smp_questions

Quiz by smp

Feel free to use or edit a copy

includes Teacher and Student dashboards

Measure skills
from any curriculum

Tag the questions with any skills you have. Your dashboard will track each student's mastery of each skill.

With a free account, teachers can

edit the questions
save a copy for later
start a class game
automatically assign follow-up activities based on students’ scores
assign as homework
share a link with colleagues
print as a bubble sheet

Q 1/20
Score 0
What is the difference distance between the maximum and minimum values in a dataset called?
29
Whisker
IQR
Variance
Median
Range

Our brand new solo games combine with your quiz, on the same screen

Correct quiz answers unlock more play!

20 questions

Show answers

Q1
What is the difference distance between the maximum and minimum values in a dataset called?
Whisker
IQR
Variance
Median
Range
60s
Q2
What is the relationship between the Cumulative Distribution Function (CDF) and the Probability Density Function (PDF)?
The PDF is the discrete version of the CDF
The PDF is equal to the CDF at all points
The PDF is the derivative of the CDF
The CDF and PDF are not related
The CDF is the integral of the PDF over a range of values
120s
Q3
There are 5 brown cows and 10 white cows in a farm. The fence is broken and two cows run away. What is the probability that a brown cow runs away first, then a white cow runs away?
0.66
0,11
0,24
0,18
0,34
60s
Q4
A portfolio manager is trying to decide the optimal investment for a customer. The customer is investing 100-units money and if the portfolio fails i.e. bankrupts all the money is gone. The customer has a square risk perception of the expected loss; meaning "%5 risk creates the perception of an additional -(0.05*100)^2 = -25 units return".
What is the optimal choice?
return: 18-units money ; bankrupt probability: %4
return: 14-units money ; bankrupt probability: %2
return: 12-units money ; bankrupt probability: %1
return: 10-units money ; bankrupt probability: %0
return: 16-units money ; bankrupt probability: %3
120s
Q5
Which of the following is not a reason for outliers to occur?
z transformation
Sampling error
Measurement error
Incorrect data entry
Genuine unusual data values
60s
Q6
Dataset shift: Violation of the assumption that the training and testing data follow the same distribution.
When do we have no exact solution to the unexplained error caused by dataset shift?
random splitting is performed
the dynamics of the data change by time
biased splitting is performed
stratified splitting is performed
curse of dimensionality occurs due to abundant number of features
120s
Q7
Class overlap: It occurs when instances of more than one classes share a common region in the data space.
Which of the following may be a solution when it occurs for the dependent variable?
i. Transformation of the target variable
ii. Transformation of the independent variables
iii. Using more complex models
only iii
ii, iii
only i
i, ii, iii
i, ii
120s
Q8
Which of the below is not correct regarding missing data handling?
Filling the missing values with the median for numeric values and with the majority class for categoric values is a free lunch method
The goal is minimizing the impact of missing data on the analysis, preventing biased or incomplete results
Listwise deletion, excluding cases with any missing values, is an option
Imputing the missing values in an input variable by using the information in the other input variables can cause correlation; thus, comes with a cost
Pairwise deletion, excluding cases with missing values in specific variables, is an option
120s
Q9
Small disjunct: a data subset that covers only a few training examples.
Which of the below solution best addresses the potential problems of small disjuncts?
Removing the noise from the rare subset
Undersampling the rare subset
Ignoring the rare subset in the analysis
Creating a specialized modeling solution for the rare subset.
Using a single learner for all of the data
120s
Q10
Which of the below biases is not correct?
We have 2 main products and the cheaper one is preferred by the customers more. We want to create a 3rd decoy product with a price closer to the more expensive product to promote the expensive one of the main 2 products by using the endowment effect.
The outcome of an analytics project can help the business department to have less number of decisions thus prevents the choice overload.
A department needs to overcome the status quos bias to develop an analytics solution to replace a legacy non-analytics solution.
Including the business department more into the analytics project increase the projects acceptance by them due to the ikea effect.
While planning the project schedule, instead of guessing the total project time dividing it to small steps prevents the planning fallacy.
300s
Q11
We assume the house prices can be modeled with a simple linear regression with the intercept = +20 000 and the coefficient beta_m2 = +1000. What is the unexplained errors' absolute average for the two houses given below with the given predictions ?
house1_m2 = 100, y_pred_house1 = 100 000 dollars
house2_m2 = 160, y_pred_house2 = 190 000 dollars
15 k
10 k
20 k
25 k
30 k
120s
Q12
Variance-bias tradeoff: The balance between overfitting and underfitting.
Bias: It is the error caused by not fitting exactly to the data and caused by the cautiousness against completely believing the available information. It is mainly a limitation caused by the model type.
Variance: It is the variation of the average errors of the different subsets of the same population i.e. different training subsets. Since train & test data are part of a 'bigger pool' of data, high variance may cause a poor test performance due to this differentiation which is lack of generalization during training.
Which of the below is not correct?
Overfitting occurs when a model is too complex and fits the noise in the data.
Bias is the average of the same models' errors when they are trained on different datasets.
While lasso (L1 regularization) adds the squared values of the weights to the loss function, ridge (L2 regularization) adds the absolute values.
Variance is the variation of the same models' errors when they are trained on different datasets.
Underfitting occurs when a model is too simple to capture the complexity of the data.
300s
Q13
Confounding bias: it occurs when a regressor masks/distorts the association between another regressor and the regressand.
omitted variable bias: It is a special type of confounding bias that occurs when omittance of a regressor adds untrue association between another regressor and the regressand.
Which of the below is not correct?
Confounding occurs when the effect of the independent variable on the dependent variable is mixed with the effect of a third variable, while OVB occurs when a relevant variable is left out of the analysis.
Checking the coefficients and the signs of the variables with the business expert is a decent way to investigate existence of OVB.
OBV may lead to a spurious correlation between the independent variable and the dependent variable.
Starting to a project with a wider variable pool potentially from pre-thought datamarts may be a decent way to prevent OVB.
Removing a variable not correlated with the dependent variable may cause OVB.
300s
Q14
Which of the below can be a solution whenever a prediction model cannot be created for any reason?
i. Using the mean value as the prediction result is the most basic approach.
ii. If the change in a time series data is slow, the previous value (t-1) can be used.
iii. If a time series data shows seasonality, the value from the previous season can be used.
iii
i, ii, iii
i, iii
ii, iii
i, ii
60s
Q15
Mark Twain: "Lies, damned lies, and statistics"
Which statistical trickery below is not given a correct example?
Trickery with mode: A student who always mentions his flawless science exam results while not mentioning his varied mediocre exam scores in other topics.
Trickery with range: A political party advertises their small number of young nominees for the national congress and claiming representation from all ages whereas the majority of the nominees are old.
Trickery with mean: A mid-experienced employee who wants the average salary in a company including the upper management during the interview.
Trickery with variance: A store owner who tries tax evasion by only registering her small volume numerous sales while hiding her few but high volume sales.
Trickery with median: A portfolio manager mentioning last year as an achievement due to the common earnings for many of her customers while not mentioning bankrupt of two of her big customers' portfolios.
300s

Teachers give this quiz to your class

smp_questions

Quiz by smp

Measure skillsfrom any curriculum

Our brand new solo games combine with your quiz, on the same screen

Measure skills
from any curriculum