
smp_questions
Quiz by smp
Feel free to use or edit a copy
includes Teacher and Student dashboards
Measure skillsfrom any curriculum
Tag the questions with any skills you have. Your dashboard will track each student's mastery of each skill.
- edit the questions
- save a copy for later
- start a class game
- automatically assign follow-up activities based on students’ scores
- assign as homework
- share a link with colleagues
- print as a bubble sheet
- Q1
What is the difference distance between the maximum and minimum values in a dataset called?
Whisker
IQR
Variance
Median
Range
60s - Q2
What is the relationship between the Cumulative Distribution Function (CDF) and the Probability Density Function (PDF)?
The PDF is the discrete version of the CDF
The PDF is equal to the CDF at all points
The PDF is the derivative of the CDF
The CDF and PDF are not related
The CDF is the integral of the PDF over a range of values
120s - Q3
There are 5 brown cows and 10 white cows in a farm. The fence is broken and two cows run away. What is the probability that a brown cow runs away first, then a white cow runs away?
0.66
0,11
0,24
0,18
0,34
60s - Q4
A portfolio manager is trying to decide the optimal investment for a customer. The customer is investing 100-units money and if the portfolio fails i.e. bankrupts all the money is gone. The customer has a square risk perception of the expected loss; meaning "%5 risk creates the perception of an additional -(0.05*100)^2 = -25 units return".
What is the optimal choice?
return: 18-units money ; bankrupt probability: %4
return: 14-units money ; bankrupt probability: %2
return: 12-units money ; bankrupt probability: %1
return: 10-units money ; bankrupt probability: %0
return: 16-units money ; bankrupt probability: %3
120s - Q5Which of the following is not a reason for outliers to occur?z transformationSampling errorMeasurement errorIncorrect data entryGenuine unusual data values60s
- Q6
Dataset shift: Violation of the assumption that the training and testing data follow the same distribution.
When do we have no exact solution to the unexplained error caused by dataset shift?
random splitting is performed
the dynamics of the data change by time
biased splitting is performed
stratified splitting is performed
curse of dimensionality occurs due to abundant number of features
120s - Q7
Class overlap: It occurs when instances of more than one classes share a common region in the data space.
Which of the following may be a solution when it occurs for the dependent variable?
i. Transformation of the target variable
ii. Transformation of the independent variables
iii. Using more complex models
only iii
ii, iii
only i
i, ii, iii
i, ii
120s - Q8
Which of the below is not correct regarding missing data handling?
Filling the missing values with the median for numeric values and with the majority class for categoric values is a free lunch method
The goal is minimizing the impact of missing data on the analysis, preventing biased or incomplete results
Listwise deletion, excluding cases with any missing values, is an option
Imputing the missing values in an input variable by using the information in the other input variables can cause correlation; thus, comes with a cost
Pairwise deletion, excluding cases with missing values in specific variables, is an option
120s - Q9
Small disjunct: a data subset that covers only a few training examples.
Which of the below solution best addresses the potential problems of small disjuncts?
Removing the noise from the rare subset
Undersampling the rare subset
Ignoring the rare subset in the analysis
Creating a specialized modeling solution for the rare subset.
Using a single learner for all of the data
120s - Q10
Which of the below biases is not correct?
We have 2 main products and the cheaper one is preferred by the customers more. We want to create a 3rd decoy product with a price closer to the more expensive product to promote the expensive one of the main 2 products by using the endowment effect.
The outcome of an analytics project can help the business department to have less number of decisions thus prevents the choice overload.
A department needs to overcome the status quos bias to develop an analytics solution to replace a legacy non-analytics solution.
Including the business department more into the analytics project increase the projects acceptance by them due to the ikea effect.
While planning the project schedule, instead of guessing the total project time dividing it to small steps prevents the planning fallacy.
300s - Q11
We assume the house prices can be modeled with a simple linear regression with the intercept = +20 000 and the coefficient beta_m2 = +1000. What is the unexplained errors' absolute average for the two houses given below with the given predictions ?
house1_m2 = 100, y_pred_house1 = 100 000 dollars
house2_m2 = 160, y_pred_house2 = 190 000 dollars
15 k
10 k
20 k
25 k
30 k
120s - Q12
Variance-bias tradeoff: The balance between overfitting and underfitting.
Bias: It is the error caused by not fitting exactly to the data and caused by the cautiousness against completely believing the available information. It is mainly a limitation caused by the model type.
Variance: It is the variation of the average errors of the different subsets of the same population i.e. different training subsets. Since train & test data are part of a 'bigger pool' of data, high variance may cause a poor test performance due to this differentiation which is lack of generalization during training.
Which of the below is not correct?
Overfitting occurs when a model is too complex and fits the noise in the data.
Bias is the average of the same models' errors when they are trained on different datasets.
While lasso (L1 regularization) adds the squared values of the weights to the loss function, ridge (L2 regularization) adds the absolute values.
Variance is the variation of the same models' errors when they are trained on different datasets.
Underfitting occurs when a model is too simple to capture the complexity of the data.
300s - Q13
Confounding bias: it occurs when a regressor masks/distorts the association between another regressor and the regressand.
omitted variable bias: It is a special type of confounding bias that occurs when omittance of a regressor adds untrue association between another regressor and the regressand.
Which of the below is not correct?
Confounding occurs when the effect of the independent variable on the dependent variable is mixed with the effect of a third variable, while OVB occurs when a relevant variable is left out of the analysis.
Checking the coefficients and the signs of the variables with the business expert is a decent way to investigate existence of OVB.
OBV may lead to a spurious correlation between the independent variable and the dependent variable.
Starting to a project with a wider variable pool potentially from pre-thought datamarts may be a decent way to prevent OVB.
Removing a variable not correlated with the dependent variable may cause OVB.
300s - Q14
Which of the below can be a solution whenever a prediction model cannot be created for any reason?
i. Using the mean value as the prediction result is the most basic approach.
ii. If the change in a time series data is slow, the previous value (t-1) can be used.
iii. If a time series data shows seasonality, the value from the previous season can be used.
iii
i, ii, iii
i, iii
ii, iii
i, ii
60s - Q15
Mark Twain: "Lies, damned lies, and statistics"
Which statistical trickery below is not given a correct example?
Trickery with mode: A student who always mentions his flawless science exam results while not mentioning his varied mediocre exam scores in other topics.
Trickery with range: A political party advertises their small number of young nominees for the national congress and claiming representation from all ages whereas the majority of the nominees are old.
Trickery with mean: A mid-experienced employee who wants the average salary in a company including the upper management during the interview.
Trickery with variance: A store owner who tries tax evasion by only registering her small volume numerous sales while hiding her few but high volume sales.
Trickery with median: A portfolio manager mentioning last year as an achievement due to the common earnings for many of her customers while not mentioning bankrupt of two of her big customers' portfolios.
300s