
smp_questions
Quiz by smp
Tag the questions with any skills you have. Your dashboard will track each student's mastery of each skill.
What is the difference distance between the maximum and minimum values in a dataset called?
What is the relationship between the Cumulative Distribution Function (CDF) and the Probability Density Function (PDF)?
There are 5 brown cows and 10 white cows in a farm. The fence is broken and two cows run away. What is the probability that a brown cow runs away first, then a white cow runs away?
A portfolio manager is trying to decide the optimal investment for a customer. The customer is investing 100-units money and if the portfolio fails i.e. bankrupts all the money is gone. The customer has a square risk perception of the expected loss; meaning "%5 risk creates the perception of an additional -(0.05*100)^2 = -25 units return".
What is the optimal choice?
Dataset shift: Violation of the assumption that the training and testing data follow the same distribution.
When do we have no exact solution to the unexplained error caused by dataset shift?
Class overlap: It occurs when instances of more than one classes share a common region in the data space.
Which of the following may be a solution when it occurs for the dependent variable?
i. Transformation of the target variable
ii. Transformation of the independent variables
iii. Using more complex models
Which of the below is not correct regarding missing data handling?
Small disjunct: a data subset that covers only a few training examples.
Which of the below solution best addresses the potential problems of small disjuncts?
Which of the below biases is not correct?
We assume the house prices can be modeled with a simple linear regression with the intercept = +20 000 and the coefficient beta_m2 = +1000. What is the unexplained errors' absolute average for the two houses given below with the given predictions ?
house1_m2 = 100, y_pred_house1 = 100 000 dollars
house2_m2 = 160, y_pred_house2 = 190 000 dollars
Variance-bias tradeoff: The balance between overfitting and underfitting.
Bias: It is the error caused by not fitting exactly to the data and caused by the cautiousness against completely believing the available information. It is mainly a limitation caused by the model type.
Variance: It is the variation of the average errors of the different subsets of the same population i.e. different training subsets. Since train & test data are part of a 'bigger pool' of data, high variance may cause a poor test performance due to this differentiation which is lack of generalization during training.
Which of the below is not correct?
Confounding bias: it occurs when a regressor masks/distorts the association between another regressor and the regressand.
omitted variable bias: It is a special type of confounding bias that occurs when omittance of a regressor adds untrue association between another regressor and the regressand.
Which of the below is not correct?
Which of the below can be a solution whenever a prediction model cannot be created for any reason?
i. Using the mean value as the prediction result is the most basic approach.
ii. If the change in a time series data is slow, the previous value (t-1) can be used.
iii. If a time series data shows seasonality, the value from the previous season can be used.
Mark Twain: "Lies, damned lies, and statistics"
Which statistical trickery below is not given a correct example?
Assuming we have made a fraud solution and the output is a list of fraud probability scores for each customer. Which is a valid range for these probability scores?
A decision tree can optimize only the next step when selecting the next split variable the branches created by that variable . Which of the below algorithms describes this situation?
What strategy cannot help reduce overfitting in decision trees?
Which of the below statements regarding ensembling is not correct?
Considering the K-medoids algorithm, use points (0, 3), (2, 1), and (-2, 2) to form a single cluster. What is the centroid for this cluster?
TIP 1: use euclidean distance
TIP 2: take k-medoids vs k-means difference into account