placeholder image to represent content

smp_questions

Quiz by smp

Our brand new solo games combine with your quiz, on the same screen

Correct quiz answers unlock more play!

New Quizalize solo game modes
20 questions
Show answers
  • Q1

    What is the difference distance between the maximum and minimum values in a dataset called?

    Whisker

    IQR

    Variance

    Median

    Range

    60s
  • Q2

    What is the relationship between the Cumulative Distribution Function (CDF) and the Probability Density Function (PDF)?

    The PDF is the discrete version of the CDF

    The PDF is equal to the CDF at all points

    The PDF is the derivative of the CDF

    The CDF and PDF are not related

    The CDF is the integral of the PDF over a range of values

    120s
  • Q3

    There are 5 brown cows and 10 white cows in a farm. The fence is broken and two cows run away. What is the probability that a brown cow runs away first, then a white cow runs away?

    0.66

    0,11

    0,24

    0,18

    0,34

    60s
  • Q4

    A portfolio manager is trying to decide the optimal investment for a customer. The customer is investing 100-units money and if the portfolio fails i.e. bankrupts all the money is gone. The customer has a square risk perception of the expected loss; meaning "%5 risk creates the perception of an additional -(0.05*100)^2 = -25 units return".

    What is the optimal choice?

    return: 18-units money ; bankrupt probability: %4

    return: 14-units money ; bankrupt probability: %2

    return: 12-units money ; bankrupt probability: %1

    return: 10-units money ; bankrupt probability: %0

    return: 16-units money ; bankrupt probability: %3

    120s
  • Q5
    Which of the following is not a reason for outliers to occur?
    z transformation
    Sampling error
    Measurement error
    Incorrect data entry
    Genuine unusual data values
    60s
  • Q6

    Dataset shift: Violation of the assumption that the training and testing data follow the same distribution.

    When do we have no exact solution to the unexplained error caused by dataset shift?

    random splitting is performed

    the dynamics of the data change by time

    biased splitting is performed

    stratified splitting is performed

    curse of dimensionality occurs due to abundant number of features

    120s
  • Q7

    Class overlap: It occurs when instances of more than one classes share a common region in the data space.

    Which of the following may be a solution when it occurs for the dependent variable?

    i. Transformation of the target variable 

    ii. Transformation of the independent variables 

    iii. Using more complex models

    only iii

    ii, iii

    only i

    i, ii, iii

    i, ii

    120s
  • Q8

    Which of the below is not correct regarding missing data handling?

    Filling the missing values with the median for numeric values and with the majority class for categoric values is a free lunch method

    The goal is minimizing the impact of missing data on the analysis, preventing biased or incomplete results

    Listwise deletion, excluding cases with any missing values, is an option

    Imputing the missing values in an input variable by using the information in the other input variables can cause correlation; thus, comes with a cost

    Pairwise deletion, excluding cases with missing values in specific variables, is an option

    120s
  • Q9

    Small disjunct: a data subset that covers only a few training examples.

    Which of the below solution best addresses the potential problems of small disjuncts?

    Removing the noise from the rare subset

    Undersampling the rare subset

    Ignoring the rare subset in the analysis

    Creating a specialized modeling solution for the rare subset.

    Using a single learner for all of the data

    120s
  • Q10

    Which of the below biases is not correct?

    We have 2 main products and the cheaper one is preferred by the customers more. We want to create a 3rd decoy product with a price closer to the more expensive product to promote the expensive one of the main 2 products by using the endowment effect. 

    The outcome of an analytics project can help the business department to have less number of decisions thus prevents the choice overload. 

    A department needs to overcome the status quos bias to develop an analytics solution to replace a legacy non-analytics solution. 

    Including the business department more into the analytics project increase the projects acceptance by them due to the ikea effect. 

    While planning the project schedule, instead of guessing the total project time dividing it to small steps prevents the planning fallacy. 

    300s
  • Q11

    We assume the house prices can be modeled with a simple linear regression with the intercept = +20 000 and the coefficient beta_m2 = +1000. What is the unexplained errors' absolute average for the two houses given below with the given predictions ?

    house1_m2 = 100, y_pred_house1 = 100 000 dollars

    house2_m2 = 160, y_pred_house2 = 190 000 dollars 

    15 k

    10 k

    20 k

    25 k

    30 k

    120s
  • Q12

    Variance-bias tradeoff: The balance between overfitting and underfitting.

    Bias: It is the error caused by not fitting exactly to the data and caused by the cautiousness against completely believing the available information. It is mainly a limitation caused by the model type.

    Variance: It is the variation of the average errors of the different subsets of the same population i.e. different training subsets. Since train & test data are part of a 'bigger pool' of data, high variance may cause a poor test performance due to this differentiation which is lack of generalization during training.

    Which of the below is not correct?

    Overfitting occurs when a model is too complex and fits the noise in the data. 

    Bias is the average of the same models' errors when they are trained on different datasets. 

    While lasso (L1 regularization) adds the squared values of the weights to the loss function, ridge (L2 regularization) adds the absolute values. 

    Variance is the variation of the same models' errors when they are trained on different datasets. 

    Underfitting occurs when a model is too simple to capture the complexity of the data. 

    300s
  • Q13

    Confounding bias: it occurs when a regressor masks/distorts the association between another regressor and the regressand. 

    omitted variable bias: It is a special type of confounding bias that occurs when omittance of a regressor adds untrue association between another regressor and the regressand. 

    Which of the below is not correct? 

    Confounding occurs when the effect of the independent variable on the dependent variable is mixed with the effect of a third variable, while OVB occurs when a relevant variable is left out of the analysis. 

    Checking the coefficients and the signs of the variables with the business expert is a decent way to investigate existence of OVB. 

    OBV may lead to a spurious correlation between the independent variable and the dependent variable. 

    Starting to a project with a wider variable pool potentially from pre-thought datamarts may be a decent way to prevent OVB. 

    Removing a variable not correlated with the dependent variable may cause OVB. 

    300s
  • Q14

    Which of the below can be a solution whenever a prediction model cannot be created for any reason? 

    i. Using the mean value as the prediction result is the most basic approach. 

    ii. If the change in a time series data is slow, the previous value (t-1) can be used. 

    iii. If a time series data shows seasonality, the value from the previous season can be used. 

    iii

    i, ii, iii

    i, iii

    ii, iii

    i, ii

    60s
  • Q15

    Mark Twain: "Lies, damned lies, and statistics" 

    Which statistical trickery below is not given a correct example? 

    Trickery with mode: A student who always mentions his flawless science exam results while not mentioning his varied mediocre exam scores in other topics. 

    Trickery with range: A political party advertises their small number of young nominees for the national congress and claiming representation from all ages whereas the majority of the nominees are old. 

    Trickery with mean: A mid-experienced employee who wants the average salary in a  company including the upper management during the interview. 

    Trickery with variance: A store owner who tries tax evasion by only registering her small volume numerous sales while hiding her few but high volume sales. 

    Trickery with median: A portfolio manager mentioning last year as an achievement due to the common earnings for many of her customers while not mentioning bankrupt of two of her big customers' portfolios. 

    300s

Teachers give this quiz to your class