
Psychass: Reliability, Validity, and its values/formulas
Quiz by Gerard Dimaano
Tag the questions with any skills you have. Your dashboard will track each student's mastery of each skill.
Rain is developing a psychological scale measuring social anxiety using 40 Likert-scale items. To assess how well these items measure the same construct, which reliability method should Rain use?
Ryan is reviewing the reliability of a test where all items are multiple-choice questions scored as correct or incorrect. However, the items vary in difficulty. Which index should he use to evaluate internal consistency?
Chuchay is constructing a speed test where each item has equal difficulty. Which internal consistency formula would best apply to her test?
Eudoxia correlates scores from the odd-numbered and even-numbered items of her test to estimate internal consistency. Which method is she using?
Chow used the Spearman-Brown Prophecy formula and got a reliability estimate of 0.65. To improve reliability, what should she do?
Ayan and Eve both rated a series of patient interviews using a categorical behavior scale. To measure the agreement between them, which statistic should be used?
Yuki coordinated a research with five different raters categorizing behavioral responses. Which reliability coefficient should be used to evaluate the agreement among them?
Nieve observed that some items on her test measured empathy while others assessed assertiveness. What does this indicate about the testâs internal consistency?
Miffy administered a test and then calculated the Average Proportional Distance (APD) to determine how similar item responses were. What is APD primarily used to assess?
Ayan designed a test but accidentally split the items based on difficulty (e.g., easy in one half, difficult in the other) when computing split-half reliability. What error might this cause?
Pablo is tasked to measure the consistency of a personality test administered twice to the same group two weeks apart. He finds a high correlation between both scores. What type of reliability is this?
Stell noticed that the correlation between the test scores from Time 1 and Time 2 was inflated due to the test being administered just a day apart. What phenomenon most likely occurred?
Maloi designed two parallel forms of a musical aptitude test. She ensures that both have the same number of items, difficulty levels, and content coverage. Which error source is she minimizing?
Jhoanna is analyzing test scores using Classical Test Theory. She wants to compute the proportion of score variance due to true ability. What is this index called?
Mikha observed that the observed score of a trainee fluctuated due to fatigue and weather conditions. This inconsistency reflects what kind of error?
Denise developed a test to measure creativity using yes-no questions of varying difficulty. Which reliability estimate should she use to assess internal consistency?
Colet split her test items into odd and even numbered sets to check consistency. She then applied the Spearman-Brown Formula. What type of reliability is she estimating?
Anne and Elle independently rated applicants' performance in a dance audition. Their scores differed significantly. This inconsistency reflects what error?
Angela conducted a test but later learned half the participants didnât return for the retest due to an overseas tour. What reliability threat is this?
Justin wants to use the most universally applicable form of reliability for two equivalent test versions. What method is best suited for his study?
Jisoo received a raw score of 85 on an aptitude test. The manual reports a standard error of measurement (SEM) of 3. What does this imply at the 95% confidence level?
Jungkook is comparing two test scores to determine if they are significantly different. What statistical tool should he use?
In a college entrance test, V (Taehyung) develops a new predictor. He notices that it accurately detects those who will succeed. Which concept is best reflected?
Sana was rejected for a role based on a test that predicted failure, but she later proved highly capable. What classification error was made?
Suho analyzes his test's performance. He finds that it has high specificity. What does this tell him?
RM is conducting an item analysis for a dynamic (unstable) construct like mood. Which reliability estimate would best suit his test?
Woozi wants to improve his testâs internal consistency. Which strategy should he avoid?
Lisa is studying how accurate the predicted scores are from a regression model. Which standard error should she examine?
In an audition, Nayeon sees that only 5 trainees were hired out of 100 applicants. What is the selection ratio?
Jin wants to know if his test can consistently identify individuals who truly have the talent. Which concept should he evaluate?
During an assessment, Seokjin obtained a raw score of 85. The test has a mean of 70 and a standard deviation of 10. What is Seokjinâs Z-score?
A test has a Z-score of -1.0. Using the T-score formula, what is the equivalent T-score?
In a stanine system, a person who scores in the 77th percentile most likely falls into which stanine band?
What is the main purpose of computing the Standard Error of Measurement (SEM)?
You computed that Jennieâs observed score is 105 and the SEM is 5. What is her 95% confidence interval?
Which of the following statements best describes the standard error of the difference (SEdiff)?
A test has a reliability coefficient of 0.81, and the standard deviation of the test is 12. What is the SEM?
(Formula: SEM = SDâ1 - r)
Which of the following reflects a true positive in the context of psychological assessment utility?
In a normally distributed test, what percent of scores fall between Z = -1 and Z = +1?
You are comparing the scores of two examinees, Rosé and Jisoo, on a test with known reliability. You want to determine if their scores differ significantly. Which computation will help you most?
Bells, a psychometrician, is asked to design a final exam for a college-level statistics course. She includes only multiple-choice questions that cover basic definitions and excludes computational or application items. What kind of validity might be threatened in this case?
Ria develops a screening test for ADHD and finds that her tool has a high correlation with an already validated ADHD checklist. Which form of validity is she providing evidence for?
A college entrance test predicts first-year GPA very well. Which type of validity is demonstrated?
A school counselor selects a personality test because students feel that it seems relevant and appropriate to their self-image, although thereâs no data supporting this. What type of validity is being described?
You are asked to evaluate a new job screening tool. The HR team wants to know whether the test adds anything beyond what their interview process already tells them. What should you evaluate?
Youâre administering a depression inventory to a group of patients, some with diagnosed depression and some without. If your tool can clearly distinguish between the two groups, which method of validation are you using?
In a COVID-19 mental health screening tool, a high number of people without anxiety are flagged as having it. This suggests the test might be weak in which area?
During test construction, the developers gather a panel of experts to rate whether each item on a stress inventory is âessential,â âuseful,â or ânot necessary.â What psychometric method is being used here?
A clinical tool used for diagnosing anxiety yields consistent results across time but fails to correlate with known anxiety measures. What does this suggest about the test?
A researcher creates an academic aptitude test and finds that high scorers on the full test are consistently failing one particular item. What should they do next to improve construct validity?
A test developer reports a validity coefficient of 0.85 between a new job performance test and supervisor ratings. What does this suggest about the test?
Which of the following best represents an acceptable validity coefficient for a psychological test used in basic research?
If a test shows a validity coefficient of 0.10, what can we conclude?
Dee administers an entrance exam with a criterion validity coefficient of 0.75 with first-year college GPA. What does this imply?
Which of the following best explains a validity coefficient of 1.00?
A test developer finds that the new scale has a construct validity coefficient of 0.50. This suggests:
Which statement about validity coefficients is TRUE?
Eshie wants to justify the use of a new leadership assessment tool. She should aim for a minimum validity coefficient of:
What does a validity coefficient of 0.00 imply?
If a test has a validity coefficient of 0.60, what percentage of the variance in the criterion can be explained by the test?