Review for Exam 2
Chapter 7

Evaluating What a Test Really Measures
Validity:  Does the test measure what it claims to measure?

Types of Validity
• Content
• Face
• Criterion-Related (concurrent or predictive)
• Construct

Content Validity
• Adequately sampling the domain.
• In personnel selection: job-related.
• Determined by expert judgements.
• Intended outcomes

Face Validity
• The items look like they reflect whatever is being measured.
• Uses experts to evaluate.

Chapter 8

Using Tests to Make Decisions:

Criterion-Related Validity
What is a criterion?
This is the standard by which your measure is being judged or evaluated.

Criterion-Related Validity
• Predictive validity – correlating test scores with future behavior on the behavior…after examinees have had a chance to exhibit the predicted behavior; e.g., success on the job.
• Concurrent validity – correlating test scores with an independent measure of the same trait that the test is designed to measure – currently available.

E.g.1, Teachers’ ratings of reading ability validated by correlating with reading test scores.

Or being able to distinguish between groups known to be different; i.e., significantly different mean scores on the test.

• In both predictive and concurrent validity, we validate by comparing scores with a criterion (the standard by which your measure is being judged or evaluated).

Selecting a Criterion
• Objective criteria:  observable and measurable; e.g., sales figures, number of accidents, etc.
• Subjective criteria:  based on a person’s judgment; e.g., employee job ratings.  Example…

CRITERION MEASUREMENTS MUST THEMSELVES BE VALID!
• Usually use content validity; e.g., supervisors determination of what job characteristics are important.

BOTH PREDICTOR AND CRITERION MEASURES MUST BE RELIABLE FIRST!
• E.g., inter-rater reliability of the criterion measure.
• Reliability estimates of predictors can be obtained by one of the 4 methods covered in Chapter 6.

Validity vs. Reliability
• An unreliable test cannot be valid.
• The validity of a test cannot exceed its reliability; i.e., a test can’t correlate with another measure higher than it correlates with itself.

Correlation Between Predictor and Criterion
• Coefficient of determination:  r2  tells us how much covariation exists between predictor and criterion; e.g., if r = .7, then 49% of the variance is common to both.

Using Validity Information To Make Predictions
• Decide what is “success” on the criterion; e.g., job performance 6 months after hire.
• Determine what minimum predictor score (“cut score”) will predict “success” on the job.

Outcomes of Prediction
Hits:  a) True positives - predicted to succeed and did.
        b) True negatives - predicted to fail and did.
Misses:  a) False positives -  predicted to succeed and didn’t.
         b) False negatives - predicted to fail and would have succeeded.

WE WANT TO MAXIMIZE TRUE HITS AND MINIMIZE  MISSES!

Predictive validity correlation determines accuracy of prediction:
HIGHER r = HIGHER PREDICTION

Chapter 9

Construct Validity
What is a construct?
• An imaginary “trait” or disposition inferred from observations of specific instances of behavior that have something in common.
E.g., assertiveness, OCD, etc.
• Use indirect measures of the construct, e.g., a scale which contains examples of behaviors that we consider evidence of the construct.
• But how can we validate that scale?

Construct Validity
• Comparing high vs. low scoring people on behavior implied by the construct.
• Or by comparing groups known to differ on the construct; e.g., KKK members vs. NAACP members on Attitudes Toward Blacks scale.
• Unidimensionality of the construct being measured; i.e., homogeneity of items.

• ONLY ONE CONSTRUCT CAN BE MEASURED VALIDLY BY ONE SCALE!
Construct validity requires homogeneous items – high internal consistency reliability;
therefore unidimensional!

Convergent Validity
• Convergent validity, agreement among ratings, scales, or measurements gathered independently of one another, where measures should be theoretically related.

Discriminant Validity
• Discriminant validity, Discriminate validity is the lack of a relationship among measures which theoretically should not be related.

Multitrait-Multimethod Design
• Searching for convergence across different measures of the same thing and for divergence between measures of different things.
• E.g., a scale of intelligence should correlate with a measure of verbal ability but not with assertiveness.

Chapter 10

Developing Psychological Tests
Developing a test plan
• Defining the construct
• Choosing the test format
• Specifying admin and scoring methods
• Developing the test itself

Defining the construct
• Operationalizing the construct in terms of observable behaviors.
• Job analysis in terms of the knowledge, skills, abilities, and other characteristics (KSAOs) – what it takes to succeed.
• Learning objectives

Choosing the test format and Composing the test items

Objective items
1. Multiple choice
2. True/False
3. Forced choice

Subjective items
1. Essay
2. Interview
3. projective
4. Sentence completion

Specifying Scoring Methods
• Cumulative
• Categorical
• ipsative

Types of Response Bias
• Response sets
• Social desirability
• Acquiesence
• Random responding
• Faking

Writing Good Items
• Follow the test plan.
• Base each item on a learning objective.
• Items should not be answerable from a student’s general knowledge.
• Write each item in a clear, direct manner.
• Use appropriate language.
• Make all items independent.
• Have an expert review the items.

Multiple Choice Items
• Avoid negatives
• All choices should be similar in length and style.
• Only one answer correct or “best.”
• Avoid overlapping choices.
• Avoid “all” or “none” of the above.

Comparing objective vs. subjective formats
• Objective items provide better content validity.
• Objective are more difficult to construct.
• Objective are easier to score, and more accurately.
• Subjective items are easier and quicker to write and assess higher-order skills, but harder to grade and less valid.

Writing Admin Instructions
• Setting
• Specific requirements
• Time limits
• Admin script

Chapter 11

Piloting and Revising Tests
The Pilot Test
• A scientific investigation of the new test’s reliability and validity.
• Using a sample of the people for whom the test is intended.

Quantitative Item Analysis
• How valid is the item itself?
• Item difficulty:  p should be between .2 and .8.
• Item discrimination:  D = U – L
• D is able to  reach its maximum when p = .5.

Inter-Item Correlations
• A measure of internal consistency or homogeneity.
• Items that don’t correlate with others may be measuring other constructs.

Item Characteristic Curves
• Relates the performance of each item to the testee’s ability on the construct being measured, i.e., his/her score on the test.
• Item characteristic curve (ICC) – a graph of the probability of answering an item correctly with level of ability on the construct being measured.  Measures p and D.
• Item characteristic curve (ICC):  the greater the slope, the greater the discrimination.
• The lower the height of the curve, the more difficult is the item.
• Different curves for different groups of testees can indicate bias.

Concerns for each item
• Difficulty:  what % of testees got it correct.
• Discrimination:  how well it discriminates between high and lower scorers.
• Validity:  how well it correlates with test score.

Validation and Cross-Validation
Try out on a sample different from the pilot test.
Differential Validity
• Different validity correlation coefficients for different subgroups, e.g., men vs. women are O.K.
• Unfair discrimination means that persons with equal chances of success on the job have unequal probabilities of being hired for the job.

Developing Cut Scores
• Minimum score for acceptance or hiring.
• May use a panel of experts.
• Or actual correlation of the predictor test with success on the job or college.

Developing Norms
• Administer the test to a large random sample from the population.
• Sometimes, subgroup norms also.

Chapter 12
Survey Research vs. tests
• Tests  measure individual behavior.
• Surveys measure group behavior (thoughts, feelings, attitudes, actions, etc.)

Causal vs. Correlational Methods
• Experimental research techniques – IV’s effects on a DV, controlling for RVs – only way to determine cause and effect relationships.
• Descriptive (correlational) research techniques – simply looking at frequency of one behavior related to the occurrence of another; e.g., suicide rates vary with amount of country music played.

The Survey Method
• Clear objectives.
• Clear and unbiased questions.
• Administered to a representative sample taken from a population.
• Answers analyzed to answer the objectives.
• Unbiased, objective reporting
• Reliable and valid.

Types of Surveys
• Self-administered; e.g., printed, mailed…
• Personal interviews:
• Face-to-face
• Telephone

Developing Survey Questions
• Open-ended questions
• Closed-ended questions, including multiple-choice (Likert) , ranking, rating questions.

Rules for Writing Questions
• Clear and unambiguous (“Check your sex.”)
• Use appropriate rating scales and response options.
• Include appropriate categorical alternatives (including “other”)
• Not double-barreled questions.
• Appropriate reading level (4th grade).
• No leading or loaded questions.

Sources of Error
• Questions
• Sampling

Types of Samples
• Probability (random) – equal chance of being chosen.
• Simple random sampling
• Systematic sampling
• Stratified random sampling
• Cluster sampling

• Nonprobability (convenience) sampling.