CUBED Validity and Reliability: Assessments you can trust
There is considerable evidence to support the validity and reliability of the CUBED especially with culturally and linguistically diverse students. The CUBED can be used as a universal screener and a progress monitoring measure of phonemic awareness, decoding, reading fluency, and language and reading comprehension. It can help educators accurately identify diverse students who need supplemental and intensive intervention, and can play a vital role in multi-tiered system of supports for decoding and comprehension.
Estimates of reliability and evidences of validity are extremely important to consider when deciding whether a test is appropriate for an examiner’s needs. If a test isn’t reliable, then the examiner cannot have confidence that the results of the test are an accurate estimate of a student’s abilities. Evidence of validity provides the examiner with information on how well the test measures what it is supposed to measure, and helps the examiner know how well the test will likely satisfy his or her purpose for administering the test. An examiner should always evaluate the information about a test’s validity while keeping in mind his or her purpose for administering the test, and, most importantly, the examiner’s purpose for using the test should align with the test’s intended purpose.
Inter-Rater Reliability. To obtain inter-rater reliability, two independent examiners assign the same scores to the same student responses. This type of reliability is very important especially for the NLM subtests of the CUBED because there is a certain amount of subjectivity involved in scoring a student’s language in real-time, despite clear scoring procedures.
General Inter-rater Reliability Scores
< 80% Unacceptable
≥ 80% Acceptable
Inter-rater Reliability Scores:
≥ 90% PREFERRED
Concurrent Validity. Evidence of validity can be derived by examining the relationship between the CUBED and the results of other assessments administered at approximately the same time, often referred to concurrent validity. Six research studies with 1,146 preschool through 3rd grade students examined evidence of concurrent validity comparing the CUBED NLM Listening Retell highest score to scores from several criterion measures of language. We also compared CUBED composite scores to the Measures of Academic Progress (MAP) assessment. The majority of these comparisons, presented in correlation coefficients, offer strong evidence of concurrent, criterion-related validity for the CUBED.
Positive Correlation Coefficient Scores
.20 – .29 Unacceptable
.30 – .39 Moderate
.40 – .69 Strong
Correlation Coefficient Scores:
>.70 Very Strong
Predictive Validity. Evidence of validity can be derived by examining how well the CUBED relates to future performance on other tests. To demonstrate the predictive validity of the CUBED, we studied its relationship to the Measures of Academic Progress (MAP) and Wyoming PAWS reading assessments. We report R2 coefficients of determination to indicate the extent to which combinations of CUBED raw scores collected in the fall with 1,512 kindergarten through third grade students were predictive of the MAP assessment in the winter. R2 interpretation is highly dependent upon the testing context. For predicting reading, we consider R2 values above .10 (10% accounted variance) to be meaningful.
≥ .10 MEANINGFUL
Sensitivity and Specificity. We report sensitivity and specificity derived from discriminant function analyses that examined the predictive validity of the CUBED for end of year PAWS results. These analyses included data from 71 third grade students attending two different elementary schools in Wyoming. Sensitivity in this case represents the extent to which the CUBED accurately identified students who were at risk on PAWS, and specificity refers to the extent to which the CUBED accurately identified students who were not at risk on PAWS. Sensitivity and specificity provide evidence of both predictive validity and construct validity. The results of the analyses indicate that the CUBED is moderately to strongly predictive of the MAP and PAWS assessment.
Sensitivity and Specificity Scores
≥ 80% ACCEPTABLE