How does NAEP analyze the assessment results?

Question:

How does NAEP analyze the assessment results?

Answer:

Before the data are analyzed, responses from the groups of students assessed are assigned sampling weights to ensure that their representation in NAEP results matches their actual percentage of the school population in the grades assessed.

  • Data for national and state NAEP assessments in most subjects are analyzed by a process involving the following steps:
    • Check Item Data and Performance: The data and performance of each item are checked in a number of ways, including scoring reliability checks, item analyses, and differential item functioning (DIF), to assure fair and reliable measures of performance in the subject of the assessment.
    • Set the Scale for Assessment Data: Each subject assessed is divided into subskills, purposes, or content domains specified by the subject framework. Separate scales are developed relating to the content domains in an assessment subject area. A special statistical procedure, Item Response Theory scaling, is used to estimate the measurement characteristics of each assessment question.
    • Estimate Group Performance Results: Because NAEP must minimize the burden of time on students and schools by keeping assessment administration brief, no individual student takes more than a small portion of the assessment for a given content domain. NAEP uses the results of scaling procedures to estimate the performance of groups of students (e.g., of all fourth-grade students in the nation, of female eighth-grade students in a state).
    • Transform Results to the Reporting Scale: Results for assessments conducted in different years are linked to reporting scales to allow comparison of year-to-year trend results for common populations on related assessments.
    • Create a Database: A database is created and used to make comparisons of all results, such as scale scores, percentiles, percentages at or above achievement levels, and comparisons between groups and between years for a group. All comparisons are subjected to testing for statistical significance, and estimates of standard errors are computed for all statistics.

To ensure reliability of NAEP results, extensive quality control and plausibility checks are carefully conducted as part of each analysis step. Quality control tasks are intended to verify that analysis steps have not introduced errors or artifacts into the results. Plausibility checks are intended to encourage thinking about the results, whether they make sense, and what story they tell.