Technical DocumentationVersion 30July2019Erin M Fahle L Benjamin R ShearUniversity of Colorado BoulderDemetra KalogridesStanford UniversitySean F ReardonStanford UniversityBelen Chavez Stanford Univer ID: 870348 Download Pdf
Embed / Share  Stanford Education Data Archive
1
Stanford Education Data Archive
Techni
Stanford Education Data Archive
Technical Do
cumentation
Version
3
.
0
July
2019
Erin M. Fahle
,
≩⎥╣ L⎈⍥⎁╦⎛ ≳⎁⍨⏋〈⎗⎛⍨⎥⏒
Benjamin R. Shear
,
University of Colorado Boulder
Demetra Kalogrides
,
Stanford University
Sean F. Reardon
,
Stanford University
Belen Ch
avez, Stanford University
Andrew D. Ho, Harvard University
Suggested citation:
Fahle, E. M., Shear, B. R., Kalogrides, D., Reardon, S. F.,
Chave
z
, B.
, & Ho, A. D.
(2018). Stanford
Education Data Archive: Technical Documentation (Version
3.0
). Retrie
ved from
http://purl.stanford.edu/db586ns4974.
2
Contents
I. What is SEDA?
................................
................................
................................
......................
3
I.A. Overview of Test Score Data Files
................................
................................
..........................
3
I.B. Covariate Data
................................
................................
................................
.......................
5
I.C. Data Use Agreement
................................
................................
................................
..............
6
II. Achievement Data Construction
................................
................................
...........................
7
II.A. Source Data
................................
................................
................................
...........................
7
II.B. Definition
s
................................
..
2
..............................
.........
..............................
................................
.............................
9
II.C. Construction Overview
................................
................................
................................
.......
10
II.D. Detailed Construction Overview
................................
................................
.........................
12
Notation
................................
................................
................................
................................
..
12
Step 1. Creating the Cro
sswalk & De
fining Geographic School Districts
................................
13
Step 2. Data Cleaning
................................
................................
................................
..............
16
Step 3. Cutscore Estimation and Linking
................................
................................
................
19
Step 4. Selecting Data for
Mean Estimation
................................
................................
...........
23
Step 5. Estimating Means for Schools and Districts
................................
................................
26
S
tep 6. Aggregating GSD

subgroup estimates to Counties, CZs and Metros
..........................
29
Step 7. Scaling the Estimate
s
................................
................................
................................
..
30
Step 8. Calculating Achievem
ent Gaps
................................
................................
...................
32
Step 9. Pooled Mean and Gap Esti
3
mates
................................
mates
................................
................................
................
33
Step 10. Suppressing Data for Release
................................
................................
...................
38
II.E. Additional Notes
................................
................................
................................
..................
39
III. Covariate Data Construction
................................
................................
..............................
40
III.A. ACS Data and SES Composite Construction
................................
................................
.......
40
III.B. Common Core of Data Imputation
................................
................................
.....................
43
IV. Versioning and Publication
................................
................................
................................
45
References
................................
................................
................................
............................
47
Tables
................................
................................
................................
................................
...
48
Figures
................................
................................
................................
................................
..
64
Appendices
................................
................................
................................
...........................
65
Appendix A: Additional Detail on Statistical Methods
.........................
4
.......
...............................
.......
...............................
65
1. Estimating County

Level Means and Standard Deviations
................................
.................
65
2. Constructing OLS Standard Errors from Pooled Models
................................
.....................
67
Appendix B: Covariates
................................
................................
................................
..............
69
1. List of Raw ACS Tables Used
for SES Composite
................................
................................
.
69
2. Measurement Error, Attenuation Bias and Solutions
................................
.........................
72
3. Computing the sampling variance of sums of ACS variab
les
................................
..............
74
4. Estimating sampling variance of composite SES measures
................................
................
80
3
I.
What is
S
E
DA
?
The Stanford Education Data Archive
(SEDA)
is
part of the Educational Opportu
nity
Project at Stanford University
(https:
\
\
edopportunity.org)
,
an initiative aimed at harnessing data
to help
scholars, policymakers, educators,
and
parents
learn how to improve
educationa
l
opportunit
ies
for all children.
SEDA includes a range of detaile
d data on educational conditions,
contexts, and outcomes
in
schools,
school districts
,
counties
, commuting zones, and
metropolitan statistical areas
across the United States.
Avai
lable measures differ by aggregation;
see Sections I.A. and I.B. for a comple
te list of fil
5
es and data.
By
making the data files
es and data.
By
making the data files
available to the
public, we hope that anyone who is interested
can obtain detailed information about
U.S.
schools, communities,
and student success. We hope
that researchers will use these
data
to generat
e evidence about what policies and contexts are
most effective at increasing educational opportunity, and that such evidence will inform
educational policy and
practices.
The construction of
SEDA has been supported by grants from the Institute of Education
Sciences, the Spencer Foundation, the William T. Grant Foundation,
the Bill and Melinda Gates
Foundation, the Overdeck Family Foundation,
and by a visiting scholar fellowship from the
Russell Sage Foundation. Some of the data used in constructing the SEDA
files were provided by
the National Center for Education Statistic
s (NCES). The findings and opinions expressed in the
research
and
reported here are those of the authors
alone; they
do not represent
the
views of
the U.S. Department of Education,
NCES,
or
any of the aforementioned funding agencies
.
I.A
.
Overview of
Test
Score Data
Files
SEDA
3.0
contains
test score data files
for schools,
geographic school
districts
(GSDs)
,
counties,
commuting zones
(CZs)
,
and metropolitan statistical areas
(metros)
.
Test
score d
ata
file
s contain
information about the
average
academic achievement as measured by standardized
test scores administered in 3
rd
through
8
th
grade in mathematics and English/Language Arts (ELA)
over the 2008

09 through 2015
6

16 school years
.
The exac
t measure

16 school years
.
The exac
t measures reported diff
er by
these
levels of
aggregation.
4
School Files
.
There are
two
school

level
test score data
files
, corresponding to the two
different metrics in which the data are released: the cohort
standardized (CS)
scale and
the
grade
cohort st
andardized (GCS)
scale.
In each file there are variables corresponding to
the
average
test score
in the middle grade of the data
,
the average
╩
learning
rate
╪ ⌍⌛⎗⎈⎛⎛ ⍛⎗⌍⌥〈⎛
(grade slope)
╠ ⎥⍥〈 ╩⎥⎗〈⎁⌥╪ ⍨⎁ ⎥⍥〈 ⎥〈⎛⎥ ⎛⌛⎈⎗〈⎛ ⌍⌛⎗⎈⎛⎛
cohorts (cohort slope)
,
and the
difference
between math and ELA
(math slope)
. Each measure is included
along with
its
respective standard
error.
Estimates are reported for all students
; no estimates are provided by demographic
subgroup
.
Geographic
District, County, Commuting Zo
ne
,
and
M
etropolitan Statistical Area
Files
.
Twenty

four test score files are released
corresponding to the four units (
GSDs
, counties,
CZs
,
and
metros
) by two scales (CS and GCS) by three pooling levels (long, pooled by subject, and
pooled
overall)
.
╩≉⎈⎁⍛╪
files
c
ontain
estimates for each grade and year separately
╡ ╩⎔⎈⎈⍺〈⌥ ⌚⏒
⎛⏀⌚⍴〈⌛⎥╪ ▉⎈⎗ ⎔⎈⎈⍺⎛⏀⌚▊ ⌳⍨⍺〈⎛ ⌛⎈⎁⎥⌍⍨⎁ 〈⎛⎥⍨⎀⌍⎥〈⎛ ⎥⍥⌍⎥ ⌍⎗〈 ⌍⏋〈⎗⌍⍛〈⌥ ⌍⌛⎗⎈⎛⎛ ⍛⎗⌍⌥〈⎛ ⌍⎁⌥ ⏒〈⌍⎗⎛ ⏌⍨⎥⍥
7
⎁
⎛⏀⌚⍴〈⌛⎥⎛╡ ⌍
⎁
⎛⏀⌚⍴〈⌛⎥⎛╡ ⌍⎁⌥ ╩⎔⎈⎈⍺〈⌥ ⎈⏋〈⎗⌍⍺⍺╪ ▉⎈⎗ ⎔⎈⎈⍺▊ ⌳⍨⍺〈⎛ ⌛⎈⎁⎥⌍⍨⎁
estimates that are
averaged across grades,
years, and
subjects. In
the long
file
s
there are variables corresponding to test score means
by
subgroup
and their respective standard errors
in each grade, year and subject
. In the
two types
of
pooled
files
, there are variables co
rresponding to
the avera
ge
test score
mean (averaged
across grades, years, and subjects),
the average
╩
learning
rate
╪ ⌍⌛⎗⎈⎛⎛ ⍛⎗⌍⌥〈⎛
and
the
average
╩⎥⎗〈⎁⌥╪ ⍨⎁ ⎥⍥〈 ⎥〈⎛⎥ ⎛⌛⎈⎗〈⎛ ⌍⌛⎗⎈⎛⎛ ⌛⎈⍥⎈⎗⎥⎛
, along with their standard errors. In the pooled overall
file, the
re is also a variable th
at indicates
the
average
difference between math and ELA
and its
standard error
. Estimates are reported for all students and by demographic subgroups.
Table 1
list
s
the files and file structures.
L
ist
s
of variables can be found in
the codebook
that accomp
anies this documentation.
5
I.B.
Covariate Data
SEDA 3.0 also provides estimates of socioeconomic, demographic and segregation
characteristics of
schools,
geographic school districts
, counties
and
metros
.
The measures
included in the
district
,
county
, and
metro
covariates files come primarily from two s
ources
.
The
f
irst
is the American Community Survey (ACS) detailed tables which we obtained from the
National Histori
8
cal Geographic Information System (NHGIS
cal Geographic Information System (NHGIS) web portal.
1
These data include
demographic and s
ocioeconomic characteristics of individuals and h
ouseholds residing in each
unit.
The s
econd
is t
he
Common Core of Data (
CCD
)
which is an annual survey of all public
elementary and secondary schools and school districts in the United States. The data inclu
des
basic descriptive information on schools and
school districts, including demographic
characteristics.
2
The measures included in the school covariates file come from the CCD as well
as the Civil Rights Data Collection (CRDC). The CRDC includes data abo
ut school demographics,
teacher
experience, school expenditures, high school course enrollments as well as other
information not used here.
3
Nine
files
(three per aggregation)
in SEDA 3.0 contain CCD and ACS that data have been
curated for use with the ge
ographic school district

level
,
county

level
, and metro

level
achievement data. These data include raw measures as well derived measures (e.g., a composite
socioeconomic status measure, segregation measures). Each of the three covariate files we
construct
for each unit contain the same v
ariables, but differ based on whether they report
these variables separately for each grade and year
,
average across grades (providing a single
value per
unit
per year) or average across grades and years (providing a single
value per
unit
).
A
single data f
ile is provided for schools with one observation for each school in each year.
The
Covariate Data Construction
section of the
9
documentation describes
more
detail
documentation describes
more
detail
about the
construction
of these data files and the com
putation of derived
variables.
Table 2
lists
the names
and file structures of the
covariate data files.
1
Th
e ACS data is available for download from the NHGIS website at:
https://www.nhgis.org/
2
The CCD raw data can be accessed at
https://nces.ed.gov/ccd/
.
3
More information about
the Civil Rights Data Collection can be found here:
https://ocrdata.ed.gov/
6
I.C.
Data Use Agreement
Prior to downloading the data, users must sign the data use agreement, shown below.
You agre
e not to use the data sets
for commercial advantage, or in the course of for

profit
⌍⌛⎥⍨⏋⍨⎥⍨〈⎛╣ C⎈⎀⎀〈⎗⌛⍨⌍⍺ 〈⎁⎥⍨⎥⍨〈⎛ ⏌⍨⎛⍥⍨⎁⍛ ⎥⎈ ⏀⎛〈 ⎥⍥⍨⎛ ≩〈⎗⏋⍨⌛〈 ⎛⍥⎈⏀⍺⌥ ⌛⎈⎁⎥⌍⌛⎥ ≩⎥⌍⎁⌳⎈⎗⌥ ≳⎁⍨⏋〈⎗⎛⍨⎥⏒╦⎛
Office of Technology Licensing (
info@otlmail.stanford
.edu
).
You agree that you will not use these data to identify or to otherwise infringe the privacy
or confidentiality rights of individuals.
≯>E DA≯A ≩E≯≩ A≥E ≢≥≖≾ADED ╩A≩ A≩╪ A≐D ≩≯A≐F≖≥D ≏ANE≩ ≐≖ ≥E≢≥E≩E≐≯A≯A≖≐≩
AND EXTENDS NO
WARRANTIES OF ANY KIND, EXP
RESS OR IMPLIED. STANFORD SHALL NOT BE
LIABLE FOR ANY CLAIMS OR DAMAGES WITH RESPECT TO ANY LOSS OR OTHER CLAIM BY YOU OR
ANY THIRD PARTY ON ACCOUNT OF, OR ARISING FROM THE USE OF THE DATA SETS.
You agree that this Agreem
10
ent and
any dispute arising under i
t i
ent and
any dispute arising under i
t is governed by the laws of
the State of California of the United States of America, applicable to agreements negotiated,
executed, and performed within California.
You agree to acknowledge the Stanford Education Data Archive as
the source of these
data.
In publications, please cite the data as:
Reardon, S. F., Ho, A. D., Shear, B. R., Fahle, E. M., Kalogrides, D.,
Jang, H., Chavez, B.,
Buontempo, J.,
& DiSalvo, R. (
201
9
). Stanford Education Data Archive (Version
3.0
).
Retrieved
from
http://purl.stanford.edu/db586ns4974
.
Subject to your compliance with the terms and conditions set forth in this Agreement,
Stanford gran
ts you a revocable, non

exclusive, non

transferable right t
o access and make use of
the Data Sets.
7
II.
Achi
e
vement
Data
Construction
II.A.
Source
D
ata
The SEDA
3.0
achievement data is constructed
using data from the
ED
Facts
data system
housed by
the U.S. Department of Education (USEd), which collects aggregated
test score data
⌳⎗⎈⎀ 〈⌍⌛⍥ ⎛⎥⌍⎥〈╦⎛ ⎛⎥⌍⎁⌥⌍⎗⌥⍨⏗〈⌥ ⎥〈⎛⎥⍨⎁⍛ ⎔⎗⎈⍛⎗⌍⎀
as
required by federal law. The data include
assessment outcomes for
eight
consecutive school years fro
m the 2008

09 school year to the
201
5

1
6
school year in grades 3 to 8 in English
Language
Arts (ELA) and
m
ath.
Under federal legislation, each state is required to test every student in grades 3 through
8
(
and in one high school grade
)
in
m
ath and ELA
each yea
11
r. States have the flexibility to select
r. States have the flexibility to select
(or design) and administer a test of their
choice that measures student achievement relative to
⎥⍥〈 ⎛⎥⌍⎥〈╦⎛ ⎛⎥⌍⎁⌥⌍⎗⌥⎛╣ ≩⎥⌍⎥〈⎛ ⎥⍥〈⎁ 〈⌍⌛⍥ ⎛〈⎥ ⎥⍥〈⍨⎗ ⎈⏌⎁
benchmarks or thresholds
for
the levels of
performance
or
╩⎔⎗⎈⌳⍨⌛⍨〈⎁
cy
╪ ⍨⎁ 〈⌍⌛⍥ ⍛⎗⌍⌥〈 ⌍⎁⌥ ⎛⏀⌚⍴〈⌛⎥╣ ≩⎥⌍⎥〈⎛ ⌍⎗〈 ⎗〈⎖⏀⍨⎗〈⌥ ⎥⎈ ⎗〈⎔⎈⎗⎥ ⎥⍥〈
number of studen
ts scoring
⏌⍥⎈ ⌍⎗〈 ╩
proficient
,
╪
both overall and disaggregated by certain
demographic subgroups, for each school. More often, states report the number of students
scoring at each of a small number (usually 3

5) of ordered performance levels, where one or
more levels represent
╩
proficient
╪ ⍛⎗⌍⌥〈

level
achievement.
When states report this information to the USEd,
it
is
compiled into the ED
Facts
database. The ED
Facts
database reports the number of students
disaggregated by subgroup
scoring in each of the ord
ered performance categories, f
or each grade, year and subject
;
no
individual student

level data is reported
.
The student subgroups include race/ethnicity, gender,
and
socioeconomic disadvantage, among others.
In 2013

2016, the data is further broken out by
assessment type: regular assessment
s
, regular assessment
s
with accommodations, and alternate
assessment
s wit
h grade

level standards, modified standards and alternate stand
12
ards
. However,
i
n
2009

2012
, we c
ards
. However,
i
n
2009

2012
, we cannot distinguish students taking regular from alte
rnate assessments
; these
counts were combined in the reported data
. Therefore, for consistency in all years,
we use all
performance data reported in ED
Facts
, including results of students taking both regular and
alternate assessments
. The raw data include
no suppressed cells, nor do they have a minimum
cell size for reporting.
8
E
ach row of data corresponds to a
school

subgroup

subject

grade

year cell
.
The raw data
include no suppressed cells, nor do they have a minimum cell size for reporting.
Table
3
illu
strates the structure of the raw data from ED
Facts
prior to use in constructing SEDA
3.0
.
9
II.B.
Definitions
Commuting Zone (CZ):
Regions
defined by
the geographic boundaries of
a
local econom
y
. We
use the
2000 boundary definition
s
(
https://www.ers.usda.gov/data

products/commuting

zones

and

labor

market

areas/
)
, which are the most recent commuting zone definitions
.
Geographic School District (GSD)
:
The aggregate
of a
ll public schools, regardless of
type and
administrative control, residing in a geographic catchment area defined by a traditional public
school district. GSDs allow linking of achievement data to demographic and economic
information from EDGE/ACS, wh
ich is reported for students living in
GSD boundaries regardless
of where they attend school.
Group:
A
subgroup

unit
(as defined below)
. For schools, the only available subgroup is all
students. For
13
GSDs, coun
ti
es,
CZs,
and MSA
s
, dat
GSDs, coun
ti
es,
CZs,
and MSA
s
, data for subgroups
are
av
ailable
when estimates
are sufficientl
y precise
.
Metropolitan Statistical Area (
metro
)
:
A c
ounty or group of counties with a population exceeding
50,000 and
encompassing
an urban
area, combined with any surrounding counties with strong
commuting ties to t
he urban area (
https://www.census.gov/programs

surveys/metro

micro/about/glossary.html
)
.
The U.S. Census Bureau revises the metropolitan statistical area
definitions af
ter each decennia
l census.
We use the
2013 U.S. Census Bureau definitions
, which
are the definitions based on the 2010 census
(
https://www.c
ensus.gov/progra
ms

surveys/metro

micro/geographies/geographic

reference

files.2013.html
)
. We make
one
modification
to the definitions
:
The
Census
defines very large metropolitan areas as
Consolidated Metropolitan Statistical Area
s (CMSAS); each CMSA is su
bdivided into Met
ropolitan
Area Divisions.
We treat each Division a
s a separate metropolitan area
for analysis purposes, as
the CMSAs generally quite large
.
10
Subgroup:
≯⍥〈 ⎥〈⎗⎀ ╩⎛⏀⌚⍛⎗⎈⏀⎔╪ ⎗〈⌳〈⎗⎛ ⎥⎈ ⎥⍥〈 ⍛⎗⎈⏀⎔ ⎈⌳ ⎛⎥⏀⌥〈⎁⎥⎛ ⎥⎈ ⏌⍥⍨⌛⍥ ⌍⎁ 〈⎛⎥⍨⎀⌍⎥〈 ⎔〈⎗⎥⌍⍨⎁⎛╣
This
may be: all, whit
e, black, Hispanic, Asian, male, female, economically disadvantaged, or not
economically
disadvantaged students.
Unit:
≯⍥〈 ⎥〈⎗⎀ ╩⏀⎁⍨⎥╪ ⎗〈⌳〈⎗⎛ ⎥⎈ ⎥⍥〈 ⌍⍛⍛
14
⎗〈⍛⌍⎥⍨⎈⎁ ⎈⌳ ⎥⍥
⎗〈⍛⌍⎥⍨⎈⎁ ⎈⌳ ⎥⍥〈 ⌥⌍⎥⌍╣ ≯⍥⍨⎛ ⎀⌍⏒ ⌚〈 ⌍ ⎛⌛⍥⎈⎈⍺╠ G≩D╠ ⌛⎈⏀⎁⎥⏒╠
CZ,
or
metro
.
II.C. Construc
tion Overview
The construction process produces mean test score estimates for schools, GSDs,
counties, CZs and metros on two nationally
comparable scales in a series of ten steps, outlined in
Figure 1. We provide a brief conceptual description of each step
here. We then provide
substantial description and technical details about each step in
Section II.D
.
Step 1: Creating the Crosswalk.
T
his step
assigns each public school district to a GSD and links
each GSD uniquely to a county, CZ, and metro.
Step 2:
Data Cleaning
. This step removes data for states and units in particular subjects, grades,
and years for which we cannot produce any est
imates. We also remove any identified errors in
the raw data here.
Step 3: Estimating and Linking Cutscores.
This step
uses Heteroskedastic Ordered Probit (HETOP)
models to
estimate the state

grade

subject

year cutscores from the GSD proficiency count dat
a
for all students. It links t
he estimated cutscores to the NAEP scale and then standardizes the
linked cutscores to the
Cohort Standardized (CS) scale. The resulting cutscores are comparable
across states and years
.
Step 4: Exclude and Prepare Data
. Thi
s step excludes data for
unit

subgroup

subject

grade

year
cases with low participation in the assessment or high percent
ages of students taking alternate
assessments.
11
Step 5
15
: Estimating School and District Means
.
: Estimating School and District Means
. This step uses the pooled HETOP model to
estimate school and GSD subgroup

subject

grade

year means and standard deviations, along
with their standard errors, bas
ed on the cutscores from Step 3 and the data prepared in Step 4.
Step 6: Aggregating to County, CZ, and MSA Means
. This step aggregate
s the GSD

subgroup
estimates from Step 5 to counties, CZs, and metros.
From this point onward, we have test score
estima
tes for five units: schools,
GSDs
, counties, CZs, and metros. Subsequent steps are
equivalent for all units unless otherwise
noted.
Step 7: Scaling Across Grades
. This step
creates
grade cohort standardized (GCS) estimates for all
units.
From this point o
nward, we have two scales of the data for all units: CS and GCS.
Subsequent steps are equivalent for both scales unless other
wise noted.
Step 8: Calculating Achievement Gaps.
This step estimates white

black, white

Hispanic, white

Asian, male

female, and n
onpoor

poor achievement gaps for GSDs, counties, CZs, and metros in
each subject

grade

year where there is sufficient data.
Step 9: Pooling Mean and Gap Estimates.
This step
estimates the average achievement, learning
rate, and trend in test scores by su
bject and overall for each unit and scale. From this point
onward, we have three levels of the data for all units: long (not pooled by grade, year, or
subject), pooled by subje
ct (poolsub), and pooled overall (pool).
Step 10: Suppressing Data for Release
.
The step suppresses estimates that are too
16
imprecise to
be useful or do not reflec
imprecise to
be useful or do not reflect the performance of at least 20 unique students in both long and
pooled files for all units
and scales. For estimates reported in the long files, this step also adds a
sma
ll amount of random noise to
meet the reporting requirements of the US Department of
Education.
12
II.
D
.
Detailed Construction Overview
Notation
In the remainder of the document
ation, w
e
use the following
mathematical
notation:
⊃
Mean estimates are denoted by
⧯
⏃
and standard deviation estimates by
⧵
.
⊃
The cutscore
estimates are denoted
as
⥊
⏃
ⵀ
⏬
⏰
⏬
⥊
⏃
.
There are
⤸
total cutscores
in each
state

subject

grade

year
.
⊃
A
subscript
indicates the aggregation of the estimate. We use the following subscripts:
⥜
= unit
(generic)
⥕
= school
⥋
= GSD
⥊
= county
⥡
= CZ
⥔
=
metro
⥍
= state
⥙
= subgroup
⥈⥓⥓
= all students
⥞
⥏
⥛
= white
⥉⥓⥒
= black
⥏
⥚⥗
= Hispanic
⥈⥚⥕
= Asian
⥔⥈⥓
= male
⥍⥌⥔
= female
⥌⥊⥋
= economically disadvantaged
⥕⥌⥊
= not economically disadvantaged
⥞⥉⥎
= white

black gap
⥞
⥏
⥎
= white

Hispanic gap
⥔⥍⥎
= male

female gap
⥕⥌⥎
=not economically disadvantaged

economically disadvantaged gap
⥠
=
year
⥉
= subject
⥎
= grade
⊃
A
superscript
indicates the scal
e of the estimate. The metric is generically designated as
⥟
17
⏯
There are four scales
. The first
⏯
There are four scales
. The first two scales are only used in construction. The latter two
scales are reported
:
⥚⥛⥈⥛⥌
= state

referenced metric
⥕⥈⥌⥗
= NAEP test sco
re scale metric
⥊
⥚
=
c
ohort
scale metric
⥎⥊⥚
=
g
rade
within cohort scale metric
13
Step 1
.
Creating the
Crosswalk & D
efin
ing
Geographic School Districts
The primary purpose of the crosswalk is to assign schools to GSDs.
Each
traditional
public
school
district
in the U.S
.
is defined by
a geographic catchment area
; the schools that fall within
this geographic boundary make up the GSD
.
Commonly,
public school
districts have
administrative control over
the traditional public schools
that fall within
their
specific geographic
boundar
ies.
However, there may be some schools
physically
located within the geographic
boundary of a school district that are not under i
ts administrative control. For example, there
may be charter schools located with
in
the boundaries of a
traditional p
ublic
school district that
are operated by
a
charter school network (which has no associated geographic boundary).
Any
school that is not a
ffiliated with
one of the
traditional public school districts
is assigned to a GSD
based on
its
geograph
ic location;
the
assigned
GSD will be the
traditional public school district in
whose geographic boundaries the school is physically located.
T
he GSD
, t
herefore,
contains
all of
the
public school students living within
the geographic boundaries
18
of the school district.
The
motivatio
of the school district.
The
motivation for this assignment is to better align the test scores for students living within school
district boundaries with the demogr
aphic and socioeconomic data that we retrieve from other
sources
that report data by geographic school district bound
aries
.
Below are the GSD

assignment rules for common
types of schools that
are operated by a
local education agency (LEA) without
a
straightforward
geographic boundary.
Charter schools
:
If a charter school is
operated by an
administrative
district that o
nly has
charter schools or is authorized by a state

wide administrative agency, it is
geolocated
a
nd assigned to a GSD based on
its
l
ocation
.
4
If a charter school is
operated by a
traditional public school district
, we use that as its GSD regardless of the
⎛⌛⍥⎈⎈⍺╦⎛
location.
Schools
operated by
high school
districts
:
In
the cases where
school
s in high school
districts serve
students in grades 7 and 8,
t
he
high schools are assigned to
the
elementary
school
district in which they
are geographically located.
4
Geograph
⍨⌛ ⍺⎈⌛⌍⎥⍨⎈⎁ ⍨⎛ ⌥〈⎥〈⎗⎀⍨⎁〈⌥ ⌚⏒ ⎥⍥〈 ⍺⌍⎥⍨⎥⏀⌥〈 ⌍⎁⌥ ⍺⎈⎁⍛⍨⎥⏀⌥〈 ⌛⎈⎈⎗⌥⍨⎁⌍⎥〈⎛ ⎈⌳ ⌍ ⎛⌛⍥⎈⎈⍺╦⎛ ⎔⍥⏒⎛⍨⌛⌍⍺ ⌍⌥⌥⎗〈⎛⎛ ⌍⎛ ⍺⍨⎛⎥〈⌥
in the CCD. The
GSD
of charter schools sometimes varies from year to year
for a
pproximately
5.45
% of the roughly
8,
612
charter schools
19
. In these ca
ses, we use the GSD the ch
. In these ca
ses, we use the GSD the charter is assigned to in the most recent year it is
observed.
18 charter schools cannot be geolocated u
sing the provided latitude/longitude information. All such
schools are assigned to a single GSD with no geographic boundary.
14
Virtual schools
:
By their nature, most virtual schools do not draw students from within
strict geographic boundaries.
We therefore assign all the virtual schools within a state to
⌍ ⎛⍨⎁⍛⍺〈 ╩⏋⍨⎗⎥⏀⌍⍺ ⎛⌛⍥⎈⎈⍺ ⌥⍨⎛⎥⎗⍨⌛⎥╪╣
We identify schools as virtual using
CC
D data from
2013

1
4
through
201
5

1
6
CCD data. The virtual school identifier did not exist in earlier years of
data, so we flag schools as virtual in all years of our data if they are identified as virtual by
the
later year
CCD indicators.
5
Additionally, we
identify virtual
schools by searching
⎛⌛⍥⎈⎈⍺ ⎁⌍⎀〈⎛ ⌳⎈⎗ ⎥〈⎗⎀⎛ ⎛⏀⌛⍥ ⌍⎛ ╩
⏋⍨⎗⎥⏀⌍⍺╪╠ ╩⌛⏒⌚〈⎗╪╠ ╩⎈⎁⍺⍨⎁〈╪╠ ╩⍨⎁⎥〈⎗⎁〈⎥╪╠ ╩⌥⍨⎛⎥⌍⎁⌛〈╪╠
╩〈⏑⎥〈⎁⌥⍨⎁⍛╪╠ ╩〈⏑⎥〈⎁⌥〈⌥╪╠ ╩⎈⎁

⍺⍨⎁〈╪╠ ╩⌥⍨⍛⍨⎥⌍⍺╪ ⌍⎁⌥ ╩⍷⌍⎔⍺⌍⎁ ⌍⌛⌍⌥〈⎀⏒╪
. Since schools may
change names, if we identify a school as virtu
al by this approac
h in one year, we flag the
school as virtual in all years.
6
Note that
virtual schools
are
retained in the estimation
of
state
cutscores
, but n
o mean
20
estimates are produced or reported
in
estimates are produced or reported
in
SEDA 3.0
for virtual
schools or
virtual school
dist
ricts
(these are r
emoved from the data in
Step 4
)
.
Schools belonging to GSDs
that cross state boundaries
:
A few school districts overlap
state borders. In this case, schools on either side of the state border take different
accountability tests. We treat
each of
these districts as two GSDs, each one coded as part
of the state in which it resides.
The second
purpose of the crosswalk is to
identify
a stable
district ID for cases where
school districts restructure or are reported differently in different da
ta sets
during the time
period of our data
. These cases are discussed below.
Schools in districts that res
tructure
:
Some districts changed structure during the time
period covered by SEDA
3.0
data. We have identified a small number of these cases. In
Calif
ornia, two Santa Barbara districts (LEA IDs: 0635360, 0635370) joined to become the
Santa Barbara Unified
School District. In South Carolina, two districts joined to become
the Sumter School District (LEA IDs: 4503720, 4503690). In Tennessee, Memphis Publi
c
5
In 2013

2015, we identified 12 non

⏋⍨⎗⎥⏀⌍⍺ ⎛⌛⍥⎈⎈⍺⎛ ⍨⎁ A⍺⌍⌚⌍⎀⌍ ⍨⌥〈⎁⎥⍨⌳⍨〈⌥ ⌍⎛ ╩⏋⍨⎗⎥⏀⌍⍺╪ ⌚⏒ ⎥⍥〈 CCD ⍨⎁⌥⍨⌛⌍⎥⎈⎗╣ ≿〈 ⎥⎗〈⌍⎥
these as reg
ular schools in all subsequent steps.
6
Some naming or classification of schools was ambiguous. When the type of school was unc
lear, research staff
21
consulted school and district websites f
consulted school and district websites for additional details. Schools whose primary mode of instruction was
online
but that required regular attendance at a computer lab or school building were coded as belonging to the GSD in
which t
hey are located.
15
Schools and Shelby County Public Schools (LEA IDs: 4702940, 4703810) merged.
In Texas,
North Forest ISD
merged with Houston ISD (LEA IDs: 4833060, 482364).
For all cases,
SEDA 3.0 contains estimated test score distributions for the combined GSDs.
School
s in New York
C
ity
:
The CCD assigns schools in New York City to one of thirty

two
districts or
⎈⎁〈 ╩⎛⎔〈⌛⍨⌍⍺ ⎛⌛⍥⎈⎈⍺⎛ ⌥⍨⎛⎥⎗⍨⌛⎥╣╪
We aggregate a
ll New York City Schools to the
city level and give
them all
the same
GSD
code, creating one unified
New York City
GSD
code.
Finally, the crosswalk links
the G
SD estimates to
counties,
CZs
, and
metros
. No
additional
geolocation is done in support of this
aspect of the
crosswalk
.
GSDs are assigned to counties,
metros
, and CZs based on the county codes provided in CCD.
A small number of counties
restructure during the time frame of our data
, meaning that we o
bserve some districts belonging
to two different counties over the course of our data
.
To avoid this issue, w
e create a stable ID
for this county that is equivalent
to the county definition in the most recent year of data.
Districts are always assigned to
this stable county
ID, regardless of the year of the data.
We use
the 2013 metropolitan
22
statistical area definitions.
The cro
statistical area definitions.
The crosswalk and the
shape files used to loca
te schools within each geographic unit are
available
in the SEDA database
. The county,
metro
, and
CZ
shape files are ori
ginal from the US
C〈⎁⎛⏀⎛ B⏀⎗〈⌍⏀╣ A ⌥⍨⎛⎥⎗⍨⌛⎥ ⍺〈⏋〈⍺ ⎛⍥⌍⎔〈 ⌳⍨⍺〈 ⏌⌍⎛ ⌛⎗〈⌍⎥〈⌥ ⏀⎛⍨⎁⍛ ⎥⍥〈 ≳╣≩╣ C〈⎁⎛⏀⎛ B⏀⎗〈⌍⏀╦⎛ ◹◷◸◷
TIGER/Line Files. These fil
es were from the National Historical Geographic Information System
(NHGIS). The Census Burea
u provides three shape files: elementary district boundaries, high
school district boundaries, and
unified district boundaries.
Research staff merged the elementary
and unified shape files
to conform to the decision rules outlined above.
Note that in the
data
⎗〈⎔⎈⎛⍨⎥⎈⎗⏒ ⎥⍥〈 ⎛⍥⌍⎔〈 ⌳⍨⍺〈⎛ ⌍⎗〈 ⍺⌍⌚〈⍺〈⌥ ⌍⎛ ╩⏋◹◸╪╣ ≐⎈ ⏀⎔⌥⌍⎥〈⎛ ⏌〈⎗〈 ⎀⌍⌥〈 ⎥⎈ ⎥⍥〈⎛〈 ⌳⍨⍺〈⎛ ⍨⎁ ⎥⍥⍨⎛
release; their version number was not edited.
16
Step 2
.
Data Clea
ning
In this step, we first merge the ED
Facts
data (described under
II.A.
Source Data
, above)
by NCES school ID and year with the crosswalk developed in
Step 1
. This merge provides us with
counts of students scoring in each proficiency category by
school

s
ubgroup

subject

grade

year
that is linked to GSDs
, counties, CZs, and metros
. As note
d above, i
n
20
08

09
through 2011

12
,
23
we cannot distinguish students taking r
we cannot distinguish students taking regular from alternate assessments
; these counts were
combined in the reported data
. Therefore,
for consistency in all years, we
combine the
performance
data
for
regular and alterna
te assessments
as
reported in ED
Facts
.
Notably,
in a
small number of cases
⎥⍥⌍⎥ ⎥⍥〈 ⎛⎥⌍⎥〈╦⎛ ⌍⍺⎥〈⎗⎁⌍⎥〈 ⌍⎛⎛〈⎛⎛⎀〈⎁⎥⎛ ⍥⌍⏋〈 ⎈⎁〈 ⌍⌥⌥⍨⎥⍨⎈⎁⌍⍺ ⎔〈⎗⌳⎈⎗⎀⌍⎁⌛〈
category relative to the
regular assessment.
7
Because o
ur estimation uses combined counts of
students
scoring in each performance category across all assessments
, this leads to the bottom
or top proficiency category of the data having a very small number of observations
.
To avoid
issues during estimation, w
e collapse
the sparse bottom or top category with
the adjacent
category in these
state

subject

grade

year cases
.
The affected state, subject, grade, and year
cases include: Arkansas, math and ELA, grades 3

8, years 2012
, 2013,
20
14 and 2016; Colorado,
math and ELA, grades 3

8, years 2012
, 2013, and
2014;
Iowa, math and ELA, grades 3 through 8,
years 2015 and 2016; New York, math, grades 3

6, years 2013 and 2014; Oregon, math and ELA,
grades 3

8 in 2013 and 2014; and South Carolina
, math and ELA, grades 3

8, years 2012
, 2013,
and
2014.
Next, we remove
all d
ata
8
for
state

subject

grade

year
cases
that do not meet the
requirements of our estimation
. A general description of these cases follows
,
and a
24
list of specific
cases can be foun
d i
list of specific
cases can be foun
d in Table
4
:
Students
took
incomparable
tests within the
state

subject

grade

year
:
There are two
common ways this appears within the data. First,
there are
cases where d
istricts were
permitted to administer locally
selected assessments
. This occurred
in N
ebraska during
SY 2008

2009 (ELA and Math) and SY 2009

2010 (Math).
Second, students take end

of

7
The ED
Facts
documentation notes these discrepancies in years after 2011

12.
8
For all subgroups and all school
s in the state. In other words, no estimates will be available for these state

subject

grade

year cases.
17
course rather than end

of

grade assessments. This is the case in
some or all years for
7
th
and 8
th
grade math for California, Virginia and Texas
(among ot
her s
tates, reported in
Table 5
)
.
The
problem
is that a
ssessments were scored on different scales and using
different cut scores. Therefore, proficiency counts cannot be compared across districts or
schools within th
ese
state

subject

grade

year cases.
The
stat
e had participation lower than 95% in the tested subject

grade

year
:
Using
the
ED
Facts
data, we are able to estimate a participation rate for all state

subject

grade

year
cases in the 2012

13 through 2014

15 school years.
This participation data file is no
t
available prior to the 2012

13 school year, and therefore we cannot calculate
partic
ipation rates prior to 2012

13.
Participation is the ratio of the number of
test
scor
25
es
reported
to the number enrolled
s
es
reported
to the number enrolled
students
in a
given
state

subject

grade

year:
⥗
⥈⥙⥛
ⷤⷷⷥⷠ
⽗
⥕⥜⥔⥚⥊⥖⥙⥌
⥚
ⷤⷷⷥⷠ
⥕⥜⥔⥌⥕⥙
⥓
ⷤⷷⷥⷠ
(2.1)
for each state
⥍
,
year
⥠
, grade
⥎
, and
subject
⥉
.
This state

level
suppression
is important because both the quality of the estimates and
the linkage process depends o
n having the population of student test scores for that
state

subject

grade

year. State participation may be low due to a number of factors,
including student opt out
or pilot testing.
Note
that w
e do not
suppress
any entire state

subject

grade

year cases
prior to the 2012

13 school year as enrollment data
are
not
available
in ED
Facts
.
However, opt out was low in 2012

13 (no state was excluded based
on this threshold),
which suggests states met 95% threshold in prior years when data
are
not available.
Insuf
ficient d
ata was
r
eported to ED
Facts
:
Some states reported no data in certain years:
Wyoming did not report any assessment outcomes in 2009

10.
Others reported data
from which we cannot recover reliable estimates. I
n the 20
08

09, 20
09

10, and
2010

11
schoo
l years,
Colorado
reported
data
in only two proficiency categories
,
and
a large
major
ity of the data (
88
%
across subjects,
grades, and years
) fall into a single category
.
These data do not provide sufficient information to estimate means
and/or
standard
de
viations
in most regions
.
In the 2014

15 and 2015

16 sch
26
ool years, New Mexico
18
report
ool years, New Mexico
18
reporte
d data in on two proficiency categories. We remove these cases because the two
years are consecutive and fall at the end of the time series of our data.
In addition to t
he exclusion of
state

subject

grade

year
cases, w
e also
remove
idiosyncratic data
errors.
These were identified by looking at t
he
distribution of students across
proficiency categories
. When the distribution changed too abruptly
for the given cohort in the
given year compared with their performance in the prior and su
bsequent years, as well as
compared with other cohorts in the
GSD
,
t
hese data
were
determined to be entry errors and
were removed.
These cases are listed in Table
5
.
19
Step 3. Cutscore Estimati
on and Linking
In this step, we use
HETOP models and
the
all

student
GSD
proficiency count
data to
estimate
state

subject

grade

year cutscores
on a common scale linked to NAEP
. T
o address
practical challenges that can arise
in linking and
the
HETOP estimat
ion framework
,
within a
specific st
ate

subject

grade

year
we
:
Rearrange
GSDs
.
W
e reconfigure
GSDs
that meet certain criteria
within a state

subject

grade

year
in order
to improve the HETOP estimation process. First, we combine vectors
of counts that have
⌳〈⏌〈⎗ ⎥⍥⌍⎁ ◹◷ ⎛⎥⏀⌥〈⎁⎥⎛ ⍨⎁⎥⎈ ╩⎈⏋〈⎗⌳⍺⎈⏌╪ ⍛⎗⎈⏀⎔⎛ ⌚〈⌛⌍⏀⎛〈 〈⎛⎥⍨⎀⌍⎥〈⎛
based on small sample sizes can be inaccurate. Second,
in
some v
27
ectors with more than
20 students the p
ectors with more than
20 students the pat
tern of counts does not provide enough information to estimate a
mean
or
a standard dev
⍨⌍⎥⍨⎈⎁╡ ⏌〈 ⌍⍺⎛⎈ ⎔⍺⌍⌛〈 ⎥⍥〈⎛〈 ⌛⎈⏀⎁⎥ ⏋〈⌛⎥⎈⎗⎛ ⍨⎁⎥⎈ ⎥⍥〈 ╩⎈⏋〈⎗⌳⍺⎈⏌╪
group.
If
the resulting
overflow
groups
have parameters that
cannot be estimated via
maximum likelihood, th
ey
are
removed from the data.
T
his reconfiguration allows us to
retain the
maximum
poss
ible number of test scores in the estimation sample
for the
cutscores
. This is important as the linking methods
we use later in this step
rely on having
information about the
full
population
in each state

grade

year

subject
.
Constrain GSDs
.
For groups not
⍨⎁ ⎥⍥〈 ╩⎈⏋〈⎗⌳⍺⎈⏌╪ ⍛⎗⎈⏀⎔╠ ⏌〈 ⌍⍺⏌⌍⏒⎛ 〈⎛⎥⍨⎀⌍⎥〈 ⌍ ⏀⎁⍨⎖⏀〈
mean. But we
can sometimes obtain
more precise and identifiable
estimates by placing
additional constraints on
group standard deviation parameters in the HETOP model
.
We
constrain
standard deviation para
meter estimates for
groups
that meet the following
conditions
during
estimation:
⊃
There are fewer than 50 student assessment outcomes in a
GSD
.
⊃
There are not sufficient data to estimate both a mean and standard deviation (a
ll
student assessment outcomes fal
l in only two adjacent performance level
categories
;
a
ll student assessment outcomes fall in the top and bottom
performance
categories
; or a
ll student assessmen
28
t outcomes fall in a single
performance
t outcomes fall in a single
performance level category
)
.
After these data processing steps, w
e
estimate
a separate HETOP model
for each state

subject

grade

year and save the cutscore estimates.
For state

grade

year

su
bjects with only two
20
proficiency categories, we cannot estimate unique GSD standard deviations and instead we use
the model with a s
ingle, fixed standard deviation parameter (the HOMOP model).
W
e denote the
estimated
cutscores
as
⥊
ⵀ
ⷤⷷⷥⷠ
ⷱⷲⷲⷣ
⏬
⏰
⏬
⥊
ⵊ
ⵀ
ⷤⷷⷥⷠ
ⷱⷲⷲⷣ
, for
a state
⥍
,
year
⥠
, grade
⥎
, and subject
⥉
, where
the proficiency data are reporte
d in
⤸
categories
. These
cutscores
are expressed in units of their
respective state

year

grade

subject student

level standardized distribution. The HETOP model
estimation procedure also provides standard errors of these
cutscore
estimates, denoted
⥚⥌
(
⥊
ⷩ
ⷤⷷⷥⷠ
ⷱⷲⷲⷣ
)
⥍⥖⥙
⥒
⽗
╾
⏬
⏯
⏯
⏬
⤸
⽑
╾
, respectively (Reardon, Shear, Castellano, & Ho, 201
7
).
Note
that we do not use the
group

specific
means or standard deviations that are simultaneously
estimated along with the cutscores; mean estimation i
s described in
Step
s 5
and
6
. See Reardon
et
al. (2017) and the description in
Step 5
below for additional details about the HETOP
model
.
To place these cutscores on a common scale across states, grades, and years
we use data
from the National Assessment
of Educational Progress (NA
29
EP). NAEP data prov
ide estimates of
4
EP). NAEP data prov
ide estimates of
4
th
and 8
th
grade test score means and standard deviations
for each state
on a common scale
,
denoted
⧯
⏃
ⷤⷷⷥⷠ
ⷬⷣⷮ
and
⧵
ⷤⷷⷥⷠ
ⷬⷣⷮ
, respectively, as well as their standard errors
.
9
Because NAEP is
administered only in 4
th
and 8
th
grades in odd

numbered years, we interpolate and extrapolate
linearly to obtain estimates of these parame
ters
for
grades (3, 5, 6, and 7) and years (2010,
2012, 2014
, and 2016
) in which NAEP was not adminis
tered. First, within each NAEP

tested year
(
2009, 2011, 2013, 2015
, and 2017
)
we
linearly
interpolate between grades 4 and 8 to grades 5,
6, and 7 and extr
apolate to grade 3. Next, for all grades 3

8, we
linearly
interpolate between the
odd
NAEP

tested yea
rs to estimate parameters in 2010, 2012, 2014
and 2016
, using the
interpolation/extrapolation formulas here:
⧯
⏃
ⷤⷷⷥⷠ
ⷬⷣⷮ
⽗
⧯
⏃
ⷤⷷ
ⵃ
ⷠ
ⷬⷣⷮ
⽐
⥎
⽑
▁
▁
⽶
⧯
⏃
ⷤⷷ
ⵇ
ⷠ
ⷬⷣⷮ
⽑
⧯
⏃
ⷤⷷ
ⵃ
ⷠ
ⷬⷣⷮ
)
⏬
⊓⊜⊟
⊔
⟛
㑇
▀
⏬
▂
⏬
▃
⏬
▄
㑈
⧯
⏃
ⷤⷷⷥⷠ
ⷬⷣⷮ
⽗
╾
╿
(
⧯
⏃
ⷤ
㑉
ⷷ
ⵊ
ⵀ
㑊
ⷥⷠ
ⷬⷣⷮ
⽐
⧯
⏃
ⷤ
㑉
ⷷ
ⵉ
ⵀ
㑊
ⷥⷠ
ⷬⷣⷮ
)
⏬
⊓⊜⊟
⊦
⟛
㑇
╿╽╾╽
⏬
╿╽╾╿
⏬
╿╽╾▁
⏬
╿╽╾▃
㑈
(
3
.1)
9
Note that the NAEP s
cales are not comparable across math and reading, but they are comparable across
states,
grades a
30
nd years within each subject.
21
nd years within each subject.
21
We do the same to interpolate/extrapolate the state NAEP standard deviations. The
reported NAEP means and standard deviations, along with interpolated values, by year and
grade, are reported in
Table
6
.
We
then
use these state

specific NAEP estimat
es to
place each
⎛⎥⌍⎥〈╦⎛ ⌛⏀⎥⎛⌛⎈⎗〈⎛
on the
NAEP scale. The methods we use
▁
as well as a set of empirical analyses demonstrating the
validity of this approach
▁
are described
in more detail
by Reardon, Kalogrides, and Ho
(
Forthcoming
). We provide a brief summar
y here
.
Because GSD test score moments
and the
cutscores
are expressed on a state scale with mean 0 and unit variance, the estimated mapping
of
⥊
ⷩ
ⷤⷷⷥⷠ
ⷱⷲⷲⷣ
⥍⥖⥙
⥒
⽗
╾
⏬
⏰
⏬
⤸
⽑
╾
to the NAEP scale is given by Equation (
3
.2) below, where
⧴
ⷤⷷⷥⷠ
is
the estimated reliability of the state test. This mapping yields an estimate of the
⥒
ⷲ
ⷦ
cutscore
on
the NAEP scale; denoted
⥊
ⷩ
ⷤⷷⷥⷠ
ⷬⷣⷮ
.
⥊
ⷩ
ⷤⷷⷥⷠ
ⷬⷣⷮ
⽗
⧯
⏃
ⷤⷷⷥⷠ
ⷬⷣⷮ
⽐
⥊
ⷩ
ⷤⷷⷥ
ⷠ
ⷱⷲⷲⷣ
⾲
⧴
ⷤⷷⷥⷠ
ⷱⷲⷲⷣ
▹
⧵
ⷤⷷⷥⷠ
ⷬⷣⷮ
(
3
.2)
The intuition behind Equation (
3
.2) is straightforward:
cutscores in
states with relatively
high NAEP averages should be placed higher on the NAEP scale. The reliability term,
⧴
ⷤⷷⷥⷠ
, in
E
quation (
3
.
31
2) is necessary to account for m
easurem
2) is necessary to account for m
easurement error in state accountability test scores.
Note that
cutscores
on the state scale
are expressed in terms of standard deviation units of the
state score distribution. The
state s
cale
cutscores
are biased toward zero due to measurement
error. They must be disattenuated
during mapping
to the NAEP scale, given that the NAEP scale
accounts f
or measurement error due to item sampling. We disattenuate the means by dividing
them by the sq
uare root of the state test score reliability estimate,
⧴
ⷤⷷⷥⷠ
. The
reliability data
used to disattenuate the estimates come from Reardon and Ho (2015)
and were supplemented
with publicly available information from state technical reports. For cases where no information
was available, test reliabil
ities were imputed using data from other grades and years in the same
state.
Finally, we
standardize the
NA
EP

linked cutscores
relative to a
reference
cohort of
students. This standardization is accomplished
by
subtracting the
national
grade

subject

spec
ific
mean and
dividing by the national grade

subject

specific standard deviation for a
reference
22
cohort. We u
se the
average of the three national
cohort
s
that
w
ere
in 4
th
grade in
2009, 2011,
and 2013
.
We rescale at this step such that all means recovered
in Step 5 will be interpretable as
an effect size relative to the average of the three national
cohorts that were in 4
th
grade in 2009,
2011, and 2013.
For each grade, year and subject
32
we calculate:
⧯
⏃
ⷴⷥ
⏬
we calculate:
⧯
⏃
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
⽗
╾
▀
⧯
(
ⷷ
ⵋ
ⷝ
ⵉ
ⷥ
)
ⷥⷠ
ⷬⷣⷮ
ⷝ
⟛
㑇
ⵁⴿⴿⵄ
⏬
ⵁⴿⴿⵆ
⏬
ⵁⴿⴿⵈ
㑈
⧵
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
⽗
╾
▀
⧵
(
ⷷ
ⵋ
ⷝ
ⵉ
ⷥ
)
ⷥⷠ
ⷬⷣⷮ
ⷝ
⟛
㑇
ⵁⴿⴿⵄ
⏬
ⵁⴿⴿⵆ
⏬
ⵁⴿⴿⵈ
㑈
(
3
.
3
)
In Equation (3.3),
⥆
refers to the year in which the cohort was in
the spring of
kindergarten.
For
the 2009 4
th
grade cohort, th
is is equal to 2005 (or 2009 minus 4).
Then we standardize each cutscore:
⥊
ⷩ
ⷤⷷⷥⷠ
ⷡⷱ
⽗
⥊
ⷩ
ⷤⷷⷥⷠ
ⷬⷣⷮ
⽑
⧯
⏃
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
⧵
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
(
3
.
4
)
The resulting cutscore
s
are on the CS scale, standardized t
o
this
national
ly averaged
reference
cohort within subject, grade, and year.
23
Step 4.
Selecting
Data for
Mean
Estimation
In Step 5, we
estimate a model separately for each unit

subgroup that draws on
ly on
the
subject

grade

year data for that unit

subgrou
p. In some subjects, grades, and years, we are less
confident in the quality of the
unit

subgroup
data and do not want leverage it in estimation as it
may bias the
parameter estimates
.
10
These cases are described below:
The p
articipation
rate
is less than 9
5%.
In these cases, t
he
population of tested students
on which the mean and standard deviation est
imates are based may not be
representative of the population of students in that sch
33
ool).
Therefore
,
we remove all
uni
ool).
Therefore
,
we remove all
unit

subgroup

subject

grade

year cases whe
re participation was lower than 95%.
P
articipation is defined as:
⥗⥈⥙⥛
ⷳⷰⷷⷥⷠ
⽗
⥕⥜⥔⥚⥊⥖
⥙⥌
⥚
ⷳⷰⷷⷥⷠ
⥕⥜⥔⥌⥕⥙
⥓
ⷳⷰⷷⷥⷠ
⏯
(
4
.1)
This measure can be constructed in the 2012

13 through 2015

16 school years; we do
not remove data based on this rule in earlier years.
If the participation rate fo
⎗ ╩⌍⍺⍺
⎛⎥⏀⌥〈⎁⎥⎛╪ ⍨⎛ ⍺〈⎛⎛ ⎥⍥⌍⎁ ☀◼♄╠ ⏌〈 ⌥⎈ ⎁⎈⎥ ⎗〈⎔⎈⎗⎥ ⌍⎁⏒ 〈⎛⎥⍨⎀⌍⎥〈⎛ ⌳⎈⎗ ⌥〈⎀⎈⍛⎗⌍⎔⍥⍨⌛ ⎛⏀⌚⍛⎗⎈⏀⎔⎛
regardless of whether the subgroup

specific participation rate was greater than 95%
because we are concerned about data quality.
Insu
fficient data reported by stu
dent demographic subgroups
.
There are a small number
of cases where the total number of test scores reported by race or gender is less than
95% of the total reported test scores for all students. For example, there may be 50 te
st
scores reported for all st
udents, but only 20 test scores for
male
students
and
20 test
scores for
female
students. In this case, we would not report the
male
or
female
test
score means because insufficient test scores were reported by
gender
. We calcul
ate the
reported percentage a
s:
10
This logic of this data selection differs from the cleaning done in Step 2 to support cutscore estimation. For the
cutscore est
imation, we wanted to keep as
34
much data as possible in the estimation
much data as possible in the estimation process because the linking
procedure at the end of the Ste
p 3 requires population

based data. Moreover, the cutscore are not particularly
sensitive to low

quality data for individual GS
Ds. In contrast, the school/GSD estimates will be strongly affected by
low quality data (due to the factors described above). Fir
st, those
parameters
may not accurate reflect the academic
performance in the unit. Second, in the model that we use (described
⎀⎈⎗〈 ⌚〈⍺⎈⏌▊╠ ⏌〈 ╩⌚⎈⎗⎗⎈⏌╪ ⍨⎁⌳⎈⎗⎀⌍⎥⍨⎈⎁ ⌍⌛⎗⎈⎛⎛
grades and years in some cases. If we include these low

quality data cases, we may b
〈 ⌚⎈⎗⎗⎈⏌⍨⎁⍛ ⌳⎗⎈⎀ ╩⌚⌍⌥╪
information.
24
⥙⥌⥗
ⷳⷰⷷⷥⷠ
⽗
◎
⥕⥜⥔⥚⥊⥖⥙⥌
⥚
ⷳⷰⷷⷥⷠ
ⷰ
⥕⥜⥔⥚⥊⥖⥙⥌
⥚
ⷳ
⏬
ⷪⷪ
⏬
ⷷⷥⷠ
⏯
(
4
.2)
This
measure
can be constructed
in all years.
More than 40% of students take alternate assessments.
We are concerned that we are
getting a biased e
stimate in unit

subgroup

subject

grade

year cases where over 40% of
the students take alternate assessments. These assessments typically differ from the
r
egular assessment and have different proficiency thresholds.
This
flag
can be
constructed in the 2012

13 through 2015

16 school years; we do not remove data based
on this rule in earlier years.
Students scored only in
the
top or
only in the
bottom proficie
ncy category.
We cannot
obtain
max
35
imum likelihood estimates
of unique me
imum likelihood estimates
of unique means
for these cases and theref
ore
remove them
prior to estimation.
This flag can be constructed in every year.
We next flag and remove s
chools

subgroups
and GSD

subgroup
s that do not
meet the
minimum estimation requirements, described below.
First
,
we
⌛⎗〈⌍⎥〈 ⌍ ╩⎥⏒⎔〈
flag
╪ ⌳⎈⎗ 〈⌍⌛⍥
uni
t

subgroup

subject

grade

year case
. It is considered
╩
deficient
╪
if
the case
meets one of the
following conditions
:
a)
has
all observations in a single category
;
b)
has
all observations in only 2
adjacent categories
;
c)
has
only 2 proficiency categories (o
ne cut score)
; or,
d)
has
all
observations in only the top and bottom categor
ies
(e.g., X

0

0

X or X

0

X).
Otherwise, case
s are
⌳⍺⌍⍛⍛〈⌥ ⌍⎛ ╩
sufficient
╪╣
Constraints on the parameter estimates for
╩
deficient
╪ ⌛⌍⎛〈⎛
are needed
during
estimation because they
do not provide sufficient data to freely estimate both a mean
and a standard deviation.
Second
,
we
⌛⎈⎁⎛⎥⎗⏀⌛⎥ ⌍ ╩⎛⍨⏗〈 ⌳⍺⌍⍛╣
╪
We flag unit

subgroup

subject

grade

⏒〈⌍⎗ ⌛⌍⎛〈⎛ ⌍⎛ ╩⎛⎀⌍⍺⍺╪ ⍨⌳ ⎥⍥〈⏒ ⍥⌍⏋〈 ⌳〈⏌〈⎗ ⎥⍥⌍⎁ ◸◷◷ ⎥〈⎛⎥ ⎛⌛⎈⎗〈⎛╡ ⎈⎥⍥〈⎗⏌⍨⎛〈╠ ⌛⌍⎛〈⎛ ⌍⎗〈 ⌳⍺⌍⍛⍛〈⌥
⌍⎛ ╩
⍺⌍⎗⍛〈╪╣
Each unit

subgroup

subject

grade

year, then, has two associated flags
.
36
These flags
will be used again during
These flags
will be used again during estim
ation to place constraints on the standard deviation estima
te
s for
individual unit

subgroup

subject

grade

year cases.
If a unit

subgro
up
ha
s
only
one
╩
deficient
╪ ⎈⎗
╩⎛⎀⌍⍺⍺╪ ⏀⎁⍨⎥

subgroup

subject

grade

year case
, then that case is dropped from the
data.
We
also
drop
entire
unit

subgroups that
h
⌍⏋〈 ⎁⎈ ╩
sufficient
╪ ⏀⎁⍨⎥

subgroup

subject

grade

year cases.
25
Our estimation methods, described in
the next step, cannot produce a standard deviation
estimate when all subject

grade

year cases for a given unit w
hen these conditions are met.
Finally, we
select not to perform the mean estimation
for a subset of
whole schools and
GSDs (across all
subgroup
s,
subjects, grades and years).
These include
: (1)
v
irtual
schools and
GSDs (described in
Step 2
)
;
(2)
charter
s
chools
that could not be geolocated
; and (3) schools
and
GS
Ds with
more than 20% of all students taking alternate assessments.
Note that while w
e
technically perform this data selection only for schools and GSDs in this Step, we apply a subset
of these rul
es to counties, CZs, and metros during the aggregation process. Table
7
shows the
cases that are excluded based on these rules for all geographi
es.
26
Step 5. Estimating Means for Schools and Districts
The goal of this step is to estimate the mean and
standard deviation of test scores for
each subgroup in each unit
(
school or district
)
ac
37
ross
subjects, grades, and years
.
We
ross
subjects, grades, and years
.
We have two
pieces of inf
ormation
that
we use for this process: the observed proficiency counts for each
subgroup

unit

state

grade

year

subject
from
Step 4
and
the
estimate
d
cutscores separating the
proficiency categories in the associated state

grade

year

subject
from
Step 3
. We
use these data
and a pooled HETOP
model
(Shear and Reardon, 2019
; Reardon et al., 2017
) to estimate
⧯
ⷳⷰⷷⷥⷠ
ⷡⷱ
and
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
, the mean and standard deviation of achievement on the CS scale for
unit
⥜
(school
or
GSD
)
,
subgroup
⥙
, year
⥠
, grade
⥎
, and sub
ject
⥉
. As described below, the pooled HETOP
model
is run separately for each
unit

subgroup

subject
,
but
combines data across grades and
years when estimating these parameters. Combining data across grades and years allows us to
get better estimates of
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
for years and grades in which sample sizes are small or wher
e the
proficiency count data ar
e limited.
We use a pooled HETOP model in order to overcome three practical challenges. The
challenges are: 1) in some states, years, and grades,
⤸
⽗
╿
and there is not sufficient
information to estimate both a mean and a stan
dard deviation for each unit

subgroup

grad
e

year

subject; 2) if
⤸
⽝
▀
but there are sampling zeros because test scores were not observed in
all
⤸
categories for a particular grade and year, there may not be sufficient information to
estimate both a mean an
d a stan
38
dard deviation; and 3) when the sa
mple
dard deviation; and 3) when the sa
mple size
⥕
ⷩⷳⷰⷷⷥⷠ
is small,
prior simulations (e.g., Reardon et al., 2017; Shear & Reardon, 2019) have shown that estimates
of standard deviations can be biased and contain excessive sampling error.
We e
stimate a pooled HETOP model
(Shear & Reardon
, 2019)
for each
unit
, separately
⌳⎈⎗ 〈⌍⌛⍥ ⎛⏀⌚⍴〈⌛⎥ ⌍⎁⌥ ⎛⏀⌚⍛⎗⎈⏀⎔╠ ⌚⏒ ╩⎔⎈⎈⍺⍨⎁⍛╪
data across all available grades and years, and
maximizing
the
joint
log
likelihood function given by:
27
⤹
⽗
⊙⊛
⽮
⤽
⽶
⤇
ⷳⷰⷠ
〶
⤆
ⷳⷰ
ⷠ
ⷡⷱ
⏬
⤁
ⷳⷰⷠ
ⷡⷱ
⏬
⣼
ⷤⷠ
ⷡⷱ
)
⽲
⽗
⥕
ⷩ
ⷳⷰⷷⷥⷠ
⊙⊛
⽶
⧳
ⷩⷳⷰⷷⷥⷠ
)
ⷩ
ⵋ
ⵀ
ⷋ
ⷥ
ⵋ
ⵀ
ⷝ
ⷷ
ⵋ
ⵀ
⽗
⥕
ⷩⷳⷰⷷⷥⷠ
⻝⻯
ⷩ
ⵋ
ⵀ
ⷋ
ⷥ
ⵋ
ⵀ
ⷝ
ⷷ
ⵋ
ⵀ
⊙⊛
(
(
⧯
ⷳⷰⷷⷥⷠ
ⷡⷱ
⽑
⥊
ⷩ
ⵊ
ⵀ
⻜⻯⻝⻘
ⷡⷱ
⊒⊥⊝
⽶
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
)
)
⽑
(
⧯
ⷳⷰⷷⷥⷠ
ⷡⷱ
⽑
⥊
ⷩ
⻜⻯⻝⻘
ⷡⷱ
⊒⊥⊝
⽶
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
)
)
)
⏬
where
⤇
ⷳⷰⷠ
is a matrix of proficiency counts across all available grades (
⤴
) and years (
⥆
) for
unit
⥜
,
subgroup
⥙
and subject
⥉
,
⤆
ⷳⷰⷠ
ⷡⷱ
is a vector of estimat
ed means across grades and years,
⤁
ⷳⷰⷠ
ⷡⷱ
is a vector of estimated
parameters for the functi
on
⥏
(
)
described below
, and
⣼
ⷤⷠ
ⷡⷱ
is a matrix
of cutscores across gra
39
des and years.
The cutscores are treat
des and years.
The cutscores are treated as fixed here, using the values
esti
mated in
Step 3
.
We have replaced
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
in the above equation with
⊒⊥⊝
⽶
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
)
,
where
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
is a
unit

specific function used to model the natural log of the standard
deviations as a function of grade and year:
⥏
ⷳ
ⷰ
ⷠ
(
⥎
⏬
⥠
)
⽗
⊙⊛
⽶
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
)
⽗
⧦
ⷳⷰⷷⷥⷠ
ⷡⷱ
⏯
We do this for two reasons. First, estimating
⧦
ⷳⷰⷷⷥⷠ
ⷡⷱ
⽗
⊙⊛
⽶
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
)
rather than
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
directly
ensures that the ML estimate will be positive. Second, d
efining
⧦
ⷳⷰⷷⷥⷠ
ⷡⷱ
as a function of grade and
year, rather than allowing this value to be unique in each grade and year
,
defines the pooled
H
ETOP model. If we place no constraints on the model and allow
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
⽗
⧦
ⷳⷰⷠⷥⷷ
to take o
n
a unique value in each grade and year, maximization of this likelihood will result in identical
estimates to those obtained by maximizing the likelihood sepa
rately for each grade and year.
To leverage the data available across
multiple
grades and years a
nd overcome the
limitations noted above, we
define
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
in the following way. First, we allow
⧦
ⷳⷰⷷⷥⷠ
to be
freely estimated in each grade

year ce
ll that
is
both
╩⎛⏀⌳⌳⍨⌛⍨〈⎁⎥╪
⌍⎁⌥ ╩⍺⌍⎗⍛〈╪
, by the
flags defined
40
above
. For all other grade

year cel
above
. For all other grade

year cells, we constrain
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
such tha
t the estimate of
⧦
ⷳⷰⷷⷥⷠ
is
equal to the mean of the
⧦
ⷳⷰⷷⷥⷠ
estimates across the
freely estimated
cells. That is, we estimate
⌍ ⌛⎈⎀⎀⎈⎁ ╩⎔⎈⎈⍺〈⌥╪ ⎛⎥⌍⎁⌥⌍⎗⌥ ⌥〈⏋⍨⌍⎥⍨⎈⎁ ⌍⌛⎗⎈⎛⎛
the
grades and years in which there
are
╩⌥〈⌳⍨⌛⍨〈⎁⎥╪
data
⌍⎁⌥╷⎈⎗ ╩⎛
⎀⌍⍺⍺╪ ⌛〈⍺⍺ ⎛⍨⏗〈⎛
.
More formally, for all years and grades in which
⥕
ⷳⷰⷷⷥⷠ
⽚
╾╽╽
and/
or in which there are
insuffi
cient data to estimate both a mean and a standard deviation, we constrain
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
⽗
28
⧦
ⷳⷰⷠ
ⷡⷱ
to be equal, while allowing
⥏
ⷳⷰⷠ
(
⥎
⏬
⥠
)
⽗
⧦
ⷳⷰⷷⷥⷠ
ⷡⷱ
to be freely estimated in grades and years
with at least 100 test scores and sufficient data to es
timate both a mean and standard deviation.
≿〈 ⌳⏀⎗⎥⍥〈⎗ ⌛⎈⎁⎛⎥⎗⌍⍨⎁ ⎥⍥〈 ⎀⎈⌥〈⍺ ⎛⏀⌛⍥ ⎥⍥⌍⎥ ⎥⍥〈 ╩⎔⎈⎈⍺〈⌥╪ ⍺⎈⍛ ⎛⎥⌍⎁⌥⌍⎗⌥ ⌥〈⏋⍨⌍⎥⍨⎈
n is equal to the
(unweighted) mean of the unconstrained log standard deviations by defining the constraint:
⧦
ⷳⷰⷠ
ⷡⷱ
⽗
◎
◎
⽶
⤶
ⷳ
ⷰⷷⷥⷠ
ⵀⴿⴿ
▹
⤶
ⷳⷰⷷⷥⷠ
▹
⧦
ⷳⷰⷷⷥⷠ
ⷡⷱ
)
ⷝ
ⷷ
ⵋ
ⵀ
ⷋ
ⷥ
ⵋ
ⵀ
◎
◎
⽶
⤶
ⷳⷰⷷⷥⷠ
ⵀⴿⴿ
▹
⤶
ⷳ
ⷰ
ⷷ
ⷥⷠ
41
)
ⷝ
ⷷ
ⵋ
ⵀ
ⷋ
ⷥ
ⵋ
ⵀ
⏬
)
ⷝ
ⷷ
ⵋ
ⵀ
ⷋ
ⷥ
ⵋ
ⵀ
⏬
where
⤶
ⷳⷰⷷⷥⷠ
ⵀⴿⴿ
is
⎥⍥〈 ⎛⍨⏗〈 ⍨⎁⌥⍨⌛⌍⎥⎈⎗ ⌳⍺⌍⍛ ▉〈⎖⏀⌍⍺ ⎥⎈ ◸ ⍨⌳ ⎛⍨⏗〈 ⍨⎛ ╩⍺⌍⎗⍛〈╪▊
and
⤶
ⷳⷰⷷⷥⷠ
is
the sufficient data
indicator flag (equal to 1 if there are suffic
ient data)
. If
⤶
ⷳⷰⷷⷥⷠ
ⵀⴿⴿ
and
⤶
ⷳⷰⷷⷥⷠ
are e
qual to 1 for all
cells in a unit, then we estimate a unique mean and standard deviation for each cell.
For all other
units, there will be a mix of freely estimated and constrained standard deviation p
arameters.
Recall in
Step 4
that we removed unit

subgro
ups where
⤶
ⷳⷰⷷⷥⷠ
⽗
╽
for all cells
because we are
unable to estimate a standard deviation parameter.
Summary
The models described here are used to produce ML estimates of
⧯
ⷳⷰⷷⷥⷠ
ⷡⷱ
and
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
(where
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
may be c
onstrained to be equal in some grades and years), as well as estimated
standard errors
⥚⥌
⽶
⧯
⏃
ⷳⷰⷷⷥⷠ
ⷡⷱ
)
and
⥚⥌
⽶
⧵
ⷳⷰⷷ
ⷥⷠ
ⷡⷱ
)
and the estimated sampling covariances
⥊⥖⥝
⽶
⧯
⏃
ⷳⷰⷷⷥⷠ
ⷡⷱ
⏬
⧵
ⷳⷰⷷⷥⷠ
ⷡⷱ
)
, where unit ca
n be either a GSD
⥋
, or a school
⥕
. This process is applied
separately for each district

subgroup

subject or school

subgroup

subject within each state.
The
estimates are on the CS scale described elsewhere, and can be transformed to other scales,
42
such
as
the GCS scale.
29
such
as
the GCS scale.
29
Step 6
.
Aggregating GSD

subgroup estimates to
Count
ies, CZ
s
and
Metros
W
e adopt a different approach to estimate the mean and
standard deviation of achievement
in count
ies, CZs and MSAs in a given
year
⥠
, grade
⥎
, and subject
⥉
.
We
us
e the estimates for
the
GSD
s
from
Step 5
that correspond to a given county, CZ or metro
within a subject

grade

year
to
estimate an overall mean a
nd variance
for that unit.
As noted above, w
e use stable county
identifiers in case
s
where we observe that a di
strict
is placed in multiple
counties during the
years in our sample. The district is assigned to the county it is
observed in during
the 2015

16
school year
(the last year of our data)
.
We describe the process here for counties, but it also applies to CZ
s and MSAs.
Suppose
there ar
e a set of
⤰
counties, each of which contains one or more unique GSDs. These higher

level units are defined geographically and are non

overlapping. Hence, each GSD falls within
exactly one county. The county mean is estimated a
s the weighted average of GS
D means across
all
⤱
ⷡ
GSDs in county
⥊
, computed as
⧯
⏃
ⷡⷰⷷⷥⷠ
ⷡⷱ
⽗
⥗
ⷢⷡ
⧯
⏃
ⷢⷰⷷⷥⷠ
ⷡⷱ
ⷈ
⻙
ⷢ
ⵋ
ⵀ
⏬
(
6.1
)
where
⥗
ⷢⷡ
is the proportion of county
⥊
represented by GSD
⥋
. The estimated county
standard
deviation is estimated as the square root of
the estimated total variance between and within
GSDs within a county,
⧵
ⷡⷰ
43
ⷥⷠ
ⷡⷱ
⽗
⾲
⧵
ⷆ
⻙
ⷥⷠ
ⷡⷱ
⽗
⾲
⧵
ⷆ
⻙
ⵁ
⽐
⧵
ⷛ
⻙
ⵁ
(
6.2
)
where
⧵
ⷆ
⻙
ⵁ
is the estimated variance between GSDs in
county
⥊
and
⧵
ⷛ
⻙
ⵁ
is the estimated variance
within GSDs in county
⥊
. The formulas us
ed to estimate
⧵
ⷆ
⻙
ⵁ
and
⧵
ⷛ
⻙
ⵁ
are based on equations in
Reardon et al. (2017). These formulas and formulas for estimating the standard
errors of the
county means and standard deviations,
⧯
⏃
ⷡⷰⷷⷥⷠ
ⷡⷱ
and
⧵
ⷡⷰⷷⷥⷠ
ⷡⷱ
, are included in Appendix
A
1
.
30
Step
7.
S
cal
ing
the Estimates
As described
in
Step
3
, we standardize the cutscores prior to estimation such that all
mea
n estimates are produced on the CS scale.
In the step, we establish a second scale:
The
Grade Cohort Standardized (GCS)
scale.
We recommend CS

scaled estimates for research
purposes and the GCS scale for low

stakes reporting
to
non

research audiences.
Reca
ll that t
h
e CS scale is standardized with
in
subject and grade, relative to
the average
of the three
cohort
s
in our data
who
were
in 4
th
grade in 2009
, 2011 and 2013
.
We use the
average of three cohorts as our reference group because they provide a stable b
aseline for
comparison.
This metric is interpretable as an effect size, rel
ative to the grade

specific standard
deviation of
student

level
scores in
this
common
, average
cohort
. For example, a
GSD with a
mean of 0.5 on the CS scale represents a GSD where t
he average student
44
scored approximately
one half of a stan
scored approximately
one half of a standard deviation hi
gher than the national reference cohort scored in that same
grade. GSD means reported on the CS scale have an overall average near 0
as expected.
Note
that this scale retains inform
ation about absolute changes over time by relying on the stability of
the N
AEP scale over time. This scale does not enable absolute comparisons across grades,
however.
The GCS scale standardizes the
unit
means
relative
to
the average difference in NAEP
sc
ores between students one grade level apart
.
The average grade

level difference in national
NAEP scores is estimated as the
within

cohort
grade

level
change
(separately by
subject
⥉
)
, for
the
average of three
cohort
s
of s
tudents in 4
th
grade in 2009
, 2011, and 2013
(see detail on how
⧯
⏃
ⷴⷥ
⏬
⵰
ⷠ
and
⧵
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
are calculated in
Step 3
)
.
It is
denoted
⧦
ⷴⷥ
⏬
ⷠ
:
⧦
⵿⵰
⏬
ⷠ
⽗
⧯
⏃
ⷴⷥ
⏬
ⵇ
ⷠ
⽑
⧯
⏃
ⷴⷥ
⏬
ⵃ
ⷠ
▁
(7.
1
)
We then identify the linear transformation that sets
the
grade 4 and 8 averages
for this
cohort
⌍⎥ ⎥⍥〈 ╩⍛⎗⌍⌥〈 ⍺〈⏋〈⍺╪ ⏋⌍⍺⏀〈⎛
of
4 and 8 respectively
.
Then
transform
unit
means, standard
deviations,
and their variances
acc
ordingly:
⧯
⏃
ⷳⷰⷷⷥⷠ
ⷥⷡⷱ
⽗
▁
⽐
⧯
⏃
ⷴⷥ
⏬
⵰
ⷠ
⽑
⧯
⏃
ⷴⷥ
⏬
ⵃ
ⷠ
⧦
ⷴ
45
⏬
ⷠ
⽐
⧵
ⷴⷥ
⏬
ⷥ
⏬
ⷠ
⽐
⧵
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
⧦
ⷴⷥ
⏬
ⷠ
⧯
ⷳⷰⷷⷥⷠ
ⷡⷱ
(7.
2
)
31
⧵
ⷳⷰⷷⷥⷠ
ⷥⷡⷱ
⽗
⧵
ⷴⷥ
⏬
ⷥⷠ
ⷬⷣⷮ
⧦
ⷴⷥ
⏬
ⷠ
⧵
ⷳⷰⷷⷥⷠ
ⷡ
ⷱ
⥝⥈⥙
⽶
⧯
⏃
ⷢⷷⷥⷠ
ⷥⷡⷱ
)
⽗
⽷
▁
⧵
ⷥⷠ
⧯
ⵇ
ⷠ
⽑
⧯
ⵃ
ⷠ
⽻
ⵁ
⥝⥈⥙
⽶
⧯
⏃
ⷢⷷⷥⷠ
ⷡⷱ
)
⽗
⽷
⧵
ⷥⷠ
⧦
⧦
ⷠ
⽻
ⵁ
⥝⥈⥙
⽶
⧯
⏃
ⷢⷷⷥⷠ
ⷡⷱ
)
⥝⥈⥙
⽶
⧵
ⷢⷷⷥⷠ
ⷥⷡⷱ
)
⽗
⽷
▁
⧵
ⷥⷠ
⧯
ⵇ
ⷠ
⽑
⧯
ⵃ
ⷠ
⽻
ⵁ
⥝⥈⥙
⽶
⧵
ⷢⷷⷥⷠ
ⷡⷱ
)
⽗
⽷
⧵
ⷥⷠ
⧦
ⷠ
⽻
ⵁ
⥝⥈⥙
⽶
⧵
ⷢⷷⷥⷠ
ⷡⷱ
)
Then,
⧯
⏃
ⷳⷰⷷⷥⷠ
ⷥⷡⷱ
⌛⌍⎁ ⌚〈 ⍨⎁⎥〈⎗⎔⎗〈⎥〈⌥ ⌍⎛ ⎥⍥〈 〈⎛⎥⍨⎀⌍⎥〈⌥ ⌍⏋〈⎗⌍⍛〈 ⎁⌍⎥⍨⎈⎁⌍⍺ ╩⍛⎗⌍⌥〈

level
⎔〈⎗⌳⎈⎗⎀⌍⎁⌛〈╪ ⎈⌳ ⎛⎥⏀⌥〈⎁⎥⎛ ⍨⎁
unit
⥜
,
subgroup
⥙
,
year
⥠
, grade
⥎
, and subject
⥉
.
For example,
if
⧯
⏃
ⷳⷰⷷ
ⵃ
ⷠ
ⷥⷡⷱ
⽗
▂
,
4
th

grade
students in
unit
⥜
, subgroup
⥙
⏬
and
year
⥠
are one grade level (
⧦
ⵁⴿⴿⵈ
ⷠ
)
above the 4
th
grade 2009

2013
national average (
⧯
⏃
ⷴⷥ
⏬
ⵃ
ⷠ
ⷬⷣⷮ
) in performance on the tested subject
⥉
.
GSD means reported
on the GCS scale have an overall average near 5.5
(midway between
grades 3 and 8) as expected.
This metric enables absolute com
46
parisons across grades and over
time, b
parisons across grades and over
time, but it does so by relying no
t only on the fact that the NAEP scale is stable over time
but
also that it
is vertically linked across grades 4 and 8
and
linear between grades.
This metric is a
simple linear transformation of the NAEP scale, intended to render the NAEP scale more
interpretable. As such, this metric is useful for descriptive resear
ch to broad audiences not
familiar with interpreting standard deviation units
.
However, we do not advise it for analyses
where the vertical linking across grades and the linear interpolati
on assumptions are not
required or defensible
.
32
Step 8
. Calculati
ng Achievement Gaps
W
e provide achievement gap estimates
in
SEDA
3.0 for all units
except schools
. Gaps are
estimated as the difference in average achievement between subgroups
, using the mean
estimates
from
Steps 5
,
6
and
7
. We provide white

b
lack (
⥞⥉⥎
), white

Hispanic (
⥞
⥏
⥎
), white

Asian (
⥞⥈⥎
)
, male

female (
⥔⥍⥎
)
, and non
ECD

ECD
(
⥕⥌⥎
)
achievement.
I
n each scale, the
unit

subject

grade

year gap is given by the difference in the means,
e.g., the white

black gap is given by:
⥞⥉⥎
̂
ⷳ
ⷷ
ⷥ
ⷠ
ⷶ
⽗
⧯
⏃
ⷳ
(
ⷰ
ⵋ
ⷵ
ⷦ
ⷲ
)
ⷷⷥⷠ
ⷶ
⽑
⧯
⏃
ⷳ
(
ⷰ
ⵋ
ⷠⷪⷩ
)
ⷷⷥⷠ
ⷶ
(9.1)
where
⥟
denotes a particular
scale
(
CS,
GCS
) described in Step
s
3
and
7
above. The standard
error of the gap is given by:
⥚⥌
⽶
⥞⥉⥎
̂
ⷳⷷⷥⷠ
ⷶ
)
47
⾲
⥚⥌
⽶
⧯
⏃
ⷳ
(
ⷰ
ⵋ
⾲
⥚⥌
⽶
⧯
⏃
ⷳ
(
ⷰ
ⵋ
ⷵ
ⷦ
ⷲ
)
ⷷⷥⷠ
ⷶ
)
ⵁ
⽐
⥚⥌
⽶
⧯
⏃
ⷳ
(
ⷰ
ⵋ
ⷠⷪⷩ
)
ⷷⷥⷠ
ⷶ
)
ⵁ
(9.2)
The gaps can be interpreted similarly to the means in the units defined by the
CS and GCS
scales
.
If one or both of the subgroup means needed for the calculation is
excluded
in a given
unit

su
bject

grade

year, the gap estimate will
also
be
excluded
.
33
S
tep 9
. Pool
ed
Mean
and Gap Estimates
Pooled Mean Estimates
For each unit

subgroup, we have up to
96
subject

grade

year
mean
estimates (
8
years
,
6
grades
,
2 subjects
). We pool the estimates wit
hin a
unit
using precision

weighted random

coefficient models.
These models provide more precise estimates of average performance in a
unit
(across grades and cohorts)
╠ ⌍⎛ ⏌〈⍺⍺ ⌍⎛ 〈⎛⎥⍨⎀⌍⎥〈⎛ ⎈⌳ ⎥⍥〈 ⍛⎗⌍⌥〈 ⎛⍺⎈⎔〈 ▉⎥⍥〈 ╩⍺〈⌍⎗⎁⍨⎁⍛ ⎗⌍⎥〈╪ ⌍⎥
which scores change acr
oss grades
╠ ⏌⍨⎥⍥⍨⎁ ⌍ ⌛⎈⍥⎈⎗⎥▊ ⌍⎁⌥ ⌛⎈⍥⎈⎗⎥ ⎛⍺⎈⎔〈 ▉⎥⍥〈 ╩⎥⎗〈⎁⌥╪ ⎈⎗ ⎗⌍⎥〈 ⌍⎥
which scores change across student cohorts, within a grade).
For GSDs,
c
ounties, CZs and
m
etros, we provide both subject

specific and overall pooled estimates. For schools we provide
onl
y overall
pooled estimates.
Subject

S
pecific
P
ooled
E
stimates
.
Th
is
model allow
s
each unit

subgroup
to have a
subject

specific intercept (average
test
score), a
48
s
ubject

specific linear grade slope
s
ubject

specific linear grade slope (
⎥⍥〈 ╩⍺〈⌍⎗⎁⍨⎁⍛
rate
╪
), and a subject

specific cohort trend
(the
╩⎥⎗〈⎁⌥╪
).
W
e fit the following model
for GSDs,
counties, CZs, and metros
:
⧯
⏃
ⷳⷰⷷⷥⷠ
ⷶ
⽗
⽮
⧥
ⴿ
ⷫⷢ
⽐
⧥
ⵀ
ⷫⷢ
⽶
⥊⥖
⥏
⥖⥙⥛
ⷳⷰⷷⷥⷠ
⽑
╿╽╽▃
⏯
▂
)
⽐
⧥
ⵁ
ⷫⷢ
⽶
⥎⥙⥈⥋⥌
ⷳⷰⷷⷥⷠ
⽑
▂
⏯
▂
)
⽲
⤺
ⷠ
⽐
⽮
⧥
ⴿ
ⷣⷢ
⽐
⧥
ⵀ
ⷣⷢ
⽶
⥊⥖
⥏
⥖⥙⥛
ⷳⷰⷷⷥⷠ
⽑
╿╽╽
▃
⏯
▂
)
⽐
⧥
ⵁ
ⷣⷢ
⽶
⥎⥙⥈⥋⥌
ⷳⷰⷷⷥⷠ
⽑
▂
⏯
▂
)
⽲
⤲
ⷠ
⽐
⧾
ⷳⷰⷷⷥⷠ
⽐
⥌
ⷳⷰⷷⷥⷠ
⧥
ⴿ
ⷫⷳ
⽗
⧦
ⴿ
ⷫ
ⴿ
⽐
⥝
ⴿ
ⷫⷳ
⧥
ⵀ
ⷫⷳ
⽗
⧦
ⵀ
ⷫ
ⴿ
⽐
⥝
ⵀ
ⷫⷳ
⧥
ⵁ
ⷫⷳ
⽗
⧦
ⵁ
ⷫ
ⴿ
⽐
⥝
ⵁ
ⷫⷳ
⧥
ⴿ
ⷣⷳ
⽗
⧦
ⴿ
ⷣ
ⴿ
⽐
⥝
ⴿ
ⷣⷳ
⧥
ⵀ
ⷣⷳ
⽗
⧦
ⵀ
ⷣ
ⴿ
⽐
⥝
ⵀ
ⷣⷳ
⧥
ⵁ
ⷣⷳ
⽗
⧦
ⵁ
ⷣ
ⴿ
⽐
⥝
ⵁ
ⷣⷳ
⥌
ⷳⷷⷥⷠ
┼
⤻
⽶
╽
⏬
⧼
ⷳⷷⷥⷠ
ⵁ
)
⏭
⧾
ⷳⷷⷥⷠ
┼
⤻
(
╽
⏬
⧵
ⵁ
)
⏭
⽰
⥝
ⴿ
ⷫⷳ
⢸
⥝
ⵁ
ⷣⷳ
⽴
┼
⤺⥃⤻
(
╽
⏬
⫙
ⵁ
)
⏯
(
9
.1)
In this model,
⤺
ⷠ
is an indicator variable equal to 1 if the subject is math and
⤲
ⷠ
is an
indicator variable
equal to 1 if the subject is ELA.
⧥
ⴿ
ⷠⷳ
represents the mean test score in subject
⥉
, in unit
⥜
, in grade
▂
⏯
▂
for cohort
╿╽╽▃
⏯
▂
.
⥊⥖
⥏
⥖⥙⥛
is defined as
⥠⥌⥈
49
⥙
⽑
⥎⥙⥈⥋⥌
, so this pseudo
⥙
⽑
⥎⥙⥈⥋⥌
, so this pseudo

34
cohort and pseudo

grade represents the center of our data
╦⎛ ⍛⎗⌍⌥〈 ⌍⎁⌥ ⌛⎈⍥⎈⎗⎥ ⎗⌍⎁⍛〈⎛╠ ⎛⍨⎁⌛〈 ⎥⍥〈
middle y
ear is 2012 and the middle grade is 5.5. The
⧥
ⵀ
ⷠⷳ
parameter indicates the average
within

grade (cohort

to

cohort) change per year in average test scores in unit
⥜
in subject
⥉
;
and, the
⧥
ⵁ
ⷠⷳ
indic
ates the average within

cohort change per grade in a
verage test scores in unit
⥜
in subject
⥉
.
If the model is fit using one of the scales that standardizes scores within grades (the
⥊⥚
scale), the coefficients will be interpretable in NAEP student

lev
el standard deviation units
(relative to the specifi
c standard deviation used to standardize the scale). Between

unit
differences in
⧥
ⴿ
ⷠⷳ
,
⧥
ⵀ
ⷠⷳ
, and
⧥
ⵁ
ⷠⷳ
will be interpretable relative to this same scale. If the model
is fit using the gra
de

level scale (
⥎⥊⥚
), the coefficients will be interpretab
le as test score
differences relative to the average between

grade difference among students.
Overall
P
ooled
E
stimates
.
SEDA
3.0
also provides estimates pooled across grades, years,
and subjects
.
For GSDs, counties, CZs, and metros, t
his
model is as follows:
⥠
ⷳⷷⷥⷠ
ⷶ
⽗
⧥
ⴿ
ⷳ
⽐
⧥
ⵀ
ⷳ
⽶
⥊⥖
⥏
⥖⥙⥛
ⷳⷷⷥⷠ
⽑
╿╽╽▃
)
⽐
⧥
ⵁ
ⷳ
⽶
⥎⥙⥈⥋⥌
ⷳⷷⷥⷠ
⽑
▂
⏯
▂
)
⽐
⧥
ⵂ
ⷳ
(
⤺
ⷠ
⽑
⏯
▂
)
⽐
50
ⷳⷷⷥⷠ
⽐
⥌
ⷳⷷⷥⷠ
ⷳⷷⷥⷠ
⽐
⥌
ⷳⷷⷥⷠ
⧥
ⴿ
ⷳ
⽗
⧦
ⴿⴿ
⽐
⥝
ⴿ
ⷳ
⧥
ⵀ
ⷳ
⽗
⧦
ⵀⴿ
⽐
⥝
ⵀ
ⷳ
⧥
ⵁ
ⷳ
⽗
⧦
ⵁⴿ
⽐
⥝
ⵁ
ⷳ
⧥
ⵂ
ⷳ
⽗
⧦
ⵂⴿ
⽐
⥝
ⵂ
ⷳ
⥌
ⷳⷷⷥⷠ
┼
⤻
⽶
╽
⏬
⧼
ⷳⷷⷥⷠ
ⵁ
)
⏭
⧾
ⷳⷷⷥⷠ
┼
⤻
(
╽
⏬
⧵
ⵁ
)
⏭
⽱
⥝
ⴿ
ⷳ
⥝
ⵀ
ⷳ
⥝
ⵁ
ⷳ
⥝
ⵂ
ⷳ
⽵
┼
⤺⥃⤻
(
╽
⏬
⫙
ⵁ
)
⏯
(
9
.2)
This model allows each unit to have a unit

specific intercept
(average
test
score, pooled
over subjects), linear
grade slope (
⎥⍥〈 ╩⍺〈⌍⎗⎁⍨⎁⍛
rate
╪
at which scores change across grades,
within a cohort, pooled over subjects), cohort trend (the
╩⎥⎗〈⎁⌥╠╪ ⎈⎗
rate at which scores change
across student cohorts, within a gra
de, pooled over subjects), and
the
math

ELA difference.
Tables
8
and
9
report the variance and covariance terms from the estimated
⫙
⳦
matrices
from
the
pooling models
for GSDs, counties, CZs, and metros
.
Tables 1
0
and
1
1
report the
estimated reliabilities from these models.
35
For schools,
we estimate the
same general
model
as shown in equation (9.2)
. However,
we use different
grade
and cohort
centering. Specifically, we center
relative to
the middle grade
of the school
.
We define
the middle grade as the middle grade for which we have test s
core
estimates from
Step 5
, regardless of whether or not the school serves additional grades
or
tested
in other grades for which we could not produce estimat
51
es
.
For each school, the middle grade
es
.
For each school, the middle grade is:
⥔
⥎
ⷬ
⽗
ⶁ
(
ⷥⷰⷢⷣ
)
⻤
ⵉ
(
ⷥⷰⷢⷣ
)
⻤
ⵁ
. Cohort is centered at:
⥔
⥊
ⷬ
⽗
(
╿╽╾╿
⏯
▂
⽑
⥔
⥎
ⷬ
)
. Note that
2012.5 is the middle year of our data:
ⵁⴿⵀⵅ
ⵉ
ⵁⴿⴿⵈ
ⵁ
⽗
╿╽╾╿
⏯
▂
. We use this same middle year,
regardless of whether or not the school
was observed over that whole time period.
For
reference, the schools in
our sample tend to serve common grade spans: grades 3

5 (26,572
schools); grades 3

6 (13,330 schools); grades 3

8 (10,549 schools); grades 6

8 (12,729 schools);
and, grades 7

8 (5,426
schools). In total, schools serving these grade spans make up 85% of al
l
schools in our sample.
Table
s
12
and
13
report the variance and covariance terms from the estimated
⫙
⳦
matrices
, as well as the reliabilities,
from the
school
pooling models
.
Pooled Gap Estimates
We use the same models to pool gaps in GSDs, counties, CZs, and metros; however, the
interpretation of the parameters
differs
.
From these models, we recove
r the average test score
gap across grades and years, the rate of the gap changes over grades within cohorts, and the
trend in the gap across cohorts within grades.
Notably the pooled gaps are not identical to the d
ifference in
the p
ooled
mean estimates
.
F
or users in
terested in analyzing
pooled
achievement gaps, it is important to use the pooled gap
estimates rather than taking the difference between pooled estimates of group

specific
52
means.
For example, the pooled white

means.
For example, the pooled white

black g
ap estimate in
unit
⥜
is obtained by 1) computing the gap
(the difference in mean white and black scores) in each
unit

grade

year

subject; 2) fitting model
10.1 or 10.2 above using these gap estimates on the
left

hand
side; and 3) constructing
⧥
㐣
ⴿ
ⷳ
ⷭⷪⷱ
and
⧥
㐣
ⴿ
ⷳ
ⷣⷠ
from the estimates. This is the preferred method of computing the average gap in
unit
⥜
.
The alternative approach (taking the difference of pooled white and black
mean scores) will not
yield the same estimates. That is, th
is
preferred
appro
ach will
not yield identical estimates of
36
pooled gaps as: 1) fitting model 10.1 or 10.2 above using the white mean estimates on the left

hand side; 2) constructing
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷵ
ⷦ
ⷲ
)
ⷭⷪⷱ
and
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷵ
)
ⷣⷠ
for white students fro
m the estimates; 3)
doi
ng t
he same with black student mean scores to construct
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷠⷪⷩ
)
ⷭⷪⷱ
and
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷠⷪⷩ
)
ⷣⷠ
for black
students; and then 4) estimating gaps by subtracting
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷵ
ⷦ
ⷲ
)
ⷭⷪⷱ
⽑
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷠⷪⷩ
)
ⷭⷪⷱ
and
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷵ
ⷦ
ⷲ
)
ⷣⷠ
⽑
⧥
㐣
ⴿ
ⷳ
(
ⷰ
ⵋ
ⷠⷪⷩ
)
ⷣⷠ
. In particular, the EB shrunken mean of the gaps is not in general equal to the
difference in the EB shrunken means. The former is preferred.
OLS and EB Estimates from Pooled Models
SEDA 3.0
contains two set
53
s of estim
ates derived from
the poolin
s of estim
ates derived from
the pooling models described in
Equations (
9
.1) and (
9
.2). First are what we refer to as the OLS estimates of
⧥
ⴿ
ⷳ
⏬
⏰
⏬
⧥
ⵂ
ⷳ
. Second
are the Empirical Bayes (EB) shrunken estimates of
⧥
ⴿ
ⷳ
⏬
⏰
⏬
⧥
ⵂ
ⷳ
. The OLS estimates are the
es
timates of
⧥
ⴿ
ⷳ
⏬
⏰
⏬
⧥
ⵂ
ⷳ
that
we would get if we took the fitted values from Model (
9
.1) or (
9
.2)
and added in the residuals
⥝
ⴿ
ⷳ
⏬
⏰
⏬
⥝
ⵂ
ⷳ
. That is
⧥
㐣
ⴿ
ⷳ
ⷭⷪⷱ
⽗
⧦
ⴿⴿ
⽐
⥝
ⴿ
ⷳ
, for example. These are
unbiased estimates of
⧥
ⴿ
ⷳ
⏬
⏰
⏬
⧥
ⵂ
ⷳ
, but they may be noisy in small
units
. We obtain standa
rd
errors of these as described in Appendix
A2
.
The EB estimates are based on the fitted model as well, but they include the EB shrunken
residual. That is,
⧥
㐣
ⴿ
ⷳ
ⷣⷠ
⽗
⧦
ⴿⴿ
⽐
⥝
ⴿ
ⷳ
ⷣⷠ
, for example, where
⥝
ⴿ
ⷳ
ⷣⷠ
is the EB residual from the fitted
model. The
EB estimates are biased toward
⧦
ⴿⴿ
, but have statistical properties that make them
suited for inclusion as predictor variables or when one is interested in identi
fying outlier GSDs.
We report the square root of the posterior variance of the EB estimates
as the standard error of
the EB estimate.
For a small number of cases, we were unable to recover an estimate of the OLS SE for a
given parameter. For these, we rep
ort
only
the EB estimates of the parameter and standard
error
.
In general, the EB estimates
should
54
be used for descriptive purposes and as
be used for descriptive purposes and as predictor
variables on the right

hand side of a regression model
; they are the estimates shown on the
website
(https
://edopportunity.org)
. They should not be used as outcome variables in a
regression model
because they are shrunken estimates
. Doing so may lead to biased parameter
estimates in fitted regression models. The OLS estimates are appropriate for use as outcome
37
variables in a regression model. When using the OLS estimates as outcome
variables, we
recommend fitting precision

weighted models that account for the known error variance of the
OLS estimates.
Replicating the Pooled Estimates
Notably, we pooled non

n
oised long

form estimates prior to data suppression in
Step 10
(see below
). Users will not be able to identically replicate our pooled estimates given two
differences
between the public long files and the ones used to create the pooled estimates:
added noi
se and fewer estimates. However, the results should be largely similar.
38
Step 10
.
Suppressing
Data
for Release
Long Form Files
For the GSD, county,
CZ,
and metro long

form files, o
ur agreement with the US
Department of Education requires
(1) that all re
ported cells reflect at least 20 students; and (2)
that a small amount of random noise is added to each estimate in proportion to the sampling
variance of the respective estimate.
We (1) drop any estimate that does not reflect at least 20
students and (2)
adjust t
he SEs of the means to account for the addi
tional error.
≯
55
〈 ⌍⌥⌥〈⌥ ⎁⎈⍨⎛〈 ⍨
〈 ⌍⌥⌥〈⌥ ⎁⎈⍨⎛〈 ⍨⎛ ⎗⎈⏀⍛⍥⍺⏒ 〈⎖⏀⍨⏋⌍⍺〈⎁⎥ ⎥⎈ ⎗⌍⎁⌥⎈⎀⍺⏒ ⎗〈⎀⎈⏋⍨⎁⍛ ⎈⎁〈 ⎛⎥⏀⌥〈⎁⎥╦⎛ ⎛⌛⎈⎗〈 ⌳⎗⎈⎀
each
unit

subgroup

subject

grade

year estimate.
These measures are taken
to ensure that the
raw counts of
students in each proficiency category cannot be rec
overed from published
estimates.
The random error added to each to
unit

subgroup
estimate is drawn from a normal
distribution
⫳
(
╽
⏬
(
╾
␋
⥕
)
⟦
⧼
ⵁ
)
where
⧼
ⵁ
is the squared estimated standard error of
the
estimate and
⥕
is the number of student assessment outc
omes to which the estimate applies.
SEs of the mean are adjusted to account for the additional error. The added noise is roughly
equivalent to the amount of error that would be introduced by rand
omly removing one
⎛⎥⏀⌥〈⎁⎥╦⎛ ⎛⌛⎈⎗〈 ⌳⎗⎈⎀ 〈⌍⌛⍥
unit

subgroup

gra
de

year estimate.
In addition, we
remov
e any
imprecise individual estimates
where the CS scale
standard
error greater than 2
standard deviations.
A
ny individual estimate with
such a large
standa
rd
error is too imprecise to use in analysis.
Table
1
4
summari
z
es the cases removed in the GSD,
county, CZ, and metro long files.
Pooled Files
In the interest of discouraging the over

interpretation of imprecisely estimated
parameters, SEDA
3.0
does not
report
EB or OLS
estimates of
⧥
ⷳ
when
OLS reliabilit
ies
are
below
0.
56
7. We compute the reliability of OLS est
7. We compute the reliability of OLS estimate
⧥
㐣
ⷩⷳ
ⷭⷪⷱ
as
⸦
⻡
⸹
⸦
⻡
⸹
ⵉ
ⷚ
⻡⻫
, where
⧷
⏃
ⷩ
ⵁ
is the
⥒
ⷲ
ⷦ
diagonal
element of the estimated
⫙
⳦
matrix (the estimated true variance of
⧥
ⷩⷢ
) and
⥃
ⷩⷳ
is the square of
the estimated standard error
of
⧥
㐣
ⷩⷳ
ⷭⷪⷱ
. That is, we do not report
⧥
㐣
ⷩⷳ
ⷭⷪⷱ
if
⥃
ⷩⷳ
⽛
ⵂ
ⵆ
⧷
⏃
ⷩ
ⵁ
. For subgroups,
39
we use the same procedure
. H
owever, we use the standard error threshold determined for all
students
to censor estimates rather than calculate a subgroup

specific thr
eshold.
II
.
E
. Additional Notes
Gender
Mean and
Gap Estimates
.
Recent research reported by Reardon
, Kalogrides,
et al.
(201
9
) suggests that the magnitude of
gender achievement gaps can be impacted by the
proportion of test items that are multiple

choice versus constructed

response. As a
result,
differences in gender gaps across states (or across time when a state changes the format of its
test) may confound tr
ue differences in achievement with differences in the format of the state
test used to measure achievement. See Reardon, Fahle, et
al. (
2019
) for a description of an
analytic strategy that can be used to adjust for these potential effects.
40
III.
Covariate
Data
Construction
SEDA 3.0 contains CCD and ACS data that have been curated for use with the
school,
GSD, county, and metro
achie
vement data. SEDA 3.0 differs from the prior version of S
57
EDA in
that it uses the new crosswalk f
EDA in
that it uses the new crosswalk files to aggregate the covariates to
GSDs
and counties
, as well as
releases
school and
metro covariate data.
III.A.
ACS Data and
SES Composite Construction
For
G
SDs,
counties
and metros
, w
e use data from the ACS to construct measures of
⎀〈⌥⍨⌍⎁ ⌳⌍⎀⍨⍺⏒ ⍨⎁⌛⎈⎀〈╠ ⎔⎗⎈⎔⎈⎗⎥⍨⎈⎁ ⎈⌳ ⌍⌥⏀⍺⎥⎛ ⏌⍨⎥⍥ ⌍ ⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈 ⎈⎗ ⍥⍨⍛⍥〈⎗╠ ⎔⎗⎈⎔⎈⎗⎥⍨⎈⎁ ⎈⌳
adults that are unemployed, the household poverty rate, the proportion of househol
ds re
ceiving
SNAP benefits, and the proportion of households with children that are headed by a single
mother. We also combine these measures to construct a single socioeconomic status composite.
ACS data for districts and counties are available as 5

year
pool
ed samples, from which we
use samples from 2006

2010 through 2012

2016. The samples we use here reflect data for the
total population of residents in each unit. In select years, district

level tabulations are also
available for families who live in e
ach s
chool district in the U.S and who have children enrolled in
public school. However, the most recent sample of this data that has all of the information we
need is the 5

year 2007

2011 sample
.
W
e prefer to use the total population tabulation data from
more
recent years. We have compared measures constructed using the total population samples
and the relevant children enrolled in public schools samples
in years wh
58
ere both samples are
available
and the
ere both samples are
available
and the measures are highly correlated
(
r
� 0.99)
and not sen
sitiv
e to which sample we
use.
The construction of our derived measures from the ACS data occurs in a variety of steps,
which we describe below
. Our derivation of these measures is complicated by the fact that we
use
the ACS

reported
margins of error to c
omput
e empirical Bayes shrunken versions of our key
ACS measures. The shrunken measures help account for attenuation bias that results from the
⌳⌍⌛⎥ ⎥⍥⌍⎥ ⎛⎀⌍⍺⍺〈⎗ ⏀⎁⍨⎥⎛╦ ⎀〈⌍⎛⏀⎗〈⎛ ⍨⎁⌛⍺⏀⌥〈 ⎀⎈⎗〈 ⎀〈⌍⎛⏀⎗〈⎀〈⎁⎥ 〈⎗⎗⎈⎗ ⌥⏀〈 ⎥⎈ ⎛⎀⌍⍺⍺〈⎗ ⎛⌍⎀⎔⍺〈
sizes.
Appendix
B
2
descri
bes t
he problems of measurement error and attenuation bias in detail. Below
we describe the steps we take to create our derived measures from the raw ACS data:
41
Step 1:
We download and clean the raw ACS data for each year
and unit
, saving the
measures of intere
st along with their margins of error.
We use data from the 2006

2010, 2007

2011, 2008

2012, 2009

2013, 2010

2014, 2011

2015, and 2012

2016 samples. We were unable
to locate
all the necessary margins of error for the 2005

2009
sample so do not use those dat
a
here. In Appendix
B
1
we provide a list of the raw ACS data tables we downloaded and use to
compute each derived
measure.
Step 2:
Some of our derived measures require combining various fields from ACS in order
to compute ou
r desired m
59
etric. For example,
in order to compute
etric. For example,
in order to compute the proportion of adults with a
⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈 ⎈⎗ ⍥⍨⍛⍥〈⎗ ⏌〈 ⎛⏀⎀ ⎥⍥〈 ⎁⏀⎀⌚〈⎗ ⎈⌳ ⎀〈⎁ ⏌⍨⎥⍥ ⌍ ⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈╠ ⌍ ⎀⌍⎛⎥〈⎗╦⎛
⌥〈⍛⎗〈〈 ⎈⎗ ⌍ ⎔⎗⎈⌳〈⎛⎛⍨⎈⎁⌍⍺ ⌥〈⍛⎗〈〈 ⏌⍨⎥⍥ ⎥⍥〈 ⎁⏀⎀⌚〈⎗ ⎈⌳ ⏌⎈⎀〈⎁ ⏌⍨⎥⍥ ⌍ ⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈╠
⌍ ⎀⌍⎛⎥〈⎗╦⎛
degree or a professi
onal degree and divide that sum by the total number of adults in the unit.
Each of these component measures is reported with its own margin of error in the raw ACS data.
We use the margins of error from each component measure
to generate a single standard
error
⌳⎈⎗ ⎥⍥〈 ⌛⎈⎀⌚⍨⎁〈⌥ ⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈 ⌍⎥⎥⌍⍨⎁⎀〈⎁⎥ ⎗⌍⎥〈
variable (and do the same for all 6
socioeconomic measures that make up the SES composite). Appendix
B
3 describes our
methodology for computing the sampling variance o
f sums of ACS variables in deta
il.
Step 3:
After constructing the 6 SES measures and their standard err
ors we impute some
⎀⍨⎛⎛⍨⎁⍛ ⌥⌍⎥⌍ ⏀⎛⍨⎁⍛ ≩⎥⌍⎥⌍╦⎛
╿
mi impute chained
╿
routine, which fills in missing values iteratively by
using chained equations. We res
hape the data from long (one ob
servation for each unit and race
group [all, white, black and Hispanic] in each ye
60
ar) to wide (one observation for each un
ar) to wide (one observation for each unit and a
separate variable for each of the 6 SES by race measures in each year). We use both the 6 SES
measures and their standard er
rors in the imputation model as well as the total population
count in each unit. The imputation model, therefore, includes median income, proportion of
⌍⌥⏀⍺⎥⎛ ⏌⍨⎥⍥ ⌍ ⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈 ⎈⎗ ⍥⍨⍛⍥〈⎗╠
child
poverty rate, SNAP recei
pt rate, single mother
headed h
ousehold rate, and unemployment rate for each race group (all, white, black, Hispanic)
in each of
7

year
span
s for both the estimates and their standard errors. We estimate the
imputation model 5 times.
Step 4:
Next we use the imputed data to compute the
SES composite. This is done 5 times
for each imputed data set and then we take the average. This measure is computed as the first
42
principal component score of the following measures (each standardized): median in
come,
percent of adults ages 25 and older wi
⎥⍥ ⌍ ⌚⌍⌛⍥〈⍺⎈⎗╦⎛ ⌥〈⍛⎗〈〈 ⎈⎗ ⍥⍨⍛⍥〈⎗╠
child
poverty rate, SNAP
receipt rate, single mother headed household rate, and employment rate for adults ages
16

64.
We use the logarithm of median income in these computations. We calculate the component
loadings by con
ducting the analysis in 2008

2012 at th
e GSD level and weighting by GSD
enrollment. We then use the loadings from this principal component analysis to calculate SES
composite values for different subgroups, years and unit
61
s.
Note that only observations with
ou
s.
Note that only observations with
out
any imputed ACS data are used in th
e computation of the factor weights.
Table 1
5
shows the component loadings for the socioeconomic status composite as well as
the mean and standard de
viation of each measure
⍨⎥ ⍨⎁⌛⍺⏀⌥〈⎛╣ ≯⍥〈 ╩⎛⎥⌍⎁⌥⌍⎗⌥⍨⏗〈⌥ ⍺⎈⌍⌥⍨⎁⍛⎛╪
indicate the coefficient
s used to compute the overall GSD SES composite score from the 6
standardized indicator variables
in 2008

2012
, resulting in an SES composite that has an
enrollment

we
ighted mean of 0 and sta
ndard deviation of 1 across all GSDs
in 2008

2012
without any imputed data
╣ ≯⍥〈 ╩⏀⎁⎛⎥⌍⎁⌥⌍⎗⌥⍨⏗〈⌥ ⍺⎈⌍⌥⍨⎁⍛⎛╪ ⌍⎗〈 ⎗〈

scaled versions of the
coefficients that are used to construct an SES composite score from the raw (unstandardized)
ind
icator variables, but wh
ich is on the same scale as the standardized SES composite scores.
To provide context for interpreting values of the SES composite
,
Table 16
reports average
values of the indicator variables at different values of
the SES composite.
Step 5:
The next step is to construct a standard error of the SES composite. We discuss
our method
ology in
detail in
Appendix
B4
.
Step 6:
The final step is to do the empirical Bayes shrinking for the SES composites as well
as for each
of the 6 SES measures that go into making the composite. In addition to the time

varying versions of the SES composite
, we also create an SES composite that is the
62
average of
SES in the 2007

2011 and 2
average of
SES in the 2007

2011 and 2012

2016 ACS (i.e., using years with non

overlapping s
amples). The
shrinkage is done using a random effects meta

analysis regression model weighted by the
standard error of
each measure.
43
III.B. Common Core of Data Imputation
School

level data from the CCD are available from
Fall
1987 until
Fall
2015. There i
s some
missing data on racial composition and free/reduced price lunch receipt for some schools in
some years. We there
fore impute missing data on race/ethnicity and free/reduced priced lunch
counts at the school level prior to aggregating data to the GSD
,
county
, or metro
level. The
imputation model includes school

level data from the 1991

92 through 2015

16 school
years
and measures of total enrollment, enrollments by race (black, Hispanic, white, Asian, and Native
American), enrollments by free and reduc
ed

priced lunch receipt (note that reduced

priced
lunch is only available in 1998 and later), an indicator for wh
ether the school is located in an
urban area, and state fixed effects. To improve the imputation of free and reduced

priced lunch
in more recen
t years we also use the proportion of students at each school that are classified as
economically disadvantaged i
n the ED
Facts
data for 2008

09 through 2015

16 in the imputation
model. Different states use different definitions of economically disadvantage
d but these
measures are highly correlated with free lunch rates from the CCD (r=.90). The imputations are
estima
⎥〈⌥ ⏀⎛⍨⎁⍛ ⎔⎗
63
〈⌥⍨⌛⎥⍨⏋〈 ⎀〈⌍⎁
〈⌥⍨⌛⎥⍨⏋〈 ⎀〈⌍⎁ ⎀⌍⎥⌛⍥⍨⎁⍛ ⍨⎁ ≩⎥⌍⎥⌍╦⎛
╿
mi impute chained
╿
routine, which fills in
missing values iteratively by using chained equations
. The idea behind this method is to impute
variables iteratively using a sequence of univariate imputation models
, one for each imputation
variable, with all variables except the one being included in the prediction equation on the
right

hand
side. This me
thod is flexible for imputing data of different types. For more information, see:
https://www.stata.com/manuals13/mi.pdf
.
Prior to the imputation, we make three changes to the reported raw CCD data. Fi
rst, for
states with especially high levels of missing free and
reduced

price
lunch data in recent years, we
sear
ched state department of education websites for alternative sources of data. We were only
able to locate the appropriate data for Oregon and Oh
io. For these states we replace CCD counts
of free and
reduced

price
lunch receipt with the counts reported in st
ate department of
education data for 2008

09 through 2015

16. In Ohio, 8% of schools were missing CCD free
lunch data in 4 or more of the 7 ED
F
acts
years. In Oregon, 5% of schools were missing CCD free
lunch data in 4 or more of the 7 ED
Facts
years. Other
states with high rates of missing free lunch
data in the CCD during the ED
Facts
years are Alaska, Arizona, Montana, Texas, and Idaho.
44
Unfortuna
tely, we were unable to locate alternative data sources for these states, and rely on
the imputation model to fi
64
l
l in missing data.
Second, starting
l
l in missing data.
Second, starting in the 2011

12 school year some states began using community
eligibility for the delivery of school meals
whereby all students attending schools in low

income
areas would have access to free meals regardless of their in
dividual household income. Free
lunch counts in schools in the community eligibility program are not reported in the same way
nation

wide in th
e CCD. In community eligible schools, some schools report that all of their
students are eligible for free lunch
while others report counts that are presumably based on the
individual student

level eligibility. Because reported free lunch eligible rates of
100 percent in
community eligible schools may not accurately reflect the number of children from poor families
i
n the school, we impute free lunch eligible rates in these schools. We replace free and reduced
priced lunch counts as equal to missing if the
school is a community eligible program school in a
given year and their reported CCD free lunch rate is 100 perce
nt. We then impute their free
lunch eligible rate as described above.
Third, and finally, prior to imputation we replaced free and
reduced

pri
ce
lunch counts as
missing if the count was equal to 0. Anomalies in the CCD data led some cases to be reported a
s
zeros when they should have been missing so we preferred to delete these 0 values and impute
them using other years of data from that school.
The structure of the data prior to imputation is wide
╿
that is, there is one variable for
each year for any g
iven measur
65
e (i.e., total enrollment 1991, total en
e (i.e., total enrollment 1991, total enrollment 1992, total
〈⎁⎗⎈⍺⍺⎀〈⎁⎥ ◸☀☀◺╠ ╤╠ ⎥⎈⎥⌍⍺ 〈⎁⎗⎈⍺⍺⎀〈⎁⎥ ◹◷◸◼▊ ⌳⎈⎗ ⌍⍺⍺ ⎥⍥〈 ⎀〈⌍⎛⏀⎗〈⎛ ⌥〈⎛⌛⎗⍨⌚
ed above. The exception
are time invariant measures
╿
urbanicity and state. We impute 6 datasets and use the aver
age of
the 6 imputed value
s for each school in each year.
45
IV.
Versioning
and Publication
New or revised data will be posted periodically to th
e SEDA website. SEDA updates that
contain substantially new information are labeled as a new version (e.g.
V1.0, V2.0, etc.
).
Updates that make corrections or minor revisions to previously posted data ar
e labeled as a
subsidiary of the current version (e.g
. V1.1, V1.2, etc.). When citing any SEDA data set for
presentation, publication or use in the field, please include the version number in the citation. All
versions of the data will remain archived and
available on the SEDA website to facilitate data
veri
fication and research replication.
SEDA
3.0
makes the following additions to
data contained in
SEDA 2.
1, we now release
:
⊃
Pooled estimates of the average test scores in schools with at least 20 students
across grades and years.
⊃
Subject

grade

year (long) es
timates of the average test scores for all students and by
student subgroups for metropolitan statistical areas and commuting zones.
⊃
Subject

grade

year (long) e
stimates of
the
average test scores by
econ
omic
disadvantage
, includ
66
ing estimated achievement gaps between
ing estimated achievement gaps between
non

disadvantaged
and
disadvantaged
students.
SEDA
3.0
makes the following
modifications
to
the procedures used in
SEDA 2.
1
:
⊃
We changed the estimation procedure for all uni
ts to use the pool
ed HETOP model
ra
ther than the original HETOP model. When constraining estimates, this model
draws on information from the
same unit
, rathe
r than different units. We believe that
this improves our mean estimates in
units
where
some cells
do not have suffic
ient
data to esti
mate a unique standard deviation.
⊃
≿〈 ⌥⎈ ⎁⎈⎥ ⌍⌥⌥ ⌍⎁⏒ ⌍⌥⌥⍨⎥⍨⎈⎁⌍⍺ ⎁⎈⍨⎛〈 ⎥⎈ ⎥⍥〈 ╩⎔⎈⎈⍺╪ ⌳⍨⍺〈⎛ ⎔〈⎗ ⌍ ⎗〈⏋⍨⎛〈⌥ ⌍⍛⎗〈〈⎀〈⎁⎥ ⏌⍨⎥⍥
the NCES. We also now release pooled estimates for units with at least 20 unique
students (across grades/years), rather t
han requiring at
least 20 students within each
grade/year.
46
⊃
Prior to estimation, we now remove cases where more than 40% of students take
alternate assessments.
We also do not report estimates for unit

subgroups with more
than 20% of students taking alternate assessments.
⊃
A
ll test score and covariate data files hav
e been updated to reflect
updates to the
crosswalk file (described in
Step 1
), including:
o
M
inor corrections
.
o
A new policy for districts that reorganize during the time frame of our data.
o
We use stable county iden
tifiers, in case
s
where we observe that a district
is
placed in mul
67
tiple
counties during the years in our
tiple
counties during the years in our sample. The district is
assigned to the county
it is
observed in during
the 2015

16 school year.
47
References
Reardon, S. F., Fahle, E. M., Kalogrides, D., Podolsky, A., & Zrate, R. C. (2019). Gender
Achievement Gaps in U.S. School Districts.
American Educational Research Journal
,
000283121984382.
https://doi.org/10.3102/00028312198438
24
Reardon, S. F., & Ho, A. D. (2015). Practical issues in estimating achievement gaps from
coarsened data.
Journal of Educational and Behavioral Statistics
,
4
0
(2), 158
╿
189.
https:
//doi.org/10.3102/1076998615570944
Reardon, S. F., Kalogrides, D., & Ho, A. D. (
Forthcoming
).
Validation methods for aggregate

level
test scale linking: A case study mapping school district test score distributions to a
common scale.
J
ournal of Educational
and Behavioral Statistics.
Reardon, S. F., Kalogrides, D., Fahle, E. M., Podolsky, A., & Zrate, R. C. (2018). The relationship
between test item format and gender achievement gaps on math and ELA tests in fourth
and eighth grades. Ed
ucational Researcher,
1
╿
11.
https://doi.org/10.3102/0013189X18762105
Reardon, S. F., Shear, B. R., Castellano, K. E., & Ho, A. D. (2017). Using heteroskedastic ordered
probit models to recover moments of continuous test score distributions from coarsened
d
ata.
Journal of
Educational and Behavioral Statistics
, 42(1), 3
╿
45.
https://doi.org/10.3102/1076998616666279
Shear
, B
. R.
& Reardon, S
.
F.
(2019)
Using Pooled Heteroskedas
tic Ordered Probit Models to
68
Improve Small

Sample Estimates of La
Improve Small

Sample Estimates of Latent Test Score Distributions
.
CEP
A Working P
aper
No. 19

05.
Retrieved from Stanford Center for Education Policy Analysis:
http://cepa.stanford.edu/wp19

05
48
Tables
Table
1
.
Test Score Files
Notes:
Metric
:
CS = Cohort Scale; GCS = Grade Scale
Unit
Metro = Metropolitan
S
tatistical Area; CZ = Commuting Zone
Academic Years
:
2008/09
╿
2014/16
Grades
:
3
╿
8
Subjects
:
Math, ELA
Race
:
white, black, Hispanic, and Asian
Race Gaps
:
white

black, white

Hispanic, white

Asian
Gender:
male, female
Gender
Gaps:
ma
le

female
ECD:
economically disadvantaged, not disadvantaged (as defined by states)
ECD Gaps:
not disadvantaged

economically disadvantaged
49
Table
2
.
Covariate Data Files
50
Table
3
. Example
ED
Facts
Data Structure
51
Table
4
.
State

Subject

Year

Gra
de
Data Not Included in SEDA
3.0
Note: Year is spring of year, so 2016 is the 2015

16 school year.
52
Table
5
. Individual GSDs Removed Prior to Estimation
due to Data Errors
53
Table
6
. NAEP Means and Standard Deviations by
Year and Grade.
Note: Table sho
ws the interpolated national NAEP estimates. We use the expanded population estimates, which may differ slightly from those r
eported publicly
on the website.
54
Table
7
.
Subject

Grade

Year Cases
Removed Pre

Es
timation
55
Table
8
. GSD and County Variances an
69
d
Covariances
Note
: GSD = Geograp
d
Covariances
Note
: GSD = Geographic district; CZ = Commuting zone; CS = cohort scale; GCS = grade

cohort scale; wht = white;
blk = black; hsp = Hispanic; asn = Asian; m = male; f = female; wag =
white

Asian gap; wbg = white

black gap; whg =
white

Hispani
c gap; mfg = male

female gap; tau = variance; rel = reliability
56
Table
9
. CZ and Metro Variances and Covariances
Note
: GSD = Geographic district; CZ = Commuting zone; CS = cohort scale; GCS = g
rade

cohort scale; wht = white;
blk = black; hsp = Hispanic
; asn = Asian; m = male; f = female; wag = white

Asian gap; wbg = white

black gap; whg =
white

Hispanic gap; mfg = male

female gap; tau = variance; rel = reliability
57
Table 1
0
.
GSE and Coun
ty Reliabilities
Note
: GSD = Geographic district; CZ = Commuti
ng zone; CS = cohort scale; GCS = grade

cohort scale; wht = white;
blk = black; hsp = Hispanic; asn = Asian; m = male; f = female; wag = white

Asian gap; wbg = white

black gap; whg =
white

Hi
spanic gap; mfg = male

female gap; tau = variance; rel = reliabil
ity
58
Table 1
1
.
CZ and Metro Reliabilities
Note
: GSD = Geographic district; CZ = Commuting zone; CS = cohort scale; GCS = grade

cohort scale; wht = white;
blk = black; hsp = Hispanic; asn =
Asian; m = male; f = female; wag = white

Asian gap; wbg = white

black gap; whg =
white

Hispanic gap; mfg = male

female gap; tau = variance; rel = reliability
59
Table
12
. School Pooling Model Variances and Covariances
Note: CS = cohort scale; GC
70
S =
grade

cohort scale
S =
grade

cohort scale
60
Table
13
. School Pooling Model Reliabilities
Note: CS = cohort scale; GCS = grade

cohort scale
61
Table
14
. Suppressed Estimates
by Unit Post

Estimation
, Long Form Data for GSDs,
Counties
CZs, and Metros
62
Table 1
5
. Compone
nt Loadings and Summary Statistics for Socioeconomic Status Composite Construction.
63
Table 1
6
. Summary
Statistics
at Different Values of the
Socioeconomic Status Composite.
64
Figures
Figure
1
. SEDA
3.0
Construction Process.
65
Appendi
c
es
Appendix
A
: Additional Detail on
Statistical Methods
1
. Estimating County

Level Means and Standard Deviations
This section briefly describes how means, standard deviations, and standard e
rrors are
estimated for counties
and metros
. As described above, we first estimate GSD

level means and
standard deviations. We then estimate the county
, CZ, and metro
means as weighted averages
of the GSD means and the county
, CZ, and metro
standard deviat
ions as estimates of total
variance within a county
, CZ, or metro
based on the GSD means and standard
deviations.
The county
, CZ, and
metro aggregates are estimated within subjects, grades, and years.
Let
⧯
⏃
ⷢ
and
⧵
ⷢ
be the estimated means and standard deviations for the
⤱
GSD units
⥋
⽗
╾
⏬
⏰
⏬
that will be aggregated for a given county
, CZ,
or metro. We a
lso have estimates of the standard
errors for each mean and standard deviation,
71
⥚⥌
(
⧯
⏃
ⷢ
)
and
⥚⥌
(
⥚⥌
(
⧯
⏃
ⷢ
)
and
⥚⥌
(
⧵
ⷢ
)
. We do not include grade,
subject, year, or state subscripts here for clarity.
We estimate aggregate county
, CZ,
or metro means independently for each agg
regate
unit. To estimate the aggregate parameters we make the simplifying assumpti
on that
⥊⥖⥝
⽶
⧯
⏃
ⷧ
⏬
⧯
⏃
ⷨ
)
⽗
⥊⥖⥝
⽶
⧵
ⷧ
⏬
⧵
ⷨ
)
⽗
⥊⥖⥝
(
⧯
⏃
ⷧ
⏬
⧵
ⷧ
)
⽗
╽
for
⥐
⽘
⥑
. The derivations for these expressions are
based on the formulas in the
a
ppendix of Reardon et al. (2017) used to estimate to overall mean
and variance
of a set of groups in the HETOP model. Let
⥗
ⷢ
⽗
⥕
ⷢ
◎
⥕
ⷢ
ⷈ
ⷢ
ⵋ
ⵀ
⽗
⥕
ⷢ
⤻
ⷡ
be the proportion of all students in the aggregate unit
⥊
that are in GSD
⥋
.
We estimate the
aggregate
mean for aggregate unit
⥊
as the weighted average of
the GSD estimated means,
⧯
⏃
ⷡ
⽗
⥗
ⷢ
⧯
⏃
ⷢ
ⷈ
ⷢ
ⵋ
ⵀ
⏬
with an estimated standard error of
⥚⥌
(
⧯
⏃
ⷡ
)
⽗
⾴
㑉
⥗
ⷢ
ⵁ
▹
⥚⥌
(
⧯
⏃
ⷢ
)
ⵁ
㑊
ⷈ
ⷢ
ⵋ
ⵀ
⏯
66
We estimate the standard deviation for aggregate unit
⥈
as the square root of the sum
of
the
estimated between and within

GSD variance,
⧵
ⷡ
⽗
⾴
㑉
⥗
ⷢ
(
⧯
⏃
ⷢ
⽑
⧯
⏃
ⷡ
)
ⵁ
⽐
⥘
ⷢ
⧵
ⷢ
ⵁ
㑊
ⷈ
ⷢ
ⵋ
ⵀ
⏬
with the associated estimated standard error
⥚⥌
(
⧵
ⷡ
)
⽗
⾳
⥡
ⷡ
⟦
⽷
╾
⧵
ⷡ
⽻
⏯
In these expressions we define
⥘
72
ⷢ
⽗
(
⥗
ⷢ
⽐
(
⥕
ⷢ
⽑
╾
ⷢ
⽗
(
⥗
ⷢ
⽐
(
⥕
ⷢ
⽑
╾
)
⥕
ⷢ
)
(
⥗
ⷢ
╾
⽐
╿
(
╾
╿
⥕
̃
ⷡ
)
⽽
⏬
⥕
̃
ⷡ
⽗
⽰
⽷
╾
⤱
⽻
⽷
╾
⥕
ⷢ
⽑
╾
⽻
ⷈ
ⷢ
ⵋ
ⵀ
⽴
ⵊ
ⵀ
⏬
and
⥡
ⷡ
⽗
㑉
(
⥗
ⷢ
ⵁ
(
⧯
⏃
ⷢ
⽑
⧯
⏃
ⷡ
)
ⵁ
⥚⥌
(
⧯
⏃
ⷢ
)
ⵁ
)
⽐
(
⥘
ⷢ
ⵁ
▹
⧵
ⷢ
ⵁ
▹
⥚⥌
(
⧵
ⷢ
)
ⵁ
)
㑊
ⷈ
ⷢ
ⵋ
ⵀ
⏯
67
2
.
Constructing OLS
Standard Errors
from Pooled Models
In the SEDA
3.0
data, we release the OLS and EB estimates of the intercept and grade
slope, as well as their standard errors, from the pooled models described in Section
9
. The
recovery of the OLS SEs is not straightforwa
rd from HLM. In order to recover these, we perform
the estimation in two steps and calculate the OLS SEs post

estimation.
The remainder of this section describes the method and computational implementation.
The equations are written to correspond to the po
oling model shown in equation
9
.2; however,
this p
rocedure is the same for the other variant of our pooling models.
Step 1.
We estimate
⧵
ⵁ
using the three

level model described in equation
9
.2 and define:
⨁
ⷢⷰⷷⷥⷠ
ⵁ
⽗
⧵
ⵁ
⽐
⧼
ⷢⷰⷷⷥⷠ
ⵁ
(
A

2
.1)
Where
⧼
ⷢⷰⷷⷥⷠ
ⵁ
is the variance of the
⥠
ⷢⷰⷷⷥⷠ
ⷶ
estimate (either
⧯
or
⧵
).
We assume that
⧵
ⵁ
is a
very precise estimate because of the large amount of data in the model.
Step 2.
We then reweight the data and estimate a
two

level HLM
73
model:
Level

1
:
⨁
ⷢⷰⷷ
model:
Level

1
:
⨁
ⷢⷰⷷⷥⷠ
ⵊ
ⵀ
⥠
ⷢⷰⷷⷥⷠ
ⷶ
⽗
㑉
⧥
ⴿ
ⷢ
⧥
ⵀ
ⷢ
⧥
ⵁ
ⷢ
⧥
ⵂ
ⷢ
㑊
⣚
⣙
⣙
⣙
⣘
⨁
ⷢⷰⷷⷥⷠ
ⵊ
ⵀ
⨁
ⷢⷰⷷⷥⷠ
ⵊ
ⵀ
⽶
⥊⥖
⥏
⥖⥙
⥛
ⷢⷰⷷⷥⷠ
⽑
╿╽╽▃
⏯
▂
)
⨁
ⷢⷰⷷⷥⷠ
ⵊ
ⵀ
⽶
⥎⥙⥈⥋⥌
ⷢⷰⷷⷥⷠ
⽑
▂
⏯
▂
)
⨁
ⷢⷰ
ⷷ
ⷥⷠ
ⵊ
ⵀ
⽶
⥔⥈⥛
⥏
ⷢⷰⷷⷥⷠ
⽑
⏯
▂
)
⣝
⣜
⣜
⣜
⣛
⽐
⨁
ⷢⷰⷷⷥⷠ
ⵊ
ⵀ
⥌
ⷢⷰⷷⷥⷠ
Level

2
:
⧥
ⴿ
ⷢ
⽗
⧦
ⴿⴿ
⽐
⧰
ⴿ
ⷢ
⧥
ⴿ
ⷢ
⽗
⧦
ⵀⴿ
⽐
⧰
ⵀ
ⷢ
⧥
ⴿ
ⷢ
⽗
⧦
ⵁⴿ
⽐
⧰
ⵁ
ⷢ
⧥
ⴿ
ⷢ
⽗
⧦
ⵂⴿ
⽐
⧰
ⵂ
ⷢ
(
A

2
.2)
After estimation, the HLM residual file contains the OLS a
nd EB estimates, as well as the
posterior variance matrices,
⪍
ⷢ
ⷉⷆ
, for each
GSD
. From the model, we also recover an estimate of
⫙
ⵁ
. Using
⪍
ⷢ
ⷉⷆ
and
⫙
ⵁ
, we can calculate the standard errors
of the OLS estimates for each
GSD
as
the i
nverse of:
68
(
⪍
ⷢ
ⷓⷐ
)
ⵊ
ⵀ
⽗
(
⪍
ⷢ
ⷉⷆ
)
ⵊ
ⵀ
⽑
⫙
ⵊ
ⵁ
⏯
(
A

2
.3)
69
Appendix B: Covariates
1.
List of Raw ACS Tables Used for SES Composite
70
71
72
2.
Measurement
Error, Attenuation Bias and Solutions
Formally, attenuation bias can be specified as
follows.
As an example, c
onsider the true
relationship
between race

specific achievement and socioeconomic status
we would like
74
to
estimate:
⥆
ⷥ
⽗
⧥
ⴿ
ⷥ
to
estimate:
⥆
ⷥ
⽗
⧥
ⴿ
ⷥ
⽐
⧥
ⵀ
ⷥ
⽶
⥀⤲
⥀
ⷥ
)
⽐
⧨
ⷥ
(
B

2.
1)
Where
Y
is white or non

white minority achievement in a
unit (district, county, or
metropolitan area)
(
g
indexes group), and
SES
is the average socioeconomic status of the group.
Race specific SES is measured with error and measurement error w
ill be larger in
units
with
relatively smaller sample sizes of non

wh
ite minorities. Thus, the data we observe are
⥄
ⷥ
⽗
⥀⤲
⥀
ⷥ
⽐
⧨
ⷥ
. In this case, the bias in
⧥
ⵀ
ⷥ
is known as attenuation bias. This bias can by quantified
⌚⏒ ⎀⏀⍺⎥⍨⎔⍺⏒⍨⎁⍛ ⌚⏒ ⎥⍥〈 ⏋⌍⎗⍨⌍⌚⍺〈╦⎛
reliability
⧮
⽗
ⷴⷰ
⽶
ⷉ
⻝
)
ⷴⷰ
⽶
ⷉ
⻝
)
ⵉ
⸤
⸸
⸹
, i.e. the true variance of the varia
ble
⥀⤲
⥀
ⷥ
relative to the true variance plus the variance of the measurement error.
To address attenuation bias, we use regression calibration, which makes use of t
he fact
that the measurement error in
⥀⤲
⥀
ⷥ
(and consequently
⥀⤲⥀⤴⥈⥗
) ar
e known from Census data.
11
Regression calibration is a method that replaces the error

prone variable
⥄
with its best linear
prediction (blp). The best linear predictor of
⥀⤲⥀⤴⥈⥗
can be defined as:
⥀⤲⥀
⥗
ⷥ
ⷠⷪⷮ
⽗
⤲
⽶
⥀⤲
⥀
ⷥ
)
⽐
⥊⥖⥝
⽶
⥀⤲
⥀
ⷥ
⏬
⥄
ⷥ
)
⥝⥈⥙
⽶
⥄
ⷥ
)
(
⥄
ⷥ
⽑
⤲
⽶
⥄
ⷥ
)
)
⽗
⧯
⽐
⥊⥖⥝
⽶
⥀⤲
⥀
ⷥ
⏬
⥀⤲
⥀
ⷥ
⽐
75
ⷥ
)
⧵
ⷉ
⻝
ⵁ
⽐
⧵
ⷥ
)
⧵
ⷉ
⻝
ⵁ
⽐
⧵
ⷥ
ⵁ
⽶
⥄
ⷥ
⽑
⧯
)
11
Specifically, the ACS reports margins of error which can be easily converted standard err
ors for each Census
variable. Appendix B3: Computing the sampling variance of sums of ACS variables provides a full description o
f
how standard errors for cross

tabulated
Census data are constructed.
73
⽗
⧯
⽐
⧮
⽶
⥄
ⷥ
⽑
⧯
)
(
B

2.
2)
Note that
⥀⤲
⥀
ⷥ
ⷠⷪⷮ
⍨⎛ ╩⎛⍥⎗⏀⎁⍷〈⎁╪ ⎥⎈⏌⌍⎗⌥⎛ ⎥⍥〈 ⎀〈⌍⎁ ⏋⌍⍺⏀〈 ⎈⌳
⥀⤲
⥀
ⷥ
as
a function of
⧮
which, recall,
is equal to the reliability of the variable
⥀⤲
⥀
ⷥ
and can be estimated as a random effect (or
empirical Bayes estimate) from a generalized linear
model.
Now, we show that regressing
⥆
ⷥ
on
⥀⤲
⥀
ⷥ
ⷠⷪⷮ
resul
ts in consistent estimates of
⧥
ⵀ
ⷥ
.
⥊⥖⥝
(
⥆
ⷥ
⏬
⧯
⽐
⧮
⽶
⥄
ⷥ
⽑
⧯
)
)
⥝⥈⥙
(
⧯
⽐
⧮
⽶
⥄
ⷥ
⽑
⧯
)
)
⽗
⥊⥖⥝
⽶
⥆
ⷥ
⏬
⧮
⥄
ⷥ
)
⧮
ⵁ
(
⧵
ⷉ
⻝
ⵁ
⽐
⧵
ⷥ
ⵁ
)
⽗
⥊⥖⥝
⽶
⥆
ⷥ
⏬
⥀⤲
⥀
ⷥ
)
⧮
(
⧵
ⷉ
⻝
ⵁ
⽐
⧵
ⷥ
ⵁ
)
⽗
⥊⥖⥝
⽶
⥆
ⷥ
⏬
⥀⤲
⥀
ⷥ
)
⧵
ⷉ
⻝
ⵁ
⽗
⧥
ⵀ
ⷥ
(
B

2.
3)
74
3.
Computing the sampling variance of sums of ACS variables
In each
unit
we are given counts in
⤸
cells:
⥕
╾
ⷢ
⏬
⥕
╿
ⷢ
⏬
⏰
⏬
⥕⤸
ⷢ
; we also know total counts
⥛
ⷢ
; we als
76
o have margins of error
of the counts
o have margins of error
of the counts
⤺⥖⤲
⽶
⥕
╾
ⷢ
)
⏬
⤺⥖⤲
⽶
⥕
╿
ⷢ
)
⏬
⏰
⏬
⤺⥖⤲
⽶
⥕⤸
ⷢ
)
⏯
We then compute the sampling variances of the
⥝⥈⥙
⽶
⥕⥒
ⷢ
)
⽗
[
⤺⤼⤲
⽶
⥕⥒
ⷢ
)
╾
⏯
▃▁▂
]
ⵁ
from these we compute
⥗⥒
ⷢ
⽗
⥕⥒
ⷢ
⥛
ⷢ
and
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⽗
⥝⥈⥙
⽶
⥕⥒
ⷢ
)
⥛
ⷢ
ⵁ
⏯
We
do not
know the sampling rate in
unit
⥋
╡ ⍺〈⎥╦⎛ ⌛⌍⍺⍺ ⍨⎥
⥙
ⷢ
.
If the estimates come from a
simple random sample,
we would
have
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⟦
⽗
⥗⥒
ⷢ
(
╾
⽑
⥗
⥒
ⷢ
)
⥙
ⷢ
⥛
ⷢ
The estimated design e
ffect in district
⥋
for variable
⥒
is then
⤱⥒
ⷢ
⽗
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⟦
We can compute the average design effect in
unit
⥋
as
⤱
ⷢ
⽗
╾
⤸
⤱⥒
ⷢ
ⷩ
ⵋ
ⵀ
Now we compute
75
⤽
ⷢ
⽗
╾
⥛
ⷢ
⥕⥒
ⷢ
ⷩ
ⵋ
ⵀ
⽗
⥗⥒
ⷢ
ⷩ
ⵋ
ⵀ
We want to know
⥝⥈⥙
⽶
⤽
ⷢ
)
.
If we had a simple random sample,
we would
have
⥝⥈⥙
⽶
⤽
ⷢ
)
⟦
⽗
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
Given the design effect in
unit
⥋
, however,
we would
expect this to be inflated by a factor
⤱
ⷢ
.
So,
we have:
⥝⥈⥙
⽶
⤽
ⷢ
)
⽗
⤱
ⷢ
⥝⥈⥙
⽶
⤽
ⷢ
)
⟦
⽗
⤱
ⷢ
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
77
╾
⤸
⤱⥒
ⷢ
ⷩ
╾
⤸
⤱⥒
ⷢ
ⷩ
ⵋ
ⵀ
⽴
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
⽰
╾
⤸
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⟦
ⷩ
ⵋ
ⵀ
⽴
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
⽰
╾
⤸
⥙
ⷢ
⥛
ⷢ
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⥗⥒
ⷢ
(
╾
⽑
⥗
⥒
ⷢ
)
ⷩ
ⵋ
ⵀ
⽴
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
⽰
╾
⤸
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⥗⥒
ⷢ
(
╾
⽑
⥗
⥒
ⷢ
)
ⷩ
ⵋ
ⵀ
⽴
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⽗
⽰
╾
⤸
╾
⥕⥒
ⷢ
ⷩ
ⵋ
ⵀ
⽴
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⽗
╾
⥕
̃
ⷢ
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
where
⥕⥒
ⷢ
=
ⷮⷩ
⻚
(
ⵀ
ⵊ
ⷮ
ⷩ
⻚
)
ⷴⷰ
⽶
ⷮⷩ
⻚
)
is the effective sample size in cell
⥒
in
unit
⥋
(the sample size
⥕⥒
ⷢ
such
that
ⷮⷩ
⻚
(
ⵀ
ⵊ
ⷮ
ⷩ
⻚
)
ⷬⷩ
⻚
⽗
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
), and
⥕
̃
ⷢ
⽗
(
ⵀ
◎
ⵀ
ⷬⷩ
⻚
ⷩ
ⵋ
ⵀ
)
ⵊ
ⵀ
is the harmonic mean of the effective
76
sample sizes across cells within
unit
⥋
. Note that
ⷬ
̃
⻚
ⷲ
⻚
⽗
⥙
⏌
ⷢ
is the harmonic mean of the effective
sampling rate across cells within
⥋
.
An alternate a
pproach is to assume a common d
esign effect across
units
⥝⥈⥙
⽶
⤽
ⷢ
)
⽗
⤱
ⷢ
⥝⥈⥙
⽶
⤽
ⷢ
)
⟦
⽗
⤱
ⷢ
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
⤱
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
w
here
⤱
⽗
ⵀ
ⷘ
◎
⥛
ⷨ
⤱
ⷨ
ⷎ
ⷨ
ⵋ
ⵀ
is the aver
78
age design effect
across
units
(weig
age design effect
across
units
(weighted by
unit
size to
increase precision). We can write
⤱
⽗
╾
⥁
⥛
ⷨ
⤱
ⷨ
ⷎ
ⷨ
ⵋ
ⵀ
⽗
╾
⥁
⥛
ⷨ
⽰
╾
⤸
⥙
ⷨ
⥛
ⷨ
⥕⥒
ⷨ
ⷩ
ⵋ
ⵀ
⽴
ⷎ
ⷨ
ⵋ
ⵀ
⽗
⥛
ⷨ
⥁
⥙
ⷨ
⥙
⏌
ⷨ
ⷎ
ⷨ
ⵋ
ⵀ
So then,
⥝⥈⥙
⽶
⤽
ⷢ
)
⽗
⤱
ⷢ
⥝
⥈
⥙
⽶
⤽
ⷢ
)
⟦
⽗
⤱
ⷢ
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
⤱
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
77
⽗
[
⥛
ⷨ
⥁
⥙
ⷨ
⥙
⏌
ⷨ
ⷎ
ⷨ
ⵋ
ⵀ
]
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
⽗
[
⥛
ⷨ
⥁
⥙
ⷨ
⥛
ⷢ
⥙
⏌
ⷨ
⥛
ⷢ
ⷎ
ⷨ
ⵋ
ⵀ
]
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
ⷢ
⥛
ⷢ
Assume
⥙
ⷨ
is constant across
units
and assume the effective sampling rate in
unit
⥑
is
independent of the
unit
size
⥛
ⷨ
; then this simplifies to
⥝⥈⥙
⽶
⤽
ⷢ
)
⽗
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥛
ⷢ
⥙
⏌
⏬
where
⥙
⏌
⽗
[
⥛
ⷨ
⥁
╾
⥙
⏌
ⷨ
ⷎ
ⷨ
ⵋ
ⵀ
]
ⵊ
ⵀ
i
s
the (weighted) harmonic mean of the effective sampling rates. We can compute
⥙
⏌
without
knowing the actual sampling rates:
⥙
⏌
⽗
⣚
⣙
⣙
⣙
⣘
⥛
ⷨ
⥁
╾
╾
⥛
ⷨ
(
╾
⤸
◎
⥝⥈⥙
⽶
⥗⥒
ⷨ
)
⥗⥒
ⷢ
⽶
╾
⽑
⥗
⥒
ⷨ
)
ⷩ
ⵋ
ⵀ
)
ⵊ
ⵀ
ⷎ
ⷨ
ⵋ
ⵀ
⣝
⣜
⣜
⣜
⣛
ⵊ
ⵀ
⽗
[
⥛
ⷨ
ⵁ
⥁
⽸
╾
⤸
⥝⥈⥙
⽶
⥗⥒
ⷨ
)
⥗⥒
ⷢ
⽶
╾
⽑
⥗
⥒
ⷨ
)
ⷩ
ⵋ
ⵀ
⽼
ⷎ
ⷨ
ⵋ
ⵀ
]
79
ⵊ
ⵀ
T
o recap, we have two approac
ⵊ
ⵀ
T
o recap, we have two approaches to compute the sampling variance of
⤽
ⷢ
:
1.
For each
unit
, compute the harmonic mean of the effective sample s
ize
⥕
̃
ⷢ
⽗
⽸
╾
⤸
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⥗⥒
ⷢ
(
╾
⽑
⥗
⥒
ⷢ
)
ⷩ
ⵋ
ⵀ
⽼
ⵊ
ⵀ
78
then
⥃⥈⥙
⽶
⤽
ⷢ
)
⽗
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥕
̃
ⷢ
⏯
Or:
2.
Compute the weighted harmonic
mean of the effective sampling rate across
units
(
using
any of these formulas, all
identical):
⥙
⏌
⽗
[
⥛
ⷨ
⥁
╾
⥙
⏌
ⷨ
ⷎ
ⷨ
ⵋ
ⵀ
]
ⵊ
ⵀ
⽗
⽰
⥛
ⷢ
ⵁ
⥁
⽸
╾
⤸
⥝⥈⥙
⽶
⥗⥒
ⷢ
)
⥗⥒
ⷢ
(
╾
⽑
⥗
⥒
ⷢ
)
ⷩ
ⵋ
ⵀ
⽼
ⷈ
ⷢ
ⵋ
ⵀ
⽴
ⵊ
ⵀ
⽗
[
╾
(
╾
⏯
▃▁▂
ⵁ
)
⥁⤸
⤺⥖⤲
⽶
⥕⥒
ⷢ
)
ⵁ
⥗⥒
ⷢ
(
╾
⽑
⥗
⥒
ⷢ
)
ⷩ
ⵋ
ⵀ
ⷎ
ⷢ
ⵋ
ⵀ
]
ⵊ
ⵀ
then
⥃⥈⥙
⽶
⤽
ⷢ
)
⽗
⤽
ⷢ
(
╾
⽑
⤽
ⷢ
)
⥙
⏌
⥛
ⷢ
⏯
The first approach allows a different design effect in each
unit
, but the design effect is
probably noisily estimated, so will have more noise in the estimated sampling variances. The
second assumes a
common design effect across
units
.
Our decision cr
iteria for generating
sampling variances is as follows:
1.
When
⤸
⽗
╾
and
⤽
ⷢ
⽛
╽
, use the sampling variance provided by ACS, i.e.,
⥝⥈⥙
(
⥗
⏃
ⷢ
)
⽗
ⷴⷰ
(
ⷬ
⻚
)
ⷲ
⻚
⸹
79
2.
When
⤸
⽗
╾
and
⤽
ⷢ
⽗
╽
, us
e the sa
80
mpling variance method 2, i.e.,
⥃⥈
mpling variance method 2, i.e.,
⥃⥈⥙
⽶
⤽
ⷢ
)
⽗
ⷔ
⻚
(
ⵀ
ⵊ
ⷔ
⻚
)
ⷰ
⏌
ⷲ
⻚
,
where
⤽
ⷢ
⽗
ⵀ
ⷲ
⻚
.
3.
When
⤸
⽛
╾
and
⤽
ⷢ
⽛
╽
, use the sampling variance method 2, i.e.,
⥃⥈⥙
⽶
⤽
ⷢ
)
⽗
ⷔ
⻚
(
ⵀ
ⵊ
ⷔ
⻚
)
ⷰ
⏌
ⷲ
⻚
4.
When
⤸
⽛
╾
and
⤽
ⷢ
⽗
╽
⏬
use the sampling variance method 2, i.e.,
⥃⥈⥙
⽶
⤽
ⷢ
)
⽗
ⷔ
⻚
(
ⵀ
ⵊ
ⷔ
⻚
)
ⷰ
⏌
ⷲ
⻚
,
where
⤽
ⷢ
⽗
ⵀ
ⷲ
⻚
.
80
4
.
Estimating sampling variance of composite SES measures
Let
⤑
⼰
ⷢ
be the vector of 6 va
riables we use to construct th
e SES composite in
unit
⥋
. Let
⤐
ⷢ
be the diagonal matrix containing the standard errors of
⤑
ⷢ
.
12
Our estimated SES composite (
⥀
) in
un
it
⥋
is
⥀
㐣
ⷢ
⽗
⤑
⼰
ⷢ
⣻
⏬
where
⣻
is a
▃
⽓
╾
vector of unstandardized coefficients. The sampling vari
ance of
⥀
㐣
ⷢ
is
⥝⥈⥙
⽶
⥀
㐣
ⷢ
)
⽗
⣻
㏼
⤏
ⷢ
⣻
⏬
where
⤏
ⷢ
is the covariance matrix of
⤑
ⷢ
. We know the diagonal elements of
⤏
ⷢ
(
⤐
ⷢ
); but not
the off

diagonals. We need to know
⤏
ⷢ
to get the standard error of
⥀
㐣
ⷢ
. How
can we compute
⤏
ⷢ
?
Define
⤋
ⷢ
,
the correlation matrix describing the correlations of the estimates
⤑
ⷢ
. If we
knew
⤋
ⷢ
, then we can get
⤏
ⷢ
⽗
⤐
ⷢ
⤋
ⷢ
⤐
ⷢ
⏯
T
he key is getting an estimate of
⤋
ⷢ
.
W
e can use PUMS data to es
ti
81
mate
⤋
empirically (via
bootstrapp
mate
⤋
empirically (via
bootstrapped samples). We do this as follows:
a.
Set
⤻
⽗
▂
⏬
╽╽╽
, and
⤷
⽗
╾
⏬
╽╽╽
(or some other values)
b.
Pick PUMA
⥒
.
c.
From all families in PUMA
⥒
, draw a random sample of
⤻
families.
12
Note that we get the standard errors of these variables from ACS. The exception is ln(median income), as we get a
standard error for median income. Let
⤺
ⷢ
be the estimated medi
an income in unit
⥋
. The Delta method gives us
⥚⥌
⽮
⊙⊛
⽶
⤺
ⷢ
)
⽲
⽙
╾
⤺
ⷢ
⥚⥌
⽶
⤺
ⷢ
)
⏯
81
d.
Compute
⤑
ⷩ
from the micro

data (so if
⤑
incl
udes ln(median income), then
estimate ln(median income)
in
PUMA
⥒
from the sample, and likewise for the
6 variables we include in
⤑
).
e.
Repeat (c) and (d)
⤷
times for PUMA
⥒
.
f.
Estimate
⤋
ⷩ
ⷆ
from the
⤷
samples
g.
Repeat (b)

(f) for all PUMAs
⥒
⽗
╾
⏬
⏰
⏬
⤸
.
h.
Repeat (b)

(g) for each race/ethnic group
⥙
to get
⤋
ⷩⷰ
ⷆ
. We might need to set
⤻
⽗
╾
⏬
╽╽╽
for race

ethnic groups, because race samples are smaller in each
PUMA.
N
ext we
examine how
⤋
ⷩ
and
⤋
ⷩⷰ
vary across PUMAs and race/et
hnic groups. If
⤋
ⷩ
and
⤋
ⷩⷰ
are relatively constant across PUMAs and subgroups, we can just use a single common value
of
⤋
for all
units
and subgroups.
We find that they are generally similar, so we use a common
⤋
in
Please download the presentation after appearing the download area.
Download Pdf  The PPT/PDF document "Stanford Education Data Archive" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, noncommercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.