-
https://clinton.presidentiallibraries.us/files/original/e4f20ea9633090b804a50b1b69a92e7e.pdf
68fbe5437fadd54429d3e6d159ce83d4
PDF Text
Text
July 5, 2000
Dear Colleague:
Thank you for your comments and continued interest in the draft document entitled "The Use of Tests
When Making High-Stakes Decisions for Students: A Resource Guide for Educators and Policymakers"
(the "testing guide"), which is being prepared by the U.S. Department of Education's Office for Civil
Rights (OCR). I am pleased to provide you with the enclosed, revised draft for your further
consideration.
As you know, the Department strongly supports efforts to promote high-standards for all students and to
increase accountability in education. These efforts, including the use of tests, must be done in a manner
consistent with federal civil rights laws, which reinforce sound educational practices. The purpose of the
enclosed draft testing guide is to provide educators and policymakers with a practical tool that
summarizes key test-measurement and legal principles that should inform the use of tests when making
high-stakes decisions for students ..
This testing guide is being developed in close consultation with the education community to ensure its
accuracy and usefulness. The first draft of the testing guide was released in April 1999 and was the
subject of substantial comments leading to extensive revisions. The second draft was released in
December 1999 and once again received substantial comments. The draft also has been independently
reviewed by the National Academy of Sciences' Board on Testing and Assessment (BOTA), which held a
hearing earlier this year to discuss the draft testing guide and issued a letter report in June 2000
commenting on the draft.
The enclosed draft testing guide is the third draft released for public comment, this time with notice of
availability published in the Federal Register. This draft seeks to respond to comments received since the
release of the second draft, including comments from external stakeholders as well as BOTA. The
comments OCR received on the second draft were largely positive, and the foundations and structure of
the draft testing guide have remained essentially the same. However, numerous edits have been made
throughout the guide in response to comments seeking to clarify, make more accurate, and/or expand key.
sections. For example:
.
• Substantial effort has been made to more fully and accurately reflect the 1999 Standards on
Educational and Psychological Testing (the "Joint Standards"), which were released during
completion of the second draft of the testing guide. The Joint Standards are widely viewed
as the primary technical authority on educational test-measurement issues, and they form the
foundation of the test-measurement discussion in the draft testing guide.
• Key legal principles have been clarified and/or expanded, including principles related to
"educational necessity" and to the use of tests for students with limited English proficiency
and students with disabilities.
• Language has been added or deleted as appropriate to clarify that the primary focus of the
guide is on the use of tests when making high-stakes decisions for students at the elementary
and secondary level, though the guide makes clear that the general principles are applicable in
the higher education context as well.
�Dear Colleague
July 5;2000
Page 2
We would greatly appreciate your comments on the enclosed third draft of the testing guide. The
comment period is 30 days, meaning all comments must be received by August 7, 2000. Comments
should be sent to:.
,
Jeanette Lim
U.S. Department of Education
400 Maryland Avenue, SW
Room 5212
Switzer Building .
Washington, D.C. 20202-1100
Based on the comments we receive on the third draft, OCR will make appropriate edits and produce a
final version of the testing guide. We expect to publish a final version in August-September 2000.
Thank you again for your assistance in this important effort. Additional copies of the draft testing guide
are available on OCR's websiteat http://www.ed.gov/offices/OCRltesting or by calling (800) 421-3481.
If you have any immediate questions or concerns, please feel free to contact me or David Berkowitz at
(202) 205-5526.
Sincerely,
Scott R. Palmer
Deputy Assistant Secretary .
for Civil Rights
Enclosure
�FOR INTERNAL USE ONLY
July 5, 2000
U.S. ,DEPARTMENT OF EDUCATION, OFFICE FOR CIVIL RIGHTS
The Use of Tests 'When Making High-Stakes Decisions for Students:
A Resource Guide for Educators and Policymakers
TALKING POINTS
Introduction: OCR Has Prepared a Third Draft of Its Testing Guide for Public Comment
• The U.S. Department of Education's Office for Civil Rights (OCR) is in the process of producing a
guide concerning the proper use of tests when making high-stakes decisions for students. "High
stakes" decisions are those that have significant consequences for students, such as placement,
promotion, and graduation decisions. The draft guide provides an overview of existing test
measurement principles and well-established federal non-discrimination laws related to the use of
tests for such high-stakes purposes. The guide covers not only the laws that OCR enforces but also
key constitutional and test-measurement issues to provide a more complete picture for educators and
policymakers.
• The guide is being developed in close and extensive consultation with the education community,
including several rounds of comments and meetings with educators, parents, teachers, business
leaders, policymakers, test publishers, and others. A first draft was released in April 1999 and was
the subject of substantial comments leading to extensive revisions. A second draft was released in
December 1999. It was well-received and was again the subject of substantial comments. In
addition, the draft was reviewed by the U.S. Department of Justice, Civil Rights Division, and by the
National Research Council's Board on Testing and Assessment (BOTA), which held a hearing on the
draft testing guide earlier this year and issued a letter report in June 2000. (BOTA's letter report can
be found on the internet at: http://books.nap.edulbooksIN1000224Ihtmlll.htmL)
• The third draft of the testing guide seeks to respond to the comments received since release of the
second draft. On July 6, OCR will publish in the Federal Register a notice of availability and
opportunity for comments on the third draft. (OCR also sent cop~es to parties who commented on
previous drafts.) There is a 30 day comment period on the third draft, ending August 7, 2000. The
draft guide will be available on OCR's Internet web site, at http://www.ed.gov/offices/OCRltesting.
It is also available upon request by contacting OCR at (800) 421-3481.
• During the weeks of July 10 and/or July 17, 2000, OCR will offer short briefings with elementary and
secondary, post-secondary, and civil rights groups, as well as Congressional staff, to introduce the
third draft of the testing guide.
• After reviewing comments received on the third draft testing guide and making appropriate changes,
OCR will issue the guide in final form and provide a notice of availability in the Federal Register. A
final version of the testing guide is expected In August-September 2000.
Background: Purpose of the Guide
• The purpose of the guide is to provide a practical resource for educators and policymakers to ensure
that tests used for high-stakes decisions are developed and implemented in ways that are
educationally sound and legally appropriate, and thereby promote the complementary goals of
excellence and equity in education.
• Throughout the 1990s, our nation has embraced the goal of promoting high standards for all students.
Federal non-discrimination laws fully support this goal.
1
�..
.
FOR INTERNAL USE ONLY
July 5, 2000
• Tests, meaning various kinds of educa'tional assessments, can play an important role in promoting
high standards for all students. Many states and other. actors, therefore, are increasing their use of
tests, including tests for high-stakes purposes. Federal non-discrimination laws support this use of
tests when done in valid, educationally appropriate ways.
• In short, there is substantial alignment between sound educational principles and federal non
discrimination laws when it comes to the use of tests for high-stakes purposes.
• However, there is currently a lack of guidance concerning established test-measurement principles
and legal standards that should inform the use of tests for high-stakes decisions. This guide is
intended to provide that vital information, including a summary of key test-measurement and legal
principles, and lists of additional resources for educators and policymakers.
The Third Draft: Significant Revisions Made Since The Release of the December 1999 Draft
• The comments received on the second draft of the testing guide were largely positive, and the
foundations and structure of the draft guide have remained essentially the same. However, numerous
changes have been made throughout the guide in response to comments seeking to clarify, make more
accurate, and/or expand key sections. For example:
~
Substantial effort has been made to more fully and accurately reflect the 1999 Standards on
Educational'and Psychological Testing (the "Joint Standards"), which were released during
completion of the second draft of the testing guide. The Joint Standards are widely viewed as the
primary technical authority on educational test-measurement issues, and they form the foundation
of the test-measurement discussion in the draft testing guide. To ensure accuracy, all standards
from the Joint Standards document cited in the guide have also been quoted in full, either in the
text or footnotes. Also, readers are repeatedly encouraged to review the Joint Standards for
additional information about the various test-measurement topics discussed in the guide.
~ Key
legal principles have been clarified and/or expanded, including principles related to
"educational necessity" and to the use of tests for students with limited English proficiency and
students with disabilities. .
~ Language has been added or deleted as appropriate to clarify that the primary focus of the guide
is on the use of tests when making high-stakes decisions for students at the elementary and
secondary level, though the guide makes clear that the general principles are applicable in the
higher education context as well.
~ Language
has been added consistent with the Joint Standards indicating the importance of
alignment between what primary and 'secondary students are taught and what material is covered
on tests used for high-stakes purposes, specifically promotion and graduation purposes.
Language has also been added consistent with the Joint Standards cautioning that a single test
should generally not be used as a sole criterion for making high-stakes decisions unless valid for
that purpose; additional relevant information should be considered if it would enhance validity.
~ Additional
examples have been added to the test-measurement chapter to aid in the readability of
the document.
For further information, please contact Scott Palmer or David Berkowitz at (202) 205-5526.
2
�Bethany Little
08/08/2000 07:49:55 AM
Record Type:
To:
Record
Caroline S. Chang/OPD/EOP@EOP, Kendra L. Brooks/OPD/EOP@EOP
cc:
Subject: OCR's Draft Testing· Guide: Release of Third Draft for Public Com
ment
This morning I was going through my "to do" list, and stumbled across this. It's actually. pretty important
that we review this carefully -- the first time this draft guidance was leaked we got a few articles with the
basic headline "White House forbids use of SAT." Clearly not a good thing ... Anyhow, can y'all please
take a look? ,I'll try to review today/tomorrow and we can all meet on this Thursday. Thanks!
---------------------- Forwarded by Bethany Little/OPD/EOP on 08108/2000 07:46 AM ----------------~----------
"Palmer, Scott" <Scott_Palmer@ed.gov>
07105/200001 :32:30 PM
Record Type:
To:
Record
See the distribution list at the bottom of this
messag~
cc:
See the distribution list at the bottom of this message
Subject: OCR's Draft Testing Guide: Release of Third Draft for Public Com
ment
As you may know, OCR is making available tomorrow, Thursday, July 6, for
public comment the third draft or our guide on the proper use of tests when
making high-stakes decisions for students. The draft has a 30-day comment
period, ending August 7, 2000, complete with notice of availability being
published tomorrow in the Federal Register. To maintain the good and open
process that folks have widely praised, copies of the third draft are being
sent to stakeholders who have commented on previous drafts. Copies will
also be available on the web at http://www.ed.gov/offices/OCR/testing, and
by calling OCR customer service at (800) 421-3481 .
.The purpose of the draft testing guide is to provide educators and
policymakers with a practical tool that summarizes key test~measurement and
legal principles that should inform the use of tests when making high-stakes
decisions for students, such as promotion, placement, and graduation
decisions. (The guide has always been focused primarily on the K-12 context
(though the general prinCiples are applicable to the higher education
context as well), and the third draft makes that pOint even more clear.)
The message is that the Department strongly supports efforts to promote
high-standards and accountability in education, and these efforts, including
the use of tests, should be done in a manner consistent with federal
nondiscrimination laws, which reinforce sound educational praCtices. It is
a positive message, and the second draft of the guide, which was r.eleased in .
�D~cember 1999, was largely well-received. OCR did, however, receive
sUbstantial comments to which we have sought to respond in the third draft.
The draft has also been independently reviewed within ED, by DOJ, and by the
National Academy of Sciences' Board on Testing and Assessment (BOTA).
Thanks to the good work of various ED offices, DOJ, BOTA, and more, the
draft testing guide has gotten even better and should not cause great
controversy in terms of substance. However, the issue of testing is hot and
timely, and various parties will likely be interested. We are working
within ED to schedule some short briefings for next week with K-12, higher
education, and Civil rights groups, as well as Hill staff, to introduce the
third draft of the testing guide if folks are interested. We are also
working with ED public affairs as appropriate.
Meanwhile, attached for your reference are (1) internal talking points
regarding the release of the third draft, (2) a cover letter that will
accompany the copies being sent to stakeholders, and (3) a copy of the third
draft of the testing guide (though the web version will likely be better
formatted and should be used externally). If you have any questions or
concerns, please contact me or David Berkowitz at (202) 205-5526. Thanks.
«Policy-Testing Guide-Talking Points-July 2000.doc»
«Policy-Testing Guide-Cover letter for draft #3-main letter. doc»
«Policy-Testing Guide-July 6 Draft.doc»
.
11-
Policy-Testing Guide-Talking Points-July 2000.doc
- Policy-Testing Guide-Cover letter for draft #3-main letteLdoc·
Policy-Testing Guide-July 6 Draft.doc
Message Sent To:
�"Holleman, Frank" <Frank_Holieman@ed.gov>
"Joshi, Sejal" <SejaLJoshi@ed.gov>
"Ramirez, Heidi" <Heidi_Ramirez@ed.gov>
"Rossi, Diane" <Diane_Rossi@ed.gov> .
"Tucker. Ben" <Ben_Tucker@ed.gov>
"Winston. Judith" <Judith_Winston@ed.gov>
"Winnick. Steve" <Steve_Winnick@ed.gov>
"Craig. Susan" <Susan_Craig@ed.gov>
"Lahring. Karl" <Karl_Lahring@ed.gov>
"Kole. Adina" <Adina_Kole@ed.gov>
"Jenkins. Kimberly" <KimberILJenkins@ed.gov>
"Cohen. Mike" <Mike_Coheri@ed.gov>
"Johnson. Judith" <Judith_Johnson@ed.gov>
"Heumann. Judy" <JudLHeumann@ed.gov>
"Warlick. Kenneth" <Kenneth_Warlick@ed.gov>
"McGuire. Kent" <Kent_McGuire@ed.gov>
"Phillips. Gary" <Gary_Phillips@ed.gov>
"Goldstein. Arnold" <Arnold_Goldstein@ed.gov>
"Cole. Arthur" <Arthur_Cole@ed.gov>
"Wohl. Alexander" <Alexander_Wohl@ed.gov>
. "Heine. Roberta" <Roberta_l-:Ieine@ed.gov>
"Lyon. Tom" <Tom_Lyon@ed.gov>
"Murphey. Rodger" <Rodger_Murphey@ed.gov>
"Fleming, Scott" <Scott_Fleming@ed.gov> .
"Kelley. Thomas" <Thomas_Kelley@ed.gov>
"Rairdin, Kae" <Kae_Rairdin@ed.gov>
'''anita_hodgkiss@usdoj.gov·'' <anita_hodgkiss@usdoj.gov>
Peter RundleUWHO/EOP
Bethany Little/OPD/EOP
John B. Buxton/OPO/EOP
Message Copied To:
�.
.
.
"Cantu, Norma V" <Norma_V_Cantu@ed.gov>
"Pierce, Raymond" <Raymond_Pierce@ed.gov>
"Patterson, Lindsay" <LindsaLPatterson@ed.gov>
"Jackson, John H" <John_Jackson2@ed.gov>
"Lim, Jeanette" <Jeanette_Lim@ed.gov>
"Serkowi~, David" <David_Serkowitz@ed.gov>
"Fitch, Rebecca" <RebeccaJitch@ed.gov>
"Kopriva, Rebecca" <Rebecca_Kopriva@ed.gov>'
"Wolkowitz, Sarbara" <Sarbara_Wolkowitz@ed.gov>
"Sowers, Susan" <Susan_Sowers@ed.gov>
"Lewis, Cathy H" <CathLH_Lewis@ed.gov>
"Dorka,Lilia(l" <~ilian_Dorka@ed.gov>
"Tosado, Rebekah" <Rebekah_Tosado@ed.gov>
. "Cramolini, Steve" <Steve.:,.Cramolini@ed.gov>
: "Slayton, Lester" <Lester_Slayton@ed.gov>
"Hibino, Thomas" <Thomas_Hibino@ed.gov>
"Whitney, Helen" <Helen_Whitney@ed.gov>
"Fox, Wendella" <Wendella_Fox@ed.gov>
"McGovern, Linda" <Unda_McGovern@ed.gov>
"Orris, Harry" <Harry_Orris@ed.gov>
"Sennett, Angela" <Angela_Sennett@ed.gov>
"Walker, Gary" <Gary_Walker@ed.gov>
"August, Taylor" <Taylor_August@ed.gov>
"Wender, Alice" <Alice_Wender@ed.gov>
"Gutierrez, Lillian" <Lillian_Gutierrez@ed.gov>
"Rosenzweig, Stefan" <Stefan_Rosenzweig@ed.gov>
"Jackson, Gary" <Gary_Jackson@ed.gov>
�The Use of Tests When·
Making High-Stakes
Decisions for Students:
A Resource Guide for
Educators and Policymakers .
u.s. Departnlent of Education
Office for Civil Rights
. Draft ..... ~ .."
.................. Draft .................. ~ ........Draft
July 6, 2000
�UNITED STATES DEPARTMENT OF EDUCATION
OFFICE FOR CIVIL RIGHTS
. THE ASSISTANT SECRETARY
(I
Dear Colleague:
Adherence to good test use practices in education is a shared goal of government
officials, policymakers, educators, parents, and students. In an era of school reforms that
place increasing emphasis'on measures of accountability, such as tests used for high
stakes purposes for individual students,· the need to provide practical information about
good testing practices is well documented. In January 1999, the National Research
Council observed that we in the education community should work to better disseminate
information related to good testing practices with a focus on the standards of testing
professionals and the relevant legal principles that, together, "reflect many common
concerns."
The points of alignment between sound educational policies and judgments and federal
nondiscrimination laws compellingly illustrate the symmetry between the goals of
promoting educational excellence for all stl,ldents and ensuring that educational practices
do not - intentionally or otherwise -. unfairly deny educational opportunities to
students based upon their race, national origin, sex or disability. In short, federal civil
rights law affirms good test use practices. As a result, an understanding of the
measurement principles related to the use of tests for high-stakes purposes is an essential
foundation to better understanding the federal legal standards that are significantly
informed by those measurement principles.
In order to further the goal of accurate and fair judgments in high-stakes decision making
that involves the use of tests, we are pleased to provide you with this copy of The Use 0/
Tests When Making High-Stakes Decisions/or Students: A Resource Guide/or Educators
and Policymakers. This guide provides important information about the professional
standards relating to the use of tests for high-stakes purposes, the relevant federal laws
that apply to such practices, and references that can help shape educationally sound and
. legally sufficient testing practices .
• As explained throughout the guide, the primary focus is the use of standardized tests or assessments (referrcd to in thc
guide as tests) llsed to make dccisions with important conscquences for individual students. Examples of high-stakes
decisions include: student placement in gifted and talented programs or in programs serving students with limited
English proficiency; determinations of disability a~d eligibility to receive special education services; student promotion
from one grade level to another; graduation from high school and diploma awards; and admission decisions and
scholarship awards. The guide does not address teacher-created tests that are used for individual classroom purposes.
�There are few simple or definitive answers to questions about the use of tests for high
stakes purposes. Tests are a means to an end and, as such, can be understood only in the
c,ontext in which they are used. The education context - in which the relationship (and
attendant obligations) of the educator to the student is frequently more complex than that
between employer and employee - shows time and again that any decision regarding the
legality of a use of a test for high-stakes purposes under federal nondiscrimination law
cannot be made without regard to the educational interest's and judgments upon which the
test use is premised.
Background
Throughoutthe 1990s, national, state and local education leaders have focused on raising
education standards and establishing strategies to promote accountability within the
education community. In fact, the promotion of challenging learning standards for all
students - coupled with assessment systems that monitor progress and hold schools
accountable - has been the centerpiece ofthe education policy agenda of the federal
government as well as many states.
Predictably, the number of states using tests as a condition for high school graduation is
on the rise, with (by a recent estimate) 26 states projected to use tests as conditions for
graduation by 2003 and six states now using tests as conditions for grade promotion, a
significant increase from past years. At the same time, more and more educators and
policymakers have requested advice and technical assistance from the U.S. Department
of Education regarding test use in the context of standards reforms.
The Department's Office for Civil Rights (OCR) is also addressing testing issues in a
more extensive array of complaints ,of discrimination being filed with our office, most of
them in a K-12 setting with implications for high-standards learning. OCR has
responsibility for enforcing Title VI ofthe Civil Rights Act of 1964, Title IX of the
Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and Title
II of the Americans with Disabilities Act of 1990. These statutes prohibit discrimination
on the basis of race, color, national origin, sex, and disability by educational institutions
that receive federal funds.
In a similar vein, institutions in the post-secondary community in recent years have
engaged in a thoughtful dialogue and analysis regarding merit in admissions and the
appropriate use oftests to establish foundations for high-stakes admissions decisions. In
some states, the use of tests in connection with admissions decisions' has been an
important element in public post-secondary education reform.
These trends highlight the salience of two recent conclusions ofthe National Research
Council (NRC) Board on Testing and Assessment. In January of this year, the NRC
observed that too many policymakers and educators are not aware of the test
measureinent standards that should inform testing policies and practices. These standards
include the Standards for Educational and Psychological Tests, prepared by a joint
committee of the American Psychological Association (AP A), the American Educational
Research Association (AERA), and the National Council on Measurement in Education
(NCME). The NRC also concluded that it "is essential that educators and policymakers
Draft 6112/00
11'
�alike be aware of both the letter of the laws and their implications for test takers and test
users" [National Research Council, High Stakes: Testing/or Tracking, Promotion and
Graduation, (Heubert and Hauser, eds., 1999)].
The Resource Guide'
Toward this end, OCR has prepared this guide in an effort to assemble the best
infonnation regarding psychometric standards, lega1 principles, and resources to help
educators and policymakers frame strategies and programs that promote learning to high '
standards in ways consistent with federal non-discrimination law. Our goal is to infonn
decisions related to the use oftests that have high-stakes consequences for students when,
for instance, they move from grade to grade or graduate from high school. Just as we
know that good test use practices can advance high standards for learning and equal
opportunity, we know'that educationally inappropriate uses of tests do not. If we want
this generation oftest-taking students and their teachers and schools to meet high
standards, then we should insist that the tests they take meet high standards. As
foundations for judgments that profoundly shape the lives of students, these tests must be
used in ways that accurately reflect educational standards and that do not inappropriately
deny opportunities to students based on their race, national origin, sex or disability.
The guide is organized to provide practical guidance related to the use of tests for high
stakes purposes. The Introduction to the guide provides a broad, conceptual overview of
relevant principles so that those who are not familiar with test measurement principles or
applicable federal law can better understand the kinds of issues that relate to the use of
tests in many contexts
from grade-to-grade promotion to college admissions. Chapter
one of the guide provides a detailed discussion of the test measurement principles that
can provide a foundation for making well-inforined decisions related to high-stakes
testing.' The relevant principles that have been approved by the APA, AERA, and NCME
are discussed in detail in this chapter. Adherence to relevant professional standards can
help reduce the risk of legal liability when schools are usinK.assessments.for high-stakes
purposes. Chapter two provides an overview of~e existing legal principles that have
guided federal courts and OCR when analyzing claims of race, national origi,n, sex, and
disability discrimination related to the use oftests as foundations in high-stakes decisions.
affecting students. These principles, as applied by the courts and OCR, underscore the
importance of adhering to educationally sound testing practices. The Appendix includes
a Glossary of Test Measurement Tenns, a Glossary of Legal Tenns, a Compendium of
Federal Nondiscrimination Laws, and a Resources and References section.
Central Principles
There are several central principles reflected in the text of this guide.
First, federal nondiscrimination laws are consistent with the establishment of high
standards of learning for all students and educationally sound practices designed to meet
that goal. The goals of promoting high educational standards and ensuring
nondiscrimination are complemen~ary objectives. Indeed, if the federal courts that have
applied civil rights statutes to education cases teach us anything, it is that compliance '
with federal nondiscrimination standards rests in the first instance upon the school's
Draft 6/12/00
III
�educational judgments, and that those judgments deserve d~ference.· Not surprisingly, the
ultimate questions posed by our resource guide on the use of tests for high-stakes
purposes also center on educational sufficiency: Is the test valid for the purposes used?
Are the inferences derived from test scores, and the high-stakes decisions based on those
inferences, accurate and fair? These inquiries are not an effort to dumb down academic
standards or alter core education objectives integral to academic admissions or other
educational decisions. Rather, they focus the educator and policymaker on ensuring that
uses of tests with consequences for students are educationally sound and legally
appropriate.
Second, federal nondiscrimination laws support the use oftests, including large-scale
standardized tests, when they are used in valid, reliable, and educationally appropriate
ways. Importantly, tests can help indicate inequalities in the kinds of educational
opportunities students are receiving, and in tum, they may stimulate efforts to ensure that
all students have equal opportunity to achieve high standards. When tests accurately
indicate performance gaps, our concern should be with the quality of educational
opportunities afforded to under-performing students - rather than the integrity of the test
itself. The key question in the context of standards-based reforms and the use of tests as
measures of student accountability is: Have all students in certain school districts been
provided quality instruction, sufficient resources, and the kind of learning environment
that would foster success?
Third, a test score disparity among groups of students does not alone constitute
discrimination under federal law. The guarantee under federal.law is for equal
opportunity, riot equal results. Test results indicating that groups of students perform
differently should be a cause for further inquiry and examination, with a focus upon the
relevant educational programs and testing practices at issue. Differences in test scores
may result from a range of factors, some of which a school may be able to influence, and
others over which it has little control. Federal law recognizes this point, as it must. The
legal non-discrimination standard regarding neutral practices (referred to by the courts as
the "disparate impact" standard) provides that if the education decisions based upon test
scores reflect 'statistically significant disparities based on race, national origin, or sex in
the kinds of educational benefits afforded to students, then questions about the education
practices at issue (including testing practices) should be thoroughly examined to ensure
that they are in fact non-discriminatory and educationally sound. In short, the goal of the
federal legal standards is to help promote accurate and fair decisions that have real
consequences for students, not to water down academic standards or deter educators from
establishing and 'applying sensible and rigorous standards.
Conclusion
Recognizing the responsibility that educators and policymakers must shoulder in making
the promise of high standards learning a reality, U.S. Secretary of Education Richard
Riley in his commemoration of the 45th anniversary of the Brown v. Board ofEducation
decision said: "A quality education must be considered a key civil right for the twenty
first century." This is the driving force behind OCR's continuing effort to provide
assistance to policymakers and educators as we continue to enforce federal laws that
prohibit discrimination against students. Rather than creating false and polarizing "winDraft 6/12/00
IV
�lose" choices on this all-important set ofissues, we need to, 'as Secretary Riley
admonishes, "search for common ground" - ground, that is, in this case, expansive.
We have worked with literally dozens of groups and individuals, including educators,
parents, teachers, business leaders, policymakers, test publishers, and others, to solicit
input and advice regarding the scope, framing, and kinds of resources to include in this
guide, and we are grateful for their assistance. In addition, we have contracted with the
NRC's Board on Testing and Assessment, which has reviewed earlier drafts of the guide,
to ensure that the guide comports with professional standards. We are grateful for the
NRC's tireless efforts.
.
Working together with our education partners, we believe that we are providing a useful
resource that will serve the education community as it addresses the very complex and
important questions that stem from the institution of high standards and accountability
systems designed to promote the best schools in the world.
Very truly yours,
DRAFT
Norma V. Cantu
Draft 6112/00
v
�Table of Contents
INTRODUCTION: An Overview of the Resource Guide .......... 1 .
CHAPTER 1. Test Measurement Principles ............................. 19
CHAPTER 2. Legal Principles .................................................. 46
APPENDIX A: Glossary of Legal Terms ................................. 63
APPENDIX B: Glossary of Test ~easurement Terms............. 67 .
APPENDIX C: Acconlmodations Used by States .................... 74
.
.
APPENDIX D: Compendium of Federal Statutes and Regulations
.
.
...................................................... ;................... 77
APPENDIX E: Resources and References ............................... 80
Draft 7/6/00
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators altd Policymakers
Draft
INTRODUCTION: An Overview of the Resource
Guide
I.
Introduction
Decisions affecting students' educational opportunities should be made accurately and
fairly. When tests are used in making educational decisions for individual students, they
should accurately measure students' abilities, knowledge, skills or needs, and they should
do so in ways that do not discriminate in violation of federal law on the basis of the
students' race, national origin, sex or disability. The US. Department of Education's
Office for Civil Rights (OCRl·has developed this resource guide in order to provide
educators and policymakers with a useful, practical tool that will assist in their
development and implementation of policies that involve the use of tests in making high
stakes decisions for students. It is
intended t6 facilitate the propt;r use of
tests for those purposes.
Chapter one of this guide provides
information about professionally
recognized test measurement
principles. Chapter two provides the
legal frameworks that have guided
federal courts and OCR when
addressing the use oftests that have
high-stakes consequences for students.
The test measurement principles
described in chapter one are not legal
principles. However, the use of tests
in educationally appropriate ways
consistent with the principles
described iIi chapter one can help to minimize the risk of noncompliance with the
federal nondiscrimination laws discussed in chapter two.
IOCR enforces laws that prohibit discrimination on the basis of race, national origin, sex, disability, and age by
educational institutions that receive federal funds. The laws enforced by OCR are: I) Title VI of the Civil Rights Act of
1964, 42 U.S.c. §§ 2000d, et seq. (2000)(Title VI), which prohibits discrimination on the basis of race, color, or
national origin; 2) Title IX of the Education Amendments of 1972; 20 U.S.c. §§ 1681, et seq. (I999)(Title IX), which
prohibits discrimination on th.e basis of sex; 3) Section 504 of the Rehabilitation Act of 1973, 29 U.S.c. §§ 794, el seq.
(I 999)(Section 504), which prohibits discrimination on the basis.of disability; 4) the Age Discrimination Act of 1975,
42 U.S.c. §§ 6 I 01, el seq. (1995 and Supp" 1999)(as amended), which prohibits age discrimination; and 5) Title II of
the Americans with Disabilities Act ofl990, 42 U.S.c. §§ 12134, et seq. (1995 and Supp. 1999)(Title II), which
prohibits discrimination on the basis of disability by public entities, whether or not they receive federal financial
assistance.
Draft 7/6/00
1
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 716/00
Guide For Educators alld PolicYlIlakers
'
Draft
The guide also includes a collection of resources related to test measurement 'and
nondiscrimination principles that are discussed in the guide - all in an effort to help
policymakers and educators ensure that decisions that have high-stakes consequences for
students are made accurately and fairly.
Educational stakeholders at all levels have approached OCR requesting adviceand
technical assistance in a variety of test-use contexts, particularly as states and districts use
tests as part oftheir standards~based reforms. Also, increasingly, OCR is addressing
testing issues in a broader and more extensive array of complaints of discrimination that
have been filed with OCR. These corresponding developments confirm the need to
provide a useful resource that captures legal and test measurement principles and
resources to assist educators and policymakers. This document does not establish any
new legal or test1measurement principles.
'
As used in this resource guide, "high-stakes
decisions" refer to decisions with important
consequences for individual students. Education
entities, including state agencies, local education
agencies, ,and individual education institutions, make
a variety of decisions affecting individual students
during the course of their academic careers, beginning
in elementary school and extending through the post
secondary school years. Examples of high-stakes
decisions affecting students include: student placement in gifted and talented programs or
in programs serving students with limited-English proficiency; determinations of
disability and eligibility to receive special education services; student promotion from
one grade level to another; graduation from high school and diploma awards; and
2
admissions decisions and scholarship awards.
This guide is intended to apply to standardized tests that are used in making high-stakes
decisions affecting individual students and that are addressed in the Standards for
Educational and Psychological Testing (Joint Standards). The Joint Standards are
viewed as the primary technical authority on educational test measurement issues. They
have been prepared by a joint committee of the American Educational Research
Association, the American Psychological Association and the National Council on
Measurement in Education, the three leading organizations in the area of educational test
measurement. The Joint Standards were developed and revised by these three
organizations through a process that involved the participation of hundreds of testing
professionals and thousands of pages ofwritten comment from both professionals and the
public. The current edition of the Joint Standards reflects the experiel1ce gained from
The purpose of this guide is to address tests that are used in making high-stakes decisions for individual students. In
addition to using tests for high-stakes purposes for individual students, states and school districts are also using tests to
hold schools and districts accountable for student performance. Although using tests for this purpose is not the focus of
the guide, we have provided some useful background information about relevant principles and federal statutory
requirements.
2
Draft 7/6/00
2
�The Use of Tests When Making High
Stakes Deci~ions for Students: A ResouI'ce7/6/00
Guide Fol' Edllcatol's alld Policymakel's
Draft
.
many years of wide use of previous versions of the Joint Standards in the testing
community..
The Joint Standards, which are discussed in more detail below, apply to standardized
measures generally recognized as tests, and also may be usefully applied to a broad range
of system-wide standardized assessment procedures? For the sake of simplicity, this
guide will refer to tests, regardless of the type of label that might otherwise be applied to
them. The guide does not address teacher-created tests that are used for individual
classroom purposes.
States and school districts are also using another important kind of assessment system for
the purpose of promoting school and district accountability. For example, under Title I of
the Elementary and Secondary Education Act, states are required to develop content
standards, performance standards, and assessment systems that measure the progress that
schools and districts are making in educating students to the standards established by the
state. Title I explicitly requires that such assessments be valid and reliable for their
intended purpose and be consistent with relevant, nationally rycognized technical and
4
professional standards. When egucators and policy makers consider using the same test
for school or district accountability purposes and for individual student high-stakes
purposes, they need to ensure that the test score inferences are valid and reliable for each
particular use for which the test is being considered.
. When high-stakes decisions are made, test scores are often used in conjunction with other
criteria, such as grades and teacher recommendations. A test should not be used as the
sole criterion for making a high-stakes decision unless it is.validated for this use. The
Joint Standards state that a high-stakes decision "should not be made on the basis of a
single test score. Other relevant information should be taken into account if it will
enhance the overall validity of the decision."s As explained in the Joint Standards,
"[w ]hen interpreting and using scores about individuals or groups of students,
considerations ofrelevant collateral information can enhance the validity ofthe
interpretation, by providing corroborating evidence or evidence that helps explain student
performance. . .. As the stakes of testing increase for individual students, the importance
of considering additional evidence to document the validity of score interpretations and
tpe fairness in testing increases accordingly.,,6
The Joint Standards note that the applicability of the Joint Standards to an evaluation device or method is not altered
by the label used (e.g., test, assessment scale, inventory). A more complete discussion about the instruments covered by
the Joint Standards can be found in the introduction section of that document. See Joint Standards, Introduction, pp. 3
4.
.
3
420 U.S.C. 631 I (b)(3)(C).
5 Standard 13.7 states, "In educational settings, a decision or characterization that will have major impact on a student,
should not be made on the basis ofa single test score. Other relevant information should be taken into account ifit will
enhance the overall validity of the decision."
6
Joint Standards, p. 141.
Draft 7/6/00
3
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators ami Policymakers
Draft
Although this guide focuses on the use of tests in making high-stakes decisions,
policymakers and the education community need to ensure that the operation of the entire
high-stakes decision-making process does not result in the discriminatory denial of
educational benefits or opportunities to students.? Applicable standards for technical
quality set forth in the Joint Standards are important principles to consider when other
criteria affect high-stakes decisions. Educators should carefully monitor inputs into the
high-stakes decision-making process and outcomes over time so that any potential
discrimination arising from the use of any of the criteria can be identified and eliminated.
The guide focuses primarily on
tests used in making high-stakes
decisions at the elementary and
secondary education level.
However, it is important to
recognize that the general
principles <;>f sound educational
measurement apply equally to
tests used at the elementary and
secondary education level and at
the post-secondary education
level, including admissions and
other types of test use. s For
example, post-secondary.
admissions policies and practices should be derived from and clearly linked to an
institution's overarching educational ~oals, and the use of tests in the admissions process
should serve those institutional goals.
II.
Foundations of the Resource Guide
A.
Professional Standards of Sound Testing Practices
Chapter one summarizes the leading professionally
recognized standards of sound testing practices
See Nondiscrimination Under Programs Recciving Federal Financial
Education Effectuation of Title VI of the Civil Rights Act of 1964, 34
J 00.3(b )(2) (1999); Nondiscrimination on the Basis of Handicap in Pr
Financial Assistance, 34 C.F.R. §§ 104.4(a), 104.4(b)(J)(i) and (iv), an
Basis of Sex in Education Programs and Activities Receiving or Bene
C.F.R. §§ 106.3J(a) and 106.31(b) (1999).
7
For additional information regarding testing at the post-secondary Ie
Trddeoffs, 1999; Messick, S., Validity, in R.L. Linn, ed., Educational
13-103, 1989; Wigdor, Alexandra K .. and Garner, Wendell R., ed., Ab
Controversies, chapter 5, National Academy Press, 1982.
8
See'High Stakes, p. 23 and National Research Council, Placing Children in Special Education: A Strategy for Equity,
1982.
9
Draft 7/6/00
4
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators and Policymakers
Draft
within the educational measurement field. They include those described in the Joint
Standards (1999), which represent the primary statement of professional consensus
regarding educational testing. Other leading professionally recognized standards of
sound testing practices within the educational measurement field include the Code ofFair
Testing Practices in Education (1988), and the Code ofProfessional Responsibilities in
Educational Measurement (1995). The guide also cites recent reports from the National
Research Council's Board on Testing and Assessment, including High Stakes: Testingfor
Tracking, Promotion and Graduation (High Stakes, 1999), Myths and TradeojJs: The
Role of Tests in Undergraduate Admissions (Myths and TradeojJs, 1999), Testing,
Teaching, and Learning: A Guide for States and School Districts (Testing, Teaching, and
Learning, 1999), Improving Schoolingfor Language-Minority Children: A Research
Agenda (Improving Schoolingfor Language-Minority Children, 1997), and Educating
One & All: Students with Disqbilities and Standards-Based Reform (Educating One &
All, 1997). I 0 These reports help explain or elaborate principles· that are stated in the Joint
Standards.
Designed to provide criteria for the evaluation of tests, testing practices, and the effects of
test use, the Joint Standards recommend that all professional test developers, sponsors,
publishers, and users make efforts to observe the Joint Standards and encourage others to
do so. II The Joint Standards inclu<;le chapters on the test development process (with a
focus primarily on the responsibilities of test developers), the specific uses and
applications of tests (with a focus primarily on the responsibilities of test users), and the
rights and responsibilities of test takers. Because the Joint Standards are the most widely
accepted professional standards that are relied upon in developing testing instruments,
this guide includes a discussion of specific standards that are contained within the Joint
Standards, where relevant. Numbered standards that are referenced throughout this guide
.refer to specific standards that are contained within the Joint Standards.
In order to ensure that information presented in the guide is readable and accessible to
educators and policymakers, we have paraphrased language from relevant standards. Our
goal in paraphrasing is to be concise and accurate. Where we have paraphrased in ·the
text, we have also provided the full text of the relevant standards in the footnotes.
Because the Joint Standards provide additional relevant discussion, we always encourage
readers also to review the full document.
Professional test measurement standards provide important information that is relevant to
making determinations about appropriate test use. The Joint Standards provide a frame
of reference to assist in the evaluation oftests, testing practices, and the effects of test
use. The Joint Standards caution that the acceptability of a test or test application does
10 The National Academy of Sciences, which is an independent, private, nonprofit entity, established the Board on
Testing and Assessment in 1993 to help policymakers evaluate the use of tests, alternative assessments, and other
indicators commonly used as tools of public policy. The Board provides guidance for judging the quality of testing or
assessment technologies and the intended and unintended consequences of particular uses of these technologies. The
Board concentrates on topics and conducts activities that serve the general publ ic interest.
II
·See, e.g., Joint Standards, Introduction, p. 2.
Draft 7/6/00
5
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
Draft
not rest on the literal satisfaction of every standard in the Joint Standards and cannot be
determined by using a checklist. 12 The exercise of professional judgment is a critical
element in th.e interpretation and application of the standards,13 and the interpretation of
individual standards should be considered in the overall context of the use of the test in
question. Failure to meet a particular professional test measurement standard does not
necessarily constitute a lack of compliance with federal civil rights -laws.
B.
Legal Standards
Chapter two of the guide discusses the federal Constitutional, statutory and regulatory
nondiscrimination principles that apply to the use of tests for high-stakes p~rposes. This
guide is intended to reflect existing legal principles and does not establish new federal
legal requirements. The primary legal focus~fthe resource guide is an explanation of
principles that are clearly embedded in four nondiscrimination laws that have been
enacted by Congress: Title VI of the Civil Rights Act of 1964 (Title VI), Title IX of the
Education Amendments of 1972 (Title IX), Section 504 of the Rehabilitation Act of 1973
(Section 504), and Title II ofthe Americans with Disabilities Act of 1990 (Title 11).14
Within the U.S. Department of Education, the Office for Civil Rights has responsibility
for enforcing the requirements of these four statutes and their implementing regulations.
.
The due process and equal protection requirements of the Fifth and Fourteenth
Amendments to the U.S. Constitution have also been applied by courts to issues
regarding the use of tests in making hIgh-stakes educational decisions. Although the
Office for Civil Rights does not enforce federal constitutional provisions, a brief
overview of these constitutional principles has been included for informational purposes.
12
Joint Standards, Introduction, p. 4.
13
Joint Standards, Introduction, p. 4.
14 Title V[ prohibits discrimination on the basis ~f race, color and national origin in the programs and activities of
recipients that receive federal financial assistance. The U.S. Department of Education's regulation implementing Title
V[ is found at 34 C.F.R. Part 100. Title [X prohibits discrimination on the basis of sex in educational programs and
activities of recipients offederal financial assistance. The U.S. Department of Education's regulation implementing
Title IX is found at 34 C.F.R. Part 106. Section 504 prohibits discrimination on the basis of disability in the programs
and activities of recipients offederal financial assistance. The U.S. Department of Education's regulation implementing
Section 504 is found at 34 C.F.R. Part 104: Title [I prohibits discrimination on the basis of disability by public entities,
regardless of whether they receive federal funding. The ·U.S. Department of Education's regulation implementing Title
" is found at 28 C.F.R. Part 35.
Draft 7/6/00
6
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
III.
Draft
Basic Principles
The brief overview of the test measurement and legal principles that follows establishes
the framework fo'r more detailed discussions of test quality in chapter one and federal
, legal standards in chapter two.'
/
Draft 7/6/00
7
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
\
A.
Test Use Principles
1.
Educational Objectives and Context·
Tests that are used in educationally
appropriate ways and that are valid for the
. purposes used are important instruments to
help educators do their job. Before any state,
school district, or educational institution
administers a test, the objectives for using the
test should be clear: What are the intended
goals for and uses of the test in question? As
an educational matter, the answer to this
question will guide all other relevant inquiries
about whether the test use is educationally
appropriate. The context in which a test is to
be administered, the population oftest takers, and the intended purpose for which the test
will be used are important considerations in determining which test would be appropriate
for a specific use, as illustrated below:
a.
Placement Decisions
Placement decisions are by their very nature used to make a decision about the
future. Tests used in placement decisions generally determine what kinds of
programs, services, or interventions will be most appropriate for particular
students. Decisions concerning the appropriate educational program for a student
with a disability, placement in gifted
and talented programs, and access to
language services are examples of
placement decisions, The Joint
Standards state that there should be
adequate evidence documenting the
relationship among test scores,
appropriate instructional programs,
and beneficial student outcomes. IS
When evidence about the relationship
is limited, the test results should be '
corisidered in light of other relevant student information. 16
15 Standard 13,9 states, "When test scores are intended to be used as part of the process for making decision~ for
educational placement, promotion, or implementation of prescribed educational plans, empirical evidence documenting
the relationship among particular scores, the instructional programs, and desired student outcomes should be provided,
When adequate'empirical information is not available, users should be cautioned to weigh the test results accordingly in
light of other relevant information about the student."
.
16
See id.
Draft 7/6/00
8
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
b.
Draft
Promotion Decisions
Student promotion decisions are generally viewed as decisions incorporating a
determination about whether a student has mastered the subj ect matter or content
of instruction provided to date and a determination regarding whether the student
will be able to master the content at the, next grade level (a placement decision). 17
At present, the focus of most school districts and states with promotion policies
has been primarily on assessing mastery of curriculum taught at a given grade
, level. 18 When a test given for promotion purposes is being used to certify
mastery, the use of the test should adhere to professional standards for certifying
knowledge and skills for all students. 19 It is important that there be evidence that
the test adequately covers only the content and skills that students have actually
had an opportunity t6 leam. 2o Educational institutions should have information
indicating an alignment among the curriculum, instruction, and material covered
on such a high-stakes test. To the extent that a test for 'promotion purposes is
being used as a placement device, it should also adhere, as appropriate, to
professional standards regarding tests used for placement purposes?'
,
,
17
See High Stakes, p. 123.
18
See American Federation of Teachers, Passing all Failure: District Promotion Policies a/Jd Practices, 1997.
19 See Standards 13.5 and 13.6; High Stakes, p. 123. Standard 13.5 states, "When test results substantially contribute to
making decisions about student promotion or graduation, there should be evidence that the test adequately covers only
,
the specific or generalized content and skills that students have had an oppOrtunity to learn."
Standard 13,6 states, "Students who must demonstrate mastery' of certain skills or knowledge before being promoted or
granted a diploma should have a reasonable number of opportunities to succeed on equivalent forms of the test or be
provided with construct-equivalent testing alternatives of equal difficulty to demonstrate the skills or knowledge. In
most circumstances, when students are provided with multiple opportunities to demonstrate mastery, the time interval
between the opportunities should allow for students to have the opportunity to obtain the relevant instructional
experiences."
.
"
20
See Standard 13.5, supra note 19.; High Stakes, pp, 124-125.
21See Standards 13.2 and 13,9; High Stakes, p. 123, Standard 13,2 states, "In educational settings, when a test is
designed or used to serve multiple purposes, evidence of the test's technical quality should be provided for each
purpose," See Standard 13:9, supra note 15.
'
Draft 7/6/00
9
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Et/tlcatol's alld Policymakers
c.
Draft
Graduation Decisions
Graduation decisions are generally certification decisions: The diploma certifies
that the student has reached an acceptable level of mastery of knowledge and
skills. 22 When large-scale standardized tests are used in making graduation
decisions, there should be evidence that the test adequately covers only the
content an~ skills thatstudents have had an opportunity to leam. 23 Therefore, all
students should be provided a meaningful opportunity to acquire the knowledge
and skills that are being tested, and infonnation should, indicate an alignment
among the 9,urriculum, instruction, and material covered on the test used as a
condition for graduation.
2.
Overarching Principles
The highly contextual and fact
based test measurement analyses
applicable to a variety of
circumstances ultimately focus
upon the following question: Is
there sufficient confidence in the
test results at issuE: to allow for
infonned'decisions to be made
that will have specified
consequences for the students
taking the test?
In the elementary and secondary
education context, regardless of whether tests are being used to make placement,
promotion, or graduation decisions, the National Academy of Sciences' Board on Testing
and Assessment has identified three principal criteria, which are based on established
professional standards, that can help .infonn and guide conclusions regarding this issue. 24
(1)
Measurement validity: Is a test valid for a particular purpose, and does it
accurately measure the test taker's knowledge in the content area being
tested?
State and local educational agencies and educational institutions should ensure that a test
actually measures what it is intended to measure for all students. The inferences derived
from the test scores for a given use
for a specific purpose, in a specific type of
22
See High Stakes, p. 166.
23
See Standard 13.5, supra note 19,
.
.,
See High Stakes, p. 23 and National Research Council, Placing Children in Special Education:, A Strategy for Equity,
1982.
24
Draft 7/6/00
10
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
Draft
situation, and with specific types of students - are validated, rather than the test itself. It
is important for educators who use the test to request adequate evidence of test quality
(including validity and reliability evidence), evaluate the evidence, and ensure that the
test is used appropriately in a way that is consistent with information provided by the
developers or through supplemental validation studies.
(2)
Attribution ofcause: Does a student's performance on a test reflect
knowledge and skills based on appropriate instruction, or is it attributable
to poor instruction or to such factors as language barriers unrelated to the
skills being tested?
In some contexts, whether a particular test use is appropriate depends on whether test
scores are an accurate reflection of a student's knowledge or skills or whether they are
influenced by extraneous factors unrelated to the specific skills being tested. For
example, when tests are used in making student promotion or graduation decisions, state
and local education agencies should ensure that all students have an equal opportunity to
acquire the knowledge and skills that are being tested. 25 In some situations, it maybe
necessary to provide appropriate accommodations for limited English proficient students
and students with disabilities to accurately and effectively measure students' knowledge
and skills in the particular content area bei~g assessed. 26
,
(3)
Effectiveness oftreatment
Do test scores lead to
placements and other
consequences that are
educationally beneficial?
The most basic obligation of educators at the
elementary and secondary level is to meet
the needs of students as they find them, with
their different backgrounds, and to teach knowledge and skills to allow them to grow to
maturity with meaningful expectations of a productive life in the workforce and
elsewhere. 27 This elementary andsecondary educational obligation is no less present
when educators administer tests and evaluate and act on students' test results than it is
during classroom instruction. Relying upon the sound premise that tests should be
25 See Standard 7.10, which states, "When the use of a test results in outcomes that affect the life chances or educational
opportunities of examinees, evidence of mean test score differences between relevant subgroups of examinees should,
where feasible, be examined for subgroups for which credible research reports mean differences for similar tests.
Where mean differences are found, an investigation should be undertaken to determine that such differences are not
attributable to a source of construct underrepresentation or construct-irrelevant variance. While initially, the
responsibility of the test developer, the test user bears responsibility for uses with groups other than those specified by
the developer."
26
See Joint Standards, p. 143.
See Brown v. Bd. ofEduc., 347 U.S. 483, 493 (1954) (stating that "[education] is required in the performance of our
most basic public responsibilities, ... is the very foundation of good citizenship, ... [and] is [a] principal instrument· ...
in preparing [the child] for later professional training .... ").
27
Draft 7/6/00
11
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators and Policymakers
Draft
integral to the learning and achievement of students, one federal cohrt distinguished
betWeen testing in the empioyment and education settings:
.
If tests predict that a person is going to be a poor employee, the employer can
legitimately deny the person the job, but if tests suggest that a young child is
probably goingto be a poor student, a school cannot on that basis alone deny that
child the opportunity to improve and develop the~ac.ademic skills necessary to
....
.
success in our society.28
Tests; in short, should be instruments used by elementary and secondary educators to
help students achieve their full potentiaL .Test scores should lead ~o consequences that
are educationally beneficial for students. When making high-stake·s decisions that
involve the use of tests, it is important for policymakers and educators to consider the
intended and unintended consequences that may result from the use of the test scores. 29
B..
Legal Principles
Federal constitutional, statutory, and regulatory principles form the federal legal
nondiscrimination framework applicable to the use oftests for high-stakes purposes.
Title VI, Title IX, Section 504, and Title II, as well as the equal protection clause of the
Fourteent~ Amendment to the United States Constitution, prohibit intentional
discrimination based on race, national origin, sex, or disability. In addition, the
regulations that implement Title VI, Title IX, Section 504 and Title II prohibit intentional
.discrimination and policies or practices that have a <;liscriminatory disparate impact on
students based 'on their race, national origin, sex, or disability.3o The Section 504 ..
regulation and the Individuals with Disabilities Education Ace' contain specifi'c
provisions relative to the use of high-stakes tests for individuals with disabilities. 32
,
.
Larry P. v. Riles, 793 F.2d 969, 980 (9th Cir.1984)(quoting Larry P v. Riles, 495 F. Supp. 926, 969 (N.D. Cal.
1979».'
.
28
29 Research indicates that students in low-track classes do not have the opportunity to acquire knowledge and skills
strongly associated with future success that is offered to students in other tracks. The National Research Council
recommends that neither test scores nor other information should be used to place students in such classes, See High
. ,
.
Stakes, 1999: 282.
30
34 C.FK § 100.3(b)(2); 34 C.F.R. §§ I 06.21 (b)(2), 106.36(b), 106.52; 34 C.F.R. § 104.4(b)(4)(i); and 28 CfR. §
35.130(b)(3).
'.
The authority of federal agencies to 'issue regulations with an "effects" standard has been consistently acknowledged by
U.S .. Suprem.e Court deeisions and applied by lower federakourts addressing claims of discrimination in education.
See, e.g., Lau v.Nichols, 414 U.S. 563, 568 (1974); Guardians Ass'n. v. City Service Comm'n. a/City o/N. Y., 463 U.S.
582; 584-593 (1983); Alexander v. Choate, 469 U.S. 287, 289-300 (J985): See also Memorandum from the Attorney
General for Heads of Departments and Agencies that Provide Federal Financial Assistance, "Use of the Disparate.
Impact Standard in Administrative Regulations under Title' VI ofthe Civil Rights Act of 1964," July 14, 1994.
The IDEA establishes rights and protections for students with disabilities and their families. It also provides federal
funds to local school districts and state. agencies to assist in erlucaiing students with disabiiities. Individuals with
Disabilities Education Act, 20 U.S.c. § 1400(\)(c).
. '
.
31
32
34 C.F.R. §§ \04.35, 104.42(b); 20 U.S.C. §§ 1412(a)(\7),1414(b); 34 C.F.R. § 300.138 - .139, 300.530 -'.536.
Dr~ft 7/6/00
12
�The Use of Tests When Making High
Stakes Decisions for Students: A Resou,.ce 7/6/00
Draft
Guide Fo,. Educato,.s alld Policymake,.s
Further discussion of issues regardingi.testing of limited English proficient students and
students with disabilities is provided below.
1.
Frameworks for Analysis
a.
Different Treatment
Under federal law, policies and practices generally must be applied consistently to
similarly situated individuals',or groups, regardless of their race, national origin, sex, or
disability. For example, a court concluded that a school district had intentionally treated
students differently on the basis of race where minority students whose test scores
qualified them for two or more ability levels were more likely to be assigned to the lower
level class than similarly situated white students, and no explanatory reason was
'
evident.33
In addition, educational systems that were previously segregated by race in violation of
the. Fourteenth Amendment and have not achieved unitary status have an obligation to
dismantle their prior de jure segregation. In such instances, when a school district or
other educational syst'em uses a test or assessment procedure for a high-stakes purPose
that has racially disproportionate effects, the school district or other educational system
must show that the disparity is not traceable to prior intentional segregation or that the
test or assessment procedure does not perpetuate the adverse effects of such
segregation. 34 The school district is under "a 'heavy burden' of showing that actions that
increase[] or continue[] the effects of the dual system serve important and legitimate
ends.,,35
b.
Disparate Impact
Discrimination under federal law may also occur where the application of neutral criteria
has discriminatory effects and those criteria are not educationally justified. The federal
nondiscrimination regulations provide that a recipi~nt of federal funds may not "utilize
criteria or methods of administration which have the effect of subjecting individuals to
discrimination.,,36 (For a further discussion of issues related to testing of students with
33 See People Who Care v. Rockford Bd. ofEduc., 851 F. Supp.905, 958-100 I (N.D. 111. 1994), remedial order rev'd, ill
part, 111 F.3d 528 (7th Cir. 1997). On appeal, the Seventh Circuit Court of Appeals stated that the appropriate remedy
in this case was to require the district to use objective, non-racial criteria to assign students to classes, rather than
abolishing the district's tracking system. 111 F.3d at 536.
See also United States v. Fordice, 505 U:S. 717, 731-732 (1992); Debra P. v. Turlington, 644F.2d 397,407 (5th Cir.
1981); McNeal v. Tate County Sch. Dist., 508 F.2d 1017, 1020-1021 (5th Cir. 1975); Gf Forum v. Texas Educ. Agency,
No. SA-97-CA-f278-EP, 2000 U.S. Dist. LEXIS 153, slip op. at 56-57 (W.D. Tex. 2000).
.
34
35 Dayton Bd. ofEduc. v. Brinkman, 443 U. S. at 538 (quoting Green v. Country School Board, 391 U.S. 430, 439
(1968)).'
.
36 See 34 C.F.R. § 100.3(b)(2) (Title VI); 34 C.F.R. § I 04.4(b)(4)(i) (Section 504); and 28 C.F.R. § 35.130(b)(3)(i)
(Title 1/). See also 34 C.F.R. § 106.31 (Title IX). In Guardians, 463 U.S. at 589, the United States Supreme Court
upheld the use of the effects test, stating that the Title VI regulation forbids the use of federal funds "not only in
Draft 7/6/00
13
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Po!icYlIlaket's
Draft
.
disabilities, see below.)
The disparate impact analysis has been
frequently misunderstood to indicate a
violation of law base.d merely on
disparities in student performance and
to obligate educational institutions to
change their policies and procedures to
guarantee equal results. Under federal
law, a statistically significant
difference in outcomes creates the
need for further examination of the
educational practices in question that
have caused the disparities in order to
ensure accurate and nondiscriminatory
decision making, but disparate impact alone is not sufficient to prove a violation of
federal civil rights laws.
Courts applying the disparate impact test have generally examined three questions to
determine if the practices at issue are discriminatory: (1) Does the practice or procedure
in question result in substantial differences in the award of benefits or services based on
race, national origin or sex? (2) Is the practice or procedure educationally justified? (3) Is
there an equally effective alternative that can accomplish the institution's educational goal
with less disparity?37 Under the regulations implementing Title VI and Title IX, the party
challenging the test has the burden of establishing disparate impact. If disparate impact is
established, the educational institution must provide sufficient evidence of an educational
justification for the practice in question. If sufficient evidence of an educational
justification has been provided, the party challenging the test must then demonstrate, in
order to prevail, that an alternative with less disparate impact is equally effective in
meeting the institution's educational goals or needs. 38
2.
Principles Relating to Inclusion and Accommodations
a.
Limited English Proficient Students
programs that intentionally discriminate, but also in those endeavors that have a [racially disproportionate] impact on
racial minorities."
Courts use a variety of terms when discussing whether an alternative offered by the party ehallenging the practice is
feasible and would also effectively meet the institution's goals. See, e.g., Georgia State Calif oJBranches oJNAACP v.
Georgia, 775 F.2d 1403, 1417 (11 th Cir. 1985) (party challenging the practice "may ultimately prevail by proffering an
equally effective alternative practice which results in less racial disproportionality"); Sandoval v. Hagan, 7 F.Supp.2d
1234, 1278 (M.D. Ala. 1998), ajJ'd., 197 F.3d 484, 507 (II th Cif. 1999) (plaintiff may prevail by Offering a
"comparably effective" alternative practice which results in less proportionality). These terms appear to be used
synonymously.
37
38
See Georgia State Conf, 775 F.2d at 1417. See also the Department of Justice's Title VI Legal Manual at p.2.
Draft 7/6/00
14
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators a/ld Policymakers
Draft
The obligations of states and school districts with regard.to high-stakes testing of limited
English proficient students in elementary and secondary schools must be examined
within the overall context oftheir Title VI obligation to provide equal educational
opportunities to limited English proficient stude~ts. Under Title VI, school districts have
an obligation to identify limited English proficient students and to provide them with a
program that enables them to acquire English-language proficiency as well as the
knowledge and skills that all students are required to master. 39
States or school districts using tests for high-stakes purposes must ensure that, as with all
students, the tests effectively measure limited English proficient students' knowledge and
skills in the particular content area being assessed. For limited English proficient
.
elementary and secondary students in particular, it may be necessary in some situations to
provide accommodations so that the tests provide accurate and valid information about
the know ledge and skills intended to be measured. 4o
b.
Students with Disabilities
Under Section 504, Title II, and the IDEA,41 school districts have a responsibility to
provide students with disabilities with a free appropriate public education .. Providing
effective instruction in the general curriculum for students with disabilities is an
important aspect of providing a free appropriate public education. Under federal law,
students with disabilities must be included in statewide or district-wide assessment
programs and provided ~ith appropriate accoinmodations, ifnecessary.42 The~e must be
an individualized determination of whether a student with a disability will participate in a
particular test and the appropriate accommodations, if any, that a student with a disability
will need. The individualized determinations of whether a student with a disability will
participate in a particular test, and what accommodations, if any, are appropriate must be
addressed through the individualized education program (IEP) process or other applicable
See Equal Educational Opportunities Act of 1974, P.L. No. 93-380, codified at 20 U.S.C §§ 1701-1720; Lau v.
Nichols, 414 U.S. at 568-569; Castaneda v. Pickard, 648 F.2d 989, 1011 (5th Cir. 1981); Memorandum to OCR Senior
Staff from Michael L. Williams, Former Assistant Secretary for Civil Rights, September 27, 1991 (hcreinafter Willhims
Memorandum).
39
States and school districts are also required to provide LEP students with "rea~onable adaptations and
in certain situations when using assessments for thc purpose of holding schools and districts
accountable for student performance under Title I. Title I of the Elementary and Secondary Education Act, 20 U.S.C §
6311 (a)(3)(F)(ii). Moreover, Title I requires States, to the extent practicable, to provide native-language assessments to
LEP students for Title I accountability purposes if that is the language and form of assessment most likely to yield
accurate and reliable information about what students know and ean do. 20 U.S.C § 6311 (a)(3)(F)(iii). For a discussion
of comparability issues arising in the testing of LEP students, see pages 38-42 of this guide.
40
accommod~tions"
The Section 504 regulation is found at 34 CF.R. Part 104 (1999). The Title" regulation is found at 28 CF.R. Part
35 (1999). The IDEA regulation is found at 34 CF.R. Part 300 (1999).
41
States and school districts are also required to provide students with disabilities with "reasonable adaptations' and
accommodations" in certain situations when using assessments for the purpose of holding schools and districts
accountable for student performance under Title I. 20 USC § 631 1(a)(3)(F)(ii).
42
Draft 7/6/00
15
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
evaluation and placement processes and included in either the student's IEP or Section
.
504 plan. 43
Under Section 504, post-secondary education institutions may not make use of any test or
criterion for admission that has a disproportionate adverse impact on individuals with
disabilities unless (1) the test or criterion, as used by the institution, has been validated as
a predictor of success in the education program or activity and (2) alternate tests or
criteria that have a less disproportionate adverse impact are not shown to be available by
the party asserting that the test or criterion is discriminatory.44 Admissions tests must be
selected and administered so as best to ensure that, when a test is administered to an
applicant with a disability, the test results accurately reflect the applicant's aptitude or
achievement level, rather than reflecting the effect of the disability (except where the
functions impaired by the disability are the factors the test purports to measure).45
Admissions tests designed for persons with impaired sensory, manual, or speaking skills
must be offered as often and in as timely a manner as are other admissions tests.
Admissions tests must be offered in facilities that, on the whole, are accessible to
individuals with disabilities.
3. Federal Constitutional Questions Related to Testing of Elementary and
Secondary Students For High-Stakes Purposes
The equal protection and due process requirements of the Fifth and Fourteenth
Amendments to the.U.S. Constitution would apply to ensure that high-stakes decisions by
public schools or states based on test use are made appropriately.46 The equal protection
principles involved iIi discrimination cases are, generally speaking, the same as the
standards applied to intentional discrimination claims under the applicable federal
nondiscrimination statutes. 47 Courts addressing due process claims have examined three
questions related to the use of tests as bases for promotion or graduation decisions: .
Under the IDEA, students with disabilities must be included in state and district-wide assessment programs. See 34
C. P.R. § 300.138(a). However, if the IEP team determines that a student should not participate in a particular statewide
or district-wide assessment of student achievement (or part of such an assessment), the student's fEP must include
statements of why that test is not appropriate for the student and how the student will be assessed. See 34 C.F.R. §
300.347(a)(5). The IDEA also requires state or local educational agencies to develop guidelines for students with
disabilities who cannot take part in state and district-wide assessments to participate in alternate assessments; these
alternate assessments must be developed and conducted beginning not later than July 1,2000. See 34 § C.F.R.
300. I38(b).
.
43
44
See 34 C.F.R. § 104,42(b)(2).
45
See 34 C.F.R. § 104,42(b )(3).
The requirements of Title VI, Title IX and Section 504 apply only to recipients of federal financial assistance. The
protections afforded by the Fifth and Fourteenth Amendments to the U.S. Constitution extend to actions by
governmental entities that are "state actors" and are not dependent on their reccipt of federal financial assistance.
46
Federal cases may involve equal protection challenges to ajurisdiction's use of tests in which the claim is not based
on intentional race or sex discrimination, but, instead, on the alleged impropriety of the jurisdiction's use of tests to
separate out those students who should not be allowed to graduate. As a general matter, courts exprcss reluctance to
second guess a state's cducational policy choices when faced with such challenges, although they rccognize that a state
cannot "exercise that [plenary] pcwer without reason and without regard to the United States Constitution." See Debra
P. v. Turlington, 644 F.2d 397, 403 (5th Cir. 1981). When there is no claim of discrimination based on membership in a
47
Draft 7/6/00
16
�The Use of Tests When Making High
Stakes Decisions for Students: A Resou,.ce 7/6/00
Draft
Guide Fo,. Educato,.s alld Policymake,.s
•
•
•
Is the purpose of the testing program legitimate and reasonable?48
Have students received adequate notice of the test and its consequences?49
Have students actually been taught the knowledge and skills me~sured by the test?50
Federal courts have typically deferred to educators' judgments aboutthe beneficial
educational purposes of a testing program, as long as these judgments are not arbitrary or
capricious. 5I Improving the quality of education, ensuring that students can compete on a
national and international level, and encouraging educational achievement through the
establishment of academic standards have been found to be reasonable goals for testing
programs. 52
Courts have generally required advance notice oftest requirements in order to give
students a reasonable chance to understand the standards against which they will be
evaluated and to learn the material for which they are to be accountable. A reasonable.
transition period is required between the development of a new academic requirement
and the attachment of high-stakes consequences to tests used to measure academic
suspect class, the equal protection claim is reviewed under the rational basis standard. In these cases, the jurisdiction
need show only that the use of the tests has a rational relationship to a valid state interest. See Debra P., 644 F.2d at
406; Erik V. v. Causby, 977 F. Supp. 384,389 (E.D. N.C. 1997).
See Regents ofthe Univ. ofMich. v. Ewing, 474 U.S. 214, 222, 226-27 (1985); Debra P., 644 F.2d at 406; Anderson
v. Banks, 520 F. Supp. 472, 506 (S.D. Ga. 1981).
48
See Brookhart v. fllinois State Bd. ofEduc., 697 F.2d 179, 185 (7th CiL 1983); Debra P., 644 F.2d at 404; Erik
977 F. Supp. at 389-90 (E.D. N.C. 1997); Anderson, 520 F. Supp. at 1410-12.
49
v.,
.
50 See Brookhart, 697 F.2d at 184-87; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 509. Insofar as due process
cases may involve additional questions regarding the validity, reliability, and fairness of the test used to address the
educational institution's stated purposes, these issues are discussed in the portions of the guide addressing
discrimination under federal civil rights laws.
51
See Ewing, 474 U.S. at 226-27; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 506.
5~ See Ewing, 474 U.S. at 226-27; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 506.
Draft 7/6/00
17
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators and Policymakers
Draft
achievement. That time period varies, however, depending upon the precise context in
which the high-stakes decision is to be made. Relevant inquiries affecting determinations
about the constitutionality of notice and timing have included questions about the
alignment of curriculum and instruction with material tested, the number of test taking
opportunities provided to students, tutorial or remedial opportunities provided to students,
and whether factors in addition to test scores can affect high-stakes decisions.
Ultimately, in due process cases, federal' courts have required, as a matter of
"fundamental fairness," that students have a reasonable opportunity to learn the material
covered by the test where passing the test is a condition of receipt of a high school
diploma or a condition for grade-to-grade promotion. 53 For the test to meaningfully
measure student achievement, the test, the curriculum, and classroom instruction should
be aligned.
53 See Brookhart, 697 F.ld at 184-87; Debra P., 644 F.2d at 406; GI Forum, 2000 U.S. Dist. LEXIS 153, slip op. at 50
51; Anderson, 520 F. Supp. at 509.
Draft 7/6/00
18
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
Draft
CHAPTER 1. Test Measurement Principles
This chapter explains basic test measurement standards and related educational principles
for determining whether tests that are being used to make high-stakes educational
decisions for students provide accurate and fair information. As explained in chapter two
below, federal court decisions have been informed and guided by professional test
measurement standards and principles. Professional test measurement standards,
products of the test measurement community, can provide a basis for compliance with
54
federal nondiscrimination laws. This chapter is intended as a helpful discussion of how
to understand test measurement concepts and their use. These are not specific legal
requirements, but rather are foundations for understanding appropriate test use.
Educational institutions use tests to accomplish specific purposes based on their
educational goals, including making placement, promotion, graduation,· admissions, and
other decisions. It is only after they have determined the underlying goal they want to
accomplish that they can identify the types of information that will best inform their
decision making. Information may include test results, as well as other relevant
measures, that will be able to effectively, accurately, and fairly address the purposes and
goals specified by the institutions. 55 As stated iIi the Joint Standards, "[ w ]hen interpreting
and using scores about individuals or groups of students, considerations of relevant
collateral information can enhance the validity of the interpretation, by providing
corroborating evidence or evidence that helps explain student performance ....As the
stakes of testing increase for individual students, the importance of considering additional
evidence to document the validity of score interpretations and the fairness in testing
increases accordingly. ,,56
In using tests to make high-stakes decisions, educational institutions should ensure that
the test will provide accurate results that are valid, reliable, and fair for all test takers.
This includes requesting adequate evidence oftest quality, evaluating the evidence, and
ensuring that appropriate test use is based on adequate evidence provided by the
developers or through supplemental validation studies. 57 When test results are used to
make high-stakes decisions about student promotion or graduation, evidence should be
54
See, e:g., High Stakes, p. 59-60.
Among other considerations, institutions will determine if they want test seore interpretations that are norm
referenced or criterion-referenced, or both. Norm-referenced means that the performances of students are compared to
the performances of other students in a specified reference population; criterion-referenced indicates the extent to
which students have mastered specific knowledge and skills.
55
Joint Standards, p. 141. See also Standard 13.7, which states, "In educational settings, a decision or· characterization
that will have a major impact on a student should not be made on the basis of a single test score. Other relevant
information should be taken into account if it will enhance the overall validity of the decision."
56
In order to provide educational institutions with tests that are accurate and fair, test developers should develop tests
in accordance with professionally recognized standards, and provide educational institutions with adequate evidence of
test quality.
57
Draft 7/6/00
19
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators and Policymakers
Draft
available which documents that students have had an adequate opportunity to learn the
.
matenalb' teste d .58
emg
I.
Key Considerations in Test Use
This section addresses the fundamental concepts oftest validity and reliability. It will
also discuss issues associated with ensuring fairness in the meaning of test scores, and
issues related to using appropriate cutscores in high-stakes tests.
A.
Validity
Test validity refers to a determination of how well a test actually measures what it says it
measures. The Joint Standards define validity as "[t]he degree to which accumulated
evidence and theory support specific interpretations of test scores entailed by proposed
uses of a test.,,59 The demonstration of validity is multifaceted and must always be
determined within the context of the specific use of a test. In order to promote
readability, the discussion on validity presented here is meant to reflect this complex
topic in an accurate, but concise and user-friendly way. The Joint Standards identify and
discuss in detail principles related to determining the validity oftest scores within the
context o,f their use, and readers are encouraged to review the Joint Standards, Chapter 1,
Validity, for additional, relevant discussion. 6o
There are three central points to keep in mind:
• The focus of validity is not really on the test itself, but on the validity of the
inferences drawn from the test results for a given use.
• All validity is really a form of "construct validity."
• In validating the inferences of the test results, one must also consider the
consequences of the test's interpretation and use.
58
Standards 13.5 and 7.5. Standard 13.5, supra note 19.
Standard 7.5 states, "In testing applications involving individualized interpretations oftest scores other than selection, a
test taker's score should not be accepted as a reflection of standing on the characteristic being assessed without
consideration of alternate explanations for the test taker's performance on that test at that time."
59
Joint Standards, p. 9, 184.
60
Joint Standards, Chapter I, Validity, p. 9-24.
Draft 7/6/00
20
�The Use of Tests When Making High
Stakes Decisions for Students: A ResoUl'ce7/6/00
Guide For Educatol's alld Policymakers
1.
Draft
Validity of the Inferences of the Scores
It is not the test that is validated per se, but the inferences or meaning derived from the
test scores for a given use-that is, for a specific purpose, in a specific type of situ,ation,
and with specific groups of students. The meaning of test scores will differ based on
such factors as how the test is designed, the types of questions that are asked, and the
documentation that supports how all groups of students are interpreting what the test is
asking and how effectively their perfonnance can be generalized beyond the test.
For instance, in one case, the educational institutioq may want to evaluate how well
students can analyze complex issues and evaluate implications in history. For a given
amount of test time, they would want to use a test that measures the ability of students to
think deeply about a few selected history topics. The meaning of the scores should
reflect this purpose and the limits ofthe range of topics being measured on the test. In
another case, the institution may want to assess how well students know a range of facts
about a wide variety of historical events. The institution would want to use a test that
measures a broad range of knowledge about many different occurrences in history. The
inferences of the scores should accurately reflect how well students know a broad range
of historical facts.
2.
Construct Validity
Construct validity refers to the degree to which the scores of test takers accurately reflect
the constructs a test is attempting to measure. The Joint Standards defines a construct as
"the concept or the characteristic that a test is designed to measure.,,61 Test scores and
their inferences are validated to measure one or more constructs described in a particular
content domain. 62 In K-12 education, these domains are often explained in state or
district content standards in various subject areas .
.For instance, in mathematics, constructs of mathematical problem solving and the
knowledge of number systems would be among the constructs described in a state'.s
elementary mathematics content standards. These standards would define the
mathematics domain in this situation. Items would be selected for the test that sample
from this domain, and are properlyrepresentative of the constructs identified within it.
The meaning of the test scores should accurately reflect the knowledge and skills defined
in the mathematics content standards domain.
Validity should be viewed as tJ:Ie overarching, integrative evaluation of the degree to
which all accumulated evidence supports the intended interpretation of the test scores for
61
Page 173.
The Joint Standards defines a content domain as "the set'ofbehaviors, knowledge, skills, abilities, attitudes or other
characteristics to be measured by a test, represented in a detail.ed specification, and often organized into categories by
which items are classified (p.174)." A domain, then, represents a definition of a content area for the purposes of a
particular test. Other tests will likely have a different definition of what knowledge and skills a particular content area
entails.
62
Draft 7/6/00
21
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Draft
Guide For Educators alld Policymakers
a proposed purpose. 63 This Ul1itary and comprehensive concept of validity is referred to
as "construct validity." Different sources of-evidence may illuminate different aspects of
v~lidity, but they do not represent distinct types ofvalidity.64
Therefore, '~construci validity" is not just one of the many types of validity-it is validity.'
Demonstrating construct validity then means gathering a variety oftypes of evidence to
support the intended interpretations and uses of test scores. All validity evidence and the
interpretation of the evidence are focused on the basic question: Is the test measuring the
concept, skill, or trait in question? Is it, for example, really measuring mathematical.
reasoning or reading comprehension for the types of students that are being tested? A
variety of types of evidence can be used to answer this question-none of which provides
a simple yes or no answer. The exact nature of the types of evidence that needs to be
accumulated is directly related to the intended use of the test, which includes information
:egardin~ the ~kills and knowledge being.measured, the pU6Tose for which the
mformatlOn wIll be used, and the population of test takers.
For instance, an educational institution may want to use a test to help make promotion
decisions. It may also want to use a test to place students in the appropriate sequence of
courses. In each situation, the types of validity evidence an institution would expect to
see would depend on how the test is being used.
In making promotion decisions, the test should reflect content the student has learned.
Appropriate validation would include adequate evidence that the test is measuring the
constructs identified in the curriculum, and that the inferences of the scores accurately
reflect the intended constructs for all test takers. Validation of the decision process
involving the use of the test would include adequate evidence that low scores reflect lack
of knowledge of students after they have been taught the material, rather than lack of
exposure to the curriculum in the first place.
In making placement decisions, on the other hand, the test may not need to measure
content that the student has already learned. Rather, at least in part, the educational
institution may want the test to measure aptitude for the future learning of knowledge or
skills that have been identified as necessary to complete a course sequence. Appropriate
validation would include documentation of the relationship between what constructs are
being measured in the test, and what skills and knowledge are actually needed in the
63
Joint Standards, Chapter I, Validity, pp. 9-11,184..
Therefore, constructvalidity can be seen asan umbrella that encompasses what has previously been described as
predictive validity, content validity, criterion validity, discriminant validity, etc. Rather, these terms refer to types or
sources of evidence that can be accumulated to support the validity argument. Definitions of these terms can be found
in Appendix B, Measurement Glossary.
64
65 Rather than follow the traditional nomenclature (e.g. predictive validity, content validity, criterion validity,
discriminant validity, etc.), the Joint Standards define sources of val idity evidence as evidence based on test content,
evidence based on response processes, evidence based on internal structure, evidence based on relations to other
variables, and evidence based on consequences of testing. These are discussed in Chapter I oftlie Joint Standards, p.
11-17.
.
Draft 7/6/00
22
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
Draft
future placements. Differential evidence would provide documentation that scores are
not significantly confounded by other factors irrelevant to the knowledge and skills the
test is intending to measure.
lnstitutionsoften think about using the same test for two or more purposes. This is
appropriate as long as the validity evidence properly supports the use for the test for each
purpose, and properly supports that the inferences of the results accurately reflect what
the test is measuring for all students taking the test.
The empirical evidence related to the various aspects of construct validity is collected
throughout test development, during test construction,and after the test is completed. It
is,important for educators and policymakers to understand and expect that the
accumulated evidence spans the range of test development and implementation. There is
not just one set of documentation collected at one point in
time.
When the empirical database is large and includes results from a number of studies
related to a given purpose, situation, and type of test takers, it may be appropriate to
generalize validity findings beyond validity data gathered for one particular test use.
That is, it may be appropriate to use evidence collected in one setting when determining
the validity of the meaning of the test scores for a similar use. lfthe accumulated validity
evidence for a particular purpose, situation, or sl:lbgroup is small, or features of the
proposed use of the test differ markedly from an adequate amount of validity evidence
already collected, evidence from this particular type of test use will generally need to be
compiled. 66 Regardless of where the evidence is collected, educational institutions
should expect adequate documentation of construct validity based on needs defined by
the particular purposes and populations for which a test is being used.
a.
. Sources of Validity Error
When considering the types of construct validity evidence to collect, the Joint Standards
emphasize that it is important to guard against the two major sources of validity error. This error
. can distort the intended meaning of scores for particular groups of students, situations, or
purposes. 67
One potential source of error omits some important aspects of the intended construct being
tested. This is called construct underrepresentation. 68 An example would be a test.that is being
66 As indicated in the Joint Standards, "The extent to which predictive or concurrent evidence of validity generalization
can be found in new situations is in large measure a function of accumulated research. Although evidence of
generalization can often help to support a claim of validity in a new situation, the extent ofavai Iable data limits the
extent to which the claim can be sustained." Joint Standards, Chapter I, p, 15-16,
, 67
Joint Standards, Chapter I, Validity, p. 10,
Messick, S. (\989). Validity. In Educational Measurement. 3'" Edition, R.L. Linn, ed. New York: Macmillan, p, 13
103,
68
Draft 7/6/00
23
�The Use of Tests When Making High
Stakes Decisions for Students: A Resollrce7/6/00
Guide For Educators and Policymakers
Draft
used to. measure English language proficiency. When the institutio.n has defined English
language proficiency as including specific skills in listening, speaking, reading, and writing the
English language, and wants to. use a test which measures these aspects, co.nstruct
underrepresentatio.n wo.uld o.ccur if the test o.nly measured the reading skills.
The o.ther po.tential so.urce o.f erro.r occurs when a test measures material that is extraneo.us to. the
intended co.nstruct, co.nfo.unding the ability o.fthe test to. measure the co.nstruct that it intends to.
measure. This so.urce o.f erro.r is called co.nstruct irrelevance. 69 Fo.r instance, ho.w well a student
reads a mathematics test may influence the student's subtest sco.re in mathematics co.mputatio.n.
In this case, the student's reading skills are irrelevant when the skill o.fmathematics co.mputatio.n
is what is being measured by the subtest. 7o
An essential part o.fthe accumulated validity info.rmatio.n is co.llecting evidence no.t o.nly abo.ut
what a test measures in particular situatio.ns o.r fo.r particular students, but also. evidence that
seeks to. do.cument that the intended meaning o.f the test sco.res is no.t unduly influenced by either
o.fthe two. so.urces o.fvalidity erro.r.
3.
Considering the Co.nsequences o.fTest Use
Evidence abo.ut the intended and unintended co.nsequences o.ftest use can provide impo.rtant
info.rmatio.n abo.ut the validity o.fthe inferences o.fthe test results, o.r it can raise co.ncerns abo.ut
an inappro.priate use o.f a test where the inferences may be valid fo.r o.ther uses.
Fo.r instance, significant differences in placement test sco.res based o.n race, gender, o.r natio.nal
o.rigin may trigger a further inquiry abo.ut the test and ho.w it is being used to. make placement
decisio.ns.71 The validity o.fthe test sco.res wo.uld be called into. questio.n if the test sco.res are
substantially affected by irrelevant facto.rs that are no.t related to. the academic kno.wledge and
skills thatthe test is suppo.sed to. measure.72
.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and
performances as scientific inquiry into score meaning. American Psychologist 50(9): p.741-749.
69
Messick, 1989; 1995.
On the other hand, if an item is measuring the student's ability to apply mathematical skills in a written format (for
instance when an item requires students to fill out an order form), then writing skills may not be extraneous to the
construct being measured in this item.
70
71
See Code ofFair Testing Practices in Education, 1988.
72
Standards 7.5, 7.6 and 1.24. Standard 1.5, supra note 58.
Standard 7.6 states, "When empirical studies of differential prediction of a criterion for members of different subgroups
are conducted, they should include regression equations (or an appropriate equivalent) computed separately for each
group or treatment under consideration or an analysis in which the group or treatment variables are entered as
moderator variables."
Standard 1.24 states, "When unintended consequences result from test use, an attempt should be made to investigate
whether such consequences arise from the test's sensitivity to characteristics other than those it is intended to assess or
to the test's failure fully to represent the intended construct."
Draft 7/6/00
24
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakel's
Draft
.
On the other hand, a test may
Standard 13.1
accurately measure differences
in the level of students'
When educational testing programs are mandated by
academic achievement. That is,
school, district, state, or other authorities, the ways in
low scores may accurately
which test results are intended to be used should be
reflect that some students do not
clearly described. It is the responsibility of those who
mandate the use of tests to monitor their impact and to
know the content. However, test
identify and minimize potential negative consequences.
users should ensure t~at they
Consequences resulting from the uses of the test, both
interpret those scores correctly
intended and unintended, should also be 'examined by the
in the context of their high
test user.
stakes decisions. 73 For instance,
test users could incorrectly
conclude that the scores reflect lack of ability to master the content for some students when, in
fact, the low test scores reflect the limited educational opportunities that the students have
received. In this case, it would be problematic to use the test s'cores to place low perfonning
students in a special services program for students who have trouble learning and processing
academic content. It would be appropriate to use·the test to evaluate program effectiveness,
.
however. 74
B.
Reliability
Reliability refers to the consistency of test results. While no test is ever an "error-free"
measure of student performance,75 inferences of~dequate test reliability refer to estimates
which demonstrate that the inconsistency of the scores are minimized over test
administrations, forms, items, scorers, and/or other facets oftesting. 76 An example of
reliability of test results on different occasions is when the same students, takin'g the test
multiple times, receive ;>imilar scores. Consistency over parallel forms of a test occurs
73
Standards 7.5 and 7.10. Standard 7.5, supra note 58. Standard 7.10, supra note 25.
74
High Stakes, p. 89-113.
75 All sources of assessment information, including test results, include some degree of error. There are two types of
error. The first is random error that affects scores in such a way that sometimes students will score lower and
sometimes higher than their "true" score (the actual mastery of the students' knowledge and skills). This type of error,
also known as measurement error, particularly affects reliability of scores. Therefore, test scores are considered reliable
when evidence demonstrates that there is a min imum amount of random measurement error in the test scores for a
given group.
The second type of error that affects test results is systematic error. Systematic error consistently affects scores in one
direction; that is, this type of error causes some students to consistently score lower or consistently score higher than
their "true" (or actual) level of mastery. For instance, visually impaired students will consistently score lower than they
should on a test which has not been administered for them in Braille or large print, because their difficulty in reading
the items on the page will negatively impact their score. This type of error generally affects the validity of the
interpretation of the test results and is discussed in the validity section above. Systematic error should also be
minimized in a test for all test takers.
When educators and policy makers are evaluating the adequacy of a test for their local population of students,. it is
important to consider evidence concerning both types of error.
Evaluating the reliability of a test includes identifying the rrajor sources of measurement error, the size of the errors
resulting from these sources, the indication of the degree of reliability to be expected, arid the generalizability of results
across items, forms, raters, sampling, administrations, and other measurement facets.
:
76
Draft 7/6/00
25
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
Draft
when forms are developed to be 'equivalent in content and technical characteristics.
Reliability can also include estimates of a high degree of relationship across similar items
within a single test or subtest that are intended to measure the same knowledge or skill.
For judgmentally scored tests, such as essays, another widely used index of reliability
addresses consistency across raters or scorers. In each case, reliability can be estimated
in different ways, using one of several statistical procedures. 77 Different kinds of
reliability estimates vary in degree and nature of generalization.
In order to promote readability, the discussion on reliability presented here is meant to
reflect this complex topic in an accurate, but concise and user-friendly way. Readers are
encouraged to review Chapter 2, Reliability and Errors of Measurement, in the Joint
Standards for additional, relevant information. 78
L
77 These types of reliability estimates are known as test-retest, alternate forms, internal consistency, and inter-rater
estimates, respectively. See Joillt Stalldards, Chapter 2, Reliability, for some examples of different procedures.
78
Joillt Stalldards, Chapter 2, Reliability and Errors of Measurement, p. 25-36,
Draft 7/6/00
26
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
C.
Draft
Fairness
Tests are fair when they yield score interpretations that are valid and reliable for all
students who take the tests. That is, the academic tests must measure the same academic.
constructs (knowledge and skills)
for all students who take them,
regardless of race, national origin,
gender, or disability. Similarly,
the scores must not substantially
and systematically underestimate
or overestimate the knowledge or
skills of members of a particular
group. The Joint Standards
discuss fairness in testing in terms
oflack of bias, equitable treatment
in the testing process, equal scores
for students who have equal standing on the tested construct, and equity in opportunity to
learn the material being tested. 79 In order to promote readaoility, the discussion on
fairness presented here is meant to reflect this complex topic in an accurate, but concise
and user-friendly way. Readers are encouraged to review Chapter 7, Fairness in Testing
and Test Use, in the Joint Standards for additional, relevant information. 8o
l.
Fairness in Validity
Demonstrating fairness in the validation oftest score inferences focuses primarily on
making sure that the scores reflect the same intended knowledge and skills for all
students taking the test. For the most part this means that the test should minimize the
measurement of material that is extraneous to the intended constructs and which
confounds the ability ofthe test to accurately measure the constructs that it intends to
measure. Rather, a test score should accurately reflect how well each student has
mastered the intended constructs. The score should not be significantly impacted by
construct irrelevant influences.
{
Joint Standards, Chapter 7, Fairness in Testing and Test Usc, p. 74-80. In test measurement, the term fairness has a
specific set of technical interpretations. Four of these interpretations are discussed in the Joint Standards. For instance,
bias is discussed in relation to fairness and is defined in the Joint Standards in two ways: "In a statistical context, (bias
refers to) a systematic error in a test score. In discussing test fairness, bias (also) may refer to construct
underrepresentation or construct-irrelevant components of test scores that differentially affect the performance of
different groups of test takers (p. 172)." Fairness as equitable treatment in the testing process "requires consideration
not only of the test itself, but also the context and purpose of testing, and the manner for which test scores are used (p.
74).'~ Equal scores for students of equal standing reflects that "examinees of equal standing with respect to the construct
the test is intended to mcasure should on average earn the same test score, irrespective of group membership (p. 74)."
For educational achievement tests, "When some test takers have not had the opportunity to learn the subject matter
covered by the test content, they are likely to get low scores ... Iow scores may have resulted in part from not having had
the opportunity to learn the matieral tested as well as from having had the opportunity and failed to learn (p. 76)."
79
80
Joint Standards, Chapter 7, Fairness in Testing and Test Usc, p. 73-84.
Draft 7/6/00
27
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Draft
Guide For Educators alld Policymakers
The Joint Standards identify a number of standards that outline important elements
related to validly measuring the intended constructs for all students. 81 The elements span
considerations oftest development, test implementation, and the proper use of reported
test results.
Documenting fairness during test development involves gathering adequate evidence that
items and test scores are constructed so that the inferences validly reflect what is
intended. For all test takers, evidence should support that valid inferences can be drawn
from the scores. 82 When credible research reports that item and test results differ in
meaning across examinee subgroups, thEm to the extent feasible, separate validity
evidence for each relevant subgroup should be collected. 83 When items function
differently across relevant subgroups, appropriate studies should be conducted, when
feasible, so that bias in items due to test design, content, and format is detected and
eliminated. 84 Developers should strive to identify and eliminate language, form, and
content in tests that have a different meaning in one subgroup than in others, or that '
generally have sensitive connotations, except when judged to be necessary for adequate
representation pfthe intended constructs. 85 Adequate differential ;malyses should be
conducted when evaluating the validity of scores for prediction purposes. 86
81
Joint Standards, Chapter 7, Fairness in Testing and Test Use, p. 80-84.
82 Standard 7.2 states, "When credible res~arch reports differences in the effects of construct-irrelevant variance across
subgroups oftest takers on performance of sOple part of the test, the test should be used if at all only for those
subgroups for which evidence indicates that valid inferences can be drawn from test scores."
Standard 7.1 and 7.3. Standard 7, I states, "When credible research reports that test scores differ in meaning across
cxamince subgroups for the type of test in question, then to the extent feasible, the same forms of validity evidence
collectcd for the examinee population as a whole should also be collected for each relevant subgroup, Subgroups may
be found to differ with respect to appropriateness of test content, internal structure of test responses, the relation of test
scores to other variables, or the response proeesses employed by individual examinees. Any such findings should
receive due consideration in the interpretation and use of scores as well as in subsequent test revisions."
83
Standard 7.3 states, "When credible research reports that differential item functioning exists across age, gender,
racial/ethnic, cultural, disability and/or linguistic groups in the population oftest takers in the content domain measured
by the test, test developers should conduct appropriate studies when feasible. Such research should seek to detect and
eliminate aspects of test design, content, and format that might bias test scores for particular groups."
24
See Standard 7.3,supra note 83.
85
Standard 7.3 and Standard 7.4. Standard 7.3, supra note 83,
Standard 7.4 states, "Test developers should strive to identify and eliminate language, symbols, words, phrases, and
content that are generally regarded as offensive by members of racial, ethnic, gender, or othcr groups, except whcn
judged to be necessary for adequate representation of the domain." Comment: "Two issues are involved, The first
deals with the inadvertent use of language that, unknown to the test developer, has a different meaning or connotation
in one subgroup than in others. Tcst.publishers often conduct sensitivity reviews of all test matcrial to detcct and
remove sensitive material from the test. The second deals with settings in which sensitive material is essential for
validity. For example, history tests may appropriately include material on slavery or Nazis. Tests on subjects from life
sciences may appropriately includc material on evolution. A test of understanding of an organization's sexual
harassmen~ policy may require employees to evaluate examples of potentially offensive bchavior."
86
Standard 7.6, supra note 72.
. Draft 7/6/00
28
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
Adequate evidence should document the fair implementation of tests for all test takers.
The testing process should reflect equitable treatment for all examinees. 87 Linguistic or
reading demands in tests should be ~ept to a minimum except when these constructs are
·
.
.
.
b emg measuredO
Documentation of appropriate reporting and test use should be available. Reported data
should be clear and accurate, especially when there are high-stakes consequences for
students. 89 When tests are used in decisions that have high-stakes consequences for
students, evidence of mean score differences between relevant subgroups should be
examined, where feasible. When mean differences are found between subgroups,
investigations should be undertaken to determine that such differences are not attributable
to construct underrepresentation or construct irrelevant error. 90 Evidence about
differences in mean scores and the significance of the validity errors should also be
considered when deciding which test to use. 91 In using test results for purposes other
than selection, a test taker's score should not be accepted as a reflection of standing on
the intended constructs without consideration of alternative explanations for the test
taker's performance. 92 Explanations might reflect limitations of the test, for instance
construct irrelevant factors may have significantly impacted the student's score.
Explanations may also reflect schooling factors external to the test, for instance lack of
instructional opportunities .
. The issue of feasibility is discussed in a few' of the standards summarized above. In the
comments associated with these standards, feasibility is generally addressed in terms of
adequate sample size, with continued operational use of a test as a way of accumulating
adequate numbers of subgroup results over administrations. When credible research
reports that results differ in meaning across subgroups, collecting separate and parallel
validity data verifies that the same knowledge and skills are being measured for all test
Standard 7.12 states, "The testing or assessment process should be carried out so that test takers receive cqmparable
and equitable treatment during all phases of the testing or assessment process."
87
Standard 7.7 states, "In testing applications where the levcl of linguistic or reading ability is not part of the construct
of interest, the linguistic or reading demands of the test should be kept to the minimum necessary for the valid
assessment of the intended construct."
88
Standards 7.8, 7.9, 7.10, 1.24. Standard 7.8 states, "When scores are disaggregated and publicly reported for groups
idcntified by characteristics such as gender, ethnicity, age, language proficiency, or disability, cautionary statements
should be included whenever credible research reports that test scores may not have comparable meaning across these
different groups."
89
Standard 7.9 states, "When tests or assessments are proposed for use as instruments of social, educational, or public
policy, the test developers or users proposing the test should fully and accurately inform policymakers ofihe
characteristics of the tests as well as any relevant and credible information that may be available concerning the likely
consequences oftest use."
Standard 7.10, supra note 25. Standard 1.24, supra note 72.'
90
Standard 7.10, supra note 25.
91 Standard 7.11 states, "When a construct can be measured in different ways.that are approximately equal in their
.
degree of construct representation and freedom from construct-irrelevant variance, evidence of mean score differences
across relevant subgroups of examinees should be eonsidered in deciding which test to usc."
92
Standard 7.5, supra not~ 58.
Draft 7/6/00,
29
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
takers. Particularly in high-stakes situations, feasibility decisions need to include the
potential costs to students of using information where the validity of the scores has not
been verified. 93
2.
Fairness in Reliability
Fairness in reliability focuses on making sure that scores are stable and consistently accurateJor
all students. Two standards discuss issues of fairness in reliability. First, when there are reasons
for expecting that test reliability analyses might differ substantially for different subpopulations,
reliability data should be presented as soon as feasible for each major population for whom the
test is recommended. 94 Second,"[w]hen significant variations are permitted in test
administration procedures, separate reliability analyses should be provided for scores produced
under each major variation if adequate sample sizes are available.,,95 Often, continued
operational use of a test is a way to accumulate an adequate sample size over administrations.
D.
Cutscores
The same principles regarding fairness, validity, and reliability apply generally to the
establishment and use of cutscores for the purpose of making high-stakes educational
decisions. Cutscores, also known as cut points or cutoff scores, are specific points on the
test or scale where test results are used to divide levels of knowledge, skill, or ability. A
cutscore may divide the demonstration of acceptable and unacceptable skills, as in
placement in gifted and talented programs where students are accepted or rejected. There
may be multiple cutscores that identify qualitatively distinct levels of performance.
Cutscores are used in a variety of contexts, including decisions for placement purposes or
for other specific outcomes, such as graduation, promotion, or admissions. 96
See comment associated with Standard 10.7: "In addition to modifying tests and test administration procedures for
people who have disabilities, evidence of validity for inferences drawn from these tests is needed. Validatioll is tire
93
only way to amass knowledge about the usefulness ofmod(fied tests for people with disabilities. The costs ofobtaining
validity evidence should be considered in light ofthe consequences ofllot havillg usable illformatioll regarding the
meanings ofscores for people with disabilities. This standard is feasible in the limited circumstances where a sufficient
number of individuals with the same level or degree of a given disability is available (italics addedf'
94 Standard 2.11 states, "If there are generally accepted theoretical or empirical reasons for expecting that reliability
coefficients, standard errors of measurement, or test information functions will differ substantially for various
subpopulations, publishers should provide reliability data as soon as feasible for each major population for which the
test is recommended."/
95
Standard 2.18.
96 In order to promote readability; the discussion on cutscores presented here is meant to reflect this complex topic in an
accurate, but concise and user-friendly way. Readers are encouraged to review Chapter 4, Scales, Norms, and Score
Comparability, p. 53-54, in the Joint Standards for additional, relevant information about eutscores. See also Standards
1.19,13.9.
Standard '1.19 states, "If a test is recommended for use in assigning persons to alternative treatments or is likely to be so
used, and if outcomes from those treatments can reasonably be compared on a common criterion, then, whenever
feasible, supporting evidence of differential outcomes should be provided."
Standard 13.9, supra note 15.
Draft 7/6/00
30
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
.
Draft
Guide For Educators alld Policymakers
Many ofthe concepts regarding test validity apply to cutscores-that is, the cut points
themselves must be accurate
r--------.----------.
representations of the knowledge and
skills of students. 97 Further, the validity
evidence for cutscores should generally
be able to demonstrate that students
above the c~t point represent or
demonstrate a qualitatively greater
degree or different type of skills and
knowledge than those below the cut
point, whenever these types of
inferences are made. 98
Reliability of the cutscores is also important. The Joint Standards state that where
cutscores are specified for selection or placement, the degree of measurement error
around each cutscore should be reported. 99 Evidence should also indicate the
misclassification rates, or percentage of error in classifying students, that is likely to
occur among students with comparable knowledge and skills. 100 This information should
be available by group as soon as feasible ifthere is a prior probability that the
miscJassification rates may differ substan~ially by group. 101 For example, what
percentage of students who should be allowed to graduate would not be allowed to do so
because of error due to the test ratherthan differences in their actual knowledge and
skills?102
There is no single right answer to the questions of when, where and how cutscores should
be set on a test with high-stakes consequences for students. 103 Many experts suggest,
Joint Standards, Chapter' I, Validity, p. 9-16, discusses that the interpretation of all scores should be an accurate
representation of what is being measured.
97
98 See Standard 4.20's comment section for a discussion on these points. In high-stakes situations, it is important to
examine the validity of the inferences that underlie the specific decisions being made on the basis of the cutscores. In
other words, what must be validated is the specific use of the test based on how the scores of students above and below
the cutscore are being interpreted. What is also at issue is how scores clustered around the cut-off point are interpreted
in light of the high-stakes decision.
'.
Standard 2.14 states, "Conditional standard errors of measurement should be reported ~t several score levels if
constancy cannot be assumed. Where cut scores are specified for selection or classification, the standard errors of
measurement should be reported in the vicinity' of each cut score."
.
99
100 "Where the purpose of measurement is classificatio~, some measurement errors are more serious than others. An
individual who is far above or far below the value established for pass/fail or for eligibility for a special program can be
mismeasured without serious consequences. Mismeasurment of examinees whose true scores are close to the cut score
is a more serious concern ....The term classification consistency or inter-ratiir agreement, rather than reliability, would
be used in discussions of consistency of classification. Adoption of such usage would make it clear that the importance
of an error of any given size depends on the proximity of the examinee's score to the cut score." Joint Standards, p. 30.
101
Standard 2.1 J, supra note 94.
102 Misclassification of students above or below the cutpoints can result in both false positive and false negative,
classifications, respectively. The example in the text is a false ncgative classification.
103
High Stakes, Chapter 7, p. 168.
Draft 7/6/00
31
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Draft
Guide For Educators alld Policymakers
however, that mUltiple methods of determining cutscores should be used when
determining a final cutscore. I04 Further, the reasonableness of the standard setting
process and the consequences for students should be clearly and specifically documented
. for a given use. lOS Both the Joint Standards and High Stakes repeatedly state that
.
decisions should not be made solely or automatically on the basis of a single test score,
and that other relevant information should be taken into account if it will enhance the
oyerall validity of the decision. 106
104
High Stakes, Chapter 7, p.169.
105
See Standards 4.19 and 4.21 and their eomments. See also High Stakes, Chapters 5,6,7.
Standard 4.19 states, "When proposed seore interpretations involve one or more cut scores, the rationale and
proeedures used for establishing cut scores should be ~Iearly documented."
Standard 4.21 states, "When cut scores defining pass-fail or proficiency categories are based on direct judgments about
the adequacy of item or test performances or performance levels, the judgmental process should be designed so that
judges can bring their knowledge and experience to bear in a reasonable way."
106
See High Stakes, Chapters 5, 6, 7; Joint Stalld~rds, Standard 13.7. Standard 13.7, supra note 56.
Draft 7/6/00
32 '
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators and Policymakers
Draft
Test Measurement Principles:
Questions about Appropriate Test Use
In order to determine if a test is being used appropriately in making high-stakes
decisions about students, considerations about the context of the test use, and the
validity, reliability, and fairness of the scores and their interpretations need to be
addressed. In all cases, it is important that the evidence related to the technical merits
of the test be based on the current test being proposed.
1. What is the purpose for which the test is being used?
2. What information, besides the test, is being collected to inform this purpose?
3. Based on how the test results are to be used, is there adequate evidence of validity
to document that the test score inferences are accurate and meaningful for the
students taking the test? That is,
• Does the evidence support that the inferences accurately reflect the specific
knowledge and skills the test says it measures?
• Does the evidence support that the inferences are valid for the stated purpose,
and in the particular type of setting where the test is to be administered?
• Does the evidence support that the inferences are valid for the specific groups
of students who are taking the test?
4. Is there adequate evidence of reliability of the test scores for the proposed use?
5. Is there adequate evidence of fairnes~ in validity and reliability to document that
the test score inferences are accurate and meaningful for all students taking the
test? That is,
• Does the evidence support that the inferences are measuring the same
constructs for all students?
• Does the evidence support that the sCores do not systematically underestimate
or overestimate the know ledge or skills of members of a particular group?
• Does the evidence demonstrate validity and reliability of the score inferences
for each relevant subgroup when a prior probability exists that, across
examinee subgroups, test scores may differ in meaning or that the reliability
of the scores may vary substantially?
6. Is there adequate evidence that cutscores have been properly established and that
they will be used in ways that will provide accurate and meaningful information
for all test takers?
Draft 7/6/00
33
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
II.
Draft
Accuracy in Te$ting Limited English Proficient Students
and Students with Disabilities
All aspects of validity, reliability, fairness, and cutscores discussed above are applicable
to the measurement of knowledge and skills of all students, including limited English
proficient students I 07 and students with disabilities. This section addresses additional
issues related to accurately measuring the knowledge and skills of these two student
populations.
Ensuring that test score inferences accurately re:t:Ject the intended constructs for all
'students is a complex task. It involves several aspects of test construction, pilot testing,
implementation, analysis, and reporting. The appropriate inclusion of students from these
populations in validation and norming samples, and the meaningful inclusion of limited
English proficient experts and disability experts throughout the test development process,
helps ensure suitable test quality and use for all test takers.
The proper inclusion of all students in testing programs helps to ensure that high-stakes
decisions are made on the basis of tests results that are as comparable as possible across
all test takers, rather than on the basis of results from assessments that are developed to
measure different content domains. 108 The appropriate inclusion of all students can also
help to ensure that educational benefits attributable to the high-stakes decisions will be
available to all. In some cases, it is appropriate to test limited English proficient students
and students with disabilities under standardized conditions, as long as the evidence
supports the validity of the scores in a given situation for these students. In other cases,
the conditions may have to be accommodated to assure that the scores validly reflect the
students' mastery of the intended constructs. 109 The use of multiple measures generally
enhances the accuracy of the educational decisions, and these measures can be used to
confirm the validity of the test results.
A.
General Considerations about Accommodations·
Making similar inferences about academic test scores for all test takers, and·making
appropriate decisions when using these scores, requires measuring the same academic
constructs (knowledge and skills in specific subject areas) across groups and contexts. In
measuring the knowledge and skills of limited English proficient students and students
with disabilities, it is particularly important that the tests actually measure the intended
knowledge and skills and not other factors which are extraneous to the intended
107 These are students who are learning English as a second language. Other documents sometimes refer to these
students as English language learners.
108
f-ligh Stakes, p. 7, 80.
109 See Joint Standards, Chapter 7, Fairness in Testing and Test Use; Chapter 9, Testing Individuals of Differing
Linguistic Backgrounds; Chapter 10, Testing Individuals with Disabilities.
Draft 7/6/00
34
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7l6/00
Guide For Educators mId Policymakers
Draft
.
construct. I 10 For instance, impaired visual capacity ~ay influence a student's test score
in science when the student must sight read a typical paper and pencil science test. In
measuring science skills, the student's sight is likely not relevant to her knowledge of
science. Similarly, how well a limited English proficient student reads English may
influence the student's test score in mathematics when the student must read the test. In
this case, the student's reading skills are not relevant when the skills of mathematics
computation are to be measured.
Typically, accommodations to
established conditions are found in
Standard 10.1
three main phases of testing: 1) the
administration of tests, 2) how
In testing individuals with disabilities, test
students are allowed to respond to
developers, test administrators, and test users should
the items, and 3) the presentation of
take steps to ensure that the test score inferences
the tests (how the items are
accurately reflect the intended construct rather than
presented to the students on the test
any disabilities and their associated characteristics
extraneous to the intent of the measurement.
instrument). Administration
accommodations involve setting
and timing, and can include
extended time to counteract the
increased literacy demands or fatigue for a student with leaming or physical disabilities.
Response accommodations allow students to demonstrate what they know in different
ways. Presentation accommodations can include format variations such as fewer items
per page, and plain language editing procedures, which use short sentences, common
words, and active voice. There is a wide variation in which accommodations are used
across states and school districts. (Appendix C lists many ofthe accommodations used in
large scale testing for limited English proficient students and students with disabilities.)
Issues regarding the use of accommodations are complex. When the possible use of an
. accommodation for a student is being considered, two questions should be examined: 1)
What is being measured if conditions are accommodated? 2) What is being measured if
the conditions remain the same? The decision to use an accommodation or not should be
grounded in the ultimate goal of collecting test information that accurately and fairly
represents the knowledge and skills of the student on the intended constructs. The
overarching concern should be that test score inferences accurately reflect the intended
constructs rather than factors extraneous to the intent of the measurement. III
110
This is known as construct irrelevance. See ,po 25 above; Joint Standards, p. 173-174.
III Standards 9.1, 10.1, Messick, 1989. Standard 9.1 states, "Testing practice should be designed to reduce threats to
the reliability and validity of test score inferences that may arise from language differen~es."
Sta.ndard 10.1 states, "In testing individuals with disabilities, test developers, test administrators, and test users should
take steps to ensure that the test score inferences accurately reflect the intended construct rather than any disabilities
.
and their associated charactetistics extraneous to the intent of the measurement."
Messick (1989), supra note 68.
Draft 7/6/00
35
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
B.
Draft
Limited English Proficient Students
The Joint Standards and several recent measurement pUblications discuss the population
of limited English proficient students and how test publishers and users have handled
inclusion in tests to date. 112 This section briefly outlines principles derived from the Joint
Standards and these publications. It addresses two types of testing situations especially
relevant for limited English proficient students: the assessment of English language
proficiency and the assessment of academic educational achievement.
Interpretation of the scores of limited English proficient students should accurately and
fairly reflect the academic knowledge, skills, or abilities that the test intends to measure,
minimizing the effect of factors irrelevant to
the intended constructs. I 13 When credible
Standard 9.1
research evidence reports that scores may
differ in meaning across subgroups of
Testing practice should be designed to
reduce threats to the reliability and
linguistically diverse test takers,then, to the
validity of test score inferences that
extent feasible, the same form of validity
may arise from language differences.
evidence should be collected for each
subgroup as for the examinee population as a
whole. 114 "When a test is recommended for use with linguistically diverse test takers, test
developers and publishers should provide the information necessary for appropriate test
use and interpretation;" I 15 recommended accommodations should be used appropriately
and described in detail in the test manual; 116 translation methods and interpreter expertise
. should be clearly described; 117 and evidence of the reliability and validity of the
112 For instancc, Joint Standards, Chapter 9; High Stakes, Chapter 9; Improving Schooling/or Language Minority
Children: A Research Agenda (National Research Council, August and Hakuta, 1997); Ensuring Accuracy in Testing
lor English Language Learners (Kopriva, 2000, Washington D.C. Council of Chief State School Officers).
113
See Standard 9.1, supra note I II.
114 Standard 9.2 states, "When credible research evidence reports that test scorcs differ in meaning across subgroups of
linguistically diverse test takers, then to the extent feasible, test developers should collect for each linguistic subgroup
studied the same form of validity evidence collected for the examinee population as a whole."
liS
Standard 9.6
Standard 9.5 states, "When thcre is credible evidence of score comparability across regular and modified tests or
administrations; no flag should be attached to a score. When such evidence is lacking, specific information about the
nature of the modification should be provided, if permitted by law, to assist test users properly to interpret and act on
test scores."
116 Standard 9.4 states, "Linguistic modifications recommended by test publishers, as well as the rationale for the
modifications, should be described in detail in the test manual."
117 Standards 9.7, 9.11. Standard 9.7 states, '.'When a test is translated from one language to another, the methods used
in establishing the adequacy of the translation should be described, and empirical and logical evidence should be
provided for score reliability and the validity of the translated test's score inferences for the uses intended in the
linguistic groups to be tested."
Standard 9.11 states, "When an interpretation is used in testing, the interpreter should be fluent in both the language of
the test and the examinee's native language, should have expertise in translating, and should have a basic understanding
of the assessment process."
Draft 7/6/00
36
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
translated test score's inferences should be collected and made available in order to
support sound test use by educators and 'policy ma~ers.118
1.
Assessing English Language. Proficiency
Issues of validity, reliability, and fairness
Standard 9.10
apply to tests and other relevant'
assessments that measure English language
Inferences about test takers' general
proficiency. English language proficiency
language proficiency should be based on
is typically defined as proficiency in
tests that measure a range of language
reading, writing, speakin~, and
features, and not on a single linguistic
understanding English. II ( Assessments
skill.
that measure English language proficiency
are generally used to make decisions about
who should receive English language acquisition services, the type of programs in which
these students are placed, and the progress of students in the appropriate programs.
They are also used to evaluate the English proficiency of students when exiting from
services, to ensure that they can successfully participate in the regular school
curriculum. In making decisions about which tests are appropriate, it is particularly
important to make sure that the tests accurately and completely reflect the intended
English language proficiency constructs so that the students are not misclassified, It is
generally accepted that an evalua:tion of a range of communicative abilities will typically
.
need to be assessed when placement decisions are being made. 120
118
Standard 9.7, supra note 117.
119
Improving ScllOolingjor Lallguage Minority Children, p. 116-118.
120 Comment under Standard 9.1 0, p. 99-100. Standard 9, 10 states, "Inferences about test takers' general language
. proficiency should be based on tests that measure a range of language features, and not on a single linguistic skill."
Draft 7/6/00
37
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
2.
Draft
Testing the Academic Educational Achievement
Of Limited English Proficient Students
S~veral factors typically affect how well the educational achievement of limited English
proficient students is measured on standardized academic tests. For all test takers, any
test that employs written or oral skills in English or in another language is, in part, a
measure of those skills in the particular language. Test use with individuals who have not
sufficiently acquired the literacy or linguistic skills in the language of the test may
introduce construct-irrelevant components to the testing process. In such instances, test
results may not reflect accurately the qualities ·and competencies intended to be
measure4.121 While it is very important that the test score inferences. are valid, reliable,
and fair, the technical issues associated with developing meaningful achievement tests for
this population are complex and difficult to accomplish. Tests must be developed so that
they effectively measure the students' knowledge and skills in intended academic
achievement constructs rather than factors irrelevant to those constructs, i.e. literacy skills
when literacy is not what is being measured. This is particularly important when tests are
used to make high stakes decisions for individual' students. Reducing the influence of
construct irrelevant factors includes minimizing the confounding conditions in the test or
the testing process so that the students can access the test requirements. 122 It also
includes providing native language tests where possible, when this approach would yield
more accurate results for limited English proficient students. 123 In collecting evidence to
support the technical quality of a test for these students, the accumulation of data may
need to occur over several test administrations to ensure robust sample sizes.
a.
Background Factors for Limited English Proficient Students
The background factors particularly salient in ensuring accuracy in testing for students
with limited English proficiency tend to relate to literacy, culture, and schooling. 124
Limited English proficient students often bring varying levels of English and home
language literacy skillS to the testing situation. 125 These students may be adept in
conversing orally in their home language, but unless they have had formal schooling in
their home language, they may not have a corresponding level of literacy. Also, while
students with limited English proficiency may acquire a degree of oral proficiency in
English, literacy in English for many students comes later. 126 To add to the cO):11plexity,
121
See Joint Standards, p. 91.
122
See Standard 9.1, supra note 111.
123 Standards 9.3 states "When testing an examinee proficient in two or more languages for which the test is available,
the examinee's relative language proficiency should be determined. The test generally should be administered in the
test taker's most proficient language, unless proficiency in the less proficient language is part of the assessment.
124/mproving Schooling/or Language Minority Children, Chapter 5; Ensuring Accuracy in Testing/or English
Language Learners, Chapter 1.
125 See Joint Standards, Chapter 9, p. 91-100; Ensuring Accuracy in Testing/or English Language Learners, Chapter 1.
126 Testing, Teaching and Learning, p. 61.
Draft 7/6/00
38
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakel's
Draft
oral and literacy proficiency in either the home language or English involves both social
and academic components. Thus, a student may be able to write a well-organized social
letter in his or her home language, and may not be able to orally explain adequately in
that language how to solve a mathematics problem that includes the knowledge of
concepts and words endemic to the field of mathematics. The same phenomena may
occur in English as well. 127
Therefore, in determining how to effectively measure the academic knowledge and skills
of this population, educators and policymakers should consider how to minimize the
influence of literacy issues, except when these constructs are explicitly being measured.
Considering the level of linguistic and literacy proficiencies of limited English proficient
students in their home language and iri English will often affect which achievement tests
are appropriate for these students, and which accommodations to standardized testing
conditions, if any, might be most useful for which students. 128
Additionally, diverse cultural and other background experiences, including variations in
amount, type and location (home country and U.S.) of formal schooling, as well as
interrupted and multi-location schooling (of the type frequently experienced by children
of migrant workers), affect language literacy, the contextual content of items, and the
academic foundational knowledge base that can be assumed in educational achievement
tests. The format and procedures involved in testing can also affect accuracy in test
scores, particularly if the test practices differ substantially from ongoing instructional
.
.
practIces III cI assrooms. 129
127
Improving Schoolingfor Language Minority Children, Chapte'r 5, p. 113-137.
128
Id. at Chapter 5.
129
Ensuring Accuracy ill Testingfor English Language Leamers, Chapters 3,4, 7, and 9.
Draft 7/6/00
39
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
b.
Draft
Accommodations for Limited English Proficient Students
Providing accommodations to established testing conditions for some students with"
limited English proficiency may be appropriate when their use would yield the most valid
scores on the intended academic achievement constructs. Deciding which
accommodations to use for which students usually involves an understanding of which
construct irrelevant background factors would substantially influence the measurement of
intended knowledge and skills for individual students, and how the accommodations
would impact the validity of the test score interpretations for these students. 130 Appendix
C lists various test presentation, administration, and response accommodations that states
and districts generally employ when testing limited'English proficient students.
Examples of accommodations in the presentation of the test include editing text so the
items are in plain language, or providing page formats which minimize confusion by
limiting use of columns and the number of items per page. Presenting the test in the
student's native language is an accommodation to a test written in English when the same
constructs are being measured on both the English and native language versions.
Administration accommodations include extending the length of the testing period,
permitting breaks, administering tests in small groups or in separate rooms, and allowing
English or native language glossaries or dictionaries as appropriate. Response
accommodations include oral response and permitting students to respond in their native
language.
C.
Students with Disabilities
The Joint Standards and several recent measurement publications discuss the popUlation
of students with disabilities and how test publishers and users have handled inclusion in
tests to date. 131 This section briefly outlines principles derived from the Joint Standards
and these publications. It addresses three types of testing situations especially relevant
for students with disabilities: tests used for diagnostic and intervention purposes, the
assessment of academic educational achievement, and alternate assessments for K -12
students with disabilities who cannot participate in school-wide tests.
The Joint Standards provide that interpretation of the scores of students with disabilities
should accurately and fairly reflect the academic knowledge, skills, or abilities that the
test intends to measure. The interpretation should not be confounded by the challenges of
the students that are extraneous to the intent ofthemeasurement. 132 Rather, validity
130 See Ensuring Accuracy in Testingfor English Language Learners, Chapters 6 and 8, for a discussion of which
accommodations might be most beneficial for students with various background factor~,
131 For instance, Joint Standards, Chapter 10; High Stakes, Chapter 8; Educating One and At: Students with Disabilities
and Standards-Based Reforln (National Research Council, McDonnell, McLaughlin, and Morison, 1997); Testing
Students with Disabilities (Thurlow, Elliot, and Ysseldyke, 1998, NY: Corwin Press),
°
132 Standards, 10.1, 10.10, See Standard 10, I , supra note III, Standard 10, I states, "Any test modifications adopted
should be appropriate for the individual test taker, while maintaining all feasible standardized features. A test
Draft 7/6/00
40
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Et/ucators ami Policymakers
Draft
evidence should document that the inferences of the scores of students with disabilities
are accurate. Pilot testing and other technical investigations should be conducted where
feasible to ensure the validity of the test inferences when accommodations have been
allowed. 133 Feasibility is always a consideration; al~hough the Joint Standards comment,
"[T]he costs of obtaining validity evidence should be considered in light of the
consequences of not having usable information regarding the meanings of scores for
'
.
peop Ie WIt h d'Isab'l"It1es,,134
"
. , '
I
1.
Tests used for Diagnostic and Intervention Purposes
All issues of validity, reliability, and fairness
apply to tests and other assessments used to
make diagnostic and intervention decisions
for students with disabilities. Tests that
yield diagnostic information typically focus
in great detail on identifying the specific
Standard 10.12
In testing individuals with disabilities for
diagnostic and intervention purposes, the
test should not be used as the sole indicator
of the test taker's functioning. Instead,
Illultiple sources of information should be
used.
professional needs ,to consider reasonably available informati0t!:====:==:==.==:;::::;::=:;==::=========J
capabilities that might impact tcst performance, and document the grounds for the modification,"
IJJ Several standards discuss the appropriate types of validity evidence, including Standards 10.3, 10.5, 10.6, 10.7, 10.8,
and 10.11. Because ofthe low incidence nature of several of the disability groups, especially when different severity
levels and combinations'ofimpairments are considered, this type of evidence will probably need to be accumulated
over time in order to have a large enough sample size.
Standard 10.3 states, "Where feasible, tests that have been modified for use with individuals with disabilities should be
pilot tested on individuals who have similar disabilities to investigate the appropriatcness and feasibility of the
modifications."
.
Standard 10.5 states, "Technical material and manuals that accompany modified tests should include a careful
statement ofthe steps taken to modify the test to aiert users, to changes that are likely to alter the validity of inferences
drawn from the test scores."
Standard 10,6 states, "If a test developer recommends specific time limits for people with disabilities, empirical
procedures should be used, whenever possible, to establish time limits for modified forms of timed tests rather than
simply allowing test takers with disabilities a multiple of the standard ,time. When possible, fatigue should be
investigated as a potentially important factor when time limits are extended,"
Standard 10.7 states, "When sample sizes permit, the validity of inferences made from test scores and the reliability of
scores on tests administered to individuals with various disabilities should be investigated and reported by the agency
or publisher that makes the modification. Such investigations should examine the effects of modifications made for
people with various disabilities ori resulting scores, as well as the effects of administering standard unmodified tests to
them."
Standard 10.8 states, "Those responsible for decisions about test use with potential test takers who may need or may
request specific accommodations should (a) possess the information necessary to make an appropriate selection of
measures, (b) have current information regarding the availability of modified forms of the test in question, (c) inform
individuals, when appropriate, about the existence of modified forms, and (d) make these forms available to test takers
when appropriate and feasible,"
Standard 10.11 states, "When there is credible evidence of score comparability across regular and modified
administrations, no flag should be attached to a score, When such evidence is lacking, specific information about the
nature of the modification should be provided, if permitted by law, to assist test users properly'to interpret and, act on
test scores."
134
Comment under Standard 10.7, pg. 106,
Draft 7/6/00
,
41
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Draft
Guide For Educators alld PolicYlIlakers
challenges and strengths of a student. 135 These diagnostic tests are often administered in
one-to-one situations (test taker and examiner) rather than in a group situation. In many
cases they have been designed with standardized adaptations to fit the needs of individual
examinees. In making decisions about which tests are appropriate to use, it is important
to make sure that the tests accurately and completely reflect the intended constructs, so
that the interventions are appropriate and beneficial for the individual students.
2.
Testing the Academic Educational Achievement
. Of Students with Disabilities
Several factors affect how well the educational achievement of students with disabilities
is measured on standardized academic tests. While it is very important that the test score
inferences are valid, reliable, and fair, the technical issues associated with developing
meaningful achievement tests for this population are complex and difficult to accomplish.
To ensure accuracy in testing of studen~s with disabilities, tests must be developed so that
they effectively measure the students' knowledge and skills in academic achievement
rather than factors irrelevant to the intended constructs of the test. This is particularly
important when achievement tests are used to make high-stakes decisions for individual
students with disabilities. Reducing the influence of construct irrelevant factors includes
minimizing the confounding conditions in the test or the testing prpcess so that the test
accurately measures what it is supposed to measure. 136 In collecting evidence to support
the technical quality of the test for these students, the accumulation of data may need to
occur over several test administrations to ensure robust sample sizes.
a.
Background Factors for Students with Disabilities
The background factors particularly important to students with disabilities are generally
related to the nature of the disabilities or to the schooling experiences' of these students. 13?
135
Joint Standards, Chapters 10, 12, and 13; High Stakes, Chapter l.
136
See Standard 10.1, supra note III.
137
Educating One and All, Chapter 3; Testing Individuals with Disabilities.
Draft 7/6/00
42
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Draft
Gu;tle For Educators altd Policymakers
. Within ~ny disability category, the type, number, and severity of impairnlents vary
greatly. J38 For instance, some students with learning disabilities have a processing'
disability in only one subject, such as mathematics, while others experience accessing;
retrieval, and.processing impaimients that affect a broad number of school. subj ects and
contexts. For many of these students, one or more of the impairments maybe relatively
mild, while for others one or more can be sigpificant. Further, different types of
disabilities yield significantly different constellations of Issues. Forinstance, the
considerations surrounding hearing impaired students overlap significantly with limited
English proficient students in some ways and with other students with disabilities in other
respects. This complexity poses a challenge not only to educators, but also to test
administrators and developers. In general, in determining how to use academic tests
appropriately for students with disabilities, educators and policymakers should consider
how to minimize the influence of the impairments in measuring the intended constructs.
138
Joint Standards, Chapter 10, Testing Individuals with Disabilities, p. 10 I-I 05.
Draft 7/6/00
43
�The Use of Tests When Making High
Stakes Decisions for Students: A ResouI'ce 7/6/00
Draft
Guide Fol' Educatol's alld Policymakel's
Educating One and All explains that the schooling experiences of students with
disabilities vary greatly as a function of their disability, the severity of impainnents, and
expectations of their capabilities. 139 Two sets of educational experiences, in particular,
affect how educators and policy makers accommodate tests and use them appropriately
for this population. First, guidance about the schooling and evaluation of students with
disabilities is provided by individualized education program (IEP) teams made up of
educators and parents. These teams often recommend testing accommodations that they
feel would be appropriate for individual students. Second, classroom instructional
techniques affect large scale testing. While special educators have a long history of
accommodating instruction to fit student strengths, not all the instructional practices are
appropriate in large scale testing. Additionally, some students may not have been
exposed routinely to the types of accommodations that would be possible in large scale
testing. 140
b.
Accommodations for Students with Disabilities
Providing accommodations to established testing conditions for some students with disabilities
may' be appropriate when their use would yield the most valid scores on the intended academic
achievement constructs. Deciding which accommodations to use for which students usually
involves an understanding of which construct irrelevant background factors would substantially
influence the measurement of intended knowledge and skills for individual students, and how the
accommodations would impact the validity ofthe test score interpretations for these students. 141
Appendix C lists various presentation, administration, and response accommodations that states
and districts generally employ when testing students with disabilities. Examples of presentation
accommodations are the use of Braille, large print, oral reading, or providing page fonnats which
minimize confusion by limiting use of columns and the number of items per page.
Administration accommodations in setting include allowing students to take the test at home or
in a small group, and accommodations in timing include extended time and frequent breaks.
Variations in response fonnat include allowing students to respond orally, point or use a
computer.
3.
Alternate Assessments
Alternate assessments are assessments for those students with disabilities who cannot participate
in state or district-wide standardized assessments, even with the use of appropriate
accommodations and modifications. 142 For the constructs being measured, the considerations
with respect to validity, reliability, and fairness apply to alternate assessments, as well.
Appropriate content needs to be identified, and procedures designed to ensure technical rigor
139
See Educating One and All, Chapter 3.
140
See Educating One and All, Chapter 5.
141 See Testing Students with Disabilities for a discussion of which accommodations might be most beneficial for
students with various impairments and other background factors.
142
The IDEA req~ires use of alternate assessments in certain areas. See 34 C.F.R. 300.138.
Draft 7/6/00
44
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource716100
Guide For Educators alld Policymakers
Draft
need to be followed. 143 In addition, strong evidence should show that the test measures the
knowledge and skills it intends to measure, and that the measurement is a valid reflection of
mastery in a range of contextual situations.
143 See Educating One and All, Chapter 5, and Testing Students with Disabilities for a discussion of the issues and
processes involved in developing and implementing alternate assessments.
Draft 7/6/00
45
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators and Policymakers
Draft
CHAPTER 2. Legal Principles
It is important for educators and policy makers to understand the test measurement
principles and the legal principles that will enable them to ask informed questions and
make sound decisions regarding the use of tests for high-stakes purposes. The goal of
this chapter is to explain the legal principles that apply to educational testing.
The primary focus of this chapter is four federal nondiscrimination laws, enacted by
Congress, and their implementing regulations: Title VI of the Civil Rights Act of 1964
(Title VI), Title IX of the Education Amendments of 1972 (Title IX), Section 504 of the
Rehabilitation Act of 1973 (Section 504), and Title II of the Americans with Disabilities
Act of 1990 (Title 11).144 Within the U.S. Department of Education, the Office for Civil
Rights has responsibility for enforcing the requirements of these four statutes and their
implementing regulations. Although the Office for Civil Riglits does not enforce federal
constitutional provisions, an overview of these constitutional principles, including under
the Fifth and Fourteenth Amendments of the U.S. Constitution, has also been included for
informational purposes. The discussion oflegal principles in this chapter is'intended to
reflect existing legal principles and does not establish new requirements. 145
144 Title VI prohibits discrimination on the basis of race, color and n,ational origin in the programs and activities of
recipients that receive federal financial assistance. The U.S, Department of Education's regulation implementing Title
VI is found at 34 C.F.R. Part 100. Title IX prohibits discrimination on the basis of sex in educational programs and
activities of recipients offederal financial assistance. The U.S, Department of Education's regulation implementing
Title IX is found at 34 C.F.R. Part 106. Section 504 prohibits discrimination on the basis of disability in the programs
and activities of recipients of federal financial assistance, The U.S. Department of Education's regulation implementing
Section 504 is found at 34 C.F.R. Part 104. Title II prohibits discrimination on the basis of disability by public entities,
regardless of whether they receive federal funding. The U.S. Department of Justice's regulation implementing Title II
is found at 28 C.F,R. Part 35.
145 Consistent with this approach, court'decisions are not cited if the case is still on appeal or the time to request an
appeal has not ended.
146 See Slwrifv. New York Stale Educ. Dep't., 709 F. Supp. 345, 354-355, 364 (S.D. N. Y. 1989) (in granting a motion
for preliminary injunction, where girls received comparatively lower scores than boys, court found that the state's use
of SAT scores as the sole basis for decisions awarding college scholarships intended to reward high school
achievement was not educationally justified for this purpose in that the SAT had been designed as an aptitude test to
predict college success and was not designed or validated to measure past high school achievement).
Draft 7/6/00
46
�The Use of Tests When Making High
Stakes Decisions for Students: A Resou,.ce7/6/00
Guide Fo,. Educato,.s alld Policymake,.s
I.
Draft
Discrimination Under Federal Statutes and Regulations
Congress has enacted four statutes prohibiting discrimination based on race, color,
national origin, sex, and disability In schools, colleges, and universities. Title VI
prohibits discrimination based on race, color, or national origin; Title IX prohibits
discrimination based on sex; and Section 504 and Title II of the Americans with
Disabilities Act (ADA) prohibit discrimination based on disability. Title VI, Title IX,
and Section 504 apply to all educational institutions that receive federal funds. Title II of
the ADA applies to public entities, including public school districts and state colleges and
universities. 151 The Title VI, Title IX, Section 504, and Title II statutes and their
implementing regulations as well as. the equal protection clause of the Fourteenth
Amendment to the United States Constitution, prohibit intentional discrimination, based
on race, national origin, sex, or disability. In addition, the regulations that implement
Title VI, Title IX, Section 504 and Title II prohibit policies or practices that have a
147 See United States v. Fordice, 505 U.S. 717,733-738 (1992) (invalidating state's exclusive reliance on ACT scores as
a basis for college admissions at historically segregated colleges where the state adopted the ACT for discriminatory
reasons and the ACT administering organization recommended that college admissions decisions consider high school
grades along with test scores); see also Sharif. 709 F. Supp. at 364.·
148 See Lau v. Nichols, 414 U.S. at 566-569 (finding a violation of the Title VI regulations where limited English
proficient students were taught only in English and not provided any special assistance needed to meet English
language proficiency standards required by the state for.a· high school diploma). See also Debra P., 644 F.2d at 406
408 (holding that use of a graduation test that covered material that had not been taught in class would violate the due
process and equal protection clauses and that, under the circumstances of the case, immediate use of the diploma
sanction for test fai lure would punish black students for deficiencies created by an illegally segregated school system
which had provided them with inferior physical structures, course offerings, instructional materials, and equipment).
149 See Larry P. v. Riles, 793 F.2d at 980-981,983 (finding that IQ tests the state used had not been validated for use as
the sole means for determining that black children should be placed in classes for educable mentally retarded students);
Sharif. 709 F. Supp. at 354 (observing that the SAT under-predicts success for female college freshmen as compared
with males). See also Parents in Action on Special Educ. v. Hannon, 506 F. Supp. 831, 836-837.(N.D. Ill. 1980)
(court's analysis of items on I.Q. test found only minimal amount of cultural bias not resulting in erroneous mental
retardation diagnoses given other. information considered in process).
150 See Groves v. Alabama State Bd. of Educ, 776 F. Supp. 1518, 1530-1531 (M.D. Ala. 1991) (finding test required for
admission to undergraduate teacher training program would not be educationally justified if the passing score is not
itself a valid measure of the minimal ability necessary to become a teacher); Richardson v. Lamar County Bd. ofEduc.,
729 F. Supp. 806, 823-825 (M.D. Ala. 1989) (evidence revealed that cut off scores had not been set through a well
conceived, systematic process nor could the scores be characterized as reflecting the good faith exercise of professional
judgment), aff'd sub nom., Richardson v. Alabama State Bd. ofEduc., 935 F.2d 1240 (II th Cir. 1991).
151 OCR enforces five nondiscrimination statutes, Title VI of the Civil Rights Act of 1964, 42 U.S.c. §§ 2000d, et seq.
(2000); Title IX of the Education Amendments of 1972, 20 U.s.c. §§ 1681 et seq. (1999); Section 504 of the
Rehabilitation Act of 1973, as amended, 29 U.S.c. §§ 794 (1999); Title II of the Americans with Disabilities Act of
1990,42 U.S.c. §§, 12131, et seq. (1995 and Supp. 1999); and the Age Discrimination Act of 1975, as amended, 42
U.S.c. §§ 610 I, et. seq. (1995 and Supp. 1999). Regulations issued by the United States Department of Education
implementing Title VI, Title IX, and Section 504, respectively, can be found at 34 C.F.R. Part 100,34 C.F.R. Part 106,
and 34 C.F.R. Part 104. These regulations can be found on ·OCR's web-site at www.ed.govlofJices.OCR. For regulations
implementing Title /1 of the ADA, see 28 C.F.R. Part 35. Title II I of the ADA, which is enforced by the U.S.
Department of Justice, prohibits discrimination in public accommodations by private entities, including schools.
Religious entities operated by religious organizations are exempt from Title Ill.
Draft 7/6/00
47
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
discriminatory disparate impact on students based on their race, national origin, sex, or
disability. I 52
'
This section describes two central analytical frameworks for examining allegations of
discrimination as set forth in federal nondiscrimination regulations: different treatment
and disparate impact. 153 It also includes a further discussion of legal principles that apply
specifically to students with limited English proficiency and to students with disabilities.
A.,
Different Treatment
Under federal law, policies and practices generally must be applied consistently to
similarly situated individuals or groups, regardless of their race, national origin, sex, or
disability.154 For example, a federal court concluded that a school district had
intentionally treated students differently on the basis of race where minority students
whose test scores qualified them for two or more ability levels were more likely to be
assigned to the lower level class than similarly situated white students, and no
explanatory reason was evident. 155
In addition, educational systems that were previously segregated by race in violation of
the Fourteenth Amendment and have not achieved unitary status have an obligation to
dismantle their prior de jure segregation. In such instances, when a school district or
other educational system uses a test or assessment procedure for a high-stakes purpose
that has racially disproportionate effects, the school district or other educational system
must show that the disparity is not traceable to prior intentional segregation or that the
test or assessment procedure does not perpetuate the adverse effects of such
152
34 C.F,R, § 100J(b)(2); 34 C.F,R, §§ 106,21(b)(2), 106,36(b), 106,52; 34 C.P,R, § 104,4(b)(4)(i); and 28 C.F,R. §
35.l30(b)(3).
The authority of federal agencies to issue regulations with an "effects" standard has been consistently acknowledged by
U.S. Supreme Court decisions and applied by lower federal courts addressing claims of,discrimination in education.
See, e.g., Lau v. Nichols, 414 U.S. 563, 568 (1974); Guardians Ass 'II. v. City Service C~mm 'n. of City ofN. Y., 463 U.S.
582,584-593 (1983); Alexander v. Choate, 469 U.S. 287, 289-300 (1985), See also Memorandum from the Attorney
General for Heads of Departments and Agencies that Provide Federal Financial Assistance, "Use ofthe Disparate
Impact Standard in Administrative Regulations under Title VI of the Civil Rights Act of 1964," July 14, 1994.
153 Intentional racial discrimination is a violation of both the Fourteenth Amendment to the United States Constitution
and federal civil rights statutes in cases where evidence demonstrates that an action such as the use of a test for high
stakes purposes is motivated by an intent to discriminate. See Elston v. Talladega County Bd. ofEduc., 997 F.2d 1394,
1406 (II th Cir. 1993). As explained further in this section, the regulations promulgated under the federal civil rights
statutes prohibit the use of neutral criteria having disparate effects unless the criteria are educationally justified. See
Guardians Ass 'n v. Civil Service Comm 'n, 463 U.S. at 598.
154 For example, under the Fourteenth Amendment and Title VI, different treatment based on race is permitted only
when such action is narrowly tailored to further a compelling state interest. See Regents ofthe Ulliv, of Cal. v, Bakke,
438 U.S. 265 (1978); Adarand Constructors, Inc, v. Pella, 515 U.S. 200 (1995).
155 See People Who Care v. Rockfo~d Bd. ofEduc., 851 F. Supp. 905, 958-1001 (N.D. III. 1994), remedial order rev'd,
in part, III F.3d 528 (7th Cir. 1997). On appeal, the Seventn Circuit Court of Appeals stated that the appropriate
remedy in this case was to require the district to use objective, non-racial criteria to assign students to classes, rather
than abolishing the district's tracking system, 111 F.3d at 536.
Draft 7/6/00
48
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
_
Draft
segregation. 156 The school district is under "a 'heavy burden' of showing that actions
that increase[] or continue [] the effects of the dual system serve important and legitimate
ends.,,157
'
B.
Disparate Impact
Discrimination under federal law may also occur where the application of neutral criteria,
, has discriminatory effects and those criteria are not educationally justified. The federal '
nondiscrimination regulations provide th(lt a recipient of federal funds may not "utilize
criteria or ll)ethods of administration which have the effect of subjecting individuals to
discrimination.,,158 It is important to understand that disparities in student perfonnance
based on race, national origin,'sex, or disability, alone, do not constitute disparate impact
discrimination under federal law. Furthennore, nothing in federal law guarantees equal
results. (For a further discussion of issues related to testing of students with disabilities,
see pp. 56 - 60.)
Courts applying the disparate impact test have examined three questions to detennine if
the practices at issue are discriminatory: (1) Does the practice or procedure in question
result in substantial differences in the award of benefits or services based on race, ~
national origin, or sex? (2) Is the practice or procedure educationally justified? and (3) Is
there an equally effective alternative that can accomplish the institution's educational
goal with less disparity? 159
156 See United States v, Fordice, 505 U.S. at 731-732 (finding state's requirement that students have higher ACT scores
for admission to historically white college~ than historically'black colleges to be constitutionally suspect where thc
requirement was enacted for discriminatory purposes, emanated from the prior dejure system that continue to have
segregative effects and was not shown to be justified in educational terms); Debra P. v, Turlington, 644 F.2d at 407
(,,[defendants] failed to denionstrate either that the disproportionate failure [rate] of blacks was not due to the present
effects of past intentional segregation or, that as presently used, the diploma sanction was necessary [in order] to
remedy those effects"); McNeal v, Tate County Sch, Dist., 508 F.2d 1017, 1020-1021 (5th Cir. 1975) (since ability
grouped classroom assignment~ preserved effects of past intentional discrimination, defendants were required to show
educational benefits of assignment practice on remand or propos~ an educationally sound alternative); GI Forum v,
Texas Educ, Agency, No, SA-97-CA-1278-EP, 2000 U.S. Oist. LEXIS 153, slip op. at 56-57 (W,O, Tex. 2000)
(upholding use of graduation test where the test is used to identify educational inequalities and attempt to address
them).
157
Dayton Bd. ofEduc, v. Brinkman, 443 U.S, 526, 538 (1979) (quoting Green v, County ScllOOI Bo~rd, 391 U.S. 430,
439 (1968)),
158 See 34 C.F.R. § 100.3(b)(2)(Title VI); 34 C.F,R. § I 04.4(b)(4)(i)(Section 504); and 28 C.F.R. § 35.130(b)(3)(i)
(Title 11), See also 34 C.F.R, § 106,31 (Title IX). In Guardians, 463 U.S. at 589-590, the U.S. Supreme Court upheld
the usc of the effects test, stating that the Title Vl regulation forbids the use of federal funds, "not only in programs that
intentionally discriminate on racial grounds but also ir those endeavors that have a[n] [unjustified racially
disproportionate] impact on racial minorities."
159 See Georgia State Con!, 775 F.2d at1417, See also Elston, 997 F.2d at 1407 & n, 14; Larry p" 793 F. 2d at 982 &
n. '9; Groves, 776 F. Supp, at 1523-1524, 1529-1532; Sharif, 709 F. Supp, at 361. Many courts use the term "equally
effective" when discussing whether the alternative offered by the party challenging the test is feasible and would
effectivCly meet the institution's goals. See, e,g., Georgia State Con!, 775 F.2d at 1417; Sharif, 709 F. Supp, at 361 '
Other courts use the term "comparably effective" in evaluating proposed alternatives. See, e,g" Sandoval, 7 F. Supp.
2d at 1278; Elston, 997 F.2d at 1407; Fitzpatrick v, City ofAtlanta, 2 F.3d 1112, 1118 (11 th Cir. 1993). Review of the
decisions in these cases indicate that the courts appear to be using the terms synonymously,
Draft 7/6/00
49
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators and Policymakers
Draft
. The party challenging the test has the burden of establishing disparate impact. If
disparate impact is established, the educational institution must provide sufficient
evidence of an educational justification. If an educational justification is established,
then the party challenging the test must demonstrate that an alternative with less disparate'
impact is equally effective in meeting the institution's educational goals or needs in order
160 .
to preVaI'1 .
1.
Determining disproportiQnate impact
The first question in the disparate impact analysis is whether there is information
indicating a significant disparity in the award of benefits or services to students based on
.
..
race, natIonaI ongm, or sex. 161 T0
determine if'a significant disparate impact
exists, courts have focused on evidence of
statistical disparities. 162 Generally, a test
has a disproportionate adverse impact if a
statistical analysis shows a significant
difference from the expected random
distribution. 163 There is no rigid
mathematical threshold regarding the
degree of disproportionality required;
however, courts have used various
statistical methods to identify disparities
that are sufficiently substantial to raise an inference that the challenged practice caused
the disparate results. 164 To establish disparate impact in the context of a selection system,
the comparison must be made between those selected for the educational benefit or
service and 'a relevant pool of applicants or test-takers. 165
160
See Georgia State
COllj,
775 F.2d at 1417. See also the Department of Justice's Title VI Legal Manual at p. 2.
161 For a further discussion of the legal principles regarding students with disabilities'under the IDEA, Section 504 and
Title II of the ADA, see pp. 38-40.
162
See Watson v. Fort Worth Bank & Trust, 487 U.S. 977,994-997 (1988) (O'Connor, J.,plurality opinion).
163
See Watson, 487 U.S. at 995; Groves, 776 F. Supp. at 1526-1528.
164 See Watson, 487 U.S. at 994-995; Gr~ves, i16 F. Supp. at 1526-1527, A variety of methods are commonly used by
courts to distinguish differences between outcomes that are statistically and practically significant from those that are
random. Some have used an 80% rule whereby disparate impact is shown when the rate of selection for the less
successful group is less than 80% of the rate of selection for the most successful group. Another type of statistical
analysis considers the difference between the expected and observed rates in terms of standard deviations, with the
difference generally expected to be more than two or three standard deviations. Another test is known as the "Shoben
formula" in which the difference or Z-value in the groups' success rates must be statistically significant. Groves, 776 F.
Supp. at 1526-1528 (discussing these methods and the cases in which they were used).
165 When determining disparate impact in the context of a selection system, the comparison pool generally consists of
all minimally qualified test-takers or applicants. When tests are used to determine placement or some other type of
educational treatment, the comparison is between those identified by the test for the placement or educational treatment
and the relevant pool orlest takers. The precise composition of the comparison pool is determined on a case-by-case
basis. See Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 650-651 (1989); Watson, 487 U,S, at 995-997; Groves,
776 F. Supp. at 1525-1526,
Draft 7/6/00
50
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators and Policymakers
Draft
In general, a specific policy; practice or procedure must be identified as causing the
disproportionate adverse effect on the basis of race, national origin, or sex. 166 For
example, when a particular use of a test is being challenged, the evidence should show
that the test use, rather than other seleqion factors, accounts for the disparity.167
2.
Determining educational necessity
Where the use of a test results in decisions that have a disparate impact on the basis of
race, national origin, or sex, the test use causing the disparity must significantly serve the
legitimate educational goals of the institution. 168 This inquiry is usually referred to as
determining the "educational necessity" of the test use or determining whether the test is
"educationally justified.,,169 The' test need not be "essential". or "indispensable" to
achieving the institution's educational goal; 170 rather, the educational institution must
show a manifest relationship between use of the test and the institution's educational
purposes. 171
In evaluating educational necessity, both the legitimacy of the educational goal asserted
by the institution and the use of the test as a valid means to advance this goal may be at
issue. Courts generally allow educational institutions to define their own educational
goals and focus on whether the challenged test serves the institution's articulated
172
· .
ob~ectIves.
166 Elements of a decision-making process that cannot be separated for purposes of analysis may be analyzed as one
selection practice. See Title VII of the Civil Rights Act of 1964,42 U.S.c. § 2000e-2[k][ 1][B][i]. This is necessary
because limiting the disparate impact analysis to a discrete component ofa selection process would not allow for situations
"where the adverse impact is caused by the interaction of two or more components of the process." See GrajJam v. Scott
Paper Co., 870 F. Supp. 389, 395 (D. Me. 1994), ajJ'd, 60 F.3d 809 (1995).
167 As noted in Watson, 487 U.S. at 994, courts have found it "relatively easy," when appropriate statistical proof is
presented, to identify a standardized test as causing the racial, national origin, or sex related disparity at issue. See also
GI Forum v. Texas Educ. Agency, No. SA-97-CA-1278-EP, 2000 U.S. Oist. LEXIS 153, slip op. at 35-40 (W.O. Tex.
2000) (given legally meaningful differences in the pass rates of minority and majority students, plaintiffs made a prima
facie showing of disparate impact resulting from a minimum competency.test).
16,8
See Wards Cove, 490 U.S. at 659.
169
See Board of Educ. v. Harris, 444 U.S. 130, 151 (I 979); Elston, 997 F.2d at 1412.
170
See Wards Cove, 490 U.S. at 659; Elston, 997 F.2d at 1412 (citing Georgia State Can!, 775 F.2d at 1417-1418).
171 See Georgia State Call!, 775 F.2d at 1418 (showing required that "achievement grouping practices bear a'manifest
demonstrable relationship to classroom education"); Sharif, 709 F. Supp. at 362 (defendants must show a manifest
relationship between use of the SAT and recognition of academic achievement in high school). As explai'ned in Elston,
997 F.2d at 1412, "from consulting the way in which ... [courts] analyze the 'educational necessity' issue, it becomes
clear that... [they] are essentially requiring ... [the educational institution to] show that the challenged course of action
is demonstrably necessary to meeting an important educational goal." In other words, the institution can defend the
challenged practice on the grounds that it is "supported by a 'substantial legitimate justification. '" See Elston, 997 F .2d
at 1412 (quoting Georgia State Can!, 775 F.2d at 1417); see also Georgia State Can!, 775 F.2d at 1417-1418; Groves,
776 F. Supp. at 1529-1532.
172 See, e.g., Debra P., 644 F.2d at 402 (indicating that the court is not in a position to determine education policy and;
state's efforts to establish minimum standards and improve educational quality are praiseworthy).
Draft 7/6/00
51
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators and Policymakers
Draft
In conducting this analysis, courts have generally considered relevant evidence of
validity, reliability, and fairness 173 provided by the test developer and test user to
determine the acceptability of the test for the purpose used, giving appropriate deference
to the expertise and experience of educators and testing professionals. 174 The educational
justification inquiry thus generally looks at technical questions regarding the test's
accuracy in relation to the nature and importance of the educational institution's goals,
the educational consequences to students, and the relationship of the educational
173 In general, courts have said that validity refers to the accuracy of conclusions drawn from test results. See Allen v.
Alabama State Bd. ofEduc., 976 F. Supp. 1410, 1420-1421 (M.D. Ala. 1997) ("Generally, validity is defined as the
degree to which a certain inference from a test is appropriate and meaningful", quoting Richardson v. Lamar County
Bd. ofEduc., 729 F. Supp. 809, 820 (M.D. Ala. 1989), ajJ'd, 164 F.3d 1347 (1999), injunction granted, 2000 U.S. Dis!.
LEXIS 123 (2000).) See also Richardson, 729 F. Supp. at 820-821 ("[A] test will be valid so long as it is built to yield
its intended inference and the design and execution of the test are within the bounds of professional standards accepted
by the testing industry."); Anderson, 520 F. Supp. at 489 ("Validity in the testing field indicates whether a test measures
what it is supposed to measure.").
174 See, e.g., United States v. LULAC, 793 F.2d 636, 640, 649 (5th Cir. 1986) (pointing to substantial expert evidence in
the record; including validity studies, indicating that the tests involved were valid measures of the basic skills that
teachers,should have). The sponsors of the newly revised Joint Standards advise that the Joint Standards are intended
to provide guidance to testing professionals in making such judgments. See Joint Standards, Introduction, p. 4. The
Joint Standards are discussed more fully in Chapter One of this guide.
Where the evidence indicates that the educational institution is using a test in a manner that does not lead to valid
inferences, educational justification may be found lacking. See United States v. Fordice, 505 U.S. at 736-737 (ruling
that Mississippi's exclusive use of ACT scores in making college admissions decisions was not educationally justified,
since, among other factors, the ACT's administering organization discouraged this practice); Groves, 776 F. Supp. at
1530 (requiring minimum ACT score for admission to undergraduate teacher education programs violated the Title VI
regulations since ACT scores had not been validated for this purpose); Sharif, 709 F. Supp. at 361-363 (in ruling on a
motion for preliminary injunction, court found that the state's use of SAT scores as the sole basis for decisions
awarding college scholarships intended to reward high school achievement was not educationally justified for this
purpose in that the SAT had been designed as an aptitude test to predict college success and was not designed or
validated to measure past high school achievement).
Psychometric or scientific evidence is not the only way that validity can be demonstrated, however. Courts can draw
inferences of validity from a wide range of data points. See Watson v. Fort Worth Bank & Tn/st, 487 U.S. 977, 998
(1988) (referring to procedures used to evaluate personal qua'lities of candidates for managerial jobs).
Draft 7/6/00
52
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
175
institution to the student.
Where a test is used for promotion or graduation purposes~
courts may also consider whether the skills tested have been taught in the program. 176
3.
Detennining whether there are equally effective alternatives that
serve the institution's educational goal with less disparity
If the educational institution provides sufficient evidence that the test use in question is
justified educationally, the party challenging the test has the opportunity to show that there
exists an e~ually effective altemative practice that meets the institution's goals with less
disparity.17 The feasibility of an alternative, including costs and administrative burdens, is
.'
a relevant cons}'deratlOn. 178
.
II. Testing Of Students With Limited English Proficiency
Testing of students with limited English proficiency in the elementary and secondary
education context raises a set of unique issues. To understand the obligations of states
and school districts with regard to high-stakes testing of such students, it is important to
understand the basic 'obligations of school districts and states under Title VI and related
federal law that relate to language minority students who are learning English.
175 See, e.g., Georgia State Con!, 775 F.2d at 1417-1420; Groves, 776 F. Supp. at 1530-1531,' Larry P .. 793 F.2d at
980. In the educational context, tests playa complex role that bears on evaluation of educational justification. As noted
by the court in Larry P.,
.
[I]f tests can predict that a person is going to be a poor employee, the employer can legitimately deny that
person a job, but if tests suggest that a young chi Id is probably going to be a poor student, the school cannot
on that basis alone deny that child the opportunity to improve and develop the academic skills necessary to
success in our society.
793 F.2d at 980 (quoting Larry P., 495 F. Supp. at 969). Because determining whether a test is a valid basis for
classifying students and placing them in different educational programs may be even more complex and difficult than, '
determining if a test validly predicts job performance, particular sensitivity is needed ,to all of the interests involved.
The, question may be not only whether a test provides valid information about a student's ability and achievement, but
whether the educational services provided to the student as a consequence of the test serve the student's needs.
Inequality in the services provided to students prior to the test, as well as in the services provided as a consequence of
the test, may also be a factor considered as part of the educational justification for using a test in a particular WilY. See
Debra P., 644 F,2d at 407-408 (agreeing with the statement that Title VI would not be violated if the test were a fair
test of what students were taught); Debra P., 730 F.2d 1405, .1407, 1410-141 I; 1416 (1984)(affirming thatthe extent
of remedial efforts to address test failure is relevant to evaluation of test use).
176
See Debra P., 644 f.2d at 408.
177 See New York Urban League v. New York, 71 F.3d J031, 1036 (2d 'Cir:1995) (stating H • the plaintiff may still
prove his case by demonstrating that other less discriminatory means would serve the same objective"). See also
Albemarle Paper Co. v. Moody, 422 U.S. 405,425 (J 975); Richardson v. Lamar County Bd. ofEduc., 729 F. Supp. at
815.
. .
178 See Wards Cove, 490 U.S. at 661 (indicating that factors such as costs or other burdens are relevant in determining
whether the alternative is equally effective in serving employer's legitimate goals); Sharif. 709 F. Supp. at 363-364
(finding defendant's claim that proposed alternative was not feasible and excessively burdensome not persuasive since
most other states used proposed alternative); MacPherson v. University of Montevallo, 922 F.2d 766, 773 (11th Cir.
1991 )(holding that plaintiff must show that the alternative is economically feasible).
Draft 7/6/00
53
�The Use of Tests When Making High
Stakes Decisions for Students: A Resollrce7/6100
_
Draft
GlIide For Edllcators a"d Policymakers
Title VI prohibits discrimination based on race, color, or national origin. On May 25,
1970, the United States Department of Health, Education, and Welfare's Office for Civil
Rights issued a policy memorandum entitled "Identification of Discrimination and Denial
of Services on the Basis of National Origin." The May 25 th memorandum clarified the
responsibility of school districts, under Title VI, to provide equal educational opportunity
to national origin minqrity group students whose inability to speak and understand the
English language excludes them from effective participation in the education program
offered by the school district. 179 This memorandum was cited with approval by the
Supreme Court in its decision in Lau v. Nichols, which held that the district's policy of
teaching national origin minority group children only in English, without any special
assistance, deprived them of the opportunity to benefit from the district's education
program, including meeting the English language proficiency standards required by the
state for a high school diploma. 180 The Lau case held that such policies are barred when
they have the effect of denying such benefits, even though no purposeful design is
present. 181
Sub~equent1y, Castaneda v. Pickard,182 relying on the language of the Equal Educational
Opportunities Act (EEOA), explained the steps school districts must take to help students
with limited English proficiency overcome language barriers to ensure that they can
participate meaningfully in the district's educational programs. 183 The court stated that
school districts have an obligation to provide services that enable students to acquire
English language proficiency. A school system that chooses to temporarily emphasize
English over other subjects retains anobligation to provide assistance necessary to
remedy academic deficits that may have occurred in other subjects while the student was
focusing on learning English.
Under the Castaneda standards, school districts have broad discretion in choosing a
program of instruction for limited English proficient students. However, the program
must be based on sound educational theory, must be adequately supported so that the
program has a realistic chance of success, and must be periodically evaluated and revised,
if necessary, to achieve its goals.
The disparate impact framework discussed above may also be used to examine whether
tests used for high-stakes purposes result in a discriminatory impact upon students with
limited English proficiency. As part ofthis analysis, question~ may arise regarding the
179 See Identification ofDiscrimination and Dellial ofServices all the Basis ofNational Origin, 35 Fed. Reg. 11595
(1970). The Department of Health, Education and Welfare was the predecessor of the U.S, Department of Education.
180
181
See Lau, 414 U, S. at 566-568. '
Id. at 568, citing, among other legal authority, the predecessor of 34 C.F.R. § 100.3 (b)(2).
IR2 See Castanada, 648 F, 2d at 1005-1006,1009-1012. The analytical framework in Castaneda which was decided
under the Equal Educational Opportunities Act (EEOA), 20 U.S.C. §§ 170 I et seq., has been applied to OCR's Title VI
analysis. See Williams MemorandulIl, supra note 39. The EEOA contains standards related to limited English proficient
students similar to the Title VI regulations.
183
See Castaneda, 648 F.2d at 1011.
Draft 7/6/00
54
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld Policymakers
Draft
validity andreliabiiity of the test for these students. 184 Depending upon' the purpose of
the test and the characteristics of the populations being tested, in some situations,
accommodations or other forms of assessment of t~e same construct may be necessary .
. In short, the obligation is to ensure that the same constructs are being measured for all
students.
There are three particularly important areas involving high-stakes testing of students with
limited E~glish proficiency: (1) tests used to determine a student's proficiency in the
areas of speaking, listening, reading, or writing English for the purpose of determining.
whether the student should be provided with a program to enable the student to acquire
English language skills (and, later, for the purpose of determining whether the student is
ready to exit the program); (2) tests used to determine if the student meets the criteria for
other specialized instructional programs, such as gifted and talented or vocational
education programs; and (3) system-wide tests administered to determine if students have
met performance standards.
Tests used to determine a student's initial and continuing need for special language
programs should be appropriate in light of the district's own performance expectations
and otherwise valid and reliable fo~ the purpose used. Tests used by schools to help
select students for specialized instructional programs, including programs for gifted and
talented students, should not screen out limited English proficient students unless the
program itself requires proficiency in English for meaningful participation. 185 When
state or school district adopts content and performance standards, and uses high-stakes
tests to measure whether students have mastered these standards, a critical factor is ,
whether the overall educational program provided to students with limited English
proficiency is reasonably calculated to enable the students to master the knowledge and
skills that all students are expected to master. When education agencies institute
standards based testing, it is important for them to examine their programs for students
with limited English proficiency to determine when and how these students will be
provided with the instruction needed to prepare them to pass the test in question.
a
In addition, students with limite~ English proficiency may not be categorically excluded
from standardized testing designed to increase accountability of educational programs for
effective instruction and student performance. If these students are not included, the test
data will not fairly reflect the performance of all students for whom the education agency
is responsible. I 86 Such test data can also help a district to assess the' effectiveness of its
I
content and English language acquisition programs.
184 See pages 38-42 for a discussion of the psychometric principles involved in determining the reliability and validity
oftests.used with limited English proficient students.
185
See Williams Memorandum, supra, note 39.
186 Indeed, Title I of the Elementary and Secondary.Education Act explicitly requires States to include limited English
proficient students in the statewide assessments used to hold schools and school districts accountable for student
performance. Title I of the Elementary and Secondary Education Act, 20 U.S.c. § 6311 (b)(3)(F)(iii). If a school district
uses the results of a test given for program accountability purposes to make educational decisions about individual
students, the high-stakes use of the test must also be valid and reliable for this purpose.
Draft 7/6/00
55
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Guide For Educators alld Policymakers
Draft
"
For in'forrriation on the factors that help ensure accuracy of tests for limited English
proficient students, see pages 38 - 40 above. In making decisions about testing limited
English proficient students, factors such as the student's level of English proficiency, the
primary language of instruction, the level of literacy in the native language, and the
number of years of instruction in English may all be pertinent. 187 When students
participate in assessments designed to meet the requirements of Title I of the Elementary
and Secondary Education Act, as amended, those assessments must be implemented in a
manner that is consistent with both the requirements of Title VI and Title 1.
III.
Testing Of Students With Disabilities
Three federal statutes provide basic protections for students with disabilities. Section 504
ofthe Rehabilitation Act of 1973 (Section 504) and Ti~le II of the Americans with
Disabilities Act of 1990 (Title II) prohibit discrimination against persons with disabilities
by public schools. ISS The Individuals.with Disabilities Education Act (IDEA) establishes
rights and protections for students with disabilities and their families. It also provides
federal funds to state education agencies and school districts to assist in educating
students with disabilities. 189 Under Section 504, Title II, and the IDEA,190 school
districts have a responsibility to provide students with disabilities with a free appropriate
public education. Providing effective instruction in the general curriculum for students
with disabilities is an important aspect of providing a free appropriate public education.
The regulations implementing Section 504 and Title II specifically provide that a
recipient of federal funds may not "utilize criteria or methods of administration which
have the effect of subjecting individuals to discrimination.,,19J Under Section 504, Title
II, and the IDEA, tests given to students with disabilities must be selected and
administered so that the test accurately reflects what the student knows or is able to do,
rather than the student's disability (except when the test is designed to measure disability
related skills). This means that students with disabilities must be given appropriate
accommodations and modifications in the administration ofthe tests. Examples include
187 For more information on appropriate ways of testing students who are learning English, see Ensuring Accuracy in
Testing for English Language Learners, (CCSSO, 2000).
188 Although this part of the chapter deals only with students with disabilities attending public elementary and
secondary schools, private schools that are not religious schools operated by religious organizations are covered by
Title III of the ADA. Title II of the Americans with Disabilities Act of 1990,42 U.S.c. §§ 12181 et seq. In addition,
Title I of the Elementary and Secondary Education Act of 1965, as amended, contains important provisions regarding
students with disabilities in the Title I program and their participation in assessments of Title I programs. 20 U,S,C, §
63 II (b )(3 )(F),
189
The Individuals with Disabilities Education Act, 20 USc. § 1400(d)( I )(c).
190 The Section 504 regulation is found at 34 C.F.R, Part 104 (1999). The Title II regulation is found at 28 C.F,R, Part
35. The IDEA regulation is found at 34 C.F,R. Part 300,
191 See 34 C.F.R. § I 00.3(b)(2) and similar provisions under Title IX, Section 504, and the ADA. In Guardians, 463
U.S. at 589, the United States Supreme Court upheld the use ofthc effects test, stating that the Title VI regulation
forbids the use of federal funds, "not only in programs that intentionally discriminate on racial grounds but also in those
endeavors that have a [racially disproportionate] impact on racial minorities,"
Draft 7/6/00
56
�The Use of Tests When Making High
Stakes Decisions for Students: A Resollrce7/6/00
Draft
Guide For Etillcators and PolicYl1lakers
oral testing, large print tests, Braille versions of tests, individual testing, and separate
group testing.
Generally, there are three critical areas in which high-stakes testing issues arise for
students with disabilities: (1) tests used to detennine whether a student has a disability
and,if so, the nature ofthe disability; (2) tests used to detennine ifthe student meets the
criteria for other specialized instructional programs, such as gifted and talented or
vocational education programs; and (3) system-wide tests administered to detennine if
students have met perfonnance standards. 192
Under Section 504, Title II, and the IDEA, before a student can be classified as having a
disability, the responsible education agency must individually evaluate the student in
accordance with specific statutory and regulatory requirements, including .requirements
regarding the validity oftests and the provision of appropriate accommodations. 193 These
requirements prohibit the use of a single test score as the sole criterion for detennining
whether a student has a disability and for detennining an appropriate educational
placement for the student. 194
When tests are used for other purposes, such as in making decisions about placement in
gifted and talented programs, it is important that tests measure the skills and abilities
needed in' the program, rather than the disability, unless the test purports to measure skills
or functions which are impaired by the disability and such functions are necessary for
participation in the program. 195 For this reason, appropriate accommodations may need
to be provided to students with disabilities in order to measure accurately their
perfonnance in the skins and abilities required in the program.
Furthennore, federal law requires the inclusion of students with disabilities in state- and
district-wide assessment programs, including high-stakes tests, except as participation in
such tests is individually detennined to be inappropriate for a particular stl,ldent. Such
assessments provide valuable infonnation which benefits students, either directly, such as
in the measurement of individual progress against standards, or indirectly, such as in
evaluating programs. Given these benefits, exclusion from assessment programs based
on disability generally would violate Section 504 and Title II. If a student with a
disability will take the system-wide assessment test, including a high-stakes test, the
student must be provided appropriate instruction and appropriate test accommodations. 196
192
Tests used for college admission are discussed on pp. 4-5.
193
See 34 C.F.R. § I04J5(b) for specific provisions covering the use of tests for evaluation purposes.
194
See 34 C.F.R. § 104J5(c), requiring placement decisions to consider information from a variety of sources.
195
See 34 C.F.R. § 104J5(b)(3) and 34 C.F.R. § 300.532.
196 See Brookhart, 697 F.2d at 183-184. Some courts have held that a student with a disabi Iity may be denied a diploma
if, despite receiving appropriate services and testing accommodations, the student, because of the disability, is unable to
pass the required test or meet other graduation requirements. Jd. at 183; Anderson, 520 F. Supp. at 509-511; Board of
Educ. v, Ambach, 458 N'y,S,2d 680, 684-685, 689 (N.Y. App. Div. 1982), affd, 469 N.Y.S.2d 669 (1983),
Draft 7/6/00
57
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource7/6/00
Draft
Guide For Educators alld Policymakers
In addition, the Individuals with Disabilities Education Amendments of 1997 specifically
require states, as a condition of receiving IDEA funds, to include students with
disabilities in the regular state- and district-~ide assessment programs, witl).,appropriate
accommodations, where necessary. 197 The IDEA requiremen~s cover tests with high
stakes consequences given to measure individual achievement as well as tests given for
program accountability purposes. The IDEA also requires state or local educational
agencies to develop guidelines for the relatively small number of students with
disabilities who cannot take partin ,state- and district-wide tests to participate in alternate
assessments. 198
'
For children with disabilities, school personnel knowledgeable about the student, the
nature of the disability, and the testing program, in conjunction with the student's parent
or guardian, determine whether the student will participate in all or part'ofthe state- or
district wide assessment of student achievement. 199 The decision must be documented in
the student's individualized education program (IEP), or a similar record such as a
Section 504 plan. These records must also state any individual accommodations in the
administration of the state- or district-wide assessments of student achievement that are
needed to enable the student to participate in such assessment. An IEP, developed under
the IDEA, must also explain how the student will be assessed if it is inappropriate for the
student to participate in the testing program even with accommodations?OO
Section 504 and Title II also prohibit discrimination in virtually all public and private
post-secondary institutions. The regulatory requirements related to disability
discrimination are different in post-secondary education than in elementary and
secondary education. Post-secondary institutions are not required to evaluate students or
to provide them with a free appropriate education.
High-stakes testing issues at the post-secondary level generally relate to tests used in
admissions~ including tests giveri by an educational institution or other covered entities as
prerequisites for entering a career or career path, and tests of academic competency
required by the institution to complete a program. This guide is not intended to offer a
complete or detailed explanation of each of these testing situations, but only brief
synopsis. 201
..
a
I~
See 34 C.F.R. § 300.138(a).
.
.
198 See 34 C.F.R. § 300: 138(b) . .The IDEA Final Regulations, Attachment l--Analysis of Comments and Changes, 64
Fed. Reg. 12406, 12564 (1999) projects that there will be a relatively small number of students who will not be able to
participate in the district or state assessment program with accommodations and modifications, and will therefore need
to be assessed through altemate means. These alternate assessments must be developed and conducted beginning not
later than July 1,2000.
199 See 34 C.F.R. § 300.347(a)(5) for the IEP requirements applicable to assessment of students with disabilities under
IDEA and 34 C.F.R. § 104.33 for the more general evaluation requirements under Section 504.
200
See 34 C.F.R. § 300.347(a)(5).
Test providers that are not higher education institutions may be covered by Section 504 if they receive federal funds;
by Title II if they are parts of governmental units; or by Title III if they are private entities. Each of these laws has its
201
Draft 7/6/00
58
�The Use of Tests When Making High
Stakes Decisions for Students: A Resource 7/6/00
Guide For Educators alld PolicYlIlakers
Draft
The Section 504 regulation specifically provides that higher education institutions'
admissions procedures may not make use of any test or criterion for admission that has a
disproportionate, adverse impact on individuals with disabilities unless (1) the test or
criterion, as used by the institution,. has been validated as a predictor of success in the
. education program or activity and (2) alternative tests. or criteria that have a 1ess
disproportionate, adverse impact are not shown to be available. 202 In administering tests,
appropriate accommodations must be provided so that the person can demonstrate his or
her aptitude and achievement, not the effect of the disability (except where the functions
impaired by the disability are the factors the test purports to measure). 203
For other high-stakes tests that an institution might administer, such as rising junior 'tests,
similar requirements apply.204 The institution must provide adjustments or
accommodations and auxiliary aids and services that enable the student to demonstrate
the know ledge and skills being tested. 205
.
Students are required to notify the educational institution when accommodations are
needed and supply adequate documentation of a current disability and the need for
accommodation. The student's preferred accommodation does not have to be provided as
long as an effective accommodation is provided.
Test accommodations are intended to provide the person with disabilities the means by
which to demonstrate the skills and knowledge being tested. 'A1though Section 504 and
. Title II require a college or university to make reasonable modifications, neither Section
504 nor Title II requires a college or university to change, lower, waive, or eliminate
academic requirements or technical standards, including admissions requirements, that
can be demonstrated by the college or university to be essential to its program of
instruction or to any directly related licensing requirement. 206 Accommodations
requested by students need not be provided if they would result in a fundamental
alteration to the institution's program. 207
".
own requirements. For more information regarding testing under Title III of the ADA, consult the U.S. Department of
Justice.
.
202
34 C.F.R. § 104,42(b)(2). Appendix A to the Section 504 regulation, Subpart E-Post-secondary Education, No. 29,
notes that the party challenging the test would have the burden of showing that alternate tests with less disparate impact
are a v a i l a b l e . '
..
See 34 C.P.R. § 104,42(b)(2). Appendix A to the Section 504 regulation, Subpart E-Post-secondary Education, No.
29, notes that the party challenging the test would have the burden of showing that alternate tests with less disparate
impact are available.
203
204 Some undergraduate college progr'ams require students to pass a rising junior examinatipn to determine whether
students have met the college's standards in writing or other academic skills as a prerequisite for advancement to junior
year status.
205
See 34 C.F.R. § 104,44(a) & (d).
206
See 34 C.F.R. § 104.44 (a).
See Southeastern Community College v. Davis, 442 U.S. 397, 413 (1979); W),l1l1e v. Tufts Univ. Sell. ofMed., 976
P.2d 791, 794-796 (I st CiT. 1992), cert. denied, 507 U.S. 1030 (1993).
207
Draft 7/6/00
59
�IV.
Constitutional Protections
In addition to applying federal nondiscrimination statutes, courts have also considered
constitution'al issues that may arise when public school districts or state education
agencies require students to pass certain tests that are intended to certify that students
have attained a level of competency in skills or knowledge taught in the program?08
Constitutional challenges to testing progrlnns under the Fourteenth Amendment have
raised both equal protection and due process claims. The equal protection principles
involved in,discrimination·cases are, generally speaking, the same as the standards
applied to intentional discrimination claims under the applicable federal
· . .,
209
non d Iscnmmatlon s t a t u t e s . '
.
The due process clause of the Fourteenth Amendment is particularly associated with
cases challenging the adequacy of the notice provided to students prior to this type oftest
and the students' opportunity to learn the required content. 210 In analyzing suc,h due
process claims, courts have generally considered three issues:
208 The U.S. Department of Education, Office for Civil Rights, does not have jurisdiction to resolve constitutional
cases. However, some cases involve constitutional issues that overlap with discrimination issues arising under federal
civil rights law:s.
209 Federal cases may involve equal protection challenges to a jurisdiction's use oftests in which the claim is not based
on intentional race or sex discrimination, but, instead, on the alleged impropriety ofthe jurisdiction's use oftests to
separate out those students who should not be allowed to graduate. As a general matter, courts express reluctance to
second guess a state's educational policy choices when faced with such challenges, although recognize that a state
cannot "exercise that [plenary] power without reason and without regard to the United States Constitution." Debra P.,
644 F.2d at 403. When there is no claim of discrimination based on membership in a suspect class, the equal protection
claim is revie"Yed under the rational basis standard. In these cases, the jurisdiction need show only that the use of the
tests has a rational relationship to a valid state interest. ld. at 406. See also Erik v., 977 F. Supp. at 389.
210 A review of relevant cases reveals the highly fact and context-specific nature of the conclusions reached by federal
courts considering alleged violations of the due process clause. In Debra P., 644 F.2d at 404, the Fifth Circuit held that
students' due process rights were violated when a newly imposed minimum competency test required for high school
graduation was instituted without adequate notice and an opportunity for students to learn the material covered by the
test. Three years later, in Debra P. v. Turlillgtoll, 730 F.2d at 1416-1417, the court held that students who now had six
years notice ofthe exam were afforded the opportunity to learn the relevant material, given the state's remedial
programs. For'additional courts identifying due process violations in the way in which a compe~ency test was instituted,
see Brookhart; 697 F.2d at 186- J 87 (holding that district-required minimum competency test for graduation denied due
process to students with disabilities where notice was inadequate and students had not been exposed to 90% ofthe
material covered by the test); Crump v. Gilmer Indep. Sch. Dist., 797 F. Supp. 552, 556-557 (E.D: Tex. 1992) (granting
temporary restraining order where district had not demonstrated validity of graduation examination in light of actual'
instructional d:mtent); Anderson, 520 F. Supp. at 508-509 (finding that school district failed to show that minimum
competency test required for high school graduation covered material actually taught at school). Other cases have
concluded that adequate notice was provided, t
Draft 7/6/00
60
�
Dublin Core
The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.
Title
A name given to the resource
Kendra Brooks - Subject Series
Creator
An entity primarily responsible for making the resource
Domestic Policy Council
Kendra Brooks
Is Part Of
A related resource in which the described resource is physically or logically included.
<a href="http://clinton.presidentiallibraries.us/items/show/36031" target="_blank">Collection Finding Aid</a>
<a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a>
Description
An account of the resource
The Kendra Brooks Subject Files contain correspondence, reports, articles, memos, and various printed material. Other documents include background information for education events and meetings. The files include material pertaining to charter schools, national testing, SAT preparation, school safety, school modernization/construction, affirmative action, Blue Ribbon Schools, class–size reduction, teacher quality, Limited English Proficiency (LEP), the White House Initiative on Education Excellence for Hispanic Americans, Tribal Colleges and Universities, Historically Black Colleges and Universities (HBCU’s), the Individuals with Disabilities Education Act (IDEA), and Title 1 of the Elementary and Secondary Education Act of 1965.
Provenance
A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource.
Clinton Presidential Records: White House Staff and Office Files
Publisher
An entity responsible for making the resource available
William J. Clinton Presidential Library & Museum
Extent
The size or duration of the resource.
157 folders in 16 boxes
Text
A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text.
Original Format
The type of object, such as painting, sculpture, paper, photo, and additional data
Paper
Dublin Core
The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.
Title
A name given to the resource
[Education - High Stakes Testing] [2]
Creator
An entity primarily responsible for making the resource
Domestic Policy Council
Kendra Brooks
Subject Files
Is Part Of
A related resource in which the described resource is physically or logically included.
Box 4
<a href="http://clintonlibrary.gov/assets/Documents/Finding-Aids/Systematic/KendraBrookssubjectfile.pdf" target="_blank">Collection Finding Aid</a>
<a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a>
Provenance
A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource.
Clinton Presidential Records: White House Staff and Office Files
Format
The file format, physical medium, or dimensions of the resource
Adobe Acrobat Document
Publisher
An entity responsible for making the resource available
Clinton Presidential Library & Museum
Medium
The material or physical carrier of the resource.
Reproduction-Reference
Date Created
Date of creation of the resource.
1/17/2012
Source
A related resource from which the described resource is derived
647992-education-high-stakes-testing-2.pdf
647992