https://clinton.presidentiallibraries.us/files/original/e4f20ea9633090b804a50b1b69a92e7e.pdf 68fbe5437fadd54429d3e6d159ce83d4 PDF Text Text July 5, 2000 Dear Colleague: Thank you for your comments and continued interest in the draft document entitled "The Use of Tests When Making High-Stakes Decisions for Students: A Resource Guide for Educators and Policymakers" (the "testing guide"), which is being prepared by the U.S. Department of Education's Office for Civil Rights (OCR). I am pleased to provide you with the enclosed, revised draft for your further consideration. As you know, the Department strongly supports efforts to promote high-standards for all students and to increase accountability in education. These efforts, including the use of tests, must be done in a manner consistent with federal civil rights laws, which reinforce sound educational practices. The purpose of the enclosed draft testing guide is to provide educators and policymakers with a practical tool that summarizes key test-measurement and legal principles that should inform the use of tests when making high-stakes decisions for students .. This testing guide is being developed in close consultation with the education community to ensure its accuracy and usefulness. The first draft of the testing guide was released in April 1999 and was the subject of substantial comments leading to extensive revisions. The second draft was released in December 1999 and once again received substantial comments. The draft also has been independently reviewed by the National Academy of Sciences' Board on Testing and Assessment (BOTA), which held a hearing earlier this year to discuss the draft testing guide and issued a letter report in June 2000 commenting on the draft. The enclosed draft testing guide is the third draft released for public comment, this time with notice of availability published in the Federal Register. This draft seeks to respond to comments received since the release of the second draft, including comments from external stakeholders as well as BOTA. The comments OCR received on the second draft were largely positive, and the foundations and structure of the draft testing guide have remained essentially the same. However, numerous edits have been made throughout the guide in response to comments seeking to clarify, make more accurate, and/or expand key. sections. For example: . • Substantial effort has been made to more fully and accurately reflect the 1999 Standards on Educational and Psychological Testing (the "Joint Standards"), which were released during completion of the second draft of the testing guide. The Joint Standards are widely viewed as the primary technical authority on educational test-measurement issues, and they form the foundation of the test-measurement discussion in the draft testing guide. • Key legal principles have been clarified and/or expanded, including principles related to "educational necessity" and to the use of tests for students with limited English proficiency and students with disabilities. • Language has been added or deleted as appropriate to clarify that the primary focus of the guide is on the use of tests when making high-stakes decisions for students at the elementary and secondary level, though the guide makes clear that the general principles are applicable in the higher education context as well. �Dear Colleague July 5;2000 Page 2 We would greatly appreciate your comments on the enclosed third draft of the testing guide. The comment period is 30 days, meaning all comments must be received by August 7, 2000. Comments should be sent to:. , Jeanette Lim U.S. Department of Education 400 Maryland Avenue, SW Room 5212 Switzer Building . Washington, D.C. 20202-1100 Based on the comments we receive on the third draft, OCR will make appropriate edits and produce a final version of the testing guide. We expect to publish a final version in August-September 2000. Thank you again for your assistance in this important effort. Additional copies of the draft testing guide are available on OCR's websiteat http://www.ed.gov/offices/OCRltesting or by calling (800) 421-3481. If you have any immediate questions or concerns, please feel free to contact me or David Berkowitz at (202) 205-5526. Sincerely, Scott R. Palmer Deputy Assistant Secretary . for Civil Rights Enclosure �FOR INTERNAL USE ONLY July 5, 2000 U.S. ,DEPARTMENT OF EDUCATION, OFFICE FOR CIVIL RIGHTS The Use of Tests 'When Making High-Stakes Decisions for Students: A Resource Guide for Educators and Policymakers TALKING POINTS Introduction: OCR Has Prepared a Third Draft of Its Testing Guide for Public Comment • The U.S. Department of Education's Office for Civil Rights (OCR) is in the process of producing a guide concerning the proper use of tests when making high-stakes decisions for students. "High stakes" decisions are those that have significant consequences for students, such as placement, promotion, and graduation decisions. The draft guide provides an overview of existing test measurement principles and well-established federal non-discrimination laws related to the use of tests for such high-stakes purposes. The guide covers not only the laws that OCR enforces but also key constitutional and test-measurement issues to provide a more complete picture for educators and policymakers. • The guide is being developed in close and extensive consultation with the education community, including several rounds of comments and meetings with educators, parents, teachers, business leaders, policymakers, test publishers, and others. A first draft was released in April 1999 and was the subject of substantial comments leading to extensive revisions. A second draft was released in December 1999. It was well-received and was again the subject of substantial comments. In addition, the draft was reviewed by the U.S. Department of Justice, Civil Rights Division, and by the National Research Council's Board on Testing and Assessment (BOTA), which held a hearing on the draft testing guide earlier this year and issued a letter report in June 2000. (BOTA's letter report can be found on the internet at: http://books.nap.edulbooksIN1000224Ihtmlll.htmL) • The third draft of the testing guide seeks to respond to the comments received since release of the second draft. On July 6, OCR will publish in the Federal Register a notice of availability and opportunity for comments on the third draft. (OCR also sent cop~es to parties who commented on previous drafts.) There is a 30 day comment period on the third draft, ending August 7, 2000. The draft guide will be available on OCR's Internet web site, at http://www.ed.gov/offices/OCRltesting. It is also available upon request by contacting OCR at (800) 421-3481. • During the weeks of July 10 and/or July 17, 2000, OCR will offer short briefings with elementary and secondary, post-secondary, and civil rights groups, as well as Congressional staff, to introduce the third draft of the testing guide. • After reviewing comments received on the third draft testing guide and making appropriate changes, OCR will issue the guide in final form and provide a notice of availability in the Federal Register. A final version of the testing guide is expected In August-September 2000. Background: Purpose of the Guide • The purpose of the guide is to provide a practical resource for educators and policymakers to ensure that tests used for high-stakes decisions are developed and implemented in ways that are educationally sound and legally appropriate, and thereby promote the complementary goals of excellence and equity in education. • Throughout the 1990s, our nation has embraced the goal of promoting high standards for all students. Federal non-discrimination laws fully support this goal. 1 �.. . FOR INTERNAL USE ONLY July 5, 2000 • Tests, meaning various kinds of educa'tional assessments, can play an important role in promoting high standards for all students. Many states and other. actors, therefore, are increasing their use of tests, including tests for high-stakes purposes. Federal non-discrimination laws support this use of tests when done in valid, educationally appropriate ways. • In short, there is substantial alignment between sound educational principles and federal non discrimination laws when it comes to the use of tests for high-stakes purposes. • However, there is currently a lack of guidance concerning established test-measurement principles and legal standards that should inform the use of tests for high-stakes decisions. This guide is intended to provide that vital information, including a summary of key test-measurement and legal principles, and lists of additional resources for educators and policymakers. The Third Draft: Significant Revisions Made Since The Release of the December 1999 Draft • The comments received on the second draft of the testing guide were largely positive, and the foundations and structure of the draft guide have remained essentially the same. However, numerous changes have been made throughout the guide in response to comments seeking to clarify, make more accurate, and/or expand key sections. For example: ~ Substantial effort has been made to more fully and accurately reflect the 1999 Standards on Educational'and Psychological Testing (the "Joint Standards"), which were released during completion of the second draft of the testing guide. The Joint Standards are widely viewed as the primary technical authority on educational test-measurement issues, and they form the foundation of the test-measurement discussion in the draft testing guide. To ensure accuracy, all standards from the Joint Standards document cited in the guide have also been quoted in full, either in the text or footnotes. Also, readers are repeatedly encouraged to review the Joint Standards for additional information about the various test-measurement topics discussed in the guide. ~ Key legal principles have been clarified and/or expanded, including principles related to "educational necessity" and to the use of tests for students with limited English proficiency and students with disabilities. . ~ Language has been added or deleted as appropriate to clarify that the primary focus of the guide is on the use of tests when making high-stakes decisions for students at the elementary and secondary level, though the guide makes clear that the general principles are applicable in the higher education context as well. ~ Language has been added consistent with the Joint Standards indicating the importance of alignment between what primary and 'secondary students are taught and what material is covered on tests used for high-stakes purposes, specifically promotion and graduation purposes. Language has also been added consistent with the Joint Standards cautioning that a single test should generally not be used as a sole criterion for making high-stakes decisions unless valid for that purpose; additional relevant information should be considered if it would enhance validity. ~ Additional examples have been added to the test-measurement chapter to aid in the readability of the document. For further information, please contact Scott Palmer or David Berkowitz at (202) 205-5526. 2 �Bethany Little 08/08/2000 07:49:55 AM Record Type: To: Record Caroline S. Chang/OPD/EOP@EOP, Kendra L. Brooks/OPD/EOP@EOP cc: Subject: OCR's Draft Testing· Guide: Release of Third Draft for Public Com ment This morning I was going through my "to do" list, and stumbled across this. It's actually. pretty important that we review this carefully -- the first time this draft guidance was leaked we got a few articles with the basic headline "White House forbids use of SAT." Clearly not a good thing ... Anyhow, can y'all please take a look? ,I'll try to review today/tomorrow and we can all meet on this Thursday. Thanks! ---------------------- Forwarded by Bethany Little/OPD/EOP on 08108/2000 07:46 AM ----------------~---------- "Palmer, Scott" <Scott_Palmer@ed.gov> 07105/200001 :32:30 PM Record Type: To: Record See the distribution list at the bottom of this messag~ cc: See the distribution list at the bottom of this message Subject: OCR's Draft Testing Guide: Release of Third Draft for Public Com ment As you may know, OCR is making available tomorrow, Thursday, July 6, for public comment the third draft or our guide on the proper use of tests when making high-stakes decisions for students. The draft has a 30-day comment period, ending August 7, 2000, complete with notice of availability being published tomorrow in the Federal Register. To maintain the good and open process that folks have widely praised, copies of the third draft are being sent to stakeholders who have commented on previous drafts. Copies will also be available on the web at http://www.ed.gov/offices/OCR/testing, and by calling OCR customer service at (800) 421-3481 . .The purpose of the draft testing guide is to provide educators and policymakers with a practical tool that summarizes key test~measurement and legal principles that should inform the use of tests when making high-stakes decisions for students, such as promotion, placement, and graduation decisions. (The guide has always been focused primarily on the K-12 context (though the general prinCiples are applicable to the higher education context as well), and the third draft makes that pOint even more clear.) The message is that the Department strongly supports efforts to promote high-standards and accountability in education, and these efforts, including the use of tests, should be done in a manner consistent with federal nondiscrimination laws, which reinforce sound educational praCtices. It is a positive message, and the second draft of the guide, which was r.eleased in . �D~cember 1999, was largely well-received. OCR did, however, receive sUbstantial comments to which we have sought to respond in the third draft. The draft has also been independently reviewed within ED, by DOJ, and by the National Academy of Sciences' Board on Testing and Assessment (BOTA). Thanks to the good work of various ED offices, DOJ, BOTA, and more, the draft testing guide has gotten even better and should not cause great controversy in terms of substance. However, the issue of testing is hot and timely, and various parties will likely be interested. We are working within ED to schedule some short briefings for next week with K-12, higher education, and Civil rights groups, as well as Hill staff, to introduce the third draft of the testing guide if folks are interested. We are also working with ED public affairs as appropriate. Meanwhile, attached for your reference are (1) internal talking points regarding the release of the third draft, (2) a cover letter that will accompany the copies being sent to stakeholders, and (3) a copy of the third draft of the testing guide (though the web version will likely be better formatted and should be used externally). If you have any questions or concerns, please contact me or David Berkowitz at (202) 205-5526. Thanks. «Policy-Testing Guide-Talking Points-July 2000.doc» «Policy-Testing Guide-Cover letter for draft #3-main letter. doc» «Policy-Testing Guide-July 6 Draft.doc» . 11- Policy-Testing Guide-Talking Points-July 2000.doc - Policy-Testing Guide-Cover letter for draft #3-main letteLdoc· Policy-Testing Guide-July 6 Draft.doc Message Sent To: �"Holleman, Frank" <Frank_Holieman@ed.gov> "Joshi, Sejal" <SejaLJoshi@ed.gov> "Ramirez, Heidi" <Heidi_Ramirez@ed.gov> "Rossi, Diane" <Diane_Rossi@ed.gov> . "Tucker. Ben" <Ben_Tucker@ed.gov> "Winston. Judith" <Judith_Winston@ed.gov> "Winnick. Steve" <Steve_Winnick@ed.gov> "Craig. Susan" <Susan_Craig@ed.gov> "Lahring. Karl" <Karl_Lahring@ed.gov> "Kole. Adina" <Adina_Kole@ed.gov> "Jenkins. Kimberly" <KimberILJenkins@ed.gov> "Cohen. Mike" <Mike_Coheri@ed.gov> "Johnson. Judith" <Judith_Johnson@ed.gov> "Heumann. Judy" <JudLHeumann@ed.gov> "Warlick. Kenneth" <Kenneth_Warlick@ed.gov> "McGuire. Kent" <Kent_McGuire@ed.gov> "Phillips. Gary" <Gary_Phillips@ed.gov> "Goldstein. Arnold" <Arnold_Goldstein@ed.gov> "Cole. Arthur" <Arthur_Cole@ed.gov> "Wohl. Alexander" <Alexander_Wohl@ed.gov> . "Heine. Roberta" <Roberta_l-:Ieine@ed.gov> "Lyon. Tom" <Tom_Lyon@ed.gov> "Murphey. Rodger" <Rodger_Murphey@ed.gov> "Fleming, Scott" <Scott_Fleming@ed.gov> . "Kelley. Thomas" <Thomas_Kelley@ed.gov> "Rairdin, Kae" <Kae_Rairdin@ed.gov> '''anita_hodgkiss@usdoj.gov·'' <anita_hodgkiss@usdoj.gov> Peter RundleUWHO/EOP Bethany Little/OPD/EOP John B. Buxton/OPO/EOP Message Copied To: �. . . "Cantu, Norma V" <Norma_V_Cantu@ed.gov> "Pierce, Raymond" <Raymond_Pierce@ed.gov> "Patterson, Lindsay" <LindsaLPatterson@ed.gov> "Jackson, John H" <John_Jackson2@ed.gov> "Lim, Jeanette" <Jeanette_Lim@ed.gov> "Serkowi~, David" <David_Serkowitz@ed.gov> "Fitch, Rebecca" <RebeccaJitch@ed.gov> "Kopriva, Rebecca" <Rebecca_Kopriva@ed.gov>' "Wolkowitz, Sarbara" <Sarbara_Wolkowitz@ed.gov> "Sowers, Susan" <Susan_Sowers@ed.gov> "Lewis, Cathy H" <CathLH_Lewis@ed.gov> "Dorka,Lilia(l" <~ilian_Dorka@ed.gov> "Tosado, Rebekah" <Rebekah_Tosado@ed.gov> . "Cramolini, Steve" <Steve.:,.Cramolini@ed.gov> : "Slayton, Lester" <Lester_Slayton@ed.gov> "Hibino, Thomas" <Thomas_Hibino@ed.gov> "Whitney, Helen" <Helen_Whitney@ed.gov> "Fox, Wendella" <Wendella_Fox@ed.gov> "McGovern, Linda" <Unda_McGovern@ed.gov> "Orris, Harry" <Harry_Orris@ed.gov> "Sennett, Angela" <Angela_Sennett@ed.gov> "Walker, Gary" <Gary_Walker@ed.gov> "August, Taylor" <Taylor_August@ed.gov> "Wender, Alice" <Alice_Wender@ed.gov> "Gutierrez, Lillian" <Lillian_Gutierrez@ed.gov> "Rosenzweig, Stefan" <Stefan_Rosenzweig@ed.gov> "Jackson, Gary" <Gary_Jackson@ed.gov> �The Use of Tests When· Making High-Stakes Decisions for Students: A Resource Guide for Educators and Policymakers . u.s. Departnlent of Education Office for Civil Rights . Draft ..... ~ .." .................. Draft .................. ~ ........Draft July 6, 2000 �UNITED STATES DEPARTMENT OF EDUCATION OFFICE FOR CIVIL RIGHTS . THE ASSISTANT SECRETARY (I Dear Colleague: Adherence to good test use practices in education is a shared goal of government officials, policymakers, educators, parents, and students. In an era of school reforms that place increasing emphasis'on measures of accountability, such as tests used for high stakes purposes for individual students,· the need to provide practical information about good testing practices is well documented. In January 1999, the National Research Council observed that we in the education community should work to better disseminate information related to good testing practices with a focus on the standards of testing professionals and the relevant legal principles that, together, "reflect many common concerns." The points of alignment between sound educational policies and judgments and federal nondiscrimination laws compellingly illustrate the symmetry between the goals of promoting educational excellence for all stl,ldents and ensuring that educational practices do not - intentionally or otherwise -. unfairly deny educational opportunities to students based upon their race, national origin, sex or disability. In short, federal civil rights law affirms good test use practices. As a result, an understanding of the measurement principles related to the use of tests for high-stakes purposes is an essential foundation to better understanding the federal legal standards that are significantly informed by those measurement principles. In order to further the goal of accurate and fair judgments in high-stakes decision making that involves the use of tests, we are pleased to provide you with this copy of The Use 0/ Tests When Making High-Stakes Decisions/or Students: A Resource Guide/or Educators and Policymakers. This guide provides important information about the professional standards relating to the use of tests for high-stakes purposes, the relevant federal laws that apply to such practices, and references that can help shape educationally sound and . legally sufficient testing practices . • As explained throughout the guide, the primary focus is the use of standardized tests or assessments (referrcd to in thc guide as tests) llsed to make dccisions with important conscquences for individual students. Examples of high-stakes decisions include: student placement in gifted and talented programs or in programs serving students with limited English proficiency; determinations of disability a~d eligibility to receive special education services; student promotion from one grade level to another; graduation from high school and diploma awards; and admission decisions and scholarship awards. The guide does not address teacher-created tests that are used for individual classroom purposes. �There are few simple or definitive answers to questions about the use of tests for high stakes purposes. Tests are a means to an end and, as such, can be understood only in the c,ontext in which they are used. The education context - in which the relationship (and attendant obligations) of the educator to the student is frequently more complex than that between employer and employee - shows time and again that any decision regarding the legality of a use of a test for high-stakes purposes under federal nondiscrimination law cannot be made without regard to the educational interest's and judgments upon which the test use is premised. Background Throughoutthe 1990s, national, state and local education leaders have focused on raising education standards and establishing strategies to promote accountability within the education community. In fact, the promotion of challenging learning standards for all students - coupled with assessment systems that monitor progress and hold schools accountable - has been the centerpiece ofthe education policy agenda of the federal government as well as many states. Predictably, the number of states using tests as a condition for high school graduation is on the rise, with (by a recent estimate) 26 states projected to use tests as conditions for graduation by 2003 and six states now using tests as conditions for grade promotion, a significant increase from past years. At the same time, more and more educators and policymakers have requested advice and technical assistance from the U.S. Department of Education regarding test use in the context of standards reforms. The Department's Office for Civil Rights (OCR) is also addressing testing issues in a more extensive array of complaints ,of discrimination being filed with our office, most of them in a K-12 setting with implications for high-standards learning. OCR has responsibility for enforcing Title VI ofthe Civil Rights Act of 1964, Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and Title II of the Americans with Disabilities Act of 1990. These statutes prohibit discrimination on the basis of race, color, national origin, sex, and disability by educational institutions that receive federal funds. In a similar vein, institutions in the post-secondary community in recent years have engaged in a thoughtful dialogue and analysis regarding merit in admissions and the appropriate use oftests to establish foundations for high-stakes admissions decisions. In some states, the use of tests in connection with admissions decisions' has been an important element in public post-secondary education reform. These trends highlight the salience of two recent conclusions ofthe National Research Council (NRC) Board on Testing and Assessment. In January of this year, the NRC observed that too many policymakers and educators are not aware of the test measureinent standards that should inform testing policies and practices. These standards include the Standards for Educational and Psychological Tests, prepared by a joint committee of the American Psychological Association (AP A), the American Educational Research Association (AERA), and the National Council on Measurement in Education (NCME). The NRC also concluded that it "is essential that educators and policymakers Draft 6112/00 11' �alike be aware of both the letter of the laws and their implications for test takers and test users" [National Research Council, High Stakes: Testing/or Tracking, Promotion and Graduation, (Heubert and Hauser, eds., 1999)]. The Resource Guide' Toward this end, OCR has prepared this guide in an effort to assemble the best infonnation regarding psychometric standards, lega1 principles, and resources to help educators and policymakers frame strategies and programs that promote learning to high ' standards in ways consistent with federal non-discrimination law. Our goal is to infonn decisions related to the use oftests that have high-stakes consequences for students when, for instance, they move from grade to grade or graduate from high school. Just as we know that good test use practices can advance high standards for learning and equal opportunity, we know'that educationally inappropriate uses of tests do not. If we want this generation oftest-taking students and their teachers and schools to meet high standards, then we should insist that the tests they take meet high standards. As foundations for judgments that profoundly shape the lives of students, these tests must be used in ways that accurately reflect educational standards and that do not inappropriately deny opportunities to students based on their race, national origin, sex or disability. The guide is organized to provide practical guidance related to the use of tests for high stakes purposes. The Introduction to the guide provides a broad, conceptual overview of relevant principles so that those who are not familiar with test measurement principles or applicable federal law can better understand the kinds of issues that relate to the use of tests in many contexts from grade-to-grade promotion to college admissions. Chapter one of the guide provides a detailed discussion of the test measurement principles that can provide a foundation for making well-inforined decisions related to high-stakes testing.' The relevant principles that have been approved by the APA, AERA, and NCME are discussed in detail in this chapter. Adherence to relevant professional standards can help reduce the risk of legal liability when schools are usinK.assessments.for high-stakes purposes. Chapter two provides an overview of~e existing legal principles that have guided federal courts and OCR when analyzing claims of race, national origi,n, sex, and disability discrimination related to the use oftests as foundations in high-stakes decisions. affecting students. These principles, as applied by the courts and OCR, underscore the importance of adhering to educationally sound testing practices. The Appendix includes a Glossary of Test Measurement Tenns, a Glossary of Legal Tenns, a Compendium of Federal Nondiscrimination Laws, and a Resources and References section. Central Principles There are several central principles reflected in the text of this guide. First, federal nondiscrimination laws are consistent with the establishment of high standards of learning for all students and educationally sound practices designed to meet that goal. The goals of promoting high educational standards and ensuring nondiscrimination are complemen~ary objectives. Indeed, if the federal courts that have applied civil rights statutes to education cases teach us anything, it is that compliance ' with federal nondiscrimination standards rests in the first instance upon the school's Draft 6/12/00 III �educational judgments, and that those judgments deserve d~ference.· Not surprisingly, the ultimate questions posed by our resource guide on the use of tests for high-stakes purposes also center on educational sufficiency: Is the test valid for the purposes used? Are the inferences derived from test scores, and the high-stakes decisions based on those inferences, accurate and fair? These inquiries are not an effort to dumb down academic standards or alter core education objectives integral to academic admissions or other educational decisions. Rather, they focus the educator and policymaker on ensuring that uses of tests with consequences for students are educationally sound and legally appropriate. Second, federal nondiscrimination laws support the use oftests, including large-scale standardized tests, when they are used in valid, reliable, and educationally appropriate ways. Importantly, tests can help indicate inequalities in the kinds of educational opportunities students are receiving, and in tum, they may stimulate efforts to ensure that all students have equal opportunity to achieve high standards. When tests accurately indicate performance gaps, our concern should be with the quality of educational opportunities afforded to under-performing students - rather than the integrity of the test itself. The key question in the context of standards-based reforms and the use of tests as measures of student accountability is: Have all students in certain school districts been provided quality instruction, sufficient resources, and the kind of learning environment that would foster success? Third, a test score disparity among groups of students does not alone constitute discrimination under federal law. The guarantee under federal.law is for equal opportunity, riot equal results. Test results indicating that groups of students perform differently should be a cause for further inquiry and examination, with a focus upon the relevant educational programs and testing practices at issue. Differences in test scores may result from a range of factors, some of which a school may be able to influence, and others over which it has little control. Federal law recognizes this point, as it must. The legal non-discrimination standard regarding neutral practices (referred to by the courts as the "disparate impact" standard) provides that if the education decisions based upon test scores reflect 'statistically significant disparities based on race, national origin, or sex in the kinds of educational benefits afforded to students, then questions about the education practices at issue (including testing practices) should be thoroughly examined to ensure that they are in fact non-discriminatory and educationally sound. In short, the goal of the federal legal standards is to help promote accurate and fair decisions that have real consequences for students, not to water down academic standards or deter educators from establishing and 'applying sensible and rigorous standards. Conclusion Recognizing the responsibility that educators and policymakers must shoulder in making the promise of high standards learning a reality, U.S. Secretary of Education Richard Riley in his commemoration of the 45th anniversary of the Brown v. Board ofEducation decision said: "A quality education must be considered a key civil right for the twenty first century." This is the driving force behind OCR's continuing effort to provide assistance to policymakers and educators as we continue to enforce federal laws that prohibit discrimination against students. Rather than creating false and polarizing "winDraft 6/12/00 IV �lose" choices on this all-important set ofissues, we need to, 'as Secretary Riley admonishes, "search for common ground" - ground, that is, in this case, expansive. We have worked with literally dozens of groups and individuals, including educators, parents, teachers, business leaders, policymakers, test publishers, and others, to solicit input and advice regarding the scope, framing, and kinds of resources to include in this guide, and we are grateful for their assistance. In addition, we have contracted with the NRC's Board on Testing and Assessment, which has reviewed earlier drafts of the guide, to ensure that the guide comports with professional standards. We are grateful for the NRC's tireless efforts. . Working together with our education partners, we believe that we are providing a useful resource that will serve the education community as it addresses the very complex and important questions that stem from the institution of high standards and accountability systems designed to promote the best schools in the world. Very truly yours, DRAFT Norma V. Cantu Draft 6112/00 v �Table of Contents INTRODUCTION: An Overview of the Resource Guide .......... 1 . CHAPTER 1. Test Measurement Principles ............................. 19 CHAPTER 2. Legal Principles .................................................. 46 APPENDIX A: Glossary of Legal Terms ................................. 63 APPENDIX B: Glossary of Test ~easurement Terms............. 67 . APPENDIX C: Acconlmodations Used by States .................... 74 . . APPENDIX D: Compendium of Federal Statutes and Regulations . . ...................................................... ;................... 77 APPENDIX E: Resources and References ............................... 80 Draft 7/6/00 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators altd Policymakers Draft INTRODUCTION: An Overview of the Resource Guide I. Introduction Decisions affecting students' educational opportunities should be made accurately and fairly. When tests are used in making educational decisions for individual students, they should accurately measure students' abilities, knowledge, skills or needs, and they should do so in ways that do not discriminate in violation of federal law on the basis of the students' race, national origin, sex or disability. The US. Department of Education's Office for Civil Rights (OCRl·has developed this resource guide in order to provide educators and policymakers with a useful, practical tool that will assist in their development and implementation of policies that involve the use of tests in making high stakes decisions for students. It is intended t6 facilitate the propt;r use of tests for those purposes. Chapter one of this guide provides information about professionally recognized test measurement principles. Chapter two provides the legal frameworks that have guided federal courts and OCR when addressing the use oftests that have high-stakes consequences for students. The test measurement principles described in chapter one are not legal principles. However, the use of tests in educationally appropriate ways consistent with the principles described iIi chapter one can help to minimize the risk of noncompliance with the federal nondiscrimination laws discussed in chapter two. IOCR enforces laws that prohibit discrimination on the basis of race, national origin, sex, disability, and age by educational institutions that receive federal funds. The laws enforced by OCR are: I) Title VI of the Civil Rights Act of 1964, 42 U.S.c. §§ 2000d, et seq. (2000)(Title VI), which prohibits discrimination on the basis of race, color, or national origin; 2) Title IX of the Education Amendments of 1972; 20 U.S.c. §§ 1681, et seq. (I999)(Title IX), which prohibits discrimination on th.e basis of sex; 3) Section 504 of the Rehabilitation Act of 1973, 29 U.S.c. §§ 794, el seq. (I 999)(Section 504), which prohibits discrimination on the basis.of disability; 4) the Age Discrimination Act of 1975, 42 U.S.c. §§ 6 I 01, el seq. (1995 and Supp" 1999)(as amended), which prohibits age discrimination; and 5) Title II of the Americans with Disabilities Act ofl990, 42 U.S.c. §§ 12134, et seq. (1995 and Supp. 1999)(Title II), which prohibits discrimination on the basis of disability by public entities, whether or not they receive federal financial assistance. Draft 7/6/00 1 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 716/00 Guide For Educators alld PolicYlIlakers ' Draft The guide also includes a collection of resources related to test measurement 'and nondiscrimination principles that are discussed in the guide - all in an effort to help policymakers and educators ensure that decisions that have high-stakes consequences for students are made accurately and fairly. Educational stakeholders at all levels have approached OCR requesting adviceand technical assistance in a variety of test-use contexts, particularly as states and districts use tests as part oftheir standards~based reforms. Also, increasingly, OCR is addressing testing issues in a broader and more extensive array of complaints of discrimination that have been filed with OCR. These corresponding developments confirm the need to provide a useful resource that captures legal and test measurement principles and resources to assist educators and policymakers. This document does not establish any new legal or test1measurement principles. ' As used in this resource guide, "high-stakes decisions" refer to decisions with important consequences for individual students. Education entities, including state agencies, local education agencies, ,and individual education institutions, make a variety of decisions affecting individual students during the course of their academic careers, beginning in elementary school and extending through the post secondary school years. Examples of high-stakes decisions affecting students include: student placement in gifted and talented programs or in programs serving students with limited-English proficiency; determinations of disability and eligibility to receive special education services; student promotion from one grade level to another; graduation from high school and diploma awards; and 2 admissions decisions and scholarship awards. This guide is intended to apply to standardized tests that are used in making high-stakes decisions affecting individual students and that are addressed in the Standards for Educational and Psychological Testing (Joint Standards). The Joint Standards are viewed as the primary technical authority on educational test measurement issues. They have been prepared by a joint committee of the American Educational Research Association, the American Psychological Association and the National Council on Measurement in Education, the three leading organizations in the area of educational test measurement. The Joint Standards were developed and revised by these three organizations through a process that involved the participation of hundreds of testing professionals and thousands of pages ofwritten comment from both professionals and the public. The current edition of the Joint Standards reflects the experiel1ce gained from The purpose of this guide is to address tests that are used in making high-stakes decisions for individual students. In addition to using tests for high-stakes purposes for individual students, states and school districts are also using tests to hold schools and districts accountable for student performance. Although using tests for this purpose is not the focus of the guide, we have provided some useful background information about relevant principles and federal statutory requirements. 2 Draft 7/6/00 2 �The Use of Tests When Making High Stakes Deci~ions for Students: A ResouI'ce7/6/00 Guide Fol' Edllcatol's alld Policymakel's Draft . many years of wide use of previous versions of the Joint Standards in the testing community.. The Joint Standards, which are discussed in more detail below, apply to standardized measures generally recognized as tests, and also may be usefully applied to a broad range of system-wide standardized assessment procedures? For the sake of simplicity, this guide will refer to tests, regardless of the type of label that might otherwise be applied to them. The guide does not address teacher-created tests that are used for individual classroom purposes. States and school districts are also using another important kind of assessment system for the purpose of promoting school and district accountability. For example, under Title I of the Elementary and Secondary Education Act, states are required to develop content standards, performance standards, and assessment systems that measure the progress that schools and districts are making in educating students to the standards established by the state. Title I explicitly requires that such assessments be valid and reliable for their intended purpose and be consistent with relevant, nationally rycognized technical and 4 professional standards. When egucators and policy makers consider using the same test for school or district accountability purposes and for individual student high-stakes purposes, they need to ensure that the test score inferences are valid and reliable for each particular use for which the test is being considered. . When high-stakes decisions are made, test scores are often used in conjunction with other criteria, such as grades and teacher recommendations. A test should not be used as the sole criterion for making a high-stakes decision unless it is.validated for this use. The Joint Standards state that a high-stakes decision "should not be made on the basis of a single test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision."s As explained in the Joint Standards, "[w ]hen interpreting and using scores about individuals or groups of students, considerations ofrelevant collateral information can enhance the validity ofthe interpretation, by providing corroborating evidence or evidence that helps explain student performance. . .. As the stakes of testing increase for individual students, the importance of considering additional evidence to document the validity of score interpretations and tpe fairness in testing increases accordingly.,,6 The Joint Standards note that the applicability of the Joint Standards to an evaluation device or method is not altered by the label used (e.g., test, assessment scale, inventory). A more complete discussion about the instruments covered by the Joint Standards can be found in the introduction section of that document. See Joint Standards, Introduction, pp. 3 4. . 3 420 U.S.C. 631 I (b)(3)(C). 5 Standard 13.7 states, "In educational settings, a decision or characterization that will have major impact on a student, should not be made on the basis ofa single test score. Other relevant information should be taken into account ifit will enhance the overall validity of the decision." 6 Joint Standards, p. 141. Draft 7/6/00 3 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators ami Policymakers Draft Although this guide focuses on the use of tests in making high-stakes decisions, policymakers and the education community need to ensure that the operation of the entire high-stakes decision-making process does not result in the discriminatory denial of educational benefits or opportunities to students.? Applicable standards for technical quality set forth in the Joint Standards are important principles to consider when other criteria affect high-stakes decisions. Educators should carefully monitor inputs into the high-stakes decision-making process and outcomes over time so that any potential discrimination arising from the use of any of the criteria can be identified and eliminated. The guide focuses primarily on tests used in making high-stakes decisions at the elementary and secondary education level. However, it is important to recognize that the general principles <;>f sound educational measurement apply equally to tests used at the elementary and secondary education level and at the post-secondary education level, including admissions and other types of test use. s For example, post-secondary. admissions policies and practices should be derived from and clearly linked to an institution's overarching educational ~oals, and the use of tests in the admissions process should serve those institutional goals. II. Foundations of the Resource Guide A. Professional Standards of Sound Testing Practices Chapter one summarizes the leading professionally recognized standards of sound testing practices See Nondiscrimination Under Programs Recciving Federal Financial Education Effectuation of Title VI of the Civil Rights Act of 1964, 34 J 00.3(b )(2) (1999); Nondiscrimination on the Basis of Handicap in Pr Financial Assistance, 34 C.F.R. §§ 104.4(a), 104.4(b)(J)(i) and (iv), an Basis of Sex in Education Programs and Activities Receiving or Bene C.F.R. §§ 106.3J(a) and 106.31(b) (1999). 7 For additional information regarding testing at the post-secondary Ie Trddeoffs, 1999; Messick, S., Validity, in R.L. Linn, ed., Educational 13-103, 1989; Wigdor, Alexandra K .. and Garner, Wendell R., ed., Ab Controversies, chapter 5, National Academy Press, 1982. 8 See'High Stakes, p. 23 and National Research Council, Placing Children in Special Education: A Strategy for Equity, 1982. 9 Draft 7/6/00 4 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators and Policymakers Draft within the educational measurement field. They include those described in the Joint Standards (1999), which represent the primary statement of professional consensus regarding educational testing. Other leading professionally recognized standards of sound testing practices within the educational measurement field include the Code ofFair Testing Practices in Education (1988), and the Code ofProfessional Responsibilities in Educational Measurement (1995). The guide also cites recent reports from the National Research Council's Board on Testing and Assessment, including High Stakes: Testingfor Tracking, Promotion and Graduation (High Stakes, 1999), Myths and TradeojJs: The Role of Tests in Undergraduate Admissions (Myths and TradeojJs, 1999), Testing, Teaching, and Learning: A Guide for States and School Districts (Testing, Teaching, and Learning, 1999), Improving Schoolingfor Language-Minority Children: A Research Agenda (Improving Schoolingfor Language-Minority Children, 1997), and Educating One & All: Students with Disqbilities and Standards-Based Reform (Educating One & All, 1997). I 0 These reports help explain or elaborate principles· that are stated in the Joint Standards. Designed to provide criteria for the evaluation of tests, testing practices, and the effects of test use, the Joint Standards recommend that all professional test developers, sponsors, publishers, and users make efforts to observe the Joint Standards and encourage others to do so. II The Joint Standards inclu<;le chapters on the test development process (with a focus primarily on the responsibilities of test developers), the specific uses and applications of tests (with a focus primarily on the responsibilities of test users), and the rights and responsibilities of test takers. Because the Joint Standards are the most widely accepted professional standards that are relied upon in developing testing instruments, this guide includes a discussion of specific standards that are contained within the Joint Standards, where relevant. Numbered standards that are referenced throughout this guide .refer to specific standards that are contained within the Joint Standards. In order to ensure that information presented in the guide is readable and accessible to educators and policymakers, we have paraphrased language from relevant standards. Our goal in paraphrasing is to be concise and accurate. Where we have paraphrased in ·the text, we have also provided the full text of the relevant standards in the footnotes. Because the Joint Standards provide additional relevant discussion, we always encourage readers also to review the full document. Professional test measurement standards provide important information that is relevant to making determinations about appropriate test use. The Joint Standards provide a frame of reference to assist in the evaluation oftests, testing practices, and the effects of test use. The Joint Standards caution that the acceptability of a test or test application does 10 The National Academy of Sciences, which is an independent, private, nonprofit entity, established the Board on Testing and Assessment in 1993 to help policymakers evaluate the use of tests, alternative assessments, and other indicators commonly used as tools of public policy. The Board provides guidance for judging the quality of testing or assessment technologies and the intended and unintended consequences of particular uses of these technologies. The Board concentrates on topics and conducts activities that serve the general publ ic interest. II ·See, e.g., Joint Standards, Introduction, p. 2. Draft 7/6/00 5 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers Draft not rest on the literal satisfaction of every standard in the Joint Standards and cannot be determined by using a checklist. 12 The exercise of professional judgment is a critical element in th.e interpretation and application of the standards,13 and the interpretation of individual standards should be considered in the overall context of the use of the test in question. Failure to meet a particular professional test measurement standard does not necessarily constitute a lack of compliance with federal civil rights -laws. B. Legal Standards Chapter two of the guide discusses the federal Constitutional, statutory and regulatory nondiscrimination principles that apply to the use of tests for high-stakes p~rposes. This guide is intended to reflect existing legal principles and does not establish new federal legal requirements. The primary legal focus~fthe resource guide is an explanation of principles that are clearly embedded in four nondiscrimination laws that have been enacted by Congress: Title VI of the Civil Rights Act of 1964 (Title VI), Title IX of the Education Amendments of 1972 (Title IX), Section 504 of the Rehabilitation Act of 1973 (Section 504), and Title II ofthe Americans with Disabilities Act of 1990 (Title 11).14 Within the U.S. Department of Education, the Office for Civil Rights has responsibility for enforcing the requirements of these four statutes and their implementing regulations. . The due process and equal protection requirements of the Fifth and Fourteenth Amendments to the U.S. Constitution have also been applied by courts to issues regarding the use of tests in making hIgh-stakes educational decisions. Although the Office for Civil Rights does not enforce federal constitutional provisions, a brief overview of these constitutional principles has been included for informational purposes. 12 Joint Standards, Introduction, p. 4. 13 Joint Standards, Introduction, p. 4. 14 Title V[ prohibits discrimination on the basis ~f race, color and national origin in the programs and activities of recipients that receive federal financial assistance. The U.S. Department of Education's regulation implementing Title V[ is found at 34 C.F.R. Part 100. Title [X prohibits discrimination on the basis of sex in educational programs and activities of recipients offederal financial assistance. The U.S. Department of Education's regulation implementing Title IX is found at 34 C.F.R. Part 106. Section 504 prohibits discrimination on the basis of disability in the programs and activities of recipients offederal financial assistance. The U.S. Department of Education's regulation implementing Section 504 is found at 34 C.F.R. Part 104: Title [I prohibits discrimination on the basis of disability by public entities, regardless of whether they receive federal funding. The ·U.S. Department of Education's regulation implementing Title " is found at 28 C.F.R. Part 35. Draft 7/6/00 6 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers III. Draft Basic Principles The brief overview of the test measurement and legal principles that follows establishes the framework fo'r more detailed discussions of test quality in chapter one and federal , legal standards in chapter two.' / Draft 7/6/00 7 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft \ A. Test Use Principles 1. Educational Objectives and Context· Tests that are used in educationally appropriate ways and that are valid for the . purposes used are important instruments to help educators do their job. Before any state, school district, or educational institution administers a test, the objectives for using the test should be clear: What are the intended goals for and uses of the test in question? As an educational matter, the answer to this question will guide all other relevant inquiries about whether the test use is educationally appropriate. The context in which a test is to be administered, the population oftest takers, and the intended purpose for which the test will be used are important considerations in determining which test would be appropriate for a specific use, as illustrated below: a. Placement Decisions Placement decisions are by their very nature used to make a decision about the future. Tests used in placement decisions generally determine what kinds of programs, services, or interventions will be most appropriate for particular students. Decisions concerning the appropriate educational program for a student with a disability, placement in gifted and talented programs, and access to language services are examples of placement decisions, The Joint Standards state that there should be adequate evidence documenting the relationship among test scores, appropriate instructional programs, and beneficial student outcomes. IS When evidence about the relationship is limited, the test results should be ' corisidered in light of other relevant student information. 16 15 Standard 13,9 states, "When test scores are intended to be used as part of the process for making decision~ for educational placement, promotion, or implementation of prescribed educational plans, empirical evidence documenting the relationship among particular scores, the instructional programs, and desired student outcomes should be provided, When adequate'empirical information is not available, users should be cautioned to weigh the test results accordingly in light of other relevant information about the student." . 16 See id. Draft 7/6/00 8 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers b. Draft Promotion Decisions Student promotion decisions are generally viewed as decisions incorporating a determination about whether a student has mastered the subj ect matter or content of instruction provided to date and a determination regarding whether the student will be able to master the content at the, next grade level (a placement decision). 17 At present, the focus of most school districts and states with promotion policies has been primarily on assessing mastery of curriculum taught at a given grade , level. 18 When a test given for promotion purposes is being used to certify mastery, the use of the test should adhere to professional standards for certifying knowledge and skills for all students. 19 It is important that there be evidence that the test adequately covers only the content and skills that students have actually had an opportunity t6 leam. 2o Educational institutions should have information indicating an alignment among the curriculum, instruction, and material covered on such a high-stakes test. To the extent that a test for 'promotion purposes is being used as a placement device, it should also adhere, as appropriate, to professional standards regarding tests used for placement purposes?' , , 17 See High Stakes, p. 123. 18 See American Federation of Teachers, Passing all Failure: District Promotion Policies a/Jd Practices, 1997. 19 See Standards 13.5 and 13.6; High Stakes, p. 123. Standard 13.5 states, "When test results substantially contribute to making decisions about student promotion or graduation, there should be evidence that the test adequately covers only , the specific or generalized content and skills that students have had an oppOrtunity to learn." Standard 13,6 states, "Students who must demonstrate mastery' of certain skills or knowledge before being promoted or granted a diploma should have a reasonable number of opportunities to succeed on equivalent forms of the test or be provided with construct-equivalent testing alternatives of equal difficulty to demonstrate the skills or knowledge. In most circumstances, when students are provided with multiple opportunities to demonstrate mastery, the time interval between the opportunities should allow for students to have the opportunity to obtain the relevant instructional experiences." . " 20 See Standard 13.5, supra note 19.; High Stakes, pp, 124-125. 21See Standards 13.2 and 13,9; High Stakes, p. 123, Standard 13,2 states, "In educational settings, when a test is designed or used to serve multiple purposes, evidence of the test's technical quality should be provided for each purpose," See Standard 13:9, supra note 15. ' Draft 7/6/00 9 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Et/tlcatol's alld Policymakers c. Draft Graduation Decisions Graduation decisions are generally certification decisions: The diploma certifies that the student has reached an acceptable level of mastery of knowledge and skills. 22 When large-scale standardized tests are used in making graduation decisions, there should be evidence that the test adequately covers only the content an~ skills thatstudents have had an opportunity to leam. 23 Therefore, all students should be provided a meaningful opportunity to acquire the knowledge and skills that are being tested, and infonnation should, indicate an alignment among the 9,urriculum, instruction, and material covered on the test used as a condition for graduation. 2. Overarching Principles The highly contextual and fact based test measurement analyses applicable to a variety of circumstances ultimately focus upon the following question: Is there sufficient confidence in the test results at issuE: to allow for infonned'decisions to be made that will have specified consequences for the students taking the test? In the elementary and secondary education context, regardless of whether tests are being used to make placement, promotion, or graduation decisions, the National Academy of Sciences' Board on Testing and Assessment has identified three principal criteria, which are based on established professional standards, that can help .infonn and guide conclusions regarding this issue. 24 (1) Measurement validity: Is a test valid for a particular purpose, and does it accurately measure the test taker's knowledge in the content area being tested? State and local educational agencies and educational institutions should ensure that a test actually measures what it is intended to measure for all students. The inferences derived from the test scores for a given use for a specific purpose, in a specific type of 22 See High Stakes, p. 166. 23 See Standard 13.5, supra note 19, . ., See High Stakes, p. 23 and National Research Council, Placing Children in Special Education:, A Strategy for Equity, 1982. 24 Draft 7/6/00 10 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers Draft situation, and with specific types of students - are validated, rather than the test itself. It is important for educators who use the test to request adequate evidence of test quality (including validity and reliability evidence), evaluate the evidence, and ensure that the test is used appropriately in a way that is consistent with information provided by the developers or through supplemental validation studies. (2) Attribution ofcause: Does a student's performance on a test reflect knowledge and skills based on appropriate instruction, or is it attributable to poor instruction or to such factors as language barriers unrelated to the skills being tested? In some contexts, whether a particular test use is appropriate depends on whether test scores are an accurate reflection of a student's knowledge or skills or whether they are influenced by extraneous factors unrelated to the specific skills being tested. For example, when tests are used in making student promotion or graduation decisions, state and local education agencies should ensure that all students have an equal opportunity to acquire the knowledge and skills that are being tested. 25 In some situations, it maybe necessary to provide appropriate accommodations for limited English proficient students and students with disabilities to accurately and effectively measure students' knowledge and skills in the particular content area bei~g assessed. 26 , (3) Effectiveness oftreatment Do test scores lead to placements and other consequences that are educationally beneficial? The most basic obligation of educators at the elementary and secondary level is to meet the needs of students as they find them, with their different backgrounds, and to teach knowledge and skills to allow them to grow to maturity with meaningful expectations of a productive life in the workforce and elsewhere. 27 This elementary andsecondary educational obligation is no less present when educators administer tests and evaluate and act on students' test results than it is during classroom instruction. Relying upon the sound premise that tests should be 25 See Standard 7.10, which states, "When the use of a test results in outcomes that affect the life chances or educational opportunities of examinees, evidence of mean test score differences between relevant subgroups of examinees should, where feasible, be examined for subgroups for which credible research reports mean differences for similar tests. Where mean differences are found, an investigation should be undertaken to determine that such differences are not attributable to a source of construct underrepresentation or construct-irrelevant variance. While initially, the responsibility of the test developer, the test user bears responsibility for uses with groups other than those specified by the developer." 26 See Joint Standards, p. 143. See Brown v. Bd. ofEduc., 347 U.S. 483, 493 (1954) (stating that "[education] is required in the performance of our most basic public responsibilities, ... is the very foundation of good citizenship, ... [and] is [a] principal instrument· ... in preparing [the child] for later professional training .... "). 27 Draft 7/6/00 11 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators and Policymakers Draft integral to the learning and achievement of students, one federal cohrt distinguished betWeen testing in the empioyment and education settings: . If tests predict that a person is going to be a poor employee, the employer can legitimately deny the person the job, but if tests suggest that a young child is probably goingto be a poor student, a school cannot on that basis alone deny that child the opportunity to improve and develop the~ac.ademic skills necessary to .... . success in our society.28 Tests; in short, should be instruments used by elementary and secondary educators to help students achieve their full potentiaL .Test scores should lead ~o consequences that are educationally beneficial for students. When making high-stake·s decisions that involve the use of tests, it is important for policymakers and educators to consider the intended and unintended consequences that may result from the use of the test scores. 29 B.. Legal Principles Federal constitutional, statutory, and regulatory principles form the federal legal nondiscrimination framework applicable to the use oftests for high-stakes purposes. Title VI, Title IX, Section 504, and Title II, as well as the equal protection clause of the Fourteent~ Amendment to the United States Constitution, prohibit intentional discrimination based on race, national origin, sex, or disability. In addition, the regulations that implement Title VI, Title IX, Section 504 and Title II prohibit intentional .discrimination and policies or practices that have a <;liscriminatory disparate impact on students based 'on their race, national origin, sex, or disability.3o The Section 504 .. regulation and the Individuals with Disabilities Education Ace' contain specifi'c provisions relative to the use of high-stakes tests for individuals with disabilities. 32 , . Larry P. v. Riles, 793 F.2d 969, 980 (9th Cir.1984)(quoting Larry P v. Riles, 495 F. Supp. 926, 969 (N.D. Cal. 1979».' . 28 29 Research indicates that students in low-track classes do not have the opportunity to acquire knowledge and skills strongly associated with future success that is offered to students in other tracks. The National Research Council recommends that neither test scores nor other information should be used to place students in such classes, See High . , . Stakes, 1999: 282. 30 34 C.FK § 100.3(b)(2); 34 C.F.R. §§ I 06.21 (b)(2), 106.36(b), 106.52; 34 C.F.R. § 104.4(b)(4)(i); and 28 CfR. § 35.130(b)(3). '. The authority of federal agencies to 'issue regulations with an "effects" standard has been consistently acknowledged by U.S .. Suprem.e Court deeisions and applied by lower federakourts addressing claims of discrimination in education. See, e.g., Lau v.Nichols, 414 U.S. 563, 568 (1974); Guardians Ass'n. v. City Service Comm'n. a/City o/N. Y., 463 U.S. 582; 584-593 (1983); Alexander v. Choate, 469 U.S. 287, 289-300 (J985): See also Memorandum from the Attorney General for Heads of Departments and Agencies that Provide Federal Financial Assistance, "Use of the Disparate. Impact Standard in Administrative Regulations under Title' VI ofthe Civil Rights Act of 1964," July 14, 1994. The IDEA establishes rights and protections for students with disabilities and their families. It also provides federal funds to local school districts and state. agencies to assist in erlucaiing students with disabiiities. Individuals with Disabilities Education Act, 20 U.S.c. § 1400(\)(c). . ' . 31 32 34 C.F.R. §§ \04.35, 104.42(b); 20 U.S.C. §§ 1412(a)(\7),1414(b); 34 C.F.R. § 300.138 - .139, 300.530 -'.536. Dr~ft 7/6/00 12 �The Use of Tests When Making High Stakes Decisions for Students: A Resou,.ce 7/6/00 Draft Guide Fo,. Educato,.s alld Policymake,.s Further discussion of issues regardingi.testing of limited English proficient students and students with disabilities is provided below. 1. Frameworks for Analysis a. Different Treatment Under federal law, policies and practices generally must be applied consistently to similarly situated individuals',or groups, regardless of their race, national origin, sex, or disability. For example, a court concluded that a school district had intentionally treated students differently on the basis of race where minority students whose test scores qualified them for two or more ability levels were more likely to be assigned to the lower level class than similarly situated white students, and no explanatory reason was ' evident.33 In addition, educational systems that were previously segregated by race in violation of the. Fourteenth Amendment and have not achieved unitary status have an obligation to dismantle their prior de jure segregation. In such instances, when a school district or other educational syst'em uses a test or assessment procedure for a high-stakes purPose that has racially disproportionate effects, the school district or other educational system must show that the disparity is not traceable to prior intentional segregation or that the test or assessment procedure does not perpetuate the adverse effects of such segregation. 34 The school district is under "a 'heavy burden' of showing that actions that increase[] or continue[] the effects of the dual system serve important and legitimate ends.,,35 b. Disparate Impact Discrimination under federal law may also occur where the application of neutral criteria has discriminatory effects and those criteria are not educationally justified. The federal nondiscrimination regulations provide that a recipi~nt of federal funds may not "utilize criteria or methods of administration which have the effect of subjecting individuals to discrimination.,,36 (For a further discussion of issues related to testing of students with 33 See People Who Care v. Rockford Bd. ofEduc., 851 F. Supp.905, 958-100 I (N.D. 111. 1994), remedial order rev'd, ill part, 111 F.3d 528 (7th Cir. 1997). On appeal, the Seventh Circuit Court of Appeals stated that the appropriate remedy in this case was to require the district to use objective, non-racial criteria to assign students to classes, rather than abolishing the district's tracking system. 111 F.3d at 536. See also United States v. Fordice, 505 U:S. 717, 731-732 (1992); Debra P. v. Turlington, 644F.2d 397,407 (5th Cir. 1981); McNeal v. Tate County Sch. Dist., 508 F.2d 1017, 1020-1021 (5th Cir. 1975); Gf Forum v. Texas Educ. Agency, No. SA-97-CA-f278-EP, 2000 U.S. Dist. LEXIS 153, slip op. at 56-57 (W.D. Tex. 2000). . 34 35 Dayton Bd. ofEduc. v. Brinkman, 443 U. S. at 538 (quoting Green v. Country School Board, 391 U.S. 430, 439 (1968)).' . 36 See 34 C.F.R. § 100.3(b)(2) (Title VI); 34 C.F.R. § I 04.4(b)(4)(i) (Section 504); and 28 C.F.R. § 35.130(b)(3)(i) (Title 1/). See also 34 C.F.R. § 106.31 (Title IX). In Guardians, 463 U.S. at 589, the United States Supreme Court upheld the use of the effects test, stating that the Title VI regulation forbids the use of federal funds "not only in Draft 7/6/00 13 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Po!icYlIlaket's Draft . disabilities, see below.) The disparate impact analysis has been frequently misunderstood to indicate a violation of law base.d merely on disparities in student performance and to obligate educational institutions to change their policies and procedures to guarantee equal results. Under federal law, a statistically significant difference in outcomes creates the need for further examination of the educational practices in question that have caused the disparities in order to ensure accurate and nondiscriminatory decision making, but disparate impact alone is not sufficient to prove a violation of federal civil rights laws. Courts applying the disparate impact test have generally examined three questions to determine if the practices at issue are discriminatory: (1) Does the practice or procedure in question result in substantial differences in the award of benefits or services based on race, national origin or sex? (2) Is the practice or procedure educationally justified? (3) Is there an equally effective alternative that can accomplish the institution's educational goal with less disparity?37 Under the regulations implementing Title VI and Title IX, the party challenging the test has the burden of establishing disparate impact. If disparate impact is established, the educational institution must provide sufficient evidence of an educational justification for the practice in question. If sufficient evidence of an educational justification has been provided, the party challenging the test must then demonstrate, in order to prevail, that an alternative with less disparate impact is equally effective in meeting the institution's educational goals or needs. 38 2. Principles Relating to Inclusion and Accommodations a. Limited English Proficient Students programs that intentionally discriminate, but also in those endeavors that have a [racially disproportionate] impact on racial minorities." Courts use a variety of terms when discussing whether an alternative offered by the party ehallenging the practice is feasible and would also effectively meet the institution's goals. See, e.g., Georgia State Calif oJBranches oJNAACP v. Georgia, 775 F.2d 1403, 1417 (11 th Cir. 1985) (party challenging the practice "may ultimately prevail by proffering an equally effective alternative practice which results in less racial disproportionality"); Sandoval v. Hagan, 7 F.Supp.2d 1234, 1278 (M.D. Ala. 1998), ajJ'd., 197 F.3d 484, 507 (II th Cif. 1999) (plaintiff may prevail by Offering a "comparably effective" alternative practice which results in less proportionality). These terms appear to be used synonymously. 37 38 See Georgia State Conf, 775 F.2d at 1417. See also the Department of Justice's Title VI Legal Manual at p.2. Draft 7/6/00 14 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators a/ld Policymakers Draft The obligations of states and school districts with regard.to high-stakes testing of limited English proficient students in elementary and secondary schools must be examined within the overall context oftheir Title VI obligation to provide equal educational opportunities to limited English proficient stude~ts. Under Title VI, school districts have an obligation to identify limited English proficient students and to provide them with a program that enables them to acquire English-language proficiency as well as the knowledge and skills that all students are required to master. 39 States or school districts using tests for high-stakes purposes must ensure that, as with all students, the tests effectively measure limited English proficient students' knowledge and skills in the particular content area being assessed. For limited English proficient . elementary and secondary students in particular, it may be necessary in some situations to provide accommodations so that the tests provide accurate and valid information about the know ledge and skills intended to be measured. 4o b. Students with Disabilities Under Section 504, Title II, and the IDEA,41 school districts have a responsibility to provide students with disabilities with a free appropriate public education .. Providing effective instruction in the general curriculum for students with disabilities is an important aspect of providing a free appropriate public education. Under federal law, students with disabilities must be included in statewide or district-wide assessment programs and provided ~ith appropriate accoinmodations, ifnecessary.42 The~e must be an individualized determination of whether a student with a disability will participate in a particular test and the appropriate accommodations, if any, that a student with a disability will need. The individualized determinations of whether a student with a disability will participate in a particular test, and what accommodations, if any, are appropriate must be addressed through the individualized education program (IEP) process or other applicable See Equal Educational Opportunities Act of 1974, P.L. No. 93-380, codified at 20 U.S.C §§ 1701-1720; Lau v. Nichols, 414 U.S. at 568-569; Castaneda v. Pickard, 648 F.2d 989, 1011 (5th Cir. 1981); Memorandum to OCR Senior Staff from Michael L. Williams, Former Assistant Secretary for Civil Rights, September 27, 1991 (hcreinafter Willhims Memorandum). 39 States and school districts are also required to provide LEP students with "rea~onable adaptations and in certain situations when using assessments for thc purpose of holding schools and districts accountable for student performance under Title I. Title I of the Elementary and Secondary Education Act, 20 U.S.C § 6311 (a)(3)(F)(ii). Moreover, Title I requires States, to the extent practicable, to provide native-language assessments to LEP students for Title I accountability purposes if that is the language and form of assessment most likely to yield accurate and reliable information about what students know and ean do. 20 U.S.C § 6311 (a)(3)(F)(iii). For a discussion of comparability issues arising in the testing of LEP students, see pages 38-42 of this guide. 40 accommod~tions" The Section 504 regulation is found at 34 CF.R. Part 104 (1999). The Title" regulation is found at 28 CF.R. Part 35 (1999). The IDEA regulation is found at 34 CF.R. Part 300 (1999). 41 States and school districts are also required to provide students with disabilities with "reasonable adaptations' and accommodations" in certain situations when using assessments for the purpose of holding schools and districts accountable for student performance under Title I. 20 USC § 631 1(a)(3)(F)(ii). 42 Draft 7/6/00 15 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft evaluation and placement processes and included in either the student's IEP or Section . 504 plan. 43 Under Section 504, post-secondary education institutions may not make use of any test or criterion for admission that has a disproportionate adverse impact on individuals with disabilities unless (1) the test or criterion, as used by the institution, has been validated as a predictor of success in the education program or activity and (2) alternate tests or criteria that have a less disproportionate adverse impact are not shown to be available by the party asserting that the test or criterion is discriminatory.44 Admissions tests must be selected and administered so as best to ensure that, when a test is administered to an applicant with a disability, the test results accurately reflect the applicant's aptitude or achievement level, rather than reflecting the effect of the disability (except where the functions impaired by the disability are the factors the test purports to measure).45 Admissions tests designed for persons with impaired sensory, manual, or speaking skills must be offered as often and in as timely a manner as are other admissions tests. Admissions tests must be offered in facilities that, on the whole, are accessible to individuals with disabilities. 3. Federal Constitutional Questions Related to Testing of Elementary and Secondary Students For High-Stakes Purposes The equal protection and due process requirements of the Fifth and Fourteenth Amendments to the.U.S. Constitution would apply to ensure that high-stakes decisions by public schools or states based on test use are made appropriately.46 The equal protection principles involved iIi discrimination cases are, generally speaking, the same as the standards applied to intentional discrimination claims under the applicable federal nondiscrimination statutes. 47 Courts addressing due process claims have examined three questions related to the use of tests as bases for promotion or graduation decisions: . Under the IDEA, students with disabilities must be included in state and district-wide assessment programs. See 34 C. P.R. § 300.138(a). However, if the IEP team determines that a student should not participate in a particular statewide or district-wide assessment of student achievement (or part of such an assessment), the student's fEP must include statements of why that test is not appropriate for the student and how the student will be assessed. See 34 C.F.R. § 300.347(a)(5). The IDEA also requires state or local educational agencies to develop guidelines for students with disabilities who cannot take part in state and district-wide assessments to participate in alternate assessments; these alternate assessments must be developed and conducted beginning not later than July 1,2000. See 34 § C.F.R. 300. I38(b). . 43 44 See 34 C.F.R. § 104,42(b)(2). 45 See 34 C.F.R. § 104,42(b )(3). The requirements of Title VI, Title IX and Section 504 apply only to recipients of federal financial assistance. The protections afforded by the Fifth and Fourteenth Amendments to the U.S. Constitution extend to actions by governmental entities that are "state actors" and are not dependent on their reccipt of federal financial assistance. 46 Federal cases may involve equal protection challenges to ajurisdiction's use of tests in which the claim is not based on intentional race or sex discrimination, but, instead, on the alleged impropriety of the jurisdiction's use of tests to separate out those students who should not be allowed to graduate. As a general matter, courts exprcss reluctance to second guess a state's cducational policy choices when faced with such challenges, although they rccognize that a state cannot "exercise that [plenary] pcwer without reason and without regard to the United States Constitution." See Debra P. v. Turlington, 644 F.2d 397, 403 (5th Cir. 1981). When there is no claim of discrimination based on membership in a 47 Draft 7/6/00 16 �The Use of Tests When Making High Stakes Decisions for Students: A Resou,.ce 7/6/00 Draft Guide Fo,. Educato,.s alld Policymake,.s • • • Is the purpose of the testing program legitimate and reasonable?48 Have students received adequate notice of the test and its consequences?49 Have students actually been taught the knowledge and skills me~sured by the test?50 Federal courts have typically deferred to educators' judgments aboutthe beneficial educational purposes of a testing program, as long as these judgments are not arbitrary or capricious. 5I Improving the quality of education, ensuring that students can compete on a national and international level, and encouraging educational achievement through the establishment of academic standards have been found to be reasonable goals for testing programs. 52 Courts have generally required advance notice oftest requirements in order to give students a reasonable chance to understand the standards against which they will be evaluated and to learn the material for which they are to be accountable. A reasonable. transition period is required between the development of a new academic requirement and the attachment of high-stakes consequences to tests used to measure academic suspect class, the equal protection claim is reviewed under the rational basis standard. In these cases, the jurisdiction need show only that the use of the tests has a rational relationship to a valid state interest. See Debra P., 644 F.2d at 406; Erik V. v. Causby, 977 F. Supp. 384,389 (E.D. N.C. 1997). See Regents ofthe Univ. ofMich. v. Ewing, 474 U.S. 214, 222, 226-27 (1985); Debra P., 644 F.2d at 406; Anderson v. Banks, 520 F. Supp. 472, 506 (S.D. Ga. 1981). 48 See Brookhart v. fllinois State Bd. ofEduc., 697 F.2d 179, 185 (7th CiL 1983); Debra P., 644 F.2d at 404; Erik 977 F. Supp. at 389-90 (E.D. N.C. 1997); Anderson, 520 F. Supp. at 1410-12. 49 v., . 50 See Brookhart, 697 F.2d at 184-87; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 509. Insofar as due process cases may involve additional questions regarding the validity, reliability, and fairness of the test used to address the educational institution's stated purposes, these issues are discussed in the portions of the guide addressing discrimination under federal civil rights laws. 51 See Ewing, 474 U.S. at 226-27; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 506. 5~ See Ewing, 474 U.S. at 226-27; Debra P., 644 F.2d at 406; Anderson, 520 F. Supp. at 506. Draft 7/6/00 17 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators and Policymakers Draft achievement. That time period varies, however, depending upon the precise context in which the high-stakes decision is to be made. Relevant inquiries affecting determinations about the constitutionality of notice and timing have included questions about the alignment of curriculum and instruction with material tested, the number of test taking opportunities provided to students, tutorial or remedial opportunities provided to students, and whether factors in addition to test scores can affect high-stakes decisions. Ultimately, in due process cases, federal' courts have required, as a matter of "fundamental fairness," that students have a reasonable opportunity to learn the material covered by the test where passing the test is a condition of receipt of a high school diploma or a condition for grade-to-grade promotion. 53 For the test to meaningfully measure student achievement, the test, the curriculum, and classroom instruction should be aligned. 53 See Brookhart, 697 F.ld at 184-87; Debra P., 644 F.2d at 406; GI Forum, 2000 U.S. Dist. LEXIS 153, slip op. at 50 51; Anderson, 520 F. Supp. at 509. Draft 7/6/00 18 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers Draft CHAPTER 1. Test Measurement Principles This chapter explains basic test measurement standards and related educational principles for determining whether tests that are being used to make high-stakes educational decisions for students provide accurate and fair information. As explained in chapter two below, federal court decisions have been informed and guided by professional test measurement standards and principles. Professional test measurement standards, products of the test measurement community, can provide a basis for compliance with 54 federal nondiscrimination laws. This chapter is intended as a helpful discussion of how to understand test measurement concepts and their use. These are not specific legal requirements, but rather are foundations for understanding appropriate test use. Educational institutions use tests to accomplish specific purposes based on their educational goals, including making placement, promotion, graduation,· admissions, and other decisions. It is only after they have determined the underlying goal they want to accomplish that they can identify the types of information that will best inform their decision making. Information may include test results, as well as other relevant measures, that will be able to effectively, accurately, and fairly address the purposes and goals specified by the institutions. 55 As stated iIi the Joint Standards, "[ w ]hen interpreting and using scores about individuals or groups of students, considerations of relevant collateral information can enhance the validity of the interpretation, by providing corroborating evidence or evidence that helps explain student performance ....As the stakes of testing increase for individual students, the importance of considering additional evidence to document the validity of score interpretations and the fairness in testing increases accordingly. ,,56 In using tests to make high-stakes decisions, educational institutions should ensure that the test will provide accurate results that are valid, reliable, and fair for all test takers. This includes requesting adequate evidence oftest quality, evaluating the evidence, and ensuring that appropriate test use is based on adequate evidence provided by the developers or through supplemental validation studies. 57 When test results are used to make high-stakes decisions about student promotion or graduation, evidence should be 54 See, e:g., High Stakes, p. 59-60. Among other considerations, institutions will determine if they want test seore interpretations that are norm referenced or criterion-referenced, or both. Norm-referenced means that the performances of students are compared to the performances of other students in a specified reference population; criterion-referenced indicates the extent to which students have mastered specific knowledge and skills. 55 Joint Standards, p. 141. See also Standard 13.7, which states, "In educational settings, a decision or· characterization that will have a major impact on a student should not be made on the basis of a single test score. Other relevant information should be taken into account if it will enhance the overall validity of the decision." 56 In order to provide educational institutions with tests that are accurate and fair, test developers should develop tests in accordance with professionally recognized standards, and provide educational institutions with adequate evidence of test quality. 57 Draft 7/6/00 19 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators and Policymakers Draft available which documents that students have had an adequate opportunity to learn the . matenalb' teste d .58 emg I. Key Considerations in Test Use This section addresses the fundamental concepts oftest validity and reliability. It will also discuss issues associated with ensuring fairness in the meaning of test scores, and issues related to using appropriate cutscores in high-stakes tests. A. Validity Test validity refers to a determination of how well a test actually measures what it says it measures. The Joint Standards define validity as "[t]he degree to which accumulated evidence and theory support specific interpretations of test scores entailed by proposed uses of a test.,,59 The demonstration of validity is multifaceted and must always be determined within the context of the specific use of a test. In order to promote readability, the discussion on validity presented here is meant to reflect this complex topic in an accurate, but concise and user-friendly way. The Joint Standards identify and discuss in detail principles related to determining the validity oftest scores within the context o,f their use, and readers are encouraged to review the Joint Standards, Chapter 1, Validity, for additional, relevant discussion. 6o There are three central points to keep in mind: • The focus of validity is not really on the test itself, but on the validity of the inferences drawn from the test results for a given use. • All validity is really a form of "construct validity." • In validating the inferences of the test results, one must also consider the consequences of the test's interpretation and use. 58 Standards 13.5 and 7.5. Standard 13.5, supra note 19. Standard 7.5 states, "In testing applications involving individualized interpretations oftest scores other than selection, a test taker's score should not be accepted as a reflection of standing on the characteristic being assessed without consideration of alternate explanations for the test taker's performance on that test at that time." 59 Joint Standards, p. 9, 184. 60 Joint Standards, Chapter I, Validity, p. 9-24. Draft 7/6/00 20 �The Use of Tests When Making High Stakes Decisions for Students: A ResoUl'ce7/6/00 Guide For Educatol's alld Policymakers 1. Draft Validity of the Inferences of the Scores It is not the test that is validated per se, but the inferences or meaning derived from the test scores for a given use-that is, for a specific purpose, in a specific type of situ,ation, and with specific groups of students. The meaning of test scores will differ based on such factors as how the test is designed, the types of questions that are asked, and the documentation that supports how all groups of students are interpreting what the test is asking and how effectively their perfonnance can be generalized beyond the test. For instance, in one case, the educational institutioq may want to evaluate how well students can analyze complex issues and evaluate implications in history. For a given amount of test time, they would want to use a test that measures the ability of students to think deeply about a few selected history topics. The meaning of the scores should reflect this purpose and the limits ofthe range of topics being measured on the test. In another case, the institution may want to assess how well students know a range of facts about a wide variety of historical events. The institution would want to use a test that measures a broad range of knowledge about many different occurrences in history. The inferences of the scores should accurately reflect how well students know a broad range of historical facts. 2. Construct Validity Construct validity refers to the degree to which the scores of test takers accurately reflect the constructs a test is attempting to measure. The Joint Standards defines a construct as "the concept or the characteristic that a test is designed to measure.,,61 Test scores and their inferences are validated to measure one or more constructs described in a particular content domain. 62 In K-12 education, these domains are often explained in state or district content standards in various subject areas . .For instance, in mathematics, constructs of mathematical problem solving and the knowledge of number systems would be among the constructs described in a state'.s elementary mathematics content standards. These standards would define the mathematics domain in this situation. Items would be selected for the test that sample from this domain, and are properlyrepresentative of the constructs identified within it. The meaning of the test scores should accurately reflect the knowledge and skills defined in the mathematics content standards domain. Validity should be viewed as tJ:Ie overarching, integrative evaluation of the degree to which all accumulated evidence supports the intended interpretation of the test scores for 61 Page 173. The Joint Standards defines a content domain as "the set'ofbehaviors, knowledge, skills, abilities, attitudes or other characteristics to be measured by a test, represented in a detail.ed specification, and often organized into categories by which items are classified (p.174)." A domain, then, represents a definition of a content area for the purposes of a particular test. Other tests will likely have a different definition of what knowledge and skills a particular content area entails. 62 Draft 7/6/00 21 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Draft Guide For Educators alld Policymakers a proposed purpose. 63 This Ul1itary and comprehensive concept of validity is referred to as "construct validity." Different sources of-evidence may illuminate different aspects of v~lidity, but they do not represent distinct types ofvalidity.64 Therefore, '~construci validity" is not just one of the many types of validity-it is validity.' Demonstrating construct validity then means gathering a variety oftypes of evidence to support the intended interpretations and uses of test scores. All validity evidence and the interpretation of the evidence are focused on the basic question: Is the test measuring the concept, skill, or trait in question? Is it, for example, really measuring mathematical. reasoning or reading comprehension for the types of students that are being tested? A variety of types of evidence can be used to answer this question-none of which provides a simple yes or no answer. The exact nature of the types of evidence that needs to be accumulated is directly related to the intended use of the test, which includes information :egardin~ the ~kills and knowledge being.measured, the pU6Tose for which the mformatlOn wIll be used, and the population of test takers. For instance, an educational institution may want to use a test to help make promotion decisions. It may also want to use a test to place students in the appropriate sequence of courses. In each situation, the types of validity evidence an institution would expect to see would depend on how the test is being used. In making promotion decisions, the test should reflect content the student has learned. Appropriate validation would include adequate evidence that the test is measuring the constructs identified in the curriculum, and that the inferences of the scores accurately reflect the intended constructs for all test takers. Validation of the decision process involving the use of the test would include adequate evidence that low scores reflect lack of knowledge of students after they have been taught the material, rather than lack of exposure to the curriculum in the first place. In making placement decisions, on the other hand, the test may not need to measure content that the student has already learned. Rather, at least in part, the educational institution may want the test to measure aptitude for the future learning of knowledge or skills that have been identified as necessary to complete a course sequence. Appropriate validation would include documentation of the relationship between what constructs are being measured in the test, and what skills and knowledge are actually needed in the 63 Joint Standards, Chapter I, Validity, pp. 9-11,184.. Therefore, constructvalidity can be seen asan umbrella that encompasses what has previously been described as predictive validity, content validity, criterion validity, discriminant validity, etc. Rather, these terms refer to types or sources of evidence that can be accumulated to support the validity argument. Definitions of these terms can be found in Appendix B, Measurement Glossary. 64 65 Rather than follow the traditional nomenclature (e.g. predictive validity, content validity, criterion validity, discriminant validity, etc.), the Joint Standards define sources of val idity evidence as evidence based on test content, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and evidence based on consequences of testing. These are discussed in Chapter I oftlie Joint Standards, p. 11-17. . Draft 7/6/00 22 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers Draft future placements. Differential evidence would provide documentation that scores are not significantly confounded by other factors irrelevant to the knowledge and skills the test is intending to measure. lnstitutionsoften think about using the same test for two or more purposes. This is appropriate as long as the validity evidence properly supports the use for the test for each purpose, and properly supports that the inferences of the results accurately reflect what the test is measuring for all students taking the test. The empirical evidence related to the various aspects of construct validity is collected throughout test development, during test construction,and after the test is completed. It is,important for educators and policymakers to understand and expect that the accumulated evidence spans the range of test development and implementation. There is not just one set of documentation collected at one point in time. When the empirical database is large and includes results from a number of studies related to a given purpose, situation, and type of test takers, it may be appropriate to generalize validity findings beyond validity data gathered for one particular test use. That is, it may be appropriate to use evidence collected in one setting when determining the validity of the meaning of the test scores for a similar use. lfthe accumulated validity evidence for a particular purpose, situation, or sl:lbgroup is small, or features of the proposed use of the test differ markedly from an adequate amount of validity evidence already collected, evidence from this particular type of test use will generally need to be compiled. 66 Regardless of where the evidence is collected, educational institutions should expect adequate documentation of construct validity based on needs defined by the particular purposes and populations for which a test is being used. a. . Sources of Validity Error When considering the types of construct validity evidence to collect, the Joint Standards emphasize that it is important to guard against the two major sources of validity error. This error . can distort the intended meaning of scores for particular groups of students, situations, or purposes. 67 One potential source of error omits some important aspects of the intended construct being tested. This is called construct underrepresentation. 68 An example would be a test.that is being 66 As indicated in the Joint Standards, "The extent to which predictive or concurrent evidence of validity generalization can be found in new situations is in large measure a function of accumulated research. Although evidence of generalization can often help to support a claim of validity in a new situation, the extent ofavai Iable data limits the extent to which the claim can be sustained." Joint Standards, Chapter I, p, 15-16, , 67 Joint Standards, Chapter I, Validity, p. 10, Messick, S. (\989). Validity. In Educational Measurement. 3'" Edition, R.L. Linn, ed. New York: Macmillan, p, 13 103, 68 Draft 7/6/00 23 �The Use of Tests When Making High Stakes Decisions for Students: A Resollrce7/6/00 Guide For Educators and Policymakers Draft used to. measure English language proficiency. When the institutio.n has defined English language proficiency as including specific skills in listening, speaking, reading, and writing the English language, and wants to. use a test which measures these aspects, co.nstruct underrepresentatio.n wo.uld o.ccur if the test o.nly measured the reading skills. The o.ther po.tential so.urce o.f erro.r occurs when a test measures material that is extraneo.us to. the intended co.nstruct, co.nfo.unding the ability o.fthe test to. measure the co.nstruct that it intends to. measure. This so.urce o.f erro.r is called co.nstruct irrelevance. 69 Fo.r instance, ho.w well a student reads a mathematics test may influence the student's subtest sco.re in mathematics co.mputatio.n. In this case, the student's reading skills are irrelevant when the skill o.fmathematics co.mputatio.n is what is being measured by the subtest. 7o An essential part o.fthe accumulated validity info.rmatio.n is co.llecting evidence no.t o.nly abo.ut what a test measures in particular situatio.ns o.r fo.r particular students, but also. evidence that seeks to. do.cument that the intended meaning o.f the test sco.res is no.t unduly influenced by either o.fthe two. so.urces o.fvalidity erro.r. 3. Considering the Co.nsequences o.fTest Use Evidence abo.ut the intended and unintended co.nsequences o.ftest use can provide impo.rtant info.rmatio.n abo.ut the validity o.fthe inferences o.fthe test results, o.r it can raise co.ncerns abo.ut an inappro.priate use o.f a test where the inferences may be valid fo.r o.ther uses. Fo.r instance, significant differences in placement test sco.res based o.n race, gender, o.r natio.nal o.rigin may trigger a further inquiry abo.ut the test and ho.w it is being used to. make placement decisio.ns.71 The validity o.fthe test sco.res wo.uld be called into. questio.n if the test sco.res are substantially affected by irrelevant facto.rs that are no.t related to. the academic kno.wledge and skills thatthe test is suppo.sed to. measure.72 . Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist 50(9): p.741-749. 69 Messick, 1989; 1995. On the other hand, if an item is measuring the student's ability to apply mathematical skills in a written format (for instance when an item requires students to fill out an order form), then writing skills may not be extraneous to the construct being measured in this item. 70 71 See Code ofFair Testing Practices in Education, 1988. 72 Standards 7.5, 7.6 and 1.24. Standard 1.5, supra note 58. Standard 7.6 states, "When empirical studies of differential prediction of a criterion for members of different subgroups are conducted, they should include regression equations (or an appropriate equivalent) computed separately for each group or treatment under consideration or an analysis in which the group or treatment variables are entered as moderator variables." Standard 1.24 states, "When unintended consequences result from test use, an attempt should be made to investigate whether such consequences arise from the test's sensitivity to characteristics other than those it is intended to assess or to the test's failure fully to represent the intended construct." Draft 7/6/00 24 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakel's Draft . On the other hand, a test may Standard 13.1 accurately measure differences in the level of students' When educational testing programs are mandated by academic achievement. That is, school, district, state, or other authorities, the ways in low scores may accurately which test results are intended to be used should be reflect that some students do not clearly described. It is the responsibility of those who mandate the use of tests to monitor their impact and to know the content. However, test identify and minimize potential negative consequences. users should ensure t~at they Consequences resulting from the uses of the test, both interpret those scores correctly intended and unintended, should also be 'examined by the in the context of their high test user. stakes decisions. 73 For instance, test users could incorrectly conclude that the scores reflect lack of ability to master the content for some students when, in fact, the low test scores reflect the limited educational opportunities that the students have received. In this case, it would be problematic to use the test s'cores to place low perfonning students in a special services program for students who have trouble learning and processing academic content. It would be appropriate to use·the test to evaluate program effectiveness, . however. 74 B. Reliability Reliability refers to the consistency of test results. While no test is ever an "error-free" measure of student performance,75 inferences of~dequate test reliability refer to estimates which demonstrate that the inconsistency of the scores are minimized over test administrations, forms, items, scorers, and/or other facets oftesting. 76 An example of reliability of test results on different occasions is when the same students, takin'g the test multiple times, receive ;>imilar scores. Consistency over parallel forms of a test occurs 73 Standards 7.5 and 7.10. Standard 7.5, supra note 58. Standard 7.10, supra note 25. 74 High Stakes, p. 89-113. 75 All sources of assessment information, including test results, include some degree of error. There are two types of error. The first is random error that affects scores in such a way that sometimes students will score lower and sometimes higher than their "true" score (the actual mastery of the students' knowledge and skills). This type of error, also known as measurement error, particularly affects reliability of scores. Therefore, test scores are considered reliable when evidence demonstrates that there is a min imum amount of random measurement error in the test scores for a given group. The second type of error that affects test results is systematic error. Systematic error consistently affects scores in one direction; that is, this type of error causes some students to consistently score lower or consistently score higher than their "true" (or actual) level of mastery. For instance, visually impaired students will consistently score lower than they should on a test which has not been administered for them in Braille or large print, because their difficulty in reading the items on the page will negatively impact their score. This type of error generally affects the validity of the interpretation of the test results and is discussed in the validity section above. Systematic error should also be minimized in a test for all test takers. When educators and policy makers are evaluating the adequacy of a test for their local population of students,. it is important to consider evidence concerning both types of error. Evaluating the reliability of a test includes identifying the rrajor sources of measurement error, the size of the errors resulting from these sources, the indication of the degree of reliability to be expected, arid the generalizability of results across items, forms, raters, sampling, administrations, and other measurement facets. : 76 Draft 7/6/00 25 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers Draft when forms are developed to be 'equivalent in content and technical characteristics. Reliability can also include estimates of a high degree of relationship across similar items within a single test or subtest that are intended to measure the same knowledge or skill. For judgmentally scored tests, such as essays, another widely used index of reliability addresses consistency across raters or scorers. In each case, reliability can be estimated in different ways, using one of several statistical procedures. 77 Different kinds of reliability estimates vary in degree and nature of generalization. In order to promote readability, the discussion on reliability presented here is meant to reflect this complex topic in an accurate, but concise and user-friendly way. Readers are encouraged to review Chapter 2, Reliability and Errors of Measurement, in the Joint Standards for additional, relevant information. 78 L 77 These types of reliability estimates are known as test-retest, alternate forms, internal consistency, and inter-rater estimates, respectively. See Joillt Stalldards, Chapter 2, Reliability, for some examples of different procedures. 78 Joillt Stalldards, Chapter 2, Reliability and Errors of Measurement, p. 25-36, Draft 7/6/00 26 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers C. Draft Fairness Tests are fair when they yield score interpretations that are valid and reliable for all students who take the tests. That is, the academic tests must measure the same academic. constructs (knowledge and skills) for all students who take them, regardless of race, national origin, gender, or disability. Similarly, the scores must not substantially and systematically underestimate or overestimate the knowledge or skills of members of a particular group. The Joint Standards discuss fairness in testing in terms oflack of bias, equitable treatment in the testing process, equal scores for students who have equal standing on the tested construct, and equity in opportunity to learn the material being tested. 79 In order to promote readaoility, the discussion on fairness presented here is meant to reflect this complex topic in an accurate, but concise and user-friendly way. Readers are encouraged to review Chapter 7, Fairness in Testing and Test Use, in the Joint Standards for additional, relevant information. 8o l. Fairness in Validity Demonstrating fairness in the validation oftest score inferences focuses primarily on making sure that the scores reflect the same intended knowledge and skills for all students taking the test. For the most part this means that the test should minimize the measurement of material that is extraneous to the intended constructs and which confounds the ability ofthe test to accurately measure the constructs that it intends to measure. Rather, a test score should accurately reflect how well each student has mastered the intended constructs. The score should not be significantly impacted by construct irrelevant influences. { Joint Standards, Chapter 7, Fairness in Testing and Test Usc, p. 74-80. In test measurement, the term fairness has a specific set of technical interpretations. Four of these interpretations are discussed in the Joint Standards. For instance, bias is discussed in relation to fairness and is defined in the Joint Standards in two ways: "In a statistical context, (bias refers to) a systematic error in a test score. In discussing test fairness, bias (also) may refer to construct underrepresentation or construct-irrelevant components of test scores that differentially affect the performance of different groups of test takers (p. 172)." Fairness as equitable treatment in the testing process "requires consideration not only of the test itself, but also the context and purpose of testing, and the manner for which test scores are used (p. 74).'~ Equal scores for students of equal standing reflects that "examinees of equal standing with respect to the construct the test is intended to mcasure should on average earn the same test score, irrespective of group membership (p. 74)." For educational achievement tests, "When some test takers have not had the opportunity to learn the subject matter covered by the test content, they are likely to get low scores ... Iow scores may have resulted in part from not having had the opportunity to learn the matieral tested as well as from having had the opportunity and failed to learn (p. 76)." 79 80 Joint Standards, Chapter 7, Fairness in Testing and Test Usc, p. 73-84. Draft 7/6/00 27 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Draft Guide For Educators alld Policymakers The Joint Standards identify a number of standards that outline important elements related to validly measuring the intended constructs for all students. 81 The elements span considerations oftest development, test implementation, and the proper use of reported test results. Documenting fairness during test development involves gathering adequate evidence that items and test scores are constructed so that the inferences validly reflect what is intended. For all test takers, evidence should support that valid inferences can be drawn from the scores. 82 When credible research reports that item and test results differ in meaning across examinee subgroups, thEm to the extent feasible, separate validity evidence for each relevant subgroup should be collected. 83 When items function differently across relevant subgroups, appropriate studies should be conducted, when feasible, so that bias in items due to test design, content, and format is detected and eliminated. 84 Developers should strive to identify and eliminate language, form, and content in tests that have a different meaning in one subgroup than in others, or that ' generally have sensitive connotations, except when judged to be necessary for adequate representation pfthe intended constructs. 85 Adequate differential ;malyses should be conducted when evaluating the validity of scores for prediction purposes. 86 81 Joint Standards, Chapter 7, Fairness in Testing and Test Use, p. 80-84. 82 Standard 7.2 states, "When credible res~arch reports differences in the effects of construct-irrelevant variance across subgroups oftest takers on performance of sOple part of the test, the test should be used if at all only for those subgroups for which evidence indicates that valid inferences can be drawn from test scores." Standard 7.1 and 7.3. Standard 7, I states, "When credible research reports that test scores differ in meaning across cxamince subgroups for the type of test in question, then to the extent feasible, the same forms of validity evidence collectcd for the examinee population as a whole should also be collected for each relevant subgroup, Subgroups may be found to differ with respect to appropriateness of test content, internal structure of test responses, the relation of test scores to other variables, or the response proeesses employed by individual examinees. Any such findings should receive due consideration in the interpretation and use of scores as well as in subsequent test revisions." 83 Standard 7.3 states, "When credible research reports that differential item functioning exists across age, gender, racial/ethnic, cultural, disability and/or linguistic groups in the population oftest takers in the content domain measured by the test, test developers should conduct appropriate studies when feasible. Such research should seek to detect and eliminate aspects of test design, content, and format that might bias test scores for particular groups." 24 See Standard 7.3,supra note 83. 85 Standard 7.3 and Standard 7.4. Standard 7.3, supra note 83, Standard 7.4 states, "Test developers should strive to identify and eliminate language, symbols, words, phrases, and content that are generally regarded as offensive by members of racial, ethnic, gender, or othcr groups, except whcn judged to be necessary for adequate representation of the domain." Comment: "Two issues are involved, The first deals with the inadvertent use of language that, unknown to the test developer, has a different meaning or connotation in one subgroup than in others. Tcst.publishers often conduct sensitivity reviews of all test matcrial to detcct and remove sensitive material from the test. The second deals with settings in which sensitive material is essential for validity. For example, history tests may appropriately include material on slavery or Nazis. Tests on subjects from life sciences may appropriately includc material on evolution. A test of understanding of an organization's sexual harassmen~ policy may require employees to evaluate examples of potentially offensive bchavior." 86 Standard 7.6, supra note 72. . Draft 7/6/00 28 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft Adequate evidence should document the fair implementation of tests for all test takers. The testing process should reflect equitable treatment for all examinees. 87 Linguistic or reading demands in tests should be ~ept to a minimum except when these constructs are · . . . b emg measuredO Documentation of appropriate reporting and test use should be available. Reported data should be clear and accurate, especially when there are high-stakes consequences for students. 89 When tests are used in decisions that have high-stakes consequences for students, evidence of mean score differences between relevant subgroups should be examined, where feasible. When mean differences are found between subgroups, investigations should be undertaken to determine that such differences are not attributable to construct underrepresentation or construct irrelevant error. 90 Evidence about differences in mean scores and the significance of the validity errors should also be considered when deciding which test to use. 91 In using test results for purposes other than selection, a test taker's score should not be accepted as a reflection of standing on the intended constructs without consideration of alternative explanations for the test taker's performance. 92 Explanations might reflect limitations of the test, for instance construct irrelevant factors may have significantly impacted the student's score. Explanations may also reflect schooling factors external to the test, for instance lack of instructional opportunities . . The issue of feasibility is discussed in a few' of the standards summarized above. In the comments associated with these standards, feasibility is generally addressed in terms of adequate sample size, with continued operational use of a test as a way of accumulating adequate numbers of subgroup results over administrations. When credible research reports that results differ in meaning across subgroups, collecting separate and parallel validity data verifies that the same knowledge and skills are being measured for all test Standard 7.12 states, "The testing or assessment process should be carried out so that test takers receive cqmparable and equitable treatment during all phases of the testing or assessment process." 87 Standard 7.7 states, "In testing applications where the levcl of linguistic or reading ability is not part of the construct of interest, the linguistic or reading demands of the test should be kept to the minimum necessary for the valid assessment of the intended construct." 88 Standards 7.8, 7.9, 7.10, 1.24. Standard 7.8 states, "When scores are disaggregated and publicly reported for groups idcntified by characteristics such as gender, ethnicity, age, language proficiency, or disability, cautionary statements should be included whenever credible research reports that test scores may not have comparable meaning across these different groups." 89 Standard 7.9 states, "When tests or assessments are proposed for use as instruments of social, educational, or public policy, the test developers or users proposing the test should fully and accurately inform policymakers ofihe characteristics of the tests as well as any relevant and credible information that may be available concerning the likely consequences oftest use." Standard 7.10, supra note 25. Standard 1.24, supra note 72.' 90 Standard 7.10, supra note 25. 91 Standard 7.11 states, "When a construct can be measured in different ways.that are approximately equal in their . degree of construct representation and freedom from construct-irrelevant variance, evidence of mean score differences across relevant subgroups of examinees should be eonsidered in deciding which test to usc." 92 Standard 7.5, supra not~ 58. Draft 7/6/00, 29 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft takers. Particularly in high-stakes situations, feasibility decisions need to include the potential costs to students of using information where the validity of the scores has not been verified. 93 2. Fairness in Reliability Fairness in reliability focuses on making sure that scores are stable and consistently accurateJor all students. Two standards discuss issues of fairness in reliability. First, when there are reasons for expecting that test reliability analyses might differ substantially for different subpopulations, reliability data should be presented as soon as feasible for each major population for whom the test is recommended. 94 Second,"[w]hen significant variations are permitted in test administration procedures, separate reliability analyses should be provided for scores produced under each major variation if adequate sample sizes are available.,,95 Often, continued operational use of a test is a way to accumulate an adequate sample size over administrations. D. Cutscores The same principles regarding fairness, validity, and reliability apply generally to the establishment and use of cutscores for the purpose of making high-stakes educational decisions. Cutscores, also known as cut points or cutoff scores, are specific points on the test or scale where test results are used to divide levels of knowledge, skill, or ability. A cutscore may divide the demonstration of acceptable and unacceptable skills, as in placement in gifted and talented programs where students are accepted or rejected. There may be multiple cutscores that identify qualitatively distinct levels of performance. Cutscores are used in a variety of contexts, including decisions for placement purposes or for other specific outcomes, such as graduation, promotion, or admissions. 96 See comment associated with Standard 10.7: "In addition to modifying tests and test administration procedures for people who have disabilities, evidence of validity for inferences drawn from these tests is needed. Validatioll is tire 93 only way to amass knowledge about the usefulness ofmod(fied tests for people with disabilities. The costs ofobtaining validity evidence should be considered in light ofthe consequences ofllot havillg usable illformatioll regarding the meanings ofscores for people with disabilities. This standard is feasible in the limited circumstances where a sufficient number of individuals with the same level or degree of a given disability is available (italics addedf' 94 Standard 2.11 states, "If there are generally accepted theoretical or empirical reasons for expecting that reliability coefficients, standard errors of measurement, or test information functions will differ substantially for various subpopulations, publishers should provide reliability data as soon as feasible for each major population for which the test is recommended."/ 95 Standard 2.18. 96 In order to promote readability; the discussion on cutscores presented here is meant to reflect this complex topic in an accurate, but concise and user-friendly way. Readers are encouraged to review Chapter 4, Scales, Norms, and Score Comparability, p. 53-54, in the Joint Standards for additional, relevant information about eutscores. See also Standards 1.19,13.9. Standard '1.19 states, "If a test is recommended for use in assigning persons to alternative treatments or is likely to be so used, and if outcomes from those treatments can reasonably be compared on a common criterion, then, whenever feasible, supporting evidence of differential outcomes should be provided." Standard 13.9, supra note 15. Draft 7/6/00 30 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 . Draft Guide For Educators alld Policymakers Many ofthe concepts regarding test validity apply to cutscores-that is, the cut points themselves must be accurate r--------.----------. representations of the knowledge and skills of students. 97 Further, the validity evidence for cutscores should generally be able to demonstrate that students above the c~t point represent or demonstrate a qualitatively greater degree or different type of skills and knowledge than those below the cut point, whenever these types of inferences are made. 98 Reliability of the cutscores is also important. The Joint Standards state that where cutscores are specified for selection or placement, the degree of measurement error around each cutscore should be reported. 99 Evidence should also indicate the misclassification rates, or percentage of error in classifying students, that is likely to occur among students with comparable knowledge and skills. 100 This information should be available by group as soon as feasible ifthere is a prior probability that the miscJassification rates may differ substan~ially by group. 101 For example, what percentage of students who should be allowed to graduate would not be allowed to do so because of error due to the test ratherthan differences in their actual knowledge and skills?102 There is no single right answer to the questions of when, where and how cutscores should be set on a test with high-stakes consequences for students. 103 Many experts suggest, Joint Standards, Chapter' I, Validity, p. 9-16, discusses that the interpretation of all scores should be an accurate representation of what is being measured. 97 98 See Standard 4.20's comment section for a discussion on these points. In high-stakes situations, it is important to examine the validity of the inferences that underlie the specific decisions being made on the basis of the cutscores. In other words, what must be validated is the specific use of the test based on how the scores of students above and below the cutscore are being interpreted. What is also at issue is how scores clustered around the cut-off point are interpreted in light of the high-stakes decision. '. Standard 2.14 states, "Conditional standard errors of measurement should be reported ~t several score levels if constancy cannot be assumed. Where cut scores are specified for selection or classification, the standard errors of measurement should be reported in the vicinity' of each cut score." . 99 100 "Where the purpose of measurement is classificatio~, some measurement errors are more serious than others. An individual who is far above or far below the value established for pass/fail or for eligibility for a special program can be mismeasured without serious consequences. Mismeasurment of examinees whose true scores are close to the cut score is a more serious concern ....The term classification consistency or inter-ratiir agreement, rather than reliability, would be used in discussions of consistency of classification. Adoption of such usage would make it clear that the importance of an error of any given size depends on the proximity of the examinee's score to the cut score." Joint Standards, p. 30. 101 Standard 2.1 J, supra note 94. 102 Misclassification of students above or below the cutpoints can result in both false positive and false negative, classifications, respectively. The example in the text is a false ncgative classification. 103 High Stakes, Chapter 7, p. 168. Draft 7/6/00 31 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Draft Guide For Educators alld Policymakers however, that mUltiple methods of determining cutscores should be used when determining a final cutscore. I04 Further, the reasonableness of the standard setting process and the consequences for students should be clearly and specifically documented . for a given use. lOS Both the Joint Standards and High Stakes repeatedly state that . decisions should not be made solely or automatically on the basis of a single test score, and that other relevant information should be taken into account if it will enhance the oyerall validity of the decision. 106 104 High Stakes, Chapter 7, p.169. 105 See Standards 4.19 and 4.21 and their eomments. See also High Stakes, Chapters 5,6,7. Standard 4.19 states, "When proposed seore interpretations involve one or more cut scores, the rationale and proeedures used for establishing cut scores should be ~Iearly documented." Standard 4.21 states, "When cut scores defining pass-fail or proficiency categories are based on direct judgments about the adequacy of item or test performances or performance levels, the judgmental process should be designed so that judges can bring their knowledge and experience to bear in a reasonable way." 106 See High Stakes, Chapters 5, 6, 7; Joint Stalld~rds, Standard 13.7. Standard 13.7, supra note 56. Draft 7/6/00 32 ' �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators and Policymakers Draft Test Measurement Principles: Questions about Appropriate Test Use In order to determine if a test is being used appropriately in making high-stakes decisions about students, considerations about the context of the test use, and the validity, reliability, and fairness of the scores and their interpretations need to be addressed. In all cases, it is important that the evidence related to the technical merits of the test be based on the current test being proposed. 1. What is the purpose for which the test is being used? 2. What information, besides the test, is being collected to inform this purpose? 3. Based on how the test results are to be used, is there adequate evidence of validity to document that the test score inferences are accurate and meaningful for the students taking the test? That is, • Does the evidence support that the inferences accurately reflect the specific knowledge and skills the test says it measures? • Does the evidence support that the inferences are valid for the stated purpose, and in the particular type of setting where the test is to be administered? • Does the evidence support that the inferences are valid for the specific groups of students who are taking the test? 4. Is there adequate evidence of reliability of the test scores for the proposed use? 5. Is there adequate evidence of fairnes~ in validity and reliability to document that the test score inferences are accurate and meaningful for all students taking the test? That is, • Does the evidence support that the inferences are measuring the same constructs for all students? • Does the evidence support that the sCores do not systematically underestimate or overestimate the know ledge or skills of members of a particular group? • Does the evidence demonstrate validity and reliability of the score inferences for each relevant subgroup when a prior probability exists that, across examinee subgroups, test scores may differ in meaning or that the reliability of the scores may vary substantially? 6. Is there adequate evidence that cutscores have been properly established and that they will be used in ways that will provide accurate and meaningful information for all test takers? Draft 7/6/00 33 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers II. Draft Accuracy in Te$ting Limited English Proficient Students and Students with Disabilities All aspects of validity, reliability, fairness, and cutscores discussed above are applicable to the measurement of knowledge and skills of all students, including limited English proficient students I 07 and students with disabilities. This section addresses additional issues related to accurately measuring the knowledge and skills of these two student populations. Ensuring that test score inferences accurately re:t:Ject the intended constructs for all 'students is a complex task. It involves several aspects of test construction, pilot testing, implementation, analysis, and reporting. The appropriate inclusion of students from these populations in validation and norming samples, and the meaningful inclusion of limited English proficient experts and disability experts throughout the test development process, helps ensure suitable test quality and use for all test takers. The proper inclusion of all students in testing programs helps to ensure that high-stakes decisions are made on the basis of tests results that are as comparable as possible across all test takers, rather than on the basis of results from assessments that are developed to measure different content domains. 108 The appropriate inclusion of all students can also help to ensure that educational benefits attributable to the high-stakes decisions will be available to all. In some cases, it is appropriate to test limited English proficient students and students with disabilities under standardized conditions, as long as the evidence supports the validity of the scores in a given situation for these students. In other cases, the conditions may have to be accommodated to assure that the scores validly reflect the students' mastery of the intended constructs. 109 The use of multiple measures generally enhances the accuracy of the educational decisions, and these measures can be used to confirm the validity of the test results. A. General Considerations about Accommodations· Making similar inferences about academic test scores for all test takers, and·making appropriate decisions when using these scores, requires measuring the same academic constructs (knowledge and skills in specific subject areas) across groups and contexts. In measuring the knowledge and skills of limited English proficient students and students with disabilities, it is particularly important that the tests actually measure the intended knowledge and skills and not other factors which are extraneous to the intended 107 These are students who are learning English as a second language. Other documents sometimes refer to these students as English language learners. 108 f-ligh Stakes, p. 7, 80. 109 See Joint Standards, Chapter 7, Fairness in Testing and Test Use; Chapter 9, Testing Individuals of Differing Linguistic Backgrounds; Chapter 10, Testing Individuals with Disabilities. Draft 7/6/00 34 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7l6/00 Guide For Educators mId Policymakers Draft . construct. I 10 For instance, impaired visual capacity ~ay influence a student's test score in science when the student must sight read a typical paper and pencil science test. In measuring science skills, the student's sight is likely not relevant to her knowledge of science. Similarly, how well a limited English proficient student reads English may influence the student's test score in mathematics when the student must read the test. In this case, the student's reading skills are not relevant when the skills of mathematics computation are to be measured. Typically, accommodations to established conditions are found in Standard 10.1 three main phases of testing: 1) the administration of tests, 2) how In testing individuals with disabilities, test students are allowed to respond to developers, test administrators, and test users should the items, and 3) the presentation of take steps to ensure that the test score inferences the tests (how the items are accurately reflect the intended construct rather than presented to the students on the test any disabilities and their associated characteristics extraneous to the intent of the measurement. instrument). Administration accommodations involve setting and timing, and can include extended time to counteract the increased literacy demands or fatigue for a student with leaming or physical disabilities. Response accommodations allow students to demonstrate what they know in different ways. Presentation accommodations can include format variations such as fewer items per page, and plain language editing procedures, which use short sentences, common words, and active voice. There is a wide variation in which accommodations are used across states and school districts. (Appendix C lists many ofthe accommodations used in large scale testing for limited English proficient students and students with disabilities.) Issues regarding the use of accommodations are complex. When the possible use of an . accommodation for a student is being considered, two questions should be examined: 1) What is being measured if conditions are accommodated? 2) What is being measured if the conditions remain the same? The decision to use an accommodation or not should be grounded in the ultimate goal of collecting test information that accurately and fairly represents the knowledge and skills of the student on the intended constructs. The overarching concern should be that test score inferences accurately reflect the intended constructs rather than factors extraneous to the intent of the measurement. III 110 This is known as construct irrelevance. See ,po 25 above; Joint Standards, p. 173-174. III Standards 9.1, 10.1, Messick, 1989. Standard 9.1 states, "Testing practice should be designed to reduce threats to the reliability and validity of test score inferences that may arise from language differen~es." Sta.ndard 10.1 states, "In testing individuals with disabilities, test developers, test administrators, and test users should take steps to ensure that the test score inferences accurately reflect the intended construct rather than any disabilities . and their associated charactetistics extraneous to the intent of the measurement." Messick (1989), supra note 68. Draft 7/6/00 35 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers B. Draft Limited English Proficient Students The Joint Standards and several recent measurement pUblications discuss the population of limited English proficient students and how test publishers and users have handled inclusion in tests to date. 112 This section briefly outlines principles derived from the Joint Standards and these publications. It addresses two types of testing situations especially relevant for limited English proficient students: the assessment of English language proficiency and the assessment of academic educational achievement. Interpretation of the scores of limited English proficient students should accurately and fairly reflect the academic knowledge, skills, or abilities that the test intends to measure, minimizing the effect of factors irrelevant to the intended constructs. I 13 When credible Standard 9.1 research evidence reports that scores may differ in meaning across subgroups of Testing practice should be designed to reduce threats to the reliability and linguistically diverse test takers,then, to the validity of test score inferences that extent feasible, the same form of validity may arise from language differences. evidence should be collected for each subgroup as for the examinee population as a whole. 114 "When a test is recommended for use with linguistically diverse test takers, test developers and publishers should provide the information necessary for appropriate test use and interpretation;" I 15 recommended accommodations should be used appropriately and described in detail in the test manual; 116 translation methods and interpreter expertise . should be clearly described; 117 and evidence of the reliability and validity of the 112 For instancc, Joint Standards, Chapter 9; High Stakes, Chapter 9; Improving Schooling/or Language Minority Children: A Research Agenda (National Research Council, August and Hakuta, 1997); Ensuring Accuracy in Testing lor English Language Learners (Kopriva, 2000, Washington D.C. Council of Chief State School Officers). 113 See Standard 9.1, supra note I II. 114 Standard 9.2 states, "When credible research evidence reports that test scorcs differ in meaning across subgroups of linguistically diverse test takers, then to the extent feasible, test developers should collect for each linguistic subgroup studied the same form of validity evidence collected for the examinee population as a whole." liS Standard 9.6 Standard 9.5 states, "When thcre is credible evidence of score comparability across regular and modified tests or administrations; no flag should be attached to a score. When such evidence is lacking, specific information about the nature of the modification should be provided, if permitted by law, to assist test users properly to interpret and act on test scores." 116 Standard 9.4 states, "Linguistic modifications recommended by test publishers, as well as the rationale for the modifications, should be described in detail in the test manual." 117 Standards 9.7, 9.11. Standard 9.7 states, '.'When a test is translated from one language to another, the methods used in establishing the adequacy of the translation should be described, and empirical and logical evidence should be provided for score reliability and the validity of the translated test's score inferences for the uses intended in the linguistic groups to be tested." Standard 9.11 states, "When an interpretation is used in testing, the interpreter should be fluent in both the language of the test and the examinee's native language, should have expertise in translating, and should have a basic understanding of the assessment process." Draft 7/6/00 36 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft translated test score's inferences should be collected and made available in order to support sound test use by educators and 'policy ma~ers.118 1. Assessing English Language. Proficiency Issues of validity, reliability, and fairness Standard 9.10 apply to tests and other relevant' assessments that measure English language Inferences about test takers' general proficiency. English language proficiency language proficiency should be based on is typically defined as proficiency in tests that measure a range of language reading, writing, speakin~, and features, and not on a single linguistic understanding English. II ( Assessments skill. that measure English language proficiency are generally used to make decisions about who should receive English language acquisition services, the type of programs in which these students are placed, and the progress of students in the appropriate programs. They are also used to evaluate the English proficiency of students when exiting from services, to ensure that they can successfully participate in the regular school curriculum. In making decisions about which tests are appropriate, it is particularly important to make sure that the tests accurately and completely reflect the intended English language proficiency constructs so that the students are not misclassified, It is generally accepted that an evalua:tion of a range of communicative abilities will typically . need to be assessed when placement decisions are being made. 120 118 Standard 9.7, supra note 117. 119 Improving ScllOolingjor Lallguage Minority Children, p. 116-118. 120 Comment under Standard 9.1 0, p. 99-100. Standard 9, 10 states, "Inferences about test takers' general language . proficiency should be based on tests that measure a range of language features, and not on a single linguistic skill." Draft 7/6/00 37 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers 2. Draft Testing the Academic Educational Achievement Of Limited English Proficient Students S~veral factors typically affect how well the educational achievement of limited English proficient students is measured on standardized academic tests. For all test takers, any test that employs written or oral skills in English or in another language is, in part, a measure of those skills in the particular language. Test use with individuals who have not sufficiently acquired the literacy or linguistic skills in the language of the test may introduce construct-irrelevant components to the testing process. In such instances, test results may not reflect accurately the qualities ·and competencies intended to be measure4.121 While it is very important that the test score inferences. are valid, reliable, and fair, the technical issues associated with developing meaningful achievement tests for this population are complex and difficult to accomplish. Tests must be developed so that they effectively measure the students' knowledge and skills in intended academic achievement constructs rather than factors irrelevant to those constructs, i.e. literacy skills when literacy is not what is being measured. This is particularly important when tests are used to make high stakes decisions for individual' students. Reducing the influence of construct irrelevant factors includes minimizing the confounding conditions in the test or the testing process so that the students can access the test requirements. 122 It also includes providing native language tests where possible, when this approach would yield more accurate results for limited English proficient students. 123 In collecting evidence to support the technical quality of a test for these students, the accumulation of data may need to occur over several test administrations to ensure robust sample sizes. a. Background Factors for Limited English Proficient Students The background factors particularly salient in ensuring accuracy in testing for students with limited English proficiency tend to relate to literacy, culture, and schooling. 124 Limited English proficient students often bring varying levels of English and home language literacy skillS to the testing situation. 125 These students may be adept in conversing orally in their home language, but unless they have had formal schooling in their home language, they may not have a corresponding level of literacy. Also, while students with limited English proficiency may acquire a degree of oral proficiency in English, literacy in English for many students comes later. 126 To add to the cO):11plexity, 121 See Joint Standards, p. 91. 122 See Standard 9.1, supra note 111. 123 Standards 9.3 states "When testing an examinee proficient in two or more languages for which the test is available, the examinee's relative language proficiency should be determined. The test generally should be administered in the test taker's most proficient language, unless proficiency in the less proficient language is part of the assessment. 124/mproving Schooling/or Language Minority Children, Chapter 5; Ensuring Accuracy in Testing/or English Language Learners, Chapter 1. 125 See Joint Standards, Chapter 9, p. 91-100; Ensuring Accuracy in Testing/or English Language Learners, Chapter 1. 126 Testing, Teaching and Learning, p. 61. Draft 7/6/00 38 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakel's Draft oral and literacy proficiency in either the home language or English involves both social and academic components. Thus, a student may be able to write a well-organized social letter in his or her home language, and may not be able to orally explain adequately in that language how to solve a mathematics problem that includes the knowledge of concepts and words endemic to the field of mathematics. The same phenomena may occur in English as well. 127 Therefore, in determining how to effectively measure the academic knowledge and skills of this population, educators and policymakers should consider how to minimize the influence of literacy issues, except when these constructs are explicitly being measured. Considering the level of linguistic and literacy proficiencies of limited English proficient students in their home language and iri English will often affect which achievement tests are appropriate for these students, and which accommodations to standardized testing conditions, if any, might be most useful for which students. 128 Additionally, diverse cultural and other background experiences, including variations in amount, type and location (home country and U.S.) of formal schooling, as well as interrupted and multi-location schooling (of the type frequently experienced by children of migrant workers), affect language literacy, the contextual content of items, and the academic foundational knowledge base that can be assumed in educational achievement tests. The format and procedures involved in testing can also affect accuracy in test scores, particularly if the test practices differ substantially from ongoing instructional . . practIces III cI assrooms. 129 127 Improving Schoolingfor Language Minority Children, Chapte'r 5, p. 113-137. 128 Id. at Chapter 5. 129 Ensuring Accuracy ill Testingfor English Language Leamers, Chapters 3,4, 7, and 9. Draft 7/6/00 39 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers b. Draft Accommodations for Limited English Proficient Students Providing accommodations to established testing conditions for some students with" limited English proficiency may be appropriate when their use would yield the most valid scores on the intended academic achievement constructs. Deciding which accommodations to use for which students usually involves an understanding of which construct irrelevant background factors would substantially influence the measurement of intended knowledge and skills for individual students, and how the accommodations would impact the validity of the test score interpretations for these students. 130 Appendix C lists various test presentation, administration, and response accommodations that states and districts generally employ when testing limited'English proficient students. Examples of accommodations in the presentation of the test include editing text so the items are in plain language, or providing page formats which minimize confusion by limiting use of columns and the number of items per page. Presenting the test in the student's native language is an accommodation to a test written in English when the same constructs are being measured on both the English and native language versions. Administration accommodations include extending the length of the testing period, permitting breaks, administering tests in small groups or in separate rooms, and allowing English or native language glossaries or dictionaries as appropriate. Response accommodations include oral response and permitting students to respond in their native language. C. Students with Disabilities The Joint Standards and several recent measurement publications discuss the popUlation of students with disabilities and how test publishers and users have handled inclusion in tests to date. 131 This section briefly outlines principles derived from the Joint Standards and these publications. It addresses three types of testing situations especially relevant for students with disabilities: tests used for diagnostic and intervention purposes, the assessment of academic educational achievement, and alternate assessments for K -12 students with disabilities who cannot participate in school-wide tests. The Joint Standards provide that interpretation of the scores of students with disabilities should accurately and fairly reflect the academic knowledge, skills, or abilities that the test intends to measure. The interpretation should not be confounded by the challenges of the students that are extraneous to the intent ofthemeasurement. 132 Rather, validity 130 See Ensuring Accuracy in Testingfor English Language Learners, Chapters 6 and 8, for a discussion of which accommodations might be most beneficial for students with various background factor~, 131 For instance, Joint Standards, Chapter 10; High Stakes, Chapter 8; Educating One and At: Students with Disabilities and Standards-Based Reforln (National Research Council, McDonnell, McLaughlin, and Morison, 1997); Testing Students with Disabilities (Thurlow, Elliot, and Ysseldyke, 1998, NY: Corwin Press), ° 132 Standards, 10.1, 10.10, See Standard 10, I , supra note III, Standard 10, I states, "Any test modifications adopted should be appropriate for the individual test taker, while maintaining all feasible standardized features. A test Draft 7/6/00 40 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Et/ucators ami Policymakers Draft evidence should document that the inferences of the scores of students with disabilities are accurate. Pilot testing and other technical investigations should be conducted where feasible to ensure the validity of the test inferences when accommodations have been allowed. 133 Feasibility is always a consideration; al~hough the Joint Standards comment, "[T]he costs of obtaining validity evidence should be considered in light of the consequences of not having usable information regarding the meanings of scores for ' . peop Ie WIt h d'Isab'l"It1es,,134 " . , ' I 1. Tests used for Diagnostic and Intervention Purposes All issues of validity, reliability, and fairness apply to tests and other assessments used to make diagnostic and intervention decisions for students with disabilities. Tests that yield diagnostic information typically focus in great detail on identifying the specific Standard 10.12 In testing individuals with disabilities for diagnostic and intervention purposes, the test should not be used as the sole indicator of the test taker's functioning. Instead, Illultiple sources of information should be used. professional needs ,to consider reasonably available informati0t!:====:==:==.==:;::::;::=:;==::=========J capabilities that might impact tcst performance, and document the grounds for the modification," IJJ Several standards discuss the appropriate types of validity evidence, including Standards 10.3, 10.5, 10.6, 10.7, 10.8, and 10.11. Because ofthe low incidence nature of several of the disability groups, especially when different severity levels and combinations'ofimpairments are considered, this type of evidence will probably need to be accumulated over time in order to have a large enough sample size. Standard 10.3 states, "Where feasible, tests that have been modified for use with individuals with disabilities should be pilot tested on individuals who have similar disabilities to investigate the appropriatcness and feasibility of the modifications." . Standard 10.5 states, "Technical material and manuals that accompany modified tests should include a careful statement ofthe steps taken to modify the test to aiert users, to changes that are likely to alter the validity of inferences drawn from the test scores." Standard 10,6 states, "If a test developer recommends specific time limits for people with disabilities, empirical procedures should be used, whenever possible, to establish time limits for modified forms of timed tests rather than simply allowing test takers with disabilities a multiple of the standard ,time. When possible, fatigue should be investigated as a potentially important factor when time limits are extended," Standard 10.7 states, "When sample sizes permit, the validity of inferences made from test scores and the reliability of scores on tests administered to individuals with various disabilities should be investigated and reported by the agency or publisher that makes the modification. Such investigations should examine the effects of modifications made for people with various disabilities ori resulting scores, as well as the effects of administering standard unmodified tests to them." Standard 10.8 states, "Those responsible for decisions about test use with potential test takers who may need or may request specific accommodations should (a) possess the information necessary to make an appropriate selection of measures, (b) have current information regarding the availability of modified forms of the test in question, (c) inform individuals, when appropriate, about the existence of modified forms, and (d) make these forms available to test takers when appropriate and feasible," Standard 10.11 states, "When there is credible evidence of score comparability across regular and modified administrations, no flag should be attached to a score, When such evidence is lacking, specific information about the nature of the modification should be provided, if permitted by law, to assist test users properly'to interpret and, act on test scores." 134 Comment under Standard 10.7, pg. 106, Draft 7/6/00 , 41 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Draft Guide For Educators alld PolicYlIlakers challenges and strengths of a student. 135 These diagnostic tests are often administered in one-to-one situations (test taker and examiner) rather than in a group situation. In many cases they have been designed with standardized adaptations to fit the needs of individual examinees. In making decisions about which tests are appropriate to use, it is important to make sure that the tests accurately and completely reflect the intended constructs, so that the interventions are appropriate and beneficial for the individual students. 2. Testing the Academic Educational Achievement . Of Students with Disabilities Several factors affect how well the educational achievement of students with disabilities is measured on standardized academic tests. While it is very important that the test score inferences are valid, reliable, and fair, the technical issues associated with developing meaningful achievement tests for this population are complex and difficult to accomplish. To ensure accuracy in testing of studen~s with disabilities, tests must be developed so that they effectively measure the students' knowledge and skills in academic achievement rather than factors irrelevant to the intended constructs of the test. This is particularly important when achievement tests are used to make high-stakes decisions for individual students with disabilities. Reducing the influence of construct irrelevant factors includes minimizing the confounding conditions in the test or the testing prpcess so that the test accurately measures what it is supposed to measure. 136 In collecting evidence to support the technical quality of the test for these students, the accumulation of data may need to occur over several test administrations to ensure robust sample sizes. a. Background Factors for Students with Disabilities The background factors particularly important to students with disabilities are generally related to the nature of the disabilities or to the schooling experiences' of these students. 13? 135 Joint Standards, Chapters 10, 12, and 13; High Stakes, Chapter l. 136 See Standard 10.1, supra note III. 137 Educating One and All, Chapter 3; Testing Individuals with Disabilities. Draft 7/6/00 42 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Draft Gu;tle For Educators altd Policymakers . Within ~ny disability category, the type, number, and severity of impairnlents vary greatly. J38 For instance, some students with learning disabilities have a processing' disability in only one subject, such as mathematics, while others experience accessing; retrieval, and.processing impaimients that affect a broad number of school. subj ects and contexts. For many of these students, one or more of the impairments maybe relatively mild, while for others one or more can be sigpificant. Further, different types of disabilities yield significantly different constellations of Issues. Forinstance, the considerations surrounding hearing impaired students overlap significantly with limited English proficient students in some ways and with other students with disabilities in other respects. This complexity poses a challenge not only to educators, but also to test administrators and developers. In general, in determining how to use academic tests appropriately for students with disabilities, educators and policymakers should consider how to minimize the influence of the impairments in measuring the intended constructs. 138 Joint Standards, Chapter 10, Testing Individuals with Disabilities, p. 10 I-I 05. Draft 7/6/00 43 �The Use of Tests When Making High Stakes Decisions for Students: A ResouI'ce 7/6/00 Draft Guide Fol' Educatol's alld Policymakel's Educating One and All explains that the schooling experiences of students with disabilities vary greatly as a function of their disability, the severity of impainnents, and expectations of their capabilities. 139 Two sets of educational experiences, in particular, affect how educators and policy makers accommodate tests and use them appropriately for this population. First, guidance about the schooling and evaluation of students with disabilities is provided by individualized education program (IEP) teams made up of educators and parents. These teams often recommend testing accommodations that they feel would be appropriate for individual students. Second, classroom instructional techniques affect large scale testing. While special educators have a long history of accommodating instruction to fit student strengths, not all the instructional practices are appropriate in large scale testing. Additionally, some students may not have been exposed routinely to the types of accommodations that would be possible in large scale testing. 140 b. Accommodations for Students with Disabilities Providing accommodations to established testing conditions for some students with disabilities may' be appropriate when their use would yield the most valid scores on the intended academic achievement constructs. Deciding which accommodations to use for which students usually involves an understanding of which construct irrelevant background factors would substantially influence the measurement of intended knowledge and skills for individual students, and how the accommodations would impact the validity ofthe test score interpretations for these students. 141 Appendix C lists various presentation, administration, and response accommodations that states and districts generally employ when testing students with disabilities. Examples of presentation accommodations are the use of Braille, large print, oral reading, or providing page fonnats which minimize confusion by limiting use of columns and the number of items per page. Administration accommodations in setting include allowing students to take the test at home or in a small group, and accommodations in timing include extended time and frequent breaks. Variations in response fonnat include allowing students to respond orally, point or use a computer. 3. Alternate Assessments Alternate assessments are assessments for those students with disabilities who cannot participate in state or district-wide standardized assessments, even with the use of appropriate accommodations and modifications. 142 For the constructs being measured, the considerations with respect to validity, reliability, and fairness apply to alternate assessments, as well. Appropriate content needs to be identified, and procedures designed to ensure technical rigor 139 See Educating One and All, Chapter 3. 140 See Educating One and All, Chapter 5. 141 See Testing Students with Disabilities for a discussion of which accommodations might be most beneficial for students with various impairments and other background factors. 142 The IDEA req~ires use of alternate assessments in certain areas. See 34 C.F.R. 300.138. Draft 7/6/00 44 �The Use of Tests When Making High Stakes Decisions for Students: A Resource716100 Guide For Educators alld Policymakers Draft need to be followed. 143 In addition, strong evidence should show that the test measures the knowledge and skills it intends to measure, and that the measurement is a valid reflection of mastery in a range of contextual situations. 143 See Educating One and All, Chapter 5, and Testing Students with Disabilities for a discussion of the issues and processes involved in developing and implementing alternate assessments. Draft 7/6/00 45 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators and Policymakers Draft CHAPTER 2. Legal Principles It is important for educators and policy makers to understand the test measurement principles and the legal principles that will enable them to ask informed questions and make sound decisions regarding the use of tests for high-stakes purposes. The goal of this chapter is to explain the legal principles that apply to educational testing. The primary focus of this chapter is four federal nondiscrimination laws, enacted by Congress, and their implementing regulations: Title VI of the Civil Rights Act of 1964 (Title VI), Title IX of the Education Amendments of 1972 (Title IX), Section 504 of the Rehabilitation Act of 1973 (Section 504), and Title II of the Americans with Disabilities Act of 1990 (Title 11).144 Within the U.S. Department of Education, the Office for Civil Rights has responsibility for enforcing the requirements of these four statutes and their implementing regulations. Although the Office for Civil Riglits does not enforce federal constitutional provisions, an overview of these constitutional principles, including under the Fifth and Fourteenth Amendments of the U.S. Constitution, has also been included for informational purposes. The discussion oflegal principles in this chapter is'intended to reflect existing legal principles and does not establish new requirements. 145 144 Title VI prohibits discrimination on the basis of race, color and n,ational origin in the programs and activities of recipients that receive federal financial assistance. The U.S, Department of Education's regulation implementing Title VI is found at 34 C.F.R. Part 100. Title IX prohibits discrimination on the basis of sex in educational programs and activities of recipients offederal financial assistance. The U.S, Department of Education's regulation implementing Title IX is found at 34 C.F.R. Part 106. Section 504 prohibits discrimination on the basis of disability in the programs and activities of recipients of federal financial assistance, The U.S. Department of Education's regulation implementing Section 504 is found at 34 C.F.R. Part 104. Title II prohibits discrimination on the basis of disability by public entities, regardless of whether they receive federal funding. The U.S. Department of Justice's regulation implementing Title II is found at 28 C.F,R. Part 35. 145 Consistent with this approach, court'decisions are not cited if the case is still on appeal or the time to request an appeal has not ended. 146 See Slwrifv. New York Stale Educ. Dep't., 709 F. Supp. 345, 354-355, 364 (S.D. N. Y. 1989) (in granting a motion for preliminary injunction, where girls received comparatively lower scores than boys, court found that the state's use of SAT scores as the sole basis for decisions awarding college scholarships intended to reward high school achievement was not educationally justified for this purpose in that the SAT had been designed as an aptitude test to predict college success and was not designed or validated to measure past high school achievement). Draft 7/6/00 46 �The Use of Tests When Making High Stakes Decisions for Students: A Resou,.ce7/6/00 Guide Fo,. Educato,.s alld Policymake,.s I. Draft Discrimination Under Federal Statutes and Regulations Congress has enacted four statutes prohibiting discrimination based on race, color, national origin, sex, and disability In schools, colleges, and universities. Title VI prohibits discrimination based on race, color, or national origin; Title IX prohibits discrimination based on sex; and Section 504 and Title II of the Americans with Disabilities Act (ADA) prohibit discrimination based on disability. Title VI, Title IX, and Section 504 apply to all educational institutions that receive federal funds. Title II of the ADA applies to public entities, including public school districts and state colleges and universities. 151 The Title VI, Title IX, Section 504, and Title II statutes and their implementing regulations as well as. the equal protection clause of the Fourteenth Amendment to the United States Constitution, prohibit intentional discrimination, based on race, national origin, sex, or disability. In addition, the regulations that implement Title VI, Title IX, Section 504 and Title II prohibit policies or practices that have a 147 See United States v. Fordice, 505 U.S. 717,733-738 (1992) (invalidating state's exclusive reliance on ACT scores as a basis for college admissions at historically segregated colleges where the state adopted the ACT for discriminatory reasons and the ACT administering organization recommended that college admissions decisions consider high school grades along with test scores); see also Sharif. 709 F. Supp. at 364.· 148 See Lau v. Nichols, 414 U.S. at 566-569 (finding a violation of the Title VI regulations where limited English proficient students were taught only in English and not provided any special assistance needed to meet English language proficiency standards required by the state for.a· high school diploma). See also Debra P., 644 F.2d at 406 408 (holding that use of a graduation test that covered material that had not been taught in class would violate the due process and equal protection clauses and that, under the circumstances of the case, immediate use of the diploma sanction for test fai lure would punish black students for deficiencies created by an illegally segregated school system which had provided them with inferior physical structures, course offerings, instructional materials, and equipment). 149 See Larry P. v. Riles, 793 F.2d at 980-981,983 (finding that IQ tests the state used had not been validated for use as the sole means for determining that black children should be placed in classes for educable mentally retarded students); Sharif. 709 F. Supp. at 354 (observing that the SAT under-predicts success for female college freshmen as compared with males). See also Parents in Action on Special Educ. v. Hannon, 506 F. Supp. 831, 836-837.(N.D. Ill. 1980) (court's analysis of items on I.Q. test found only minimal amount of cultural bias not resulting in erroneous mental retardation diagnoses given other. information considered in process). 150 See Groves v. Alabama State Bd. of Educ, 776 F. Supp. 1518, 1530-1531 (M.D. Ala. 1991) (finding test required for admission to undergraduate teacher training program would not be educationally justified if the passing score is not itself a valid measure of the minimal ability necessary to become a teacher); Richardson v. Lamar County Bd. ofEduc., 729 F. Supp. 806, 823-825 (M.D. Ala. 1989) (evidence revealed that cut off scores had not been set through a well conceived, systematic process nor could the scores be characterized as reflecting the good faith exercise of professional judgment), aff'd sub nom., Richardson v. Alabama State Bd. ofEduc., 935 F.2d 1240 (II th Cir. 1991). 151 OCR enforces five nondiscrimination statutes, Title VI of the Civil Rights Act of 1964, 42 U.S.c. §§ 2000d, et seq. (2000); Title IX of the Education Amendments of 1972, 20 U.s.c. §§ 1681 et seq. (1999); Section 504 of the Rehabilitation Act of 1973, as amended, 29 U.S.c. §§ 794 (1999); Title II of the Americans with Disabilities Act of 1990,42 U.S.c. §§, 12131, et seq. (1995 and Supp. 1999); and the Age Discrimination Act of 1975, as amended, 42 U.S.c. §§ 610 I, et. seq. (1995 and Supp. 1999). Regulations issued by the United States Department of Education implementing Title VI, Title IX, and Section 504, respectively, can be found at 34 C.F.R. Part 100,34 C.F.R. Part 106, and 34 C.F.R. Part 104. These regulations can be found on ·OCR's web-site at www.ed.govlofJices.OCR. For regulations implementing Title /1 of the ADA, see 28 C.F.R. Part 35. Title II I of the ADA, which is enforced by the U.S. Department of Justice, prohibits discrimination in public accommodations by private entities, including schools. Religious entities operated by religious organizations are exempt from Title Ill. Draft 7/6/00 47 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft discriminatory disparate impact on students based on their race, national origin, sex, or disability. I 52 ' This section describes two central analytical frameworks for examining allegations of discrimination as set forth in federal nondiscrimination regulations: different treatment and disparate impact. 153 It also includes a further discussion of legal principles that apply specifically to students with limited English proficiency and to students with disabilities. A., Different Treatment Under federal law, policies and practices generally must be applied consistently to similarly situated individuals or groups, regardless of their race, national origin, sex, or disability.154 For example, a federal court concluded that a school district had intentionally treated students differently on the basis of race where minority students whose test scores qualified them for two or more ability levels were more likely to be assigned to the lower level class than similarly situated white students, and no explanatory reason was evident. 155 In addition, educational systems that were previously segregated by race in violation of the Fourteenth Amendment and have not achieved unitary status have an obligation to dismantle their prior de jure segregation. In such instances, when a school district or other educational system uses a test or assessment procedure for a high-stakes purpose that has racially disproportionate effects, the school district or other educational system must show that the disparity is not traceable to prior intentional segregation or that the test or assessment procedure does not perpetuate the adverse effects of such 152 34 C.F,R, § 100J(b)(2); 34 C.F,R, §§ 106,21(b)(2), 106,36(b), 106,52; 34 C.P,R, § 104,4(b)(4)(i); and 28 C.F,R. § 35.l30(b)(3). The authority of federal agencies to issue regulations with an "effects" standard has been consistently acknowledged by U.S. Supreme Court decisions and applied by lower federal courts addressing claims of,discrimination in education. See, e.g., Lau v. Nichols, 414 U.S. 563, 568 (1974); Guardians Ass 'II. v. City Service C~mm 'n. of City ofN. Y., 463 U.S. 582,584-593 (1983); Alexander v. Choate, 469 U.S. 287, 289-300 (1985), See also Memorandum from the Attorney General for Heads of Departments and Agencies that Provide Federal Financial Assistance, "Use ofthe Disparate Impact Standard in Administrative Regulations under Title VI of the Civil Rights Act of 1964," July 14, 1994. 153 Intentional racial discrimination is a violation of both the Fourteenth Amendment to the United States Constitution and federal civil rights statutes in cases where evidence demonstrates that an action such as the use of a test for high stakes purposes is motivated by an intent to discriminate. See Elston v. Talladega County Bd. ofEduc., 997 F.2d 1394, 1406 (II th Cir. 1993). As explained further in this section, the regulations promulgated under the federal civil rights statutes prohibit the use of neutral criteria having disparate effects unless the criteria are educationally justified. See Guardians Ass 'n v. Civil Service Comm 'n, 463 U.S. at 598. 154 For example, under the Fourteenth Amendment and Title VI, different treatment based on race is permitted only when such action is narrowly tailored to further a compelling state interest. See Regents ofthe Ulliv, of Cal. v, Bakke, 438 U.S. 265 (1978); Adarand Constructors, Inc, v. Pella, 515 U.S. 200 (1995). 155 See People Who Care v. Rockfo~d Bd. ofEduc., 851 F. Supp. 905, 958-1001 (N.D. III. 1994), remedial order rev'd, in part, III F.3d 528 (7th Cir. 1997). On appeal, the Seventn Circuit Court of Appeals stated that the appropriate remedy in this case was to require the district to use objective, non-racial criteria to assign students to classes, rather than abolishing the district's tracking system, 111 F.3d at 536. Draft 7/6/00 48 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers _ Draft segregation. 156 The school district is under "a 'heavy burden' of showing that actions that increase[] or continue [] the effects of the dual system serve important and legitimate ends.,,157 ' B. Disparate Impact Discrimination under federal law may also occur where the application of neutral criteria, , has discriminatory effects and those criteria are not educationally justified. The federal ' nondiscrimination regulations provide th(lt a recipient of federal funds may not "utilize criteria or ll)ethods of administration which have the effect of subjecting individuals to discrimination.,,158 It is important to understand that disparities in student perfonnance based on race, national origin,'sex, or disability, alone, do not constitute disparate impact discrimination under federal law. Furthennore, nothing in federal law guarantees equal results. (For a further discussion of issues related to testing of students with disabilities, see pp. 56 - 60.) Courts applying the disparate impact test have examined three questions to detennine if the practices at issue are discriminatory: (1) Does the practice or procedure in question result in substantial differences in the award of benefits or services based on race, ~ national origin, or sex? (2) Is the practice or procedure educationally justified? and (3) Is there an equally effective alternative that can accomplish the institution's educational goal with less disparity? 159 156 See United States v, Fordice, 505 U.S. at 731-732 (finding state's requirement that students have higher ACT scores for admission to historically white college~ than historically'black colleges to be constitutionally suspect where thc requirement was enacted for discriminatory purposes, emanated from the prior dejure system that continue to have segregative effects and was not shown to be justified in educational terms); Debra P. v, Turlington, 644 F.2d at 407 (,,[defendants] failed to denionstrate either that the disproportionate failure [rate] of blacks was not due to the present effects of past intentional segregation or, that as presently used, the diploma sanction was necessary [in order] to remedy those effects"); McNeal v, Tate County Sch, Dist., 508 F.2d 1017, 1020-1021 (5th Cir. 1975) (since ability grouped classroom assignment~ preserved effects of past intentional discrimination, defendants were required to show educational benefits of assignment practice on remand or propos~ an educationally sound alternative); GI Forum v, Texas Educ, Agency, No, SA-97-CA-1278-EP, 2000 U.S. Oist. LEXIS 153, slip op. at 56-57 (W,O, Tex. 2000) (upholding use of graduation test where the test is used to identify educational inequalities and attempt to address them). 157 Dayton Bd. ofEduc, v. Brinkman, 443 U.S, 526, 538 (1979) (quoting Green v, County ScllOOI Bo~rd, 391 U.S. 430, 439 (1968)), 158 See 34 C.F.R. § 100.3(b)(2)(Title VI); 34 C.F,R. § I 04.4(b)(4)(i)(Section 504); and 28 C.F.R. § 35.130(b)(3)(i) (Title 11), See also 34 C.F.R, § 106,31 (Title IX). In Guardians, 463 U.S. at 589-590, the U.S. Supreme Court upheld the usc of the effects test, stating that the Title Vl regulation forbids the use of federal funds, "not only in programs that intentionally discriminate on racial grounds but also ir those endeavors that have a[n] [unjustified racially disproportionate] impact on racial minorities." 159 See Georgia State Con!, 775 F.2d at1417, See also Elston, 997 F.2d at 1407 & n, 14; Larry p" 793 F. 2d at 982 & n. '9; Groves, 776 F. Supp, at 1523-1524, 1529-1532; Sharif, 709 F. Supp, at 361. Many courts use the term "equally effective" when discussing whether the alternative offered by the party challenging the test is feasible and would effectivCly meet the institution's goals. See, e,g., Georgia State Con!, 775 F.2d at 1417; Sharif, 709 F. Supp, at 361 ' Other courts use the term "comparably effective" in evaluating proposed alternatives. See, e,g" Sandoval, 7 F. Supp. 2d at 1278; Elston, 997 F.2d at 1407; Fitzpatrick v, City ofAtlanta, 2 F.3d 1112, 1118 (11 th Cir. 1993). Review of the decisions in these cases indicate that the courts appear to be using the terms synonymously, Draft 7/6/00 49 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators and Policymakers Draft . The party challenging the test has the burden of establishing disparate impact. If disparate impact is established, the educational institution must provide sufficient evidence of an educational justification. If an educational justification is established, then the party challenging the test must demonstrate that an alternative with less disparate' impact is equally effective in meeting the institution's educational goals or needs in order 160 . to preVaI'1 . 1. Determining disproportiQnate impact The first question in the disparate impact analysis is whether there is information indicating a significant disparity in the award of benefits or services to students based on . .. race, natIonaI ongm, or sex. 161 T0 determine if'a significant disparate impact exists, courts have focused on evidence of statistical disparities. 162 Generally, a test has a disproportionate adverse impact if a statistical analysis shows a significant difference from the expected random distribution. 163 There is no rigid mathematical threshold regarding the degree of disproportionality required; however, courts have used various statistical methods to identify disparities that are sufficiently substantial to raise an inference that the challenged practice caused the disparate results. 164 To establish disparate impact in the context of a selection system, the comparison must be made between those selected for the educational benefit or service and 'a relevant pool of applicants or test-takers. 165 160 See Georgia State COllj, 775 F.2d at 1417. See also the Department of Justice's Title VI Legal Manual at p. 2. 161 For a further discussion of the legal principles regarding students with disabilities'under the IDEA, Section 504 and Title II of the ADA, see pp. 38-40. 162 See Watson v. Fort Worth Bank & Trust, 487 U.S. 977,994-997 (1988) (O'Connor, J.,plurality opinion). 163 See Watson, 487 U.S. at 995; Groves, 776 F. Supp. at 1526-1528. 164 See Watson, 487 U.S. at 994-995; Gr~ves, i16 F. Supp. at 1526-1527, A variety of methods are commonly used by courts to distinguish differences between outcomes that are statistically and practically significant from those that are random. Some have used an 80% rule whereby disparate impact is shown when the rate of selection for the less successful group is less than 80% of the rate of selection for the most successful group. Another type of statistical analysis considers the difference between the expected and observed rates in terms of standard deviations, with the difference generally expected to be more than two or three standard deviations. Another test is known as the "Shoben formula" in which the difference or Z-value in the groups' success rates must be statistically significant. Groves, 776 F. Supp. at 1526-1528 (discussing these methods and the cases in which they were used). 165 When determining disparate impact in the context of a selection system, the comparison pool generally consists of all minimally qualified test-takers or applicants. When tests are used to determine placement or some other type of educational treatment, the comparison is between those identified by the test for the placement or educational treatment and the relevant pool orlest takers. The precise composition of the comparison pool is determined on a case-by-case basis. See Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 650-651 (1989); Watson, 487 U,S, at 995-997; Groves, 776 F. Supp. at 1525-1526, Draft 7/6/00 50 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators and Policymakers Draft In general, a specific policy; practice or procedure must be identified as causing the disproportionate adverse effect on the basis of race, national origin, or sex. 166 For example, when a particular use of a test is being challenged, the evidence should show that the test use, rather than other seleqion factors, accounts for the disparity.167 2. Determining educational necessity Where the use of a test results in decisions that have a disparate impact on the basis of race, national origin, or sex, the test use causing the disparity must significantly serve the legitimate educational goals of the institution. 168 This inquiry is usually referred to as determining the "educational necessity" of the test use or determining whether the test is "educationally justified.,,169 The' test need not be "essential". or "indispensable" to achieving the institution's educational goal; 170 rather, the educational institution must show a manifest relationship between use of the test and the institution's educational purposes. 171 In evaluating educational necessity, both the legitimacy of the educational goal asserted by the institution and the use of the test as a valid means to advance this goal may be at issue. Courts generally allow educational institutions to define their own educational goals and focus on whether the challenged test serves the institution's articulated 172 · . ob~ectIves. 166 Elements of a decision-making process that cannot be separated for purposes of analysis may be analyzed as one selection practice. See Title VII of the Civil Rights Act of 1964,42 U.S.c. § 2000e-2[k][ 1][B][i]. This is necessary because limiting the disparate impact analysis to a discrete component ofa selection process would not allow for situations "where the adverse impact is caused by the interaction of two or more components of the process." See GrajJam v. Scott Paper Co., 870 F. Supp. 389, 395 (D. Me. 1994), ajJ'd, 60 F.3d 809 (1995). 167 As noted in Watson, 487 U.S. at 994, courts have found it "relatively easy," when appropriate statistical proof is presented, to identify a standardized test as causing the racial, national origin, or sex related disparity at issue. See also GI Forum v. Texas Educ. Agency, No. SA-97-CA-1278-EP, 2000 U.S. Oist. LEXIS 153, slip op. at 35-40 (W.O. Tex. 2000) (given legally meaningful differences in the pass rates of minority and majority students, plaintiffs made a prima facie showing of disparate impact resulting from a minimum competency.test). 16,8 See Wards Cove, 490 U.S. at 659. 169 See Board of Educ. v. Harris, 444 U.S. 130, 151 (I 979); Elston, 997 F.2d at 1412. 170 See Wards Cove, 490 U.S. at 659; Elston, 997 F.2d at 1412 (citing Georgia State Can!, 775 F.2d at 1417-1418). 171 See Georgia State Call!, 775 F.2d at 1418 (showing required that "achievement grouping practices bear a'manifest demonstrable relationship to classroom education"); Sharif, 709 F. Supp. at 362 (defendants must show a manifest relationship between use of the SAT and recognition of academic achievement in high school). As explai'ned in Elston, 997 F.2d at 1412, "from consulting the way in which ... [courts] analyze the 'educational necessity' issue, it becomes clear that... [they] are essentially requiring ... [the educational institution to] show that the challenged course of action is demonstrably necessary to meeting an important educational goal." In other words, the institution can defend the challenged practice on the grounds that it is "supported by a 'substantial legitimate justification. '" See Elston, 997 F .2d at 1412 (quoting Georgia State Can!, 775 F.2d at 1417); see also Georgia State Can!, 775 F.2d at 1417-1418; Groves, 776 F. Supp. at 1529-1532. 172 See, e.g., Debra P., 644 F.2d at 402 (indicating that the court is not in a position to determine education policy and; state's efforts to establish minimum standards and improve educational quality are praiseworthy). Draft 7/6/00 51 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators and Policymakers Draft In conducting this analysis, courts have generally considered relevant evidence of validity, reliability, and fairness 173 provided by the test developer and test user to determine the acceptability of the test for the purpose used, giving appropriate deference to the expertise and experience of educators and testing professionals. 174 The educational justification inquiry thus generally looks at technical questions regarding the test's accuracy in relation to the nature and importance of the educational institution's goals, the educational consequences to students, and the relationship of the educational 173 In general, courts have said that validity refers to the accuracy of conclusions drawn from test results. See Allen v. Alabama State Bd. ofEduc., 976 F. Supp. 1410, 1420-1421 (M.D. Ala. 1997) ("Generally, validity is defined as the degree to which a certain inference from a test is appropriate and meaningful", quoting Richardson v. Lamar County Bd. ofEduc., 729 F. Supp. 809, 820 (M.D. Ala. 1989), ajJ'd, 164 F.3d 1347 (1999), injunction granted, 2000 U.S. Dis!. LEXIS 123 (2000).) See also Richardson, 729 F. Supp. at 820-821 ("[A] test will be valid so long as it is built to yield its intended inference and the design and execution of the test are within the bounds of professional standards accepted by the testing industry."); Anderson, 520 F. Supp. at 489 ("Validity in the testing field indicates whether a test measures what it is supposed to measure."). 174 See, e.g., United States v. LULAC, 793 F.2d 636, 640, 649 (5th Cir. 1986) (pointing to substantial expert evidence in the record; including validity studies, indicating that the tests involved were valid measures of the basic skills that teachers,should have). The sponsors of the newly revised Joint Standards advise that the Joint Standards are intended to provide guidance to testing professionals in making such judgments. See Joint Standards, Introduction, p. 4. The Joint Standards are discussed more fully in Chapter One of this guide. Where the evidence indicates that the educational institution is using a test in a manner that does not lead to valid inferences, educational justification may be found lacking. See United States v. Fordice, 505 U.S. at 736-737 (ruling that Mississippi's exclusive use of ACT scores in making college admissions decisions was not educationally justified, since, among other factors, the ACT's administering organization discouraged this practice); Groves, 776 F. Supp. at 1530 (requiring minimum ACT score for admission to undergraduate teacher education programs violated the Title VI regulations since ACT scores had not been validated for this purpose); Sharif, 709 F. Supp. at 361-363 (in ruling on a motion for preliminary injunction, court found that the state's use of SAT scores as the sole basis for decisions awarding college scholarships intended to reward high school achievement was not educationally justified for this purpose in that the SAT had been designed as an aptitude test to predict college success and was not designed or validated to measure past high school achievement). Psychometric or scientific evidence is not the only way that validity can be demonstrated, however. Courts can draw inferences of validity from a wide range of data points. See Watson v. Fort Worth Bank & Tn/st, 487 U.S. 977, 998 (1988) (referring to procedures used to evaluate personal qua'lities of candidates for managerial jobs). Draft 7/6/00 52 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft 175 institution to the student. Where a test is used for promotion or graduation purposes~ courts may also consider whether the skills tested have been taught in the program. 176 3. Detennining whether there are equally effective alternatives that serve the institution's educational goal with less disparity If the educational institution provides sufficient evidence that the test use in question is justified educationally, the party challenging the test has the opportunity to show that there exists an e~ually effective altemative practice that meets the institution's goals with less disparity.17 The feasibility of an alternative, including costs and administrative burdens, is .' a relevant cons}'deratlOn. 178 . II. Testing Of Students With Limited English Proficiency Testing of students with limited English proficiency in the elementary and secondary education context raises a set of unique issues. To understand the obligations of states and school districts with regard to high-stakes testing of such students, it is important to understand the basic 'obligations of school districts and states under Title VI and related federal law that relate to language minority students who are learning English. 175 See, e.g., Georgia State Con!, 775 F.2d at 1417-1420; Groves, 776 F. Supp. at 1530-1531,' Larry P .. 793 F.2d at 980. In the educational context, tests playa complex role that bears on evaluation of educational justification. As noted by the court in Larry P., . [I]f tests can predict that a person is going to be a poor employee, the employer can legitimately deny that person a job, but if tests suggest that a young chi Id is probably going to be a poor student, the school cannot on that basis alone deny that child the opportunity to improve and develop the academic skills necessary to success in our society. 793 F.2d at 980 (quoting Larry P., 495 F. Supp. at 969). Because determining whether a test is a valid basis for classifying students and placing them in different educational programs may be even more complex and difficult than, ' determining if a test validly predicts job performance, particular sensitivity is needed ,to all of the interests involved. The, question may be not only whether a test provides valid information about a student's ability and achievement, but whether the educational services provided to the student as a consequence of the test serve the student's needs. Inequality in the services provided to students prior to the test, as well as in the services provided as a consequence of the test, may also be a factor considered as part of the educational justification for using a test in a particular WilY. See Debra P., 644 F,2d at 407-408 (agreeing with the statement that Title VI would not be violated if the test were a fair test of what students were taught); Debra P., 730 F.2d 1405, .1407, 1410-141 I; 1416 (1984)(affirming thatthe extent of remedial efforts to address test failure is relevant to evaluation of test use). 176 See Debra P., 644 f.2d at 408. 177 See New York Urban League v. New York, 71 F.3d J031, 1036 (2d 'Cir:1995) (stating H • the plaintiff may still prove his case by demonstrating that other less discriminatory means would serve the same objective"). See also Albemarle Paper Co. v. Moody, 422 U.S. 405,425 (J 975); Richardson v. Lamar County Bd. ofEduc., 729 F. Supp. at 815. . . 178 See Wards Cove, 490 U.S. at 661 (indicating that factors such as costs or other burdens are relevant in determining whether the alternative is equally effective in serving employer's legitimate goals); Sharif. 709 F. Supp. at 363-364 (finding defendant's claim that proposed alternative was not feasible and excessively burdensome not persuasive since most other states used proposed alternative); MacPherson v. University of Montevallo, 922 F.2d 766, 773 (11th Cir. 1991 )(holding that plaintiff must show that the alternative is economically feasible). Draft 7/6/00 53 �The Use of Tests When Making High Stakes Decisions for Students: A Resollrce7/6100 _ Draft GlIide For Edllcators a"d Policymakers Title VI prohibits discrimination based on race, color, or national origin. On May 25, 1970, the United States Department of Health, Education, and Welfare's Office for Civil Rights issued a policy memorandum entitled "Identification of Discrimination and Denial of Services on the Basis of National Origin." The May 25 th memorandum clarified the responsibility of school districts, under Title VI, to provide equal educational opportunity to national origin minqrity group students whose inability to speak and understand the English language excludes them from effective participation in the education program offered by the school district. 179 This memorandum was cited with approval by the Supreme Court in its decision in Lau v. Nichols, which held that the district's policy of teaching national origin minority group children only in English, without any special assistance, deprived them of the opportunity to benefit from the district's education program, including meeting the English language proficiency standards required by the state for a high school diploma. 180 The Lau case held that such policies are barred when they have the effect of denying such benefits, even though no purposeful design is present. 181 Sub~equent1y, Castaneda v. Pickard,182 relying on the language of the Equal Educational Opportunities Act (EEOA), explained the steps school districts must take to help students with limited English proficiency overcome language barriers to ensure that they can participate meaningfully in the district's educational programs. 183 The court stated that school districts have an obligation to provide services that enable students to acquire English language proficiency. A school system that chooses to temporarily emphasize English over other subjects retains anobligation to provide assistance necessary to remedy academic deficits that may have occurred in other subjects while the student was focusing on learning English. Under the Castaneda standards, school districts have broad discretion in choosing a program of instruction for limited English proficient students. However, the program must be based on sound educational theory, must be adequately supported so that the program has a realistic chance of success, and must be periodically evaluated and revised, if necessary, to achieve its goals. The disparate impact framework discussed above may also be used to examine whether tests used for high-stakes purposes result in a discriminatory impact upon students with limited English proficiency. As part ofthis analysis, question~ may arise regarding the 179 See Identification ofDiscrimination and Dellial ofServices all the Basis ofNational Origin, 35 Fed. Reg. 11595 (1970). The Department of Health, Education and Welfare was the predecessor of the U.S, Department of Education. 180 181 See Lau, 414 U, S. at 566-568. ' Id. at 568, citing, among other legal authority, the predecessor of 34 C.F.R. § 100.3 (b)(2). IR2 See Castanada, 648 F, 2d at 1005-1006,1009-1012. The analytical framework in Castaneda which was decided under the Equal Educational Opportunities Act (EEOA), 20 U.S.C. §§ 170 I et seq., has been applied to OCR's Title VI analysis. See Williams MemorandulIl, supra note 39. The EEOA contains standards related to limited English proficient students similar to the Title VI regulations. 183 See Castaneda, 648 F.2d at 1011. Draft 7/6/00 54 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld Policymakers Draft validity andreliabiiity of the test for these students. 184 Depending upon' the purpose of the test and the characteristics of the populations being tested, in some situations, accommodations or other forms of assessment of t~e same construct may be necessary . . In short, the obligation is to ensure that the same constructs are being measured for all students. There are three particularly important areas involving high-stakes testing of students with limited E~glish proficiency: (1) tests used to determine a student's proficiency in the areas of speaking, listening, reading, or writing English for the purpose of determining. whether the student should be provided with a program to enable the student to acquire English language skills (and, later, for the purpose of determining whether the student is ready to exit the program); (2) tests used to determine if the student meets the criteria for other specialized instructional programs, such as gifted and talented or vocational education programs; and (3) system-wide tests administered to determine if students have met performance standards. Tests used to determine a student's initial and continuing need for special language programs should be appropriate in light of the district's own performance expectations and otherwise valid and reliable fo~ the purpose used. Tests used by schools to help select students for specialized instructional programs, including programs for gifted and talented students, should not screen out limited English proficient students unless the program itself requires proficiency in English for meaningful participation. 185 When state or school district adopts content and performance standards, and uses high-stakes tests to measure whether students have mastered these standards, a critical factor is , whether the overall educational program provided to students with limited English proficiency is reasonably calculated to enable the students to master the knowledge and skills that all students are expected to master. When education agencies institute standards based testing, it is important for them to examine their programs for students with limited English proficiency to determine when and how these students will be provided with the instruction needed to prepare them to pass the test in question. a In addition, students with limite~ English proficiency may not be categorically excluded from standardized testing designed to increase accountability of educational programs for effective instruction and student performance. If these students are not included, the test data will not fairly reflect the performance of all students for whom the education agency is responsible. I 86 Such test data can also help a district to assess the' effectiveness of its I content and English language acquisition programs. 184 See pages 38-42 for a discussion of the psychometric principles involved in determining the reliability and validity oftests.used with limited English proficient students. 185 See Williams Memorandum, supra, note 39. 186 Indeed, Title I of the Elementary and Secondary.Education Act explicitly requires States to include limited English proficient students in the statewide assessments used to hold schools and school districts accountable for student performance. Title I of the Elementary and Secondary Education Act, 20 U.S.c. § 6311 (b)(3)(F)(iii). If a school district uses the results of a test given for program accountability purposes to make educational decisions about individual students, the high-stakes use of the test must also be valid and reliable for this purpose. Draft 7/6/00 55 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Guide For Educators alld Policymakers Draft " For in'forrriation on the factors that help ensure accuracy of tests for limited English proficient students, see pages 38 - 40 above. In making decisions about testing limited English proficient students, factors such as the student's level of English proficiency, the primary language of instruction, the level of literacy in the native language, and the number of years of instruction in English may all be pertinent. 187 When students participate in assessments designed to meet the requirements of Title I of the Elementary and Secondary Education Act, as amended, those assessments must be implemented in a manner that is consistent with both the requirements of Title VI and Title 1. III. Testing Of Students With Disabilities Three federal statutes provide basic protections for students with disabilities. Section 504 ofthe Rehabilitation Act of 1973 (Section 504) and Ti~le II of the Americans with Disabilities Act of 1990 (Title II) prohibit discrimination against persons with disabilities by public schools. ISS The Individuals.with Disabilities Education Act (IDEA) establishes rights and protections for students with disabilities and their families. It also provides federal funds to state education agencies and school districts to assist in educating students with disabilities. 189 Under Section 504, Title II, and the IDEA,190 school districts have a responsibility to provide students with disabilities with a free appropriate public education. Providing effective instruction in the general curriculum for students with disabilities is an important aspect of providing a free appropriate public education. The regulations implementing Section 504 and Title II specifically provide that a recipient of federal funds may not "utilize criteria or methods of administration which have the effect of subjecting individuals to discrimination.,,19J Under Section 504, Title II, and the IDEA, tests given to students with disabilities must be selected and administered so that the test accurately reflects what the student knows or is able to do, rather than the student's disability (except when the test is designed to measure disability related skills). This means that students with disabilities must be given appropriate accommodations and modifications in the administration ofthe tests. Examples include 187 For more information on appropriate ways of testing students who are learning English, see Ensuring Accuracy in Testing for English Language Learners, (CCSSO, 2000). 188 Although this part of the chapter deals only with students with disabilities attending public elementary and secondary schools, private schools that are not religious schools operated by religious organizations are covered by Title III of the ADA. Title II of the Americans with Disabilities Act of 1990,42 U.S.c. §§ 12181 et seq. In addition, Title I of the Elementary and Secondary Education Act of 1965, as amended, contains important provisions regarding students with disabilities in the Title I program and their participation in assessments of Title I programs. 20 U,S,C, § 63 II (b )(3 )(F), 189 The Individuals with Disabilities Education Act, 20 USc. § 1400(d)( I )(c). 190 The Section 504 regulation is found at 34 C.F.R, Part 104 (1999). The Title II regulation is found at 28 C.F,R, Part 35. The IDEA regulation is found at 34 C.F,R. Part 300, 191 See 34 C.F.R. § I 00.3(b)(2) and similar provisions under Title IX, Section 504, and the ADA. In Guardians, 463 U.S. at 589, the United States Supreme Court upheld the use ofthc effects test, stating that the Title VI regulation forbids the use of federal funds, "not only in programs that intentionally discriminate on racial grounds but also in those endeavors that have a [racially disproportionate] impact on racial minorities," Draft 7/6/00 56 �The Use of Tests When Making High Stakes Decisions for Students: A Resollrce7/6/00 Draft Guide For Etillcators and PolicYl1lakers oral testing, large print tests, Braille versions of tests, individual testing, and separate group testing. Generally, there are three critical areas in which high-stakes testing issues arise for students with disabilities: (1) tests used to detennine whether a student has a disability and,if so, the nature ofthe disability; (2) tests used to detennine ifthe student meets the criteria for other specialized instructional programs, such as gifted and talented or vocational education programs; and (3) system-wide tests administered to detennine if students have met perfonnance standards. 192 Under Section 504, Title II, and the IDEA, before a student can be classified as having a disability, the responsible education agency must individually evaluate the student in accordance with specific statutory and regulatory requirements, including .requirements regarding the validity oftests and the provision of appropriate accommodations. 193 These requirements prohibit the use of a single test score as the sole criterion for detennining whether a student has a disability and for detennining an appropriate educational placement for the student. 194 When tests are used for other purposes, such as in making decisions about placement in gifted and talented programs, it is important that tests measure the skills and abilities needed in' the program, rather than the disability, unless the test purports to measure skills or functions which are impaired by the disability and such functions are necessary for participation in the program. 195 For this reason, appropriate accommodations may need to be provided to students with disabilities in order to measure accurately their perfonnance in the skins and abilities required in the program. Furthennore, federal law requires the inclusion of students with disabilities in state- and district-wide assessment programs, including high-stakes tests, except as participation in such tests is individually detennined to be inappropriate for a particular stl,ldent. Such assessments provide valuable infonnation which benefits students, either directly, such as in the measurement of individual progress against standards, or indirectly, such as in evaluating programs. Given these benefits, exclusion from assessment programs based on disability generally would violate Section 504 and Title II. If a student with a disability will take the system-wide assessment test, including a high-stakes test, the student must be provided appropriate instruction and appropriate test accommodations. 196 192 Tests used for college admission are discussed on pp. 4-5. 193 See 34 C.F.R. § I04J5(b) for specific provisions covering the use of tests for evaluation purposes. 194 See 34 C.F.R. § 104J5(c), requiring placement decisions to consider information from a variety of sources. 195 See 34 C.F.R. § 104J5(b)(3) and 34 C.F.R. § 300.532. 196 See Brookhart, 697 F.2d at 183-184. Some courts have held that a student with a disabi Iity may be denied a diploma if, despite receiving appropriate services and testing accommodations, the student, because of the disability, is unable to pass the required test or meet other graduation requirements. Jd. at 183; Anderson, 520 F. Supp. at 509-511; Board of Educ. v, Ambach, 458 N'y,S,2d 680, 684-685, 689 (N.Y. App. Div. 1982), affd, 469 N.Y.S.2d 669 (1983), Draft 7/6/00 57 �The Use of Tests When Making High Stakes Decisions for Students: A Resource7/6/00 Draft Guide For Educators alld Policymakers In addition, the Individuals with Disabilities Education Amendments of 1997 specifically require states, as a condition of receiving IDEA funds, to include students with disabilities in the regular state- and district-~ide assessment programs, witl).,appropriate accommodations, where necessary. 197 The IDEA requiremen~s cover tests with high stakes consequences given to measure individual achievement as well as tests given for program accountability purposes. The IDEA also requires state or local educational agencies to develop guidelines for the relatively small number of students with disabilities who cannot take partin ,state- and district-wide tests to participate in alternate assessments. 198 ' For children with disabilities, school personnel knowledgeable about the student, the nature of the disability, and the testing program, in conjunction with the student's parent or guardian, determine whether the student will participate in all or part'ofthe state- or district wide assessment of student achievement. 199 The decision must be documented in the student's individualized education program (IEP), or a similar record such as a Section 504 plan. These records must also state any individual accommodations in the administration of the state- or district-wide assessments of student achievement that are needed to enable the student to participate in such assessment. An IEP, developed under the IDEA, must also explain how the student will be assessed if it is inappropriate for the student to participate in the testing program even with accommodations?OO Section 504 and Title II also prohibit discrimination in virtually all public and private post-secondary institutions. The regulatory requirements related to disability discrimination are different in post-secondary education than in elementary and secondary education. Post-secondary institutions are not required to evaluate students or to provide them with a free appropriate education. High-stakes testing issues at the post-secondary level generally relate to tests used in admissions~ including tests giveri by an educational institution or other covered entities as prerequisites for entering a career or career path, and tests of academic competency required by the institution to complete a program. This guide is not intended to offer a complete or detailed explanation of each of these testing situations, but only brief synopsis. 201 .. a I~ See 34 C.F.R. § 300.138(a). . . 198 See 34 C.F.R. § 300: 138(b) . .The IDEA Final Regulations, Attachment l--Analysis of Comments and Changes, 64 Fed. Reg. 12406, 12564 (1999) projects that there will be a relatively small number of students who will not be able to participate in the district or state assessment program with accommodations and modifications, and will therefore need to be assessed through altemate means. These alternate assessments must be developed and conducted beginning not later than July 1,2000. 199 See 34 C.F.R. § 300.347(a)(5) for the IEP requirements applicable to assessment of students with disabilities under IDEA and 34 C.F.R. § 104.33 for the more general evaluation requirements under Section 504. 200 See 34 C.F.R. § 300.347(a)(5). Test providers that are not higher education institutions may be covered by Section 504 if they receive federal funds; by Title II if they are parts of governmental units; or by Title III if they are private entities. Each of these laws has its 201 Draft 7/6/00 58 �The Use of Tests When Making High Stakes Decisions for Students: A Resource 7/6/00 Guide For Educators alld PolicYlIlakers Draft The Section 504 regulation specifically provides that higher education institutions' admissions procedures may not make use of any test or criterion for admission that has a disproportionate, adverse impact on individuals with disabilities unless (1) the test or criterion, as used by the institution,. has been validated as a predictor of success in the . education program or activity and (2) alternative tests. or criteria that have a 1ess disproportionate, adverse impact are not shown to be available. 202 In administering tests, appropriate accommodations must be provided so that the person can demonstrate his or her aptitude and achievement, not the effect of the disability (except where the functions impaired by the disability are the factors the test purports to measure). 203 For other high-stakes tests that an institution might administer, such as rising junior 'tests, similar requirements apply.204 The institution must provide adjustments or accommodations and auxiliary aids and services that enable the student to demonstrate the know ledge and skills being tested. 205 . Students are required to notify the educational institution when accommodations are needed and supply adequate documentation of a current disability and the need for accommodation. The student's preferred accommodation does not have to be provided as long as an effective accommodation is provided. Test accommodations are intended to provide the person with disabilities the means by which to demonstrate the skills and knowledge being tested. 'A1though Section 504 and . Title II require a college or university to make reasonable modifications, neither Section 504 nor Title II requires a college or university to change, lower, waive, or eliminate academic requirements or technical standards, including admissions requirements, that can be demonstrated by the college or university to be essential to its program of instruction or to any directly related licensing requirement. 206 Accommodations requested by students need not be provided if they would result in a fundamental alteration to the institution's program. 207 ". own requirements. For more information regarding testing under Title III of the ADA, consult the U.S. Department of Justice. . 202 34 C.F.R. § 104,42(b)(2). Appendix A to the Section 504 regulation, Subpart E-Post-secondary Education, No. 29, notes that the party challenging the test would have the burden of showing that alternate tests with less disparate impact are a v a i l a b l e . ' .. See 34 C.P.R. § 104,42(b)(2). Appendix A to the Section 504 regulation, Subpart E-Post-secondary Education, No. 29, notes that the party challenging the test would have the burden of showing that alternate tests with less disparate impact are available. 203 204 Some undergraduate college progr'ams require students to pass a rising junior examinatipn to determine whether students have met the college's standards in writing or other academic skills as a prerequisite for advancement to junior year status. 205 See 34 C.F.R. § 104,44(a) & (d). 206 See 34 C.F.R. § 104.44 (a). See Southeastern Community College v. Davis, 442 U.S. 397, 413 (1979); W),l1l1e v. Tufts Univ. Sell. ofMed., 976 P.2d 791, 794-796 (I st CiT. 1992), cert. denied, 507 U.S. 1030 (1993). 207 Draft 7/6/00 59 �IV. Constitutional Protections In addition to applying federal nondiscrimination statutes, courts have also considered constitution'al issues that may arise when public school districts or state education agencies require students to pass certain tests that are intended to certify that students have attained a level of competency in skills or knowledge taught in the program?08 Constitutional challenges to testing progrlnns under the Fourteenth Amendment have raised both equal protection and due process claims. The equal protection principles involved in,discrimination·cases are, generally speaking, the same as the standards applied to intentional discrimination claims under the applicable federal · . ., 209 non d Iscnmmatlon s t a t u t e s . ' . The due process clause of the Fourteenth Amendment is particularly associated with cases challenging the adequacy of the notice provided to students prior to this type oftest and the students' opportunity to learn the required content. 210 In analyzing suc,h due process claims, courts have generally considered three issues: 208 The U.S. Department of Education, Office for Civil Rights, does not have jurisdiction to resolve constitutional cases. However, some cases involve constitutional issues that overlap with discrimination issues arising under federal civil rights law:s. 209 Federal cases may involve equal protection challenges to a jurisdiction's use oftests in which the claim is not based on intentional race or sex discrimination, but, instead, on the alleged impropriety ofthe jurisdiction's use oftests to separate out those students who should not be allowed to graduate. As a general matter, courts express reluctance to second guess a state's educational policy choices when faced with such challenges, although recognize that a state cannot "exercise that [plenary] power without reason and without regard to the United States Constitution." Debra P., 644 F.2d at 403. When there is no claim of discrimination based on membership in a suspect class, the equal protection claim is revie"Yed under the rational basis standard. In these cases, the jurisdiction need show only that the use of the tests has a rational relationship to a valid state interest. ld. at 406. See also Erik v., 977 F. Supp. at 389. 210 A review of relevant cases reveals the highly fact and context-specific nature of the conclusions reached by federal courts considering alleged violations of the due process clause. In Debra P., 644 F.2d at 404, the Fifth Circuit held that students' due process rights were violated when a newly imposed minimum competency test required for high school graduation was instituted without adequate notice and an opportunity for students to learn the material covered by the test. Three years later, in Debra P. v. Turlillgtoll, 730 F.2d at 1416-1417, the court held that students who now had six years notice ofthe exam were afforded the opportunity to learn the relevant material, given the state's remedial programs. For'additional courts identifying due process violations in the way in which a compe~ency test was instituted, see Brookhart; 697 F.2d at 186- J 87 (holding that district-required minimum competency test for graduation denied due process to students with disabilities where notice was inadequate and students had not been exposed to 90% ofthe material covered by the test); Crump v. Gilmer Indep. Sch. Dist., 797 F. Supp. 552, 556-557 (E.D: Tex. 1992) (granting temporary restraining order where district had not demonstrated validity of graduation examination in light of actual' instructional d:mtent); Anderson, 520 F. Supp. at 508-509 (finding that school district failed to show that minimum competency test required for high school graduation covered material actually taught at school). Other cases have concluded that adequate notice was provided, t Draft 7/6/00 60 � Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Kendra Brooks - Subject Series Creator An entity primarily responsible for making the resource Domestic Policy Council Kendra Brooks Is Part Of A related resource in which the described resource is physically or logically included. <a href="http://clinton.presidentiallibraries.us/items/show/36031" target="_blank">Collection Finding Aid</a> <a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a> Description An account of the resource The Kendra Brooks Subject Files contain correspondence, reports, articles, memos, and various printed material. Other documents include background information for education events and meetings. The files include material pertaining to charter schools, national testing, SAT preparation, school safety, school modernization/construction, affirmative action, Blue Ribbon Schools, class–size reduction, teacher quality, Limited English Proficiency (LEP), the White House Initiative on Education Excellence for Hispanic Americans, Tribal Colleges and Universities, Historically Black Colleges and Universities (HBCU’s), the Individuals with Disabilities Education Act (IDEA), and Title 1 of the Elementary and Secondary Education Act of 1965. Provenance A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource. Clinton Presidential Records: White House Staff and Office Files Publisher An entity responsible for making the resource available William J. Clinton Presidential Library & Museum Extent The size or duration of the resource. 157 folders in 16 boxes Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Original Format The type of object, such as painting, sculpture, paper, photo, and additional data Paper Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource [Education - High Stakes Testing] [2] Creator An entity primarily responsible for making the resource Domestic Policy Council Kendra Brooks Subject Files Is Part Of A related resource in which the described resource is physically or logically included. Box 4 <a href="http://clintonlibrary.gov/assets/Documents/Finding-Aids/Systematic/KendraBrookssubjectfile.pdf" target="_blank">Collection Finding Aid</a> <a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a> Provenance A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource. Clinton Presidential Records: White House Staff and Office Files Format The file format, physical medium, or dimensions of the resource Adobe Acrobat Document Publisher An entity responsible for making the resource available Clinton Presidential Library & Museum Medium The material or physical carrier of the resource. Reproduction-Reference Date Created Date of creation of the resource. 1/17/2012 Source A related resource from which the described resource is derived 647992-education-high-stakes-testing-2.pdf 647992