https://clinton.presidentiallibraries.us/files/original/c7b1612a7344820abb17b828213a9417.pdf b1a20f22a8b8283d36f82e81b9af20cc PDF Text Text TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES P·resident's Advisory Commission on Educational Excellence for Hispanic'Americans APRIL 2000 DRAFT DRAFT DRAFT. NOT FOR CITATION Prepared by': DRAFT DRAFT DRAFT DO NOT DUPLICATE' . Richard A. Figueroa, University of California at Davis Sonia Hernandez, California Department of Education �PRESIDENT'S ADVISORY COMMISSION ON EDUCATIONAL EXCELLENCE FOR HISPANIC AMERICANS Guillermo Linares (Chair)* New York, New York Erlinda Paiz Archuleta· Denver, Colorado Cecilia Preciado Burciaga Seaside, California George Castro* San Jose, California Darlene Chavira Chavez Tuscon, Arizona David J. Cortiella Boston, Massachusetts Miriam Cruz Washington, D.C. Juliet Villareal Garcia Brownsville, Texas Jose Gonzalez San Juan, Puerto Rico Waldemar Rojas· Dallas, Texas lsaura Santiago-Santiago New York, New York John Phillip Santos New York, New York Samuel Vigil· Las Vegas, New Mexico Diana Wasserman Ft. Lauderdale, Florida Ruben Zacarias· Los Angeles, California (*Members, Commission Assessment Committee) White House Initiative on Educational Excellence for Hispanic Americans Staff Sarita E. Brown Sonia Hernandez· Sacramento, California Cipriano Munoz· San Antonio, Texas Harry P. Pachon Claremont, California Executive Director 'Deborah A. Santiago Deputy Director Richard Toscano Special Assistant for Interagency Affairs Julie Laurel Eduardo J. Padron Miami, Florida Janice Petrovich New York, New York Policy Analyst Debbie Montoya to the Executive Director Assistant Danielle Gonazales Policy Intern Gloria Rodriguez San Antonio, Texas �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES ACKNOWLEDGEMENTS A very special thanks is extended to Dianna Gutierrez. We are indebted for the tremendous amount of research that she directed in the preparation of this report. Commendations are extended to her assistant, Marysol Flores. We must also acknowledge the help of many colleagues throughout the United States. They were all generous, prompt and supportive in their efforts. The Professional Staff of WestEd: Stanley Rabinowitz, Rose Marie Fontana, Sri Ananda, Robert Linquanti Guadalupe Valdes, Stanford University . Janette Klinger, University of Miami Jose Cintron, California State University Sacraniento Lila Jacobs, California State University Sacramento Linda Murai, Sacramento County Office of Education Pedro Pedraza, Center for Puerto Rican Studies, Hunter College Diane August, August & Associates Jon Sandoval, University of California at Davis Eugene Garcia, University of California at Berkeley Alfredo Artiles, University of California at Los Angeles Maria Trejo and David Dolson, California Department of Education " 2 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Forevvard . There is no more promising reform in public education today than the standards-based movement. It is'not only the most widely accepted school change'process, it also offers the greatest probability for leveling the playing field for all children, by clearly stating expectations for instruction, assessing the progress of each child toward achieving the standards, and holding schools accountable for student learning. Where these three core elements of a standards-based system--clear expectations,assessment and accountability--are in place, students experience success as never before. This is ,?specially true for the growing Hispanic student population in the United States, which has traditionally had limited access to rigorous mainstream instruction. But in the cLirrent rush to implement world-class standards supported by systems of accountability in the nation's public schools, state education leaders have compromised the educational future ofHispanic students by making high-stakes decisions based on inaccurate and inadequate testing information. Hundreds of thousands of Hispanic students, many lacking functional fluency in English, are assessed with a myriad of tests entirely in English and, oftentimes, only in English. ,The resulting data is used to determine high-stakes decisions, such as for student promotion or retention,'or high'. school'graduation--but rarely for the purposes of true accountability. When it comes to holding schools accountable for the academic achievement of our ~tudEmts,states allow Hispanic youngsters to. become invisible inside the very system charged with educating ~em. ' State policies often require that Hispanic students be assessed in English with tests they, may not even understand or with alternative but less rigorous tests in Spanish whether or not they are receiving instruction inthat'language. While' neither approach produces accurate information about student leaming, the resulting data is often used to hold students accountable for their own success, rather than the educators or the public school systems. Who should be responsible for vvhat Hispanic students learn in school? The answer is simple: students, educators, and parents all must share the responsibility. '. ' But what kinds of assessments should be ,used to provide accurate information about what students have been taught? Regrettably, the answer to this question is not as simple. It ,is explored in this document. . With few exceptions, students bear the weight of academic success or failure on the basis of one or two test scores. Where exemptions from testing exist~ Hispanics disappear 'from the accountability reports, triggering both positive and negative consequences for the responsible adults in the system. Thus more than two million . Hispanic students in the United States are underrepresented or absent from the rolls of stud,ents who are counted via ~ssessment and who, therefore, count. It is our belief that Hispanic studer)ts, whether they are English dominant or English Language Learners, should be tested with appropriate test instruments in order to be included at all times in the states' accountability systems. If th,is does not occur, 3 �· TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Hispanic children will not benefit from the powerful arid promising standards movement. As the United States enters the new millennium, deliberate action by policymakers at every level must be taken to include the country's fastest growing and soon-to-be largest minority, within the bounds of systems accountability using accurate information for decisionmaking. The purpose of this report is twofold: (1) to bring attention to the growing crisis of the "invisible" Hispanic students in public education to the nation's leaders and (2) to provide guidance to the nation and the states on taking the necessary steps to rectify the conditions that allow Hispanic students to be wrongly measured and unaccounted for in their own schools. It is our intent to help education leaders in this country choose wisely for the sake of the children. Commission Assessment Commitlee--President's Advisory Commission on Educational Excellence For Hispanic Americans Washington, D.C. September 15, 1999 4 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICYISSUES CHAPTER 1: INTRODUCTION All forms of human mental measurement are fragile and problematic (Gould, 1981). At their best, for example, psychometric tests account for a modest 2,5-35 percent of the variance of what they predict (Neiser, Bodoo, Bouchard, Boykin, Brody, Ceci, . He'lpern, Loehlin, Perloff, Sternberg, & Urbina, 1996; Cleary, Humphreys, Kendrick, & Wesman, 1975). This isa technical ceiling that test-makers have not succeeded in breaking for nearly a century. A fundamental assumption of all testing is that the normative framework (psychometric, criterion, or rubrics-based) on which the test scores are based assumes'a high degree of experiential homogeneity, culturailliriguistic similaritya'nd equity in learning opportunities among test takers (Colvin, 192·1; Woodrow, 1921; Heller, Holtzman, & Messick, 1982). UndeHhese conditions, a test score becomes , a measure that belongs pre-eminently to the individu.al and his or her talents, achievements, traits and predispositions. In a real sense, tests work best in a perfect democracy of monolingual and monocultural citizens. . Hispanic Americans present a mC)'ssiv.echallenge to the assumptions of tests. The vast majority has varying levels of exposure:to and proficiency in Spanish, though many· also come from other linguistic backgrounds (e.g., Portuguese, Catalan, Basque). Their cultural ancestries include Mexico, Latin America, Puerto Rico, Cuba, the Caribbean, . Spain and Portugal. Their cultural experiences in the United States are multigenerational and reflect a broad range of acculturation levels, soCioeconomic differences, and political power. So vast is their heterogeneity, that the assumptions of tests about homogeneity may well be untenable. Yet, Hispanic stUdents and Hispanic citizens are tested every day and are compared to middle class America in the unique reification of democracy and assimilation that tests impose. But the history of testing Hispanics in the United States has never been typified by equanimity. . There are few issues in Americanpsychol~gy or education that are as complex or as misunderstood as the testing of Hispanic students. Two fundamental questions have , challenged and continue to perplex test-makers, test-givers and test~users: Does Spanish in the home or as the primary language affect test scores? and, Do any aspects of Hispanic culture in the. United States attenuate or .change test outcomes? In the 1930s, the great Mexican American psychologist, George Sanchez, ' addressed both issues. The reiative responsibility of the school and of the child in the achievement of desirable goals must be examined. Is the fact that a child makes an inferior score on an intelligence test prima 'facie evidence that he is dull? Or is it the function of the test to reflect the inferior or different training and development with which the child was furnished by'his home, his language, the culture of his people, and by his school? When the child fails in promotion is it his failure or has the school failed to use the proper whetstone in bringing out the true temper and quality of his steel? The school has the. responsibility ot'supplying those experiences to the child which will make the experiences sampled by standard measures as common to him as they were to those on whom the norms of the measures were based. When the school has met the language, cultural. disciplinary, and informational lacks of the child and the. child has, reached a saturation point of his capacity, in the assimilation of fundamental experiences 5 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES and activities - then failure on his part to respond to tests of such experiences and activities may be considered his failure. As long as the tests do not at least sample in equal degree a state of saturation that is equal for the "norm children" and the particular bilingual child it cannot be assumed that the test is a valid one for the child. (Sanchez, 1934, pgs. 770-771) Presently, the impact of cultural differences on test scores remains understudied. Most of what is known about cultural effects comes from the use of U.S.-made tests on foreign populations. Anthropologists were early consumers who believed that the scientific nature of tests made them appropriate for universal use. Very little is known about cross-cultural differences in testing, in fact, precisely because monocultural tests, when translated into the local language, yielded predominantly lower scores and Anglocentric interpretations. There are, however, some notable exceptions. Holtzman, Diaz~Guerrero, & Schwarts (1975) conducted a longitudinal study comparing approximately 400 middle class Mexican children with 400 middle class white children from northern Texas. One of the unique aspects of this investigation was that a comprehensive attempt was made to make all the sociological, psychological and educational tests and scoring protocols appropriate for Mexican students and their families. The result was a compelling description of cultural differences as well as of the production of knowledge about how psychometric tests need to undergo a radical overhaul for crosscultural use and how cultural bias can subtly affect scores .. Regrettably, this type of investigation has never been replicated with Hispanic children and their families living in the United States. By and large, the study of cultural differences in testing has always operated from a "black box" design. Culture ha$ resided in the "Puerto Rican", "Mexican," or "Cuban American" samples used (Valdes & Figueroa, 1994). The only cultural effect on U.S.~ normed, English~language tests has been lower scores. Interestingly, these types of "black box" studies have seldom found evidence of lower test reliabilities or validities because of cultural differences (Cleary, Humphreys, Kendrick, & Wesman, 1975; Geisinger, 1992; Sandoval,Frisby, Gei~inger, Scheuneman, & Grenier, 1998). The same has not been true for the one cultural variable that has left its mark on virtually every investigation using tests with Hispanic populations. Linguistic exposure to Spanish has affected every type of psychometric test and test score given in the United States (Valdes & Figueroa, 1994). It is the one variable for which there is evidence of psychometric bias (Figueroa & Garcia, 1995). It is the one variable that finally has drawn the attention of the scientific community as a complex disrupter of established testing policies and practices (Pellegrino, Jones, & Mitchell; 1999). 6 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 2: THE HISTORICAL CONTEXT In 1922, Charles Brigham published The Study of American Intelligence, an analysis of government data from testing conducted on World War I recruits. A subset of the large sample included 11,300 foreign-born recruits who were tested with the Army Alpha (a verbal test of intelligence) and the Army Beta. (a nonverbal test of intelligence). Approximately 32 percent of them had been in the United States 0-5 years,38 percent 6-10 years, 17 percent 11-15 years, 7 percent 16-20 years, and 6 percent over 20 years. Because of its high verbal content, the Army Alpha was a good measure of English language proficiency. In spite of Brigham's tortured defense of the integrity of this sample of foreign-born recruits. there is a strong indication that after 10 years the sample reflected a different type of immigrants-those who did not return to their native countries and who in all likelihood had adapted culturally and linguistically. Taking this into account, the increase in Army Alpha scores for the first two, five-year groups was a meager .11 of a point. This is one of the first major empirical findings that showed proficiency in a language besides English systematically produces lower verbal scores- . possibly for avery, very long time. The 1920s and 1930s produced a great amount of research on ethnic minorities in the United States. Much of it came under the title of "Race Psychology" and reflected a naive use of test scores to support genetic arguments about lower intellectual potential in non-Nordic groups. A considerable amount of this published work included Hispanic test subjects whose linguistic backgrounds can generally be described as "bilingual." That is, they came from homes where Spanish was spoken and with varying and unknown degrees of proficiency in Spanish. Most.of these studies were conducted on Mexican American children. Several conclusions can be extracted from this early research on bilingual test takers. First, the test results of bilingual individuals compared to those of monolinguals, for all age groups, consistently produced a profil~ of lower (English) test scores regardless of the test being used. This was most pronounced in tests of verbal intelligence, although a similar profile appeared in tests of academic achievement (Brown, 1922; Cebollero, 1936; Johnson, 1938; Koch & Simmons, 1926; Manuel, 1935; Pratt, 1929). There, English-dependent skills such as vocabulary, comprehension, sentence completion skills, analogies,essay composition, etceteras were markedly low in bilingual test-takers ill comparison to their arithmetic and memory skills. The effect of differences in exposure to English appeared to be unerasable (Saer, 1923), or as the Brigham study showed, virtually unerasable. This phenomenon became widely known as the "language handicap" of all immigrant test·takers. In many research publications, this provided a rationale for denigrating or eradicating bilingualism and instruction in the primary language. Second, the psychometric properties of tests showed a curious profile. Bilingualism had no effect on the internal consistency and stability of tests, particularly indices of reliability (Figueroa, 1990). But on the critical external indices of validity, particularly predictive validity, bilingualism appeared to attenuate the power of tests 7 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES (Altus, 1945; Davenport; 1932; Feingold, 1924; Garth, 1928; Paschal & Sullivan, 1925; Pintner & Keller, 1922; Wheeler, 1932; Wood, 1929; Yoder, 1928). Third, some anomalous data appeared. Bilingual individuals from middle or upper-middle class homes occasionally either outperformed monolingual, English . speakers or did as well in test scores (Darcy, 1946; Feingold, 1924; Manuel, 1935; Pintner & Arsenian 1937). The "bilingual handicap" in effect was cured by advantaged or enriched environments and backgrounds. Clearly, however, in the early part of the 20th century, foreign-born individuals with such cultural capital were relatively rare. Also, individuals with two under-developed languages did worse on tests than individuals with a single, educationally developed foreign language (Altus, 1949; Arsenian, 1945,; Smith, 1949, 1957). Another finding that to this day remains present and unexplained is the ability of bilingual individuals to do better than English speakers on recalling digits backward (a staple of IQ tests since their inception) (Darsie, 1926; Hung-Hsia. 1929; Jensen & Inouye, 1980 Luh & WY,1931; Manuel, 1935). Finally, on school grades the "bilingual handicap" did not materialize to the same degree or persistence as on tests (Bell, 1935; Smith, 1942). Fourth, the psychometric, scientific community began the unfortunate procedural tradition of dealing with cultural groups as monolithic entities (e.g., Garth, 1920). The "Mexican" sample operationalized "Mexican culture". Socioeconomic and other intervening variables were often ignored. English language proficiency in test subjects remained as an uncontrolled source of error. Background factors such as educational backgrounds or the segregated nature of public schooling for bilingual students were overlooked. The methodological flaws in the design of studies with bilingual persons, in effect, were substantial and virtually precluded reasonable inferences or attributions. Many of these design flaws continue (e.g., MacMillan, Gresham, & Bocian,' 1998; Sandoval, 1979).· . For test users, however, the "language handicap" produced several innovations or, in the current lexicon, a series of accommodations. The testing community came to believe that nonverbal tests of mental ability were free of linguistic factors and were culturally neutral (Brigham, 1922). To this day, they are seen as culture fair measures of intelligence, mental aptitudes or personality. Tests were often simply translated without conducting norming studies (Lester, 1929; Mitchell, 1937; Paschal & Sullivan, 1925). These translations were used for research purposes and for conducting actual assessments. Eth'nic norms were occasionally produced for some bilingual groups (Ammons & Aguero, 1950; Luh & Wy, 1931). Many caveats and precautions on the use of tests with bilingu·al subjects were voiced. For example, Charles Brigham, the father of the modern SAT (Scholastic Aptitude Test) and the principal investigator in "The Study of American Intelligence," also concluded that: For purposes of comparing individuals or groups, it is apparent that tests in the vernacular [English] must be used only with individuals having equal opportunity to acquire the vernacular of the test. This requirement precludes the use of such tests in making comparative studies of individuals brought up in homes in which the vernacular of the test is not used, or in which two vernaculars are used. The last condition is frequently violated 8 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLlcvlSSUES here in studies of children born in this country whose parents spe,ak another tongue. It is important as the effects of bilingualism are not entirely known. (Brigham. 1930, pg. 165) Some 70 years later, "the effects oJ bilingualism [still] are not entirely known." What has changed is that there are more caveats about testing bilinguals. 9 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 3: TESTING STANDARDS AND OFFICE OF CIVIL RIGHTS (OCR) GUIDELINES . The most important and potentially powerful set of regulations and policies on the development and use of tests in the United States is the, Stancfards for Education{!J1 and Psychological Testing (American Equcational Research Association, American Psychological Association, National Council on Measurement in Education, 1999). They are the, "standard of the industry" and constitute somewhat of an ultimate arbiter on all matters related to test development and usage. The importance of the Standards for addressing the problems associated with testing Hispanic stUdents cannot be overstated. ' " , What is sing!J1arly unique is that the current Standards have outdistanced current test technology and testing practices with "Individuals of Differing Linguistic Backgrounds." The gulf between what the Standards promulgate and what test developers and test 'users actually do is very large. Given the directives propOsed by the Office forCivil Rights in their "Nondiscrimination in High-Stakes Testing: A Resource Guide",(U.S. Department of Education. draft, December 1999), this gulf may well constitute a denial of substantive, due process with Hispanic students, and citizens. The following is a historical look at how the current Standards evolved with respect to testing "bilingual" individua'is. A review of the OCR Guidelines then follows. ' THE STANDARDS FOR EDUCATIONAL AND PSYCHOLOGICAL TESTING The first set ofthese Standarqs appeared in 1966, even though both the , American Educational Research Association and the American Psychological Association had address~d the issues attendant to achievement and, psychological/diagnostictesting ,in two prior, separate documents respectively in the mid-1950s. The 1966.Standards for Educational and Psychological Tests and Manuals (American Psychological Association, American Educational Research Association, National Council on Measurement in Education, '1966) had one overriding goal: ''the essential principl~ underlying this document is that a test manual should carry information sufficient to enable any qualified user to make sound judgments regarding , the usefulness and interpretation of the test" (pg. 2). Behind this "essential principle" was the recognition "that tests are used in arriving at decisions which may have great influence on the, ultimate welfare of the persons tested, on educational pOints of view and practices, and on development and utilization of human resources" (pg. 1). Interestingly, in the entire 36 pages of text, there are only occasional references to general demographic variables that should be addressed by test manuals. There are only two instances when these references vaguely touch on linguistic and cultural diversity. In both instances they are not prescribed as ESSENTIAL: 10 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CS.S. If the validity of the test is likely to be different for subsamples that can be identified when the test is given, the manual should report the results for each subsample separately or should report that no differences , were found. VERYDESIREABLE ' (pg. 20), 0.2.21. [concerning the psychometric index of reliability] Demographic information, such as distributions of the subjects with respect to age, sex, socioeconomic level, intellectual level, employment status or history, and minority group membership should be given in the test manual. DESIREABLE (pg.28) In 1974, the new edition of the Standards for Educationaland Psychological Tests {American Psychological Association, American Educational Research Association, National Council on Measurement in Education, 1974) paid more attention to the issues of cultural and linguistic diversity. There was recognition that the validity of a test could be attenuated for certain groups and under certain conditions. This acknowledgement was due, in great part; to the impactof federal court cases alleging diagnostic bias in tests. In many school districts, tests produced inflated incidence rates of mental disabilities among Latino and African American stu<:1ent populations. Typically, these inflated incidence rates appeared with greater frequency than in the past. 81.3. The manual should call 'attention to marked in-nuences on test scores known to be associated with region, socioeconomic status, race, creed, color, national origin, or sex. Essential [Comment Social or cultural factors known to affect performance on the test differentially, administrator errors that are frequently repeated, examiner examinee differences, and other factors that may result in spurious or unfair test scores should, for example, be clearly and prominently identified in the manual.] (pg. 14) Of even greater importance, two warnings appeared about the possible impact of tests and testing practices on English-language learners. Standard G2 directed test users to know the research literature on tests and testing particularly with respect to the problems associated with testing individuals with "limited or restricted cultural exposure." Standard G2 suggested that theoverrepresentation of African American and Spanish speaking children "with limited cultural exposure" was caused by test users' lack of knowledge about the limitations of tests when cultural differences existed~ Standards J5, J.5.3, and J.5.3.1. went even further. They recommended that when there were great cultural differences between the test taker and the test's norming sample, the tester should not test ("Essential"). They also set forth an accommodation that has become exceedingly popular: when there are no appropriatetests for a given person or popUlation, the tester should use "a broad-based approach to assessment using as many methods as are available to him. Very Desirable" (pg. 71). What this was interpreted to mean by many was to do more assessments with more tests. The Comment for this Standard elaborated on this .. 11 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES [Comment: The standard is to do the best one can. This perhaps includes the use of a test, even though no appropriate normative data are available, simply as a means of finding out how the individual approa'ches the task of the test. It might include references, extensive interviews, or perhaps some ad hoc situational tasks. Efforts to help solve educational or psychological problems should not be abandoned simply because of the absence of an appropriate standardized instrument.f(pg. 71) When a test or tests are not appropriate, giving more tests simply to see how an individual handles the testing situation is questionable. Data exist shovying that in some high-stakes testing situations, giving more tests helps neither the tester (Mehan, ' Hertweck, & Meihls, 1986) nor the child (Taylor, 1991). Whena test is not appropriate. it should not be given if it may hurt an individual or lead to serious negative consequences for the individual. The use of an inappropriate test can only be justified if it has little or no consequences for the individual and if it helps assess system effects on similar individuals. For Hispanic students, the use of inappropriate tests is a national problem with a long history of abuse. As the Comment cited above underscores, there are, " alternative ways to help solve psychological and educational problems. With Hispanic children, these must be linguistically and culturally appropriate. In 1985, the Third Edition of the Standards was published (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1985). The challenges posed by the multiple aspects of . diversity (culture, language, disability, gender, socioeconomic status, etceteras) appear throughout the entire document. But the most Significant evolution of the Standards is Chapter 13, "Testing Linguistic Minorities." In many ways, Chapter 13 was revolutionary. , The text that introduced the seven Standards for this chapter began with the most profound acknowledgment. "For a non-native English speaker or for a speaker of some, dialects of English, every test given in English becomes, in part, a language or literacy test" (pg. 73) of English. Basically, this means that for bilinguals who have been exposed to another language, every test, except a test of English language proficiency, contains an unknown, systematic degree of error. Such tests, in effect, are biased because they may not be measuring accurately whatever is being measured. Accordingly, the Standards called for "special attention" to these issues on the part of test development, test use, and test interpretation. It was also recognized that bilingual individuals vary , extensively in their functional, academic and literate use of each language separately or simultaneously. Also, cognitive processing in the weaker language is more fragile and can be slower. Language background, in effect, is an important consideration in all aspects of testing and test validity. With respect to using tests t,hat are in the primary language of bilingual individuals, the Standards made several, key pronouncements. Translating a test does not guarantee that the test items will have the same degree of difficulty in the other language,' The latter must be empirically established. For example, a straight translation of a second-grade test of reading ability will not necessarily yield a second-grade . reading test in the other language. Tests for determining English language profici~ncy 12 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES are vitally important for making educational placement decisions. However, these tests must assess multiple dimensions of linguistic ability. Chapter 13 also made a distinction between "natural" uses of language and more formal, cognitively demanding uses. Because of these "special difficulties" attendant on the use of tests with persons who have not had adequate exposure to the language of the test, it was suggested that more testing and observations be done with them. As previously noted, more testing was and is a questionable policy. Chapter 13 also acknowledged the possible influence of culturally mediated ways of responding to test questions. Elaborated speech may not be congruent with culturally specific ways of speaking to adults. When these factors are ignored the validity of . interpret~tions and recommendations maybe questionable and harmful. Chapter 13 of the 1985 Standards was vitally important for the educational and psychological testing of English-language learners in the United States. There were, however, several problems. First, it was not known how well the testing industry and professions would abide by them. Throughout the 1980s and 1990s, the manuals of most tests showed that Chapter 13 of Standards was routinely ignored. Also, these Standards ignored several, historical practices and problems. They did not address one of the most widely used techniques for testing English-language learners--the use of interpreters who either translate the test into the primary language on the spot or who help administer a test that has already been translated. They were silent on the apparent inability of tests and test users to differentiafe among cultural factors, language ' proficiency levels, and mental/emotional disabilities. They endorsed a historical solution for what to do when there are no tests ("test more") in spite of the fact that there was no evidence that this worked. In fact, there was evidence to the contrary. In August of 1999, a new edition of the Standards was approved. Chapter 9 is titled, "Testing Individuals of Differing Linguistic Backgrounds". Just as in the 1985 Standards, the narrative introducing this chapter cautions that, with 'indiViduals from diverse linguistic backgrounds, tests that are in English become tests of English ability to a degree that is generally more pronounced than with monolingual English speakers. With individuals of varying levels of bilingualism, tests may fail to measure what they intend to measure. Accordingly, norms developed for monolingual English-speaking populations should either not be used or should be interpreted with the understanding that English language proficiency is a contaminating factor. Precautions regarding processing speed factors are also raised. The chapter suggests accommodations should be undertaken with English-language learners. It also notes that cultural factors can affect test scores, so attention should be paid to these factors. The problem with this part of the narrative in Chapter 9 is that, in spite of acknowledging the complexities associated with testing bilingual students, the precautions are tenuous and weak. Chapter 9 repeats many historical caveats about translating tests without conducting norming stUdies. Back translations are specifically mentioned as being inadequate by themselves. A similar point was made in the 1985 Standards. Yet, . publications describing and endorsing this process continue to appear {(3eis,inger, 13 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES 1994a,b). A translated test is an inappropriate test. The practice should be proscribed nationally. . The 1999 Standards break new ground along several dimensions. They note that ,several issues raised in Chapter 9 apply to persons with disabilities that affect communication such as deafness and visual impairments. This connection with disability appears to be part of a general trend in addressing the issues related to linguistic diversity. It also appears in a new, important text commissioned by the American Psychological Association (Sandoval, Frisby, Geisinger, Scheuneman, & Grenier, 1998). But, historically. bilingualism all too often has been equated with a handic,ap. Linking the test accommodations appropriate for bilingual learners with those appropriate for students with disabilities is exceedingly problematic for bilingual children. Accommodations may seem similar, but their use and outcomes may be different for bilingual children (Thurlow, Liu, Erickson, Spicuzza, & EI Sawaf, 1996). This is a matter for serious consideration. Being bilingual is not a handicap. but an asset. . The new 1999 Standards discuss several types of test accommodations that may have to be done with English-language 'E!arners: using only sections of the test that match the linguistic proficiency of the test-taker, changing the test and response formats, administering the test in a different context, and allowing more time for taking the test. Most of these modifications are currently understudy. It is difficult to see, however, how theSe will overcome the well-documented, historical impact of bilingualism on tests. It is difficult to see how testing a student in the wrong language, or testing for content that ' has not been taught, or testing for cultural material that is not in a Hispanic child's i repertOire will be made fair by the changes suggested in these accommodations. The issue of "equivalence" receives a great deal of attention in the new Standards. This refers to several aspects of test-development, use and interpretation, for example: the degree of confidence that a test-user can exercise in determining whether a test score means the same for someone who is unlike.the norming population, the equivalence across translated and renormed versions of the sam~ test, and equivalence in psychometric characteristics. As versions of the same test appear in both English and Spanish, this will become a major topic for reSearch and examination. However, some ' have asserted that the conceptual basis for testing bilingual childr.en in the United States using monolingual norms in both Spanish and English may be flawed (Grosjean, 1989; Valdes & Figueroa, 1994). It is argued that a bilingual child cannot validly be co'mpared against norms for children whose linguistic experience and development is with only one language. Bilingual children need norms derived from bilingual norming samples, controlling for differential levels of linguistic proficiencies. This issue urgently needs empirical studies and calls for an Immediate analysis. A new directive in these Standards calls for taking into account a determination of both language dominance and language proficiency. Consideration should be given to . the possibility that bilinguals may have "domain-specific" competencies in one or both languages. For example, a bilingual person may have competency .in speaking Spanish but not in reading Spanish. It is recommended that an individual's degree and type of . bilingualism be understood in order to use test results properly. This directive has 14 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES particular relevance for state wide testing .programs. Clearly, if achievement measures are interpreted without the degree of linguistic information suggested by the· new Standards, test results may not be correctly analyzed or understood. Extensive attention is given to the process of administering tests and to the possible impact of test-giver variables (culture, bilingualism, gender, time limits, and the use of interpreters). Pi crucial principle is recommended for testing English-language learners: give them enoughtime to finish the test and to show what they know and what they can do. This principle should also be applied in state wide testing programs across the United States. . ' Or)e of the most surprising parts of these new Standards' is the attention given to the use of interpreters. Not only are there multiple precautions, there is also a veritable map for how to train and. use interpreters. The unfortunate .part of this section of the Standards is that there is no empirical evidence that even remotely validates any of the procedures for using interprete~s. In fact, several, new dissertations are reporting findings to the contrary (e.g., Sanchez-Boyce, 1999). ' , There are actually11 Standardsin this new chapter, ''Testing Individuals of Differing Linguistic Backgrounds." Most are similar to the ones promulgated in the 1985 edition. Because of their vital importance for the testing of Hispanic children, they are each reviewed and critiqued here: In Appendix A, they are reproduced in their entirety. There are several meta-issues in these new Standards that are very important. Test users are charged with the responsibility of determining when a test may be inappropriate with linguistic minorities because they do not know "the language of the test"'(Conimentfor Standard '9.1). Similarly, test developers are held responsible, plausibly under "legal or regulatory requirements," for collecting evidence of test validity when the~e is research indicating differential meaning for test scores for a linguistic group.' In a' break with historical practices, the use of "representative" norming samples for these validity studies is proscribed. Separate norming studies specific to a linguistic group are called for (Comment for Standard 9.2). Test users are also required to use professional judgement in order to determine language proficiencies prior to testing. Then they are to test in either the ma'st proficient language or using both languages in order to assure construct validity (Comment for Standard 9.3). This is a highly ambitious directive that rests in some exceedingly tenuous assumptions: c;tS it applies to Hispanic students, it is assumed tl1at there are equivalent language proficiency tests in Spanish and English, that such equivalenttests can measure the complexity of linguistic proficiency in both languages, that suchtests . ' would have universal application among Hispanic Americans in the United States, that variation in linguistic proficiencies can be used to interpret an individual's test score (how' does one interpret a score ina language that is 70 percent proficient and another score in a language that is 55 percent proficient?). and that bilingualism is the sum of two languages (in which case language proficiency testing makes some sense) rather than a linguistic unit (in which case linguistic proficiency testing may be of limited use). 15 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Standard 9.4 seems to tacitly accept the validity of "linguistic modifications" of tests that are in English. These may involve changing the test to the test taker's primary language (translating?) or altering the test given in English. There are no justifications for this Standard (9.4) and the historical, empirical literature reviewed in this document argues against modifications such as translations. The Standards (Comment on 9.4) place the responsibility for justifying these modifications on test developers. The current research on test modification~ for English learners is not sufficient to warrant the existence of this Standard. . Standard 9.5 address~s the issue of "flagging" a test score when "linguistic modifications" were provided during the testing. The Standard basically suggests that such flagging may be unfair, and not useful if the score from the modified administration is "comparable" to the score on the "nonmodified" administration and if there is "no reasonable basis" for thinking that such modification affects "score comparability." This is a very problematic Standard because of its lack of specificity. Are tests to be administered to linguistic minority individuals with and without modifications to see if there is comparability? What is a "reasonable basis" for determining comparability among scores? More than anything, the problem with this Standard comes from the unknowns related to "linguistic modifications," or what in the literature is known as "test accommodations" for English language learners. The current, available literature on such accommodations (Abebi, 1999a,b,c; Thurlow, Liu, Erickson, Spicuzza, & EI Sawaf, 1996) makes several points. The use of accommodations varies greatly across and within states that provide·such test adaptations. The question of who gets accommodations also varies greatly within and across these states. The most popular accommodations in statewide testing programs are: allowing for extra time, using of a bilingual dictionary, being tested in a separate room, receiving oral translations of directions, offering multiple testing sessions, answering questions, providing written and Qral translations, having words defined, and allowing for students to mark the test booklets (Thurlow, Liu, Erickson, Spicuzza, & EI Sawaf, 1996). These researchers also note: "Few accommodations are universally allowed, and further research on the appropriateness and technical adequacy of different types of accommodations would be beneficial" (pg. 13). Actual, empirical studies of accommodations (Abebi, 1999a,b,c) have produced modest results in improved test scores of bilingual children. This applies to achievement tests that are among the easiest to "accommodate," namely, math tests. In fact, one could argue that the study of accommodations has made its greatest contribution to children who do not need accommodations. Researchers have found that tests often include test language that is needlessly obtuse and immaterial to the construct being measured~ Cleaning up such language improves the performance of all test takers. Research on test accommodations is currently insufficient to support Standard 9.5. Testing bilingual, Hispanic children on an English test with accommodations may not b~ adequate to remove the high level of "distortion" or the construct-irrelevant error (Kopriva, 1999) implicated in the assessment of bilingual learners since the 1920s. Accommodations, in effect, may prove to be a subterfuge procedure for testing in the 16 �, i " TESTING HISPANic STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES wrong language. Data already exist showing that some of the most' popular accommod~tions--translating and interpreting:-simply do not work. Standard 9.6 requires test developers and users to explicitly address the questions related to test use and interpretation with non-native speakers. This Standard , seems to eXGluqe simultaneous bilinguals from such consideratiori It is, regrettably, an example of the 1999 Standards' lack of precision ~bout the Complexity of bilingualism. ' Standard 9.7 establishes a rule for using translated tests. The methods of translating have to be described and so does the evidence for establishing validity and reliability across language groups. This basically asserts, once and for all, the need to ' go beyond translations before inte'rpreting or using test scores . .But this 'Standard is 'problematic. Properly translating a test and then establishing its reliability and validity within different Hispanic cultural'groLips is really only a first step. There is also a ne'ed to ' establish the validity and reliability of the test within different levels of linguistic pro'ficiencies within different Hispanic cultLiral groups. ' This is another example of how the Standards are fundameritally naIve about the linguistic nature of Hispanic populations inthe.United States: A.well-:translated, well- ' normed test'will be confronted not by Spanish speakers of one level of proficiency, but by the typical bilingual Spanish- speaking population of this country: culturally diverse and bilingually diverse. ' ' , ' ' Standard 9.8 speaks to the need for concurrence between what a test measures' and what a cr~dential or an occupation demands in terms of actual performance on the ' job. Particular attention is given in this Standard to the equivalence that should exist in the linguistic demands of the test and those of tbe job. Standard .9.9 req~ires that tests that are availab'le in ~o languages provide evidence that each linguistic version is comparable to the other in, terms of reliability, validity (particularly construCt validity) and other data. Once again, however, this : Standard does not address,the existential reality of bilinguals in the United States. Tests that are available in two languages have to demonstrate equivalence across a wide spectrum of linguistic abilities. ' L .' Standard 9.10 is critical to this chapter as well as to all testing of Hispanic individuals. The measurement of linguistic proficiency sh9Uld be done across "a range of language features" (pg. 154) and in more than one testing format (such as multiple ' choice)., Language profiCiency is a crucial covariate, or control measure in much of what Chapter 9 of the new Standards propo~es as solutions to testing bilingual individuals. Here, the requirement is that language proficiency be measured in multiple ways. This is an important directive, a critically necessary albeit insufficient step in testing bilingual populations. What remains 'unaddressed is how such multiple measures bf linguistic, ' proficiency are to be used for the interpretation of test scores in both the primary and the secondary language and across the language 'proficiency profiles that exist in Hispanic ' and other bilingual populations. 17 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Standard 9.11 is on interpreters. It touches on what is probably the most common historical accommodation in the testing of Hispanics. Unfortunately, it is the most problematic Standard in this chapter. As mentioned, there are no data to substantiate the assumption that it is possible to use an interpreter without severely and negatively affecting the standardization requisites, psychometric properties and the interpretation of test scores. The Standard seems to sanction the translation of tests by the interpreter and requires that the tester assume responsibility for the competence of the interpreter when there is no empirically validated model for training interpreters. Also •. in most real life situations, it is the school district or the clinic that is responsible for selecting. training, and assigning interpreters to testing situations. This is a Standard that urgently needs more deliberation and research. One interesting omission in the new Standards is the historical accommodation that when there are no appropriate tests available, more testing is an implied and acceptable methodology. This is a welcome change. However; this change should be broadly publicized in order to stop the practice of "more testing." THE OFFICE FOR CIVIL RIGHTS' RESOURCE GUIDE The issue of nondiscriminatory testing took on a unique significance in the federal courts in the early.1970s because of the overrepresentation of minority students in classes for pupils with disabilities. The most current and extensive analysis of nondiscriminatory assessment has been done by the U.S. Office for Civil Rights (U.S. Department of Education, draft, December 1999). Their Resource Guide, however, does not just address nondiscriminatory assessment from the perspective of special education diagnosis. It extends the application of nondiscriminatory assessment beyond special education to all "high-stakes testing".and all assessment methods (norm-referenced, criterion-referenced, and alternative testing methods). An important point highlighted by this Resource Guide is that nondiscriminatory assessment must be seen as part of the high-standards movement in American education. It must include, a priori, equity in·the . provision of opportunities to learn for a" students. lri educational contexts, tests function as measures of system accountability and as measures of current status or prediction for the student. With this in mind, the . Resource Guide underlines a critical distinction made by the courts between educational and employment testing: Inests predict that a person is going to be a poor employee, the employer can legitimately deny the person the job, but if tests suggest that a young child is probably gOing to be a poor student, a school cannot on that basis alone deny that child the opportunity to improve and develop the . academic skills necessary to success in our society. (larry P. v. Riles, 1984; cited in U.S. Department of Education, draft, December 1999, pg. ii) The OCR Resource Guide is fundamentally an exposition of the "testing and assessment principles that lie at the core of Title VI of the Civil Rights Act of 1964 (Title VI) and Title IX of the Education Amendments of 1972 (Title IX) ..... {U.S. Department of 18' �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Education, draft, December 1999, pg. i). Two legal theories of test discrimination are presented: disparate impact and disparate treatment. " . The analysis ofdisparate impact concentrates on whether test practices and policies, regardless of neutrality of application, produce "adverse consequences" (pg. jv) with specific racial, gender or national origin groups. The negative consequences described include "granting or denial of benefits or opportunities" (pg.' iv). "Educational necessity", for example .placement or student designations, is the only exception in·this regard. This means that tests must be reliable and valid for their intended educational purpose. Further, tests may not be used under this analysis if they are "not the least discriminatory practical alternative that can serVe the educational institution's educational purpose" (pg. iv). There are three criteria that define "disparate impact." . First, a test yields disparate results based on race, national origin, or gender. Second, the test has no educational utility. Third, there are no other practical, valid, or reliable alternatives to assess the students. . For Hispanic students, ample evidence exists showing that tests do cause disparate outcomes in high-stakes decisions: high special education representation rates for LEP:students in certain categories of disabilities (United States Department of Education, 1993); low representation rates in programs for the Gifted and Talented (Callahim, Hunsaker, Adams, Moore,.& Blend, 1995); and low representation rates in higher education (President's.Advisory Commission on Educational Excellence for Hispanic Americans. 1996). For all students, the question of a test's 'educational usefulness can often be answered in the negative with respect to Gurricular, remedial, or pedagogical decisions about an individual. For example. the most test.;.driven educational document, the special education InqividualizedEd~cation Program (IEP). has not produced educational' benefits fo children with disabilities (Skrtic, 1991). A clear distinction needs to be made here that "educational usefulness" is both an individual as well as a system . consideration. It may work for the latter but not the former, particularly when a test score fora bilingual Hispanic child contains systematic error (American Educational Research Association, et aI., 1985. Chapter 13; 1999, Chapter 9). The educational usefulness of tests to Hispanic students is an issue that needs serious, empirical consideration. The possible attenuation of tests' predictive validities with Hispanic, bilingual populations augurs badly for most forms of instructional validity or educational utility . . Invalid inferences are highly probable'when tests are used on Hispanic children with varying degrees of exposure to a language other than English. The tests measure something other than what they intend to measure. Predictive validity stUdies that control for language background strongly indicate that psychometric bias is a real possibility in the testing of students from diverse linguistic backgrounds (Figueroa, 1990; Figueroa & . Garcia, 1994). There is a great need for ·Iarge, longitudinal studies on the' predictive validity of tests used in educational contexts holding linguistic backgro~nd' and proficiencies as controls. If this type of predictive bias is further substantiated, the. legal theory of disparate impact with Hispanic students would. be significantly strengthened. 19 �TESTING HISPANIC STUDENTS IN THE UNITED STATES; TeCHNICAL AND POLICY ISSUES Practical alternatives to these tests include grades, portfolios, and student work products keyed to rubrics. However, the requirement that the measures be reliable psychometrically, is somewhat problematic. The history of testing Hispanic students clearly shows that reliability indices are insensitive to linguistic or cultural differences. Also, the requirements for educationally useful alternatives to tests should focus on visual and instructional indices of validity for appropriate consequences since criterion indices of validity with Hispanic students are currently suspect (Figueroa, 1990; Figueroa & Garcia, 1994). On the matter of cut-off scores, the OCR Resource G'uide relies on the principle that "the method and rationale for setting the cut score" including the technical analyses, should be presented in a manual or in a report" (American Educational Research Association, et aI., 1985, Standard 6.9). The most prevalent use .of such scores occurs in colleges and universities with respect to admissions. Yet, institutions of higher' learning typically do not provide any technical analyses that would justify a particular cut off score on educational grounds. More often than not, universities set up cut-off scores without any empirical consideration as to'what score differentiates between those who can learn in university settings and those who cannot. Cut-off scores ignore the systematic under-education of Hispanic students. Whereas institutions of higher learning may see them as indices of merit, for most Hispanic students they are also measures of unequal opportunities to learn at the K-12 level. Cut-off scores are largely responsible for Latinos underrepresentation in institutions of higher learning. Further, as the National Council of La Raza reports, California's Proposition 209 banning affirmative action programs in colleges and universities may re-establish the prominence of using cut·off scores in the SAT and GRE exams for admission purposes. The impact of this would only exacerbate an already inequitable situation. Between 1987 and 1997, Hispanic, students' SAT scores decreased in relation to white students' (National Council of La Raza, 1998). . The analysis of disparate treatment focuses on whether testing policies or practices are done differently for individuals or groups with distinct racial, national origin, or gender characteJistics~ Examples of differential treatments would include "being tested under different conditions" (pg. iv) or "whether students with the same test scores are ... .treated differently by an educational institution" (pg. iv). The OCR Resource Guide fails to consider the possibility that, for a Hispanic stUdent from a linguistically or culturally different background, tests administered in English are tests given "under different conditions" (pg. iv) than those for a monolingual, monocultural student. Clearly, given the evidence reviewed here and 'acknowledged by the.Standards for Educational and Psychological Testing (American Educational Research Association, et aI., 1985; 1999) this is an ,issue that should be addressed by the U.S. Office for Civil Rights. The Resource Guide's section titled Equal Opportunity for Limited-English Proficient Students (pg. 9) is quite inadequate in this respect. Some of its suggested accommodations lack any empirical justification (such as "bilingual dictionaries") and may actually attenuate psychometric properties. Similarly, the suggested "Remedies" are problematic for Hispanic children and youth: test more, revise the test, substitute the test. 20 �( TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Presently, a strong argument can be made that tests produce disparate impacts and that they do constitute a disparate treatment with regard to Hispanic students from diverse linguistic backgrounds. The viability of such arguments should be debated. It should be done on two levels: the legal/policy level and the psychometric/professional· level (with respect to consequential validity). 21 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 4: BIAS In the early part of the 20th century, the discussion of bias in tests and testing focused on two areas: the impact of the "language handicap" experienced by bilingual individuals and the possible misinterpretation of test data to assert genetic differences among groups, particularly with regards to intelligence. But it was not until the 1960s that the' problem of test bias became prominent. Tests were linked with tracking and segregation policies in school districts in several court cases. Most notably in Hobsen v. Hansen (1967), the pivotal use of academic aptitude tests for tracking African American children in vocational, highschool programs was outlawed by the court. The court found that the tests were biased because they could not really measure student learning potential and because they produced the sort of segregation proscribed by Brown v. Board of Education. In Hobsen v. Hansen (1967) and subsequent litigation that involved testing practices (Diana v. California Board of Education,1970; Larry P. v. Riles, 1979). test bias became linked with the civil rights meaning of bias, discrimination and prejudice. One ofthe consequences of this linkage was a vigorous response from the testing community in the form of extensive research on the empirical documentation of bias. Studies conducted on racial/ethnic groups across the full spectrum of available tests· were fairly unanimous: test bias could not be found (Cleary, Humphreys, Kendrick, & Wesman, 1975) in the multiple indices of reliability (items, factors, alternate forms, retest) and all the various forms of validity (content, criterion, concurrent, construct). However, a careful examination and interpretation of the research data on . Hispanics since the 1920s suggests that there is evidence of bias. Table 1 presents the sources of test bias and the degree of empirical evidence available .. .TABLE 1: SOURCES OF TEST BIAS WITH HISPANIC TEST-TAKERS 1) SIGNIFICANT EXPOSURE TO A LANGUAGE OTHER-THAN ENGLISH -Extensive documentation 2) PROCESSING SPEED IN THE WEAKER LANGUAGE -EXttimsive documentation , 3) . USING TRANSLATIONS OF TESTS -Extensive documentation 4) DIMINISHED OPPORTUNITY TO LEARN . -Extensive documentation 5) USING INTERPRETERS DURING TESTING :"Emerging documentation . 6) DECISION-MAKING BASED ON TEST~ -Emerging documentation 22 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES The following is an extended explanation of each source of bias presented in Table 1. 1) SIGNIFICANT EXPOSURE TO A LANGUAGE OTHER-THAN ENGLISH Since the 1920s, Hispanic children and adults have consistently demonstrated a classic test profile on tests of mental ability and academic achievement: low [English] verbal scores and high non-verbal/math test scores. The depressed verbal scores are directly related to the degree of exposure to Spanish in the home and the community. Its not that the tests are inherently flawed, but rather that they are applied to the wrong population. Tests in English. since they are generally normed on monolingual English speaking populations, inherently tap a developmental sequence of English proficiency and English literacy. When exposure to English varies in degree across chronological ages (as with simultaneous bilinguals), or in the time of onset (as with sequential bilinguals), or both, tests register this as a subtrahend. Item analysis studies (Sandoval, 1979; Figueroa,1983) clearly show this asa 'generalized impact throughout the test items rather than as a discrete phenomenon affecting some items and not others. This explains why studies of test bias in test item structures have not found such bias (Cotter & Burke, 1981). It also explains why internal indices of test reliability and stability have also not found bias with Hispanic children and adults (Valdes & Figueroa,1994). Item difficulty levels are not changed. The total scores are loWer, but the test items perform the same way regardless of English proficiency. The most powerful impact from exposure to Spanish is manjfested in one of the most critical functions of tests: prediction. Though empirical data have suggested this from the beginnings of psychometrics, more recent studies have clearlydocllmented that the greater the degree of exposure to Spanish the lower the predictive validity of [English] tests (Gandara, Keogh, Yashioka-Maxwell, 1980; Pil~ington, Piersel, & Ponterotto. 1988; Emerling, 1990; Kaufman & Wang, 1992; Stone, 1992; Valdez & Valdez, 1983; Valencia, 1982; Valencia & Rankin, 1988; Figueroa, 1990; Figueroa & Garcia, 1995; Pennock-Roman, 1990). , ' . The clearest example of this comes from the validity study of the System of. Multicultural Pluralistic Assessment (Mercer, 1979). Mercer developed a battery of tests that purported to operationalize the federal requirement of nondiscriminatory testing :in special education diagnoses .. Shenormed the tests in 1972 on what is in all likelihood the most random and representative sample of Hispanic children (N=700) in California. A critical aspect of this study was that the children were all judged to be English proficient by the school staff and by those who individually administered the tests. However, the children came from three types of linguistic environments: homes where only Spanish was spoken, where Spanish and English were spoken, and where only English was spoken. In 1982, a predictive validity study of the tests was undertaken (Figueroa & Sassenrath, 1989) on approximately half of the original 'norming sample. It was found that on the WISC-R IQ's the predictive validity coefficients for: the Hispanic children varied in direct proportion to their exposure to Spanish. Figure 1 presents these data. 23 �TESTING HISPANIC STUDENTS IN THE UN!TEP STATES: TECHNICAL AND POLICY ISSUES 0.7 , . . . - - - : - - - - - - - - - - - - - - - ' : : - - - - : - - - - - - - - , 0.6 0.5 -+- Verbal 10 0.4 _ _ Performance 10 . ~FuIiIO 0.3 0.1 O+-----+----~r_....,.._---~-....,.._~-----_+-----~ .. 2 3 English English/Spanish Spanish . 4 5 6 7 English English/Spanish Spanish READING· MATH Figure 1: Evidence of psychometric bias in the predictive validity of 1912 IQ's relative to 1982 standardized .measures of reading and math achievement for Hispanic children's home language background. . As Figure 1 shows, the more Spanish there was in the home in 1972, the less the. IQ predicted reading and math achievement in 1982. Ironically, the nonverballQ, the historically used solution for measuring· mental abilities and aptitudes in bilinguals, proved to be the most hypersensitive to Spanish in the home: Other studies show similar results (Figu~roa & Garcia, 1994). However, a definitive predictive validity study using multiple measures of the Hispanic subjects' English proficiency as a control urgently needs to be done. Systematic differences in predictive validity are:one of the key signatures of bias in tests. . . 2) PROCESSING SPEED IN THE WEAKER LANGUAGE Differences in cognitive processing speed between mbnolinguals and bilinguals is dramatically demonstrated in the work of Dornic (1979, 1978a, 1978b, 1977) in Europe. Basically, he found that processing information in the weaker language produced consistently slower functioning. Further, the entire process ofmentation became progressively more and more vulnerable (to the point of shutting down) when the material was too complex, when the testing situation was too noisy, or when stress levels increased. The importance of these findings for testing that is done under timed, noisy or stressful conditions with Hispanic children merits further research, It should be·· noted, however, that accommodations providing more time for English learners are already routinely recommended. . 24 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES These are critical findings whose relevance may extend beyond issues of test faimes's or bias. Processing speed and automaticity are crucial requisites for certain academic skills. Currently, for example, the great emphasis being given to fluency in , reading as a major indicator of reading proficiency may well be biased or invalid when , applied to Hispanic English-language le~mers. Research has also continued to document how Hispanic and Japanese bilingual children are better at naming digits backward than monolingual children (Jensen & Inouye,' 1980; Figueroa, 1987). Though the typical explanation attributes this to "bilingualism," a more precise explanation m'ay be slower processing in English in children from Spanish-speaking homes. Slower processing may actually be favorable for naming digits backward. Just as likely, slower processing may come from translating material into Spanish and then back into English. In conditions where "slow" is not an impediment, there may be better remembering and possibly better learning (Malakoff & ,Hakuta, 1991). ' 3) USING TRANSLATIONS OF TESTS The 1985 and 1999 Standards place all manner of caveats on translating tests without any re-norming on the target populations for which the translation was developed. This limits the effectiveness of test translations, The rationale for this is quite , clear. When a test ,is normed, the item difficulty levels and the actual norms flow from the responses of the norming sample. Translating a test, no matter how well done, no matter ifthe translation is then back-translated to English, does not necessarily produce either an equivalent or a useful instrument. But are the caveats in the 1985 and 1999 Standards about translating tests devoid of any empirical proof? Can psychometricians really proceed with elaborate directives on how to translate tests (Geisinger, 1994b) without having to actually re-norm the new, translated test? In 1982, the Mexican government undertook a renorming of the WISC-R intelligence test in Mexico City (Gomez-Palacio, Padilla, '&, Roll, 1983). They began with a straight translation of the verbal test items as well as all the instructions. They also added extra items to most of the verbal subtests in order to make these more aligned to Mexican children's opportunities-to-Ieam in both their public schools and their communities. When the norming was finished, approximately 80 percent of the items . that were translated from the English remained. The critical questions in this discussion are: What happened to the test items? Did they remain in the same location as when they were in the translation and in the English version? Not only did the item sequences, or the degree of difficulty that the children experienced with each word, change; they did so in a most unusual manner. In the first half of the vocabulary subtest,the items were generally easier for Mexican children. In the second half, there were more items that were harder. Overall, Figure 1 clearly shows that equivalence attests merely through translation does not work. This refutes the practices associated with just translating a test. The new translated test, in all likelihood, 25 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES will not be equivalent to the first. The Mexico City data indicate that translating a test, no matter how well done, is a biased procedure. It produces an instrument with unknown psychometric characteristics. It precludes any useful decisions based on the scores. 4) Tests are based on a set of critical assumptions .. It is assumed that the individual being tested has had similar experiences asthe individuals who generate the test norms. It is concomitantly assumed that there is general equivalence in the . opportunities that individuals have had to learn the content in the test, the linguistic genre of the test, and the demands of tests. Intelligence and achievement tests are particularly dependent on meeting these assumptions since without them the attributions to intellectual or academic abilities cannot be reasonably made. In the case of achievement testing, however, it is possible to determine from multiple research sources when the opportunity to leam for a population is not even or not fair (Orfield & Yun, 1999; Moreno, 1999). In this case, the test scores may reflect or even assess opportunity-to-Iearn. They become measures of system accountability. Such scores, however, may not reflect the learning ability of the individual nor his/her potential for learning. In effect, any high-stakes decisions based on scores that do not meet the assumptions of some equivalence in opportunity-to-Iearn can be unfair and invalid. Such may be the case in statewide testing programs when .the tests are administered in English to English-language learners. The scores themselves become,. in unknown degrees, measures of opportunity-ta-Ieam (and English language proficiency), not of individual achievement and certainly not of future academic achievement. In 1982, the National Academy of Sciences broke tradition.with American psychology (Heller, Holzman, & Messick, 1982). It suggested that individual differences in academic achievement may not be the primary source of score differences. It recommended that before a child is tested for special education diagnosis, his or her present instructional setting be evaluated for its validity, effectiveness, and delivery. Similar considerations should be taken into account when testing Hispanic children from diverse cultural, linguistic and schooling communities in the United States. 5) The 1985 Standards for Educational and Psychological Testing was criticized ·(Valdes & Figueroa, 1994) for not addressing one of the most commonly used practices in the testing of bilingual individuals, that is, the use of interpreters. The 1999 Standards finally discuss this practice. But they also endorse it and set about describing the who, how and what for using an interpreter during testing. The unfortunate aspect of this is that there is no empirical evidence in the testing literature.that can attest to the procedural equivalence of this process to doing. testing in one language or to any scientific documentation that the process actually works reasonably well. . . ' In fact, the few doctoral studies that have recently been done investigating the use of interpreters conclude the opposite (DuFon, 1991; Sanchez-Boyce, 1999). Sanchez-Boyce's dissertation {1999} describes the actual process of using an interpreter during individualized testing sessions for special education placement of Hispanic students as chaotic, erratic, and fairly devoid of any standardization· 26 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES procedures. She documents how the conclusions reached from such testing sessions are really socially constructed and have no bearing on what the child can or cannot actually do. Research is urgently needed in this widespread practice to either refute or validate these investigations. If the latter turns out to be the case, the 1999 Standards' should be revised relative to their endorsement of the use of interpreters and it should be done before the next comprehensive draft appears around 2010. 6) The actual process of making decisions on the basis of test scores administered in either English or Spanish has never received much attention. Recently. however, Sandoval (1998) has heuristically taken findings from the research literature on decision-making theory and attempted to apply them to the testing situation where the subject is from a different cultural and linguistic background. Table 3 presents the multiple set of factors that a test-user must engage or consider when testing a Hispanic subject. TABLE 2: FACTORS TO CONSIDER IN MAKING DIAGNOSTIC DECISIONS WHERE ISSUES OF LINGUISTIC AND CULTURAL DIVERSITY ApPLY 1. REPRESENTATIVE BIAS Ignoring the prevalence of a behavior in a given population Reaching decisions on the basis of a limited sample of observations Failing to take into account the fact that some scores will be off by chance Making premature casual inferences on the basis of correlations that are not generalizable, coherent, consistent, robust, or reversible ·2. CONFIRMATORY BIAS Bias from what is expected or what is stereotypic 3. AVAILABILITY BIAS Bias that comes from vivid, recent data .4. INTELLECTUAL BIAS Bias due to intellectual limitations or cognitive overload due to high degrees of complexity (for example, interactions among cultural, lingUistic, and opportunity to-learn variables) Assuming that even two of these factors are robust in doing testing and diagnostic work with Hispanic populations, two implications arise: Can anyone with cultural and linguistic backgrounds that are different from the student being tested actually do the task? and, Is·it possible to train individuals to effectively use these parameters in making test/diagnostic decisions? As noted earlier, decision-making has never been adequately studied with multilingual and multicultural populations. The distinct possibility exists that this has been and continues to be an inadequately studied source of test bias with Hispanic populations. 27 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES The content of this chapter provides empirical evidence that tests used with Hispanic students show evidence of bias. Comprehensive, longitudinal investigations on this question should be commissioned. The impact of Hispanic culture and Spanish language proficiency levels on the predictive, consequential, and/or instructional validity . indices of tests should be determined.. 28 �.) TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 5. EDUCATIONAL ACCOUNTABILITY For all Hispanic Americans, achievement tests are one of the most important sources of information affecting their lives and communities. The emphasis on higher academic standards for American schools has brought with it an unparalleled degree of concern about system and student accountability. Achievement tests supposedly fulfill this· goal better than any other indicator. The precarious aspect of this is that achievement tests are among the most difficult measurement instruments to develop and interpret particularly when they are given in group situations in order to .compare academic gains across states, school districts and schools (Cronb,ach, Linn, Brennan, & Haertel, 1997). These are fragile instruments that suffer from low reliabilities and that all ' too often assess not just what has been learned but also degrees of English-language proficiency, cultural differences, socioeconomic status in the home and community, quality ot-past and, present pedagogy, and the economic advantage or disadvantage of school districts. Historically, achievement tests have nearly always described a chronic pattern of underachievement for Hispanic students of all ages. Reynolds (1933) called attention to the 'one to two-standard deviation differences in achievement test scores between Anglo and Hispanic American populations in the Southwest. In the Coleman Report (Coleman et ai, 1966), the academic levels of Mexican American and Puerto Rican children, continued to show a one to two standard deviation deficit compared to white children. In the 1970s the U.S. Commission on Civil Rights (1971a, 1971b, 1972, 1973, 1974) documented a similar level of underachievement for Mexican American children. These reports also found that the education of Mexican American children differed significantly from that of white children: fewer questions from their teachers, less reinforcementfor their classroom responses, poorer schools, no reflection of their ethnic/cultural background in the curricula, and no Mexican'American models in the teaching, administrative, or counseling staffs. Other data in the 1980s and 1990s also continued to show evidence of comprehensive, national levels of underachievement among Hispanic children and youth (National Commission on Secondary Education for Hispanics, 1984a,b; Arias, 1986; Valencia, 1991; President's AdviSOry Commission on Educational Excellence for Hispanic Americans, 1996; National Council of La Raza, 1998; Laosa, 1998; Moreno, 1999). , . Although there were some modest gains between the NAEP achievement scores of Hispanic students across the country between 1988 and 1994, particularly in math and science, the overall picture is one of decline. The achievement gap between white and Hispanic students continues to increase. Key indicators (such as lower enrollment in preschool programs than white students, higher Hispanic enrollment below modal gr~de, underrepresentation in gifted and talented programs, increased enrollment in segregated schools, a 103 percent increase in suspension rates, a growing digital divide, an increase in the drop-out rates between white and Hispanic students) show that Hispanic students continue to have different educational experiences than their white counterparts in the public schools of the United States. 29 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Yet i;lchievement tests have always operated under the belief that the public schools provide similar educational experiences and opportunities for all students. In 1975, this belief was explicitly noted in a report from the American P~ychological Association (Cleary, Humphreys, Kendrick, & Wesman, 1975): ' ( It is recognized that three assumptions are basic to this .report. The first assumption is acceptance of a single society; heterogeneous though it may be, rather than a divided one. The .second assumption is that few radical changes are expected in curriculum content or methodology of instruction in our educational establishmenL. ...The third assumption accepts the importance of evaluation in education: The assumption of homogeneity of "opportunity to learn" in the public schools of the United States is never really mentioned, but it is implied. With all achievement . testing, this is a necessary condition that must be met prior to any interpretation about a student's or a group's level of academic achievement. Recently, Orfield & Yun (1999) have documented a national trend towards more segregation of Hispanic students in the public schools of the United States. As has always been the case, segregated schools typically have fewer financial resources, the most inexperienced teachers, and the lowest academic achievement levels. In effect, the Hispanic student does not receive the same curricula or pedagogy or resources as do those students in affluent or middle income school districts. The issue of differential "opportunities to learn" for Hispanic children in the public schools of the United States during the last 100 years is incontrovertible.·· , According to the legal analysis on "Nondiscrimination in High-Stakes Testing" author(3d by the U.S. Office for Civil Rights, this condition of limited opportunity to learn establishes a claim of substantive due process violation related to achievement testing because "the students were not taught the material on which the tests were based" (U.S. Department of Education, draft, December 1999, pg. 2). This is a national claim and one that needs to be addressed, particularly in the current Climate of setting progressively higher and higher academic standards without a concomitant resolve to equalize opportunities to learn. NEW INITIATIVES The 1990s produced an unprecedented degree of attention to the measurement of academic achievement.in Hispanic students. For example, considerable legislative· workwas focus~d on including English language learners in all large-scale achievement testing programs (Goals 2000, Educate America Act, P.L. 103-227; the Perkins Act, P.L. 98-524; Improving America's Schools Act, .P.L. 103 328). Three initiatives, however, are particularly noteworthy because of their potential impact on national policy and discussion on this matter: the new Stanqards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1999), the National Research Council's report Improving Schooling for Language.,.Minority Children (August & Hakuta,1997), and Grading the Nation's Report Card, by the National Research Council. Each of these· R 30 �\ TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES addresses the challenges involved in testing the academic achievement of English language learners. In the new Standards for Educational and Psychological Testing (American Educational Research Association,et al., 1999), there is a particularly powerful caveat· noted, with respect to achievement tests: it is not known how much they become measu res of English language proficiency when they are used with individuals whose primary language is not just English. [" ..among non-native speakers of the language of the test. one may not know whether a test designed to measure primarily academic achievement becomes in whole or in part a measure of proficiency in the language of the test" (pg. 9-4)]. Across the United States, however, the test scores of English language learners are never described in these terms nor is the possible degree of test bias or error recognized .. Statewide testing programs of academic achievement should include statewide standardized measures of English language proficiency capable of measuring multiple dimensions ofthis competence. As the 1999 testing Standards suggest, a determination should be made first about which language is dominant. Then, the degree of proficiency in the dominant language should be measured along dimensions such as reading, . writing, comprehension, grammar, pronunciation, and communicative competence. There should be a clear understanding, hoWever. that even with this type of comprehensive language proficiency assessment, knowledge in certain domains may be missed by achievement tests. The new Standards therefore recommend doing academic . testing in both languages even when proficiency in English is established. In 1997, the National Research Co'uncil, through its Committee on Developing a Research Agenda on the Education of Limited-English-Proficient and Bilingual Students, ostensibly summarized the current knowledge base on. testing English-language . learners in its report on Improving Schooling for Language-Minority Children (August & Hakuta, 1997). Four contemporary concerns are addressed: the measurement and use of students' L 1 and L2 linguistic proficiencies in school districts across the United States, the use of tests with bilingual students for entrance and exit criteria from educational programs (including Title I and special education). the impact of L 1 on the validity and reliability of tests, and the measurement of academic achievement particularly in standards-driven educational contexts. The report sets forth the following research agenda for the assessment of second language learners (August & Hakuta, 1997, pgs. 113-134): . 1) Given that current tests tend to measure only discrete linguistic features, the assessment of linguistic proficiencies in L1 and L2 needs to be aligned with current research on how language acquisition occurs in children in bilingual communities. 2) Because different dimensions of English are required by different academic subjects and in different grades, it is necessary to determine how to use 31 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES multifaceted measurements of English proficiency to validly predict success in English-only classrooms. 3) Research is needed on how to measure knowledge in academic subjects. Specifically: Does, testing in English underestimate academic knowledge if English proficiency is limited? Does lack of familiarity with "test language" affect academic test scores even when the tests are given in the primary language? Is it better to test in English or in the primary language when subject matter has been given only in English? What is the !mpact on test scores from varying levels of native language proficiency, years of schooling in English, and difficulty of academic content? 4) Studies need to be done on how English learners take tests, specifically, how test demands, test format and test language (such as instructions) affect scores., These studies also need to answer one question: when can an English learner validly and reliably take a test in English and when does an English learner need test' modifications or accommodations? Further, what is the impact of such modifications on test reliability and validity? 5) Studies are needed to determine how rater or scorer error can be reduced in "open-ended or performance-based" (pg. 130) assessments when the test-taker's English proficiency can influence scoring decisions. 6) -'rhe standards and accountability reform movements in education call for content, performance and opportunity-to-Iearn benchmarks (McLaughlin & ' Shepard, 1995) throughout schools, school districts and federal programs. Research is needed on how to operationalize these for English language learners. Specifically: Can indicators of subject matter competence and English proficiency development be produ,ced for English language learners? How can the progress of English language learners (ELLs) be gauged within school district standards and on indices of academic achievement? Ifnonstandard assessments are used with ELLs, how can these be included within state and district accountability measures? What is the operational meaning of "yearly progress" for ELLs given the possibility that ELLs "may take more time to meet ... standards" (pg. 127)? Finally, given the fact that there are few data on effective pedagogical, curricular, or contextual conditions for the schooling of ELLs, how can opportunity-to-Iearn standards be operationalized for them? There is one area missing in this research agenda. In spite of the fact that there is text (pgs. 124-125) in the report on meeting the assessment needs of English-language learners referred for special education testing, the report fails to heed its own voice. 8ec:;ause there are no assessment instruments that can differentiate between linguistic/cultural differences and disabilities, .research is' needed on how to ope rationalize the "nondiscriminatory .assessment" provisions of federal special \ education laws, as these apply to Hispanic, English language learners. 32 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES, By and large, the research agenda pro'posed by the National Research Council should be endorsed and funded. Its importance is twofold. First, it points to work that is vitally needed now. Second; it highlights the elementary stage that the country is in with respect t6 measuring'Jhe academic achievement levels of English langl(age learners., , ' The National Research Council has, also evaluated the efforts of the National Assessment of Educational Progress (NAEP) in measuring the academic achievement ofall students in the United States {Pe"egrino~ Jones, & Mitchell, 1999. NAEP is one qf the more problematic test-developers and test-users with respect to the academic achievement testing of Hispanic students. Up to the 1~990s, NAEP, for example, did not even have reliable procedures for identifying the ethnic background of Hispanic students (Rivera, 1986; Rivera & Pennock-Roman, 1987; Baratz-Snowden; Pollack, & Rock, ' 1988). The manner in which the students were chosen for inclusion in NAEP testing was not random and systematically excluded English language learners. Even when NAEP attempted to address the complexities associated with testing Hispanic children, its' efforts were so fJawedthat it precluded any meani'ngful interpretation of test ~cores (e.g., 8aratz-Snowden, Rock" Pollack, & Wilder,1988). , , , In the most recent report on NAEP, the National Research Council (Pellegrino, Jones, & Mitchell, 1999) pays particular attention to the assessment of English language learners'. Asserting that NAEP and other assessment programs have made many efforts to accommodate the special needs of English language learners (Olson & Goldstein, 1997), this report highlights the efforts of.the Puerto Rico Assessment of Educational Progress, unique Spanish language translation and accommodation of NAEP , mathematics tests. After the administration of NAEP in Puerto Rico, several key findings about the complexities associated with transporting American tests across languages were made: translations often 1ail (Anderson & Olson, 1996), and i~em response theory analyses yield'noncomparable scales for English and Spanish versions ofthe test (Olson & Goldstein, 1997). In another study (Anderson, Jenkins, & Miller, 1996), similar conclusions were ,reached with respect to the translation of other NAEP tests: "the translated versions of the assessment are not parallel in measurement properties to the English version and scores are not comparable" (pg. 31). a In 1995, a field test of the NAEP mathematics' test included more English language learners than ever before. In fact, where previously the NAEP instructed schools across the nation on'which ELL students to exclude, inthis field test the instructions were on how to include .ELL students who, the school staff thought could actually take the test. Also, the following accommodations were included: more time, ' more testing.sessions, different testing sessions (group and individual), using an ' ' interpreter to elaborate on instructions, and test booklets in Spanish. There is some' evidence that accommodations do enhance participation, but the explicit of such accommodations on test scores are not clear. Recent research from UCLA suggests that the benefits may be marginal (Abedi, 1999a, b, c). Problems also remain with respect to the test technology used to identify and classify ELL students nationally and with respect to the wide variation across states and school districts in the criteria that they use for these purposes (August & Lara, 1996; Valdes & Figueroa, 1994). As the ,National Research Couj)cil' acknowledges: ' 33 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES iTo date, the dilemmas described ..... have not been resolved. Children potentially in need of native language support are still being assessed at entry level using one of several instruments that many scholars have questioned, and some years later they are tested again using another of such instruments that is in no way comparable to the first. The field is no closer to developing means for assessing whether a child can or cannot function satisfactorily in an all-English program-or participate in all-English large-scale assessments-than it was in 1964. (Pellegrino, Jones; & Mitchell, 1999, pg. 105) Table 3 presents the "research agenda" suggested by the National Research Council in order to help NAEP cope with the inclusion of English language learners as part of the Nation's Report Card. As will be. noted, 'some of these (such as using translations) are questionable given the historical and scientific experience of ~he country with such procedures. TABLE 3: THE NATIONAL RESEARCH COUNCIL'S CONSIDERATIONS FOR INCLUDING ENGLISH LANGUAGE LEARNERS IN THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS . What types of demands do different assessments make on English language learners and how do different types of accommodations help different types of students? How do accommodations affect the construct validity of the tests? Is it useful and cost effective to try accommodations such as translations? Do scores from accommodated administrations have the same scaling properties and can they be reported in the same fashion as for all other students in NAEP? Do English language learners have the same opportunity to learn and curricula as non-ELL students? . What are.possible, alternative assessment methods for ELLs? (Pellegrino, Jones, & Mitchell, 1999, pg. 1,10-111) Table 4 presents the major conclusions and recommendations of the National . Research Council for NAEP's testing of English language learners. 34 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES . TABLE 4: THE NATIONAL RESEARCH COUNCIL'S MAJOR CONCLUSIONS AND RECOMMENDATIONS FOR TESTING ENGLISH LANGUAGE LEARNERS ON THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRE~S (NAEP) \ CONCLUSIONS Conclusion 3A. The participation and accommodation of .... English-language learners . are necessary if NAEP results are to be representative of the nation's students. There is currently a paucity of interpretable achievement data and accompanying contextual .. data on the performance and educational needs of these populations .. . . Conclusion 38. Enhanced participation of ..... English language learners in NAEP depends on (1) the consistent application of well·defined criteria to identify these students and (2) accurate collection and reporting of information about them. RECOMMENDATIONS Recommendation 3A. NAEP should include sufficient numbers of ..... English language learners in the large-scale assessment so that the results are representative of the nation and reliable subgroup information can be reported. Recommendation 38. Criteria for identifying ...... English language learners for inclusion in· the large-scale survey need to be more clearly defined and consistently applied.· . . Recommendation 3C. For students who cannot participate in NAEP's standard large scale surveys, appropriate, alternative methods should be devised for the ongoing collection of data on their achievement, educational opportunities, and instructional experiences. Recommendation 3D. In order to accomplish the committee's recommendations, the NAEP program should investigate the following: XI. Methods for appropriately assessing, providing accommodations, and reporting on the achievements oL ... English language learners, and XII. Effects ofchanges in inclusion criteria and accommodation trends in , achievement results." (Pellegrino, Jones, & Mitchell, 1999, pg. 112-113) 35 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Table 4 presents an exceedingly modest set of recommendations (for English language learners. and for Hispanic children. Some of these recommendations go back nearly 70 years (Sanchez,' 1934; Reynolds 1933). It is time for a comprehensive set of action plans to make large-scale, educational accountability systems, such as NAEP, relevant and useful for the educational present and future of Hi~panic children. The points made in Table 4 are good starting points. But there are other, more complex issues that need to be asked and answered. For example, assuming that it is possible to measure students' multidimensional levels of proficiency in both languages, the challenge remains as to what to do with language proficiency scores. Should they be used to, generate expectancy scores? Should they be used to adjust achievement scores to compensate for error? Should they alter cut-offs for eligibility, detention or promotion purposes? Research is needed to establish the function of such scores for interpreting the academic achievement of Hispanic, bilingual individuals with varying levels of acculturation and from the multiple ethnic and cultural backgrounds of Hispanic Americans. j 36 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 6. DIAGNOSTIC TESTING FOR SPECIAL EDUCATION Linguistic minority children, for nearly a century, have tended to overpopulate classes for students with mental disabilities. No other area of education is as linked to issues of genetic differences in intelligence as special education. The fundamental question that has plagued this area of American education is why there are so many minority children with low las. Race psychologists in the 1920s and 1930s attributed this , to genetic inferiority. Apostles of the same doctrine have asserted the same in the 1960s and continue to do so even now. On the other hand you look at a lot of kids in the inner cities who have not seen a book by the time they come to kindergarten, and you give them one and they hold it upside down and the wrong way ... The language interactions that they've had at home are nil. They , have never even heard these sound systems. Are they lousy readers? A lot of them are. Are they genetically predisposed? Some of them are making that combination a tough one to treat. (Reid Lyon. quoted in Taylor, 1998, pg. 192) In the 1960s, the federal courts entered this debate. The questions posed then were: Why are there so many minority children in classrooms for the mentally retarded? and, Are the tests used to diagnose mental disabilities biased against them? Many of these court cases focused on the possible linguistic bias of 10 tests and on the denial of equal educational opportunities to Hispanic students placed in special education {Diana v. California Board of Education. 1970; Jose P. v. Ambach, 1979; Arreola v. Board of . Education, 1968 Guadalupe Organization v. Tempe School District, 1978; Ruiz v. State Board of Education, 1971; Covarrubias v. San Diego Unified School District, 1972; Lora v. Board of Education of the City of New York, 1984). For the most part, the courts ruled in the direction of using non-verbal tests of intelligence, establishing monitoring systems for determining when Hispanic children were overrepresented in classes for students with mental disabilities, and overseeing training programs for staff in order to do nondiscriminatory assessments. This last provision became part of the federal law for special education in 1975 and was repeated in the mandates for the 1997 Individuals with Disabilities Education. Act. However, nondiscriminatory assessment remains· either an enigma or serious problem for the American Psychological Association. In its latest, major document on testing individuals from diverse backgrounds (Sandoval. et ai, 1998), nondiscriminatory assessment does not exist. . The sort of testing that is done in order to determine whether a child has disabilities is unique in education. It is really an analog of what a medical doctor does when an individual has serious symptoms. Most children who are tested for special education placement go to the school psychologist with one predominant symptom, poor reading. After the administration of a few or many, many tests, the psychologist does a diagnOSis and at an Individualized Education Program meeting recommends either special or general education placement. At that time he/she also prescribes a treatment to "cure" the symptom. But the similarities between a medical doctor and a school psychologist are illUSOry. The tests that the school psychologist uses have no real power to diagnose and the educational treatments are usually ineffective (Skrtic. 1991). 37 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Most children who are tested for special education have no clear, biological reasons that triggered the testing. Most children are sent to special education testing between the second and fifth grades. Again, the predominant reason for being sent is poor reading. There are four categories ofdisabilities (Learning Disabilities, mild Mental Retardation, Behavioral Disorders, Speech and Language Problems) that are unique in this entire enterprise because they are suspect. Many researchers have argued that the diagnostic tests used in special education are incapable of differentiating among these four disabilities (Keogh 1990; Lyon, 1996). Also, for many children, particularly from culturally and linguistically diverse backgrounds, these categories may be socially· constructed. That is to say, these students could have a disability or their "symptoms" could be due to socioeconomic, cultural, linguistic or poor opportunity-to-Iearn factors (Rueda & Forness, 1994; Trueba, 1987; Heller, Holtzman, & Messick, 1982). The tests, however,cannot "diagnose" the difference. For Hispanic children, the Handicapped Minority Research Institutes in Texas and California (Rueda, Figueroa, Mercado, & Cardoza, 1984; Rueda, Cardoza, Mercer, & Carpenter, 1984; Garcia, 1985; Ortiz, 1986; Ortiz & Maldonado-Colon, 1986; Ortiz & Polyzoi, 1986 ; Ortiz & Yates, 1987; Swedo, 1987; Wilikinson & Ortiz, 1986; Willig & Swedo, 1987) documented the unique problems Hispanic children and their families faced when confronted by the testing/diagnostic process in special education. Language . proficiency levels of the students were often not considered. Most testing was done in English. The linguistic challenges experienced by English learners were often diagnosed as a disability. Depending on the tests given, a Hispanic child could easily qualify for the Learning Disability or the Communication Handicapped categories. When retested after being in special education, the Hispanic children's las decreased. Limited English Proficient students were more likely to be recategorized with another disability. Tests developed and normed for Spanish-speaking children were just as problematic as tests normed on English speakers. If the parents were born outside the United States, there was a greater likelihood that their child would end up in special education. Finally, it was found that diagnostic tests were capricious in their "diagnoses": when the tests were given to an entire class of Hispanic children in general education, 53 percent were found eligible for the Learning Disability program. When "mentally retarded" Hispanic children were tested, 43 percent of them were "diagnosed" as Learning Disabfed. It should be noted that many researchers in the area of special education see no problem in the use of psychometric tests with Hispanic or African American children. For them, the issue of nondiscriminatory assessment simply does not apply, neither do linguistic and cultural factors describ~d in the 1985 or 1999 Testing Standards (MacMillan, Gresham, & Bocian, 1998). Special education testing with Hispanic students has very little empirical, research data to support many of its extant practices. The development ofdiagnostic tests normed on Spanish-speaking populations abroad provides one of the most widespread uses of diagnostic instruments with Hispanic children. But as some have argued: The bilingual is NOT the sum of two complete or incomplete monolinguals: rather he or she has a unique and specific linguistic configuration. The coexistence and constant interaction 38 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES of two languages in the bilingual has produced a different but complete linguistic entity. (Grosjean, 1989. pg.6) , a When bilingual individual confronts a monolingual test. developed by monolingual individuals, and standardized andnormed on a monolingual population, both the test taker 'and the test are asked to do something that they cannot. The bilingual test taker cannot perform like the monolingual. The monolingual' test cannot "measure" in the other language. (Hakuta& Garcia, 1989; Hakuta. Ferdman, & Dfaz, 1986) , ' Ironically, single-language tests deceptively measure the "monolingual" part of the bilingual (one or the other of the bilingual's two languages). irrespective of proficiency in that language, and they do so reliably, But these,tests fail insofar as they may exclude mental , content that is available to the bilingual in the other language, and mental processes, and abilities that are the product of bilingualism. ' (Valdes & Figueroa, 1994, pg.87) There is an urgent need to determine the diagnostic validity of Spanish language tests normed on monolingual populations andused,for diagnostic purposes with U.S. bilingual populations. ' ' :. , Another current practice in special education testing ofHispanic children is to test repeatedly until the right diagnostic profile appears or to conduct elaborate decision making procedures in order to get to the "real disability." Data exist, independent of issues of cultural and language differences, indicating that the more testing that is'done the less likely the "real disability" will appear (Mehan, Hertweck, & Meihls, 1986; Taylor, 1991 ). The use of interpreters' is widespread in special education testing (Langdon, 1994). As already mentioned, emerging. empirical data on this practice suggest that it should be proscribed; Another common practice in special 'education testing is the use of translated tests. This practice should also be stopped. Though most of this chapter has focused on diagnostic testing, other forms of assessment are routinely done in special education (such as curriculum-based " assessments, ,functional assessments, ecological assessments, dynamiC assessments); The Same general principles that have been discussed with respect linguistic/cultural differences, bias, validity and reliability apply. For the population of Hispanic children and their families, all of these testing practices also need to be considered in light of several questions: Does placement in special education provide any educational , benefits for the student? Does any testing really help to promote educational, achievement in special education? Many researchers (Skrtic, 1995; Figueroa' & Artiles, 1999; Mehan, Hertweck, & Meihls, 1986) would answer both in the negative. , , Finally, there is the question of testing for placement in the gifted ,and talented programs,acrosS the United States. The search for a measure of Hispanic intelligence that would give Hispanic students a fairchance at being equitably represented in such' programs has been extensive (Bernal &Reyna,1975; Chambers, Barron, & Sprecher, 1980; Zappia, 1989; Perrine, '1989; Bermudez & Rakow, 1990; Marquez, Bermudez & Rakow, 1992; Johnsen, Ryser, & Dougherty, 1993; Sawyer& Marquez, 1993; Garcia; 39 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES 1994; Maker, 1996; Maker, Nielson, & Rogers, 1994}: By and large, however, none of 'these have succeeded in establishing a national procedure for identifying gifted Hispanic , students. Hispanic pupils, accordingly, are very underrepresented in these programs. They will continue to be absent as long as developers and users of tests and eligibility criteria for gifted and talented programs fail to realize that the opportunity-to-Iearn experiences of Hispanic children in America's public schools are very different and that tests respond to these differences in the form of lower scores. Cultural factors may also complicate this form of assessment. Data suggest that the family contexts in Hispanic homes of highly academically gifted students vary significantly from Anglo, middle class homes. In Hispanic homes, family values take precedence over individualism and bilingualism is prized over monolingualism (Soto, 1988; Fernandez & Nielsen, 1986; Hine, 1993). ' For the emerging Hispanic community in the United States, there is an ' overarching question to be asked about the education of their intelligent children: Is it a good idea to isolate the "smartest" and to give only them an enriched, gifted education? or, Is it a better idea to offer this to all students, including those with disabilities? Some research on programs for the gifted'in close-knit communities and in schools suggests that the social impact of these programs can be quite negative for all children, their. parents and their communities {Margolin, 1994; Sapon-Shevin, 1994}. f 40 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 7 - OTHER TYPES OF TESTING·· Two other broad types of tests are often given to Hispanic students in educational settiilgs. These are personality and vocational· tests. Both are used in counseling settings, though the former is also used Jor special education diagnoses of emotional disturbance or personality disorders. Both,. in varying degrees, are affected by the same issues extant in all other forms of assessments with Hispanic students: cultural differences and linguistic backgrounds. PERSONALITY TESTING The empirical and professional literature on personality tests, by and large,has never really attended to the issues ofcultural differences or bilingualism in the Hispanic community (MalgadYi Constantino, & Rogier, 1987; Olmedo, 1981; Bernal, & Castro, . 1994). The officiCiI "Guidelines for providers of psychological service~ to ethnic, linguistic, and culturally diverse populations" (American Psychological Association, 1993) acknowledges tha~ there are problems associated with tests that have not been validated for use with minority populations. But there are no serious limitations placed on their use. Similarly, cultural factors in mental health testing and diagnosis receive very little attention in the "Diagnostic and Statistical Manual of Mental Disorders IV" (Dana, 1995). . < As a consequence, some psychological providers recommend that tests be us~d with interpreters, with norms from other countries (such as Spain), or as a spontaneous translation (e.g., Nieves~Grafals, 1995; Dana, 1995). Yet, there is evidence that . Hispanic test-takers can give different meanings to the connotative, emotive vocabulary that is often used in such tests (Brizuela,1975; Diaz-Guerrero, 1988; Gonzalez-Reigoza, 1976). Similarly, there is research on how they express affective material in ways that are quite different in Spanish than in English (Gonzalez, 1978; Ruiz, 1975). Also, data exist on how Hispanic clients are rated differently on personality variables when they are evaluated in English versus when they are evaluated in Spanish (Grand, Marcos, Freedman, & Barroso, 1977; Westermeyer, 1987; Edgerton & Kamo, 1971). . . A contemporary approach to personality testing involves the parallel use of tests of acculturation. Research on one such test, the "Acculturation Rating Scale for Mexican Americans" (ARSMA I & II) (Cuellar, Harris, & Jasso, 1980; Cuellar, Arnold, & Maldonado, 1995) has proven to be of some importance both in terms of heuristics and irnprovements in the personality testing of Hispanics (Velasquez & Callahan, 1992). It has been demonstrated, for example, that on the most widely used test of personality, the MMPI, Hispanic Americans whose cultural orientation is "traditional" show more· "pathology." These outcomes are basically the result of differential (cultural) treatment. The ARMA studies show this (Dana, 1995) in a particularly "emic" way, though other , research similarly confirms that differential cultural impacts persist even in the Mrv'JPI-2 (Whitworth & McBlaine, 1993; Whitworth & Unterbrink, 1994). 41 �TESTING HISPANIG STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUE;; The use of acculturation scales, such as the ARMA I and 1/, is usually limited to a specific subgroup of Hispanics. Currently, acCulturation scales are more abundant for Cuban American and Mexican American populations (Marin, 1992). Given the limited . representation of the scales, their use is limited. Also the use, of such scales remains optional. No attempt is being made to psychometrically incorporate such measures into tests such as the MMPI. Admittedly, however, using measures of acculturation or sociocultural variation to "correct" score bias has not fared well in the past. The Sociocultural Scales of the System of Multicultural Pluralistic Assessment (SOMPA) {Mercer, 1979}, for example, do produce a new IQ estimate based on the degree of distance that a Hispanic child's family exhibits from middle class, white families. Bat the new IQ proved to be neither a better predictor nor a better "corrector" (Figueroa & Sassenrath, 1989; Valdes & Figueroa, 1994). . . . There is formal resistance to norminga test such as the MMPI. within Hispanic , populations. This is a reluctance that cuts across all types of tests currently in use: As Dana {1995} has suggested, the press for group-specific norms may have to come from studies showing the error or mistake rates that mono-normative tests produce in personality diagnoses with Hispanics. In many ways these data are already available in all other areas of testing. Most tests do produce negative, differential impacts with Hispanic students. The present state of testing Hispanics on personality tests relies on a series of . questionable practices: making testers "culturally competent," doing "corrections on the tests, promoting ideographic, tester in~erpretations based on measures of acculturation. There is very little, actual research on how to do diagnostiC personality work with Hispanic children and youth (Cervantes & Arroyo, 1994). As a consequence, recommendations to clinicians, even when they are very well crafted (such as Cervantes & Arroyo, 1994), remain anecdotal and of \Jnknown generalizability, utility and validity. As. is the case with most tests used in the United States, new tests specifically made appropriate for the Hispanic populations' bilingual, multicultural status are needed. Preliminary efforts in this regard {Costantino, Malgady, Rogier, & Tsui, 1988} have been fruitful and deserve more research and development. A sea change in attitude about testing distinCtly bilingual/multicultural populations is needed within the testing community in the United States. Dana {1995} notes: j , The [current] repertoire of standard' tests emerged in an era when a "melting pot" conception of acculturation was in vogue and new immigrants were expected to assume a relatively homogeneous identity after three generations in this country. This expectation did not occur uniformly even fpr descendents of European immigrants. Now that diversity instead of homogenization has become the hallmark of American society, professional acts and technologies must reflect this societal change. (Dana, 1995, pg. 314) OCCUPATIONAL INTEREST TESTS Helping a student choose a career and plan a requisite program of academic preparation are critical counseling functions. Often, this process begins with the administration of interest inventories as early as elementary school. A fundamental 42 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES assumption behind these tests is that students in society have somewhat equivalent levels of cultural capital. That is, they have a relatively realistic notion about how societal systems function and what it takes to negotiate er:ttrance into such systems. In terms of careers and jobs, cultural capital means knowing what defines a particular area of .occupational interest and what levels are possible within a chosen occupation. Cultural capital in this context also means knowing how to negotiate entrance into systems that help produce the requisite' competencies and the desired job or career. For Hispanic students, recent research (Stanton-Salazar, 1997; Stanton-Salazar & Dornbusch, 1995) suggests that, in fact, cultural capital is not easily accessed by them. Res,earch on the use and impact of occupational interest tests with Hispanic students is neither elaborate nor elegant. There are not many studies and those that do exist do not control for either bilingualism or acculturation in terms of cultural capital. . On the other hand, the related area of employment testing is uniquely optimistic about its fairness, validity and virtual universality (Ramos, 1992) for use with Hispanic adults. Some of this comes from predictive validity studies that show that there are no differences in the way tests function with Hispanic job applicants. The proplem here, however, is that these tests are very poor predictors for everybody and that the effects of bilingualism on predictive validity remain's unknown. Also, a central element discovered in the employment testing of Hi~panics is that so much of the success in the tests and in the jobs depends on educational background. . . 43 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 8 - SUMMARY AND RECOMMENDATIONS . . The testing of Hispanic children has not made much progress in the 20 th century. The areas where there has been quite a bit of progress is in the empirical documentation of the impact of bilingualism on test scores and on the development of policies and caveats associated with the testing of Hispanic individuals. However, there has not been· much progress in test development and technology in any area of testing \,Vith respect to these students. On the basis of what has been reviewed, seven options appear to exist concerning the measurement of Hispanic students' abilities, achievements, personality, and occupational interests: 1) tests can be administered in English using what are basically monocultural norms, 2) testers can be given "cultural training" so that they can interpret the tests in ways that appear to be more valid, 3) accommodations in the tests and the testing situation can be provided, 4) a testing moratorium on the use,of individual test scores for any high-stakes assessment can be put in place until research sorts out the complex issues associated with testing Hispanic students, 5) tests can be used for holding systems legally and politically accountable for the educational decisions that adversely impact Hispanic students as manifested in differential, negative outcomes, 6) Hispanic-specific local norms 'can be developed in order to compare students with similar cultural, linguistic, and scholastic experiences, and 7) school systems and opportunities to learn are made equitable for Hispanic children across the United States, thereby meeting the crucial assumption of tests about experiential homogeneity. At present, only the first three are viable and in use. None of these three, however, can demonstrate that they are free of Significant degrees of bias, unfairness, or denial of substantive due process. . The fourth option has been suggested (Valdes & Figueroa, 1994) but has received virtually no support. The fifth option has not really been tried in the last decade, but it remains a plausible response to political attacks, such as California's propositions 227 and 209; that are already inflicting harm and damage on Hispanic children and that can be documented by the tests' ability to measure contextual effects. In Kern County in California, for example, the school board has decreed that Hispanic children must learn English in three months and then receive their education in English. The impact of this decision will become manifest in the tests administered in English. The sixth option may well be the most immediately relevant for both test developers and the Hispanic communities in the United States. But there is a great deal of opposition from both political and professional interests. Ethnic/linguistic norms will provide comparisons among children with generally homogeneous experiences and background in local communities. But, they arouse suspicions about a "divided" society. They may also be seen as sources of reverse discrimination. In employment testing, the courts and Congress have refused to accept group-specific norming precisely because of issues related to reverse discrimination (Sireci & Geisinger, 1998). Ironically, the intellectual community has not been so reluctant. The National Academy of Sciences recommended this as a solution to the bias that results from employment tests among job applicants with differential opportunities-to·learn (Hartigan & Wigdor, 1989). 44 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Education, however, has always occupied a different status with the courts. This may also apply with regard to testing in educational contexts. The issue of group norms in all aspects of schooling should be studied and debated. Certainly, the 1999 Standards for Educational and Psychological Testing may already have sided with this option. There are clear mandates there that validity needs to be grounded within linguistic groups when research indicates that test scores are affected by language background (Comment for Standard 9.2). The seventh is the best option; but if history is any indicator, it .is the one most likely to take multi-generations to accomplish. It is also the option that best explains why tests are such a failure for Hispanic communities. The primary problem with tests is not the tests. It is the educational context in which they are developed, used, and studied. The historical and contemporary data have clearly documented that, in the United States, public education has not wor:ked for Hispanic children. Tests help and perpetuate much of the dysfunction that Hispanic children get in schools. The one positive conclusion that can be drawn from the review presented in this document is that the testing community is finally beginning to realize that the . problems with testing Hispanic students are far more complex than ever imagined and that they are potentially irremediable in the status quo (Heller, Holtzman, & Messick, 1982; American Educational Research Association, et aI., 1985, 1999; Pellegrino, Jones, & . Mitchell, 1999; August & Hakuta, 1997; Sandoval et aI., 1998; Heubert& Hauser, 1999). The solution to the problems engendered and embodied in tests resides in changing the educational experiences of Hispanic children. . A compelling example of what this may entail was described by Garcia and Otheguy (1995). They set out to answer four research questions in an ethnographic study of "seven private, but low-tuition, non-elite schools in Dade County, Florida." They were "run by and for Cubans." The parents of the children were predominantly from working class and middle class income levels. They were, in effect, similar to families of Hispanic children in urban school districts. The four research questions were typically those that preoccupy educational researchers about bilingual children in U.S. public schools: Should Spanish be used? How is language dominance measured and used? When do you use English? In which language is reading taught? The authors were unable to answer these research questions. The following are the reasons for this failure. When majority educators look at the education of Hispanic children in the United States, they focus on their linguistic deficits .... Discussions about the education of these children begin and end with the issue of the English language, or how they lack it, and how best to give it to them ..... However. when Hispanic parents and educators in control of the education of their own children think about the educational process, they ask different questions. They ask questions about the way to educate their children, about pedagogy, instructional strategies and teaching methods, about curriculum and materials. We asked them about language. they told us about education ..... Spanish naturally belongs in ethnic schools that are controlled, staffed and run by the Hispanic community, so there is no need to question its role in public education ..... Those of us in public education need to learn from these educatprs that substantive high expectations do matter; that bilingualism and biliteracy are obtainable .if . . 45 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES one holds both children and teachers unequivocally responsible for obtaining them; that . initial literacy in two languages is possible and doesn't have to be limited to Spanish; that advanced literacy in two languages is possible and doesn't have to be limited to English; that in US society all children acquire English naturally and that therefore English acquisition should not be the main focus of education; that parents and community do matter for education; that when they are in controL .. .the results are ultimately superior; that the context of a child's home culture is essential..; and that continuity with the intellectual and social climate of the home is of paramount importance if the school is to help children develop and foster their intellectual and social growth. (Garcia and Otheguy, 1995, pg. 99-100) The public education of Hispanic children needs to focus on education. It needs to be reformed pre-eminently in terms of local control. Until such time as when the U.S. educational system is locally and proportionally controlled by Hispanic commu·nities and until it achieves a modicum of equity in how it distributes resources. cultural capital, and the application of "high standards" across all school districts, tests and test scores will continue to show massive technical problems of bias, differential treatments and differential outcomes. They will continue to· impede the future of Hispanic communities. Tests will "work" when the public education of Hispanic children becomes democratic and effective. . 46 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES .; RECOMMENDATIONS 1. The U.S. Department of Education's Office for Civil Rights should undertake a legal analysis of test usage with Hispanic students and individuals, focusing on the dysjuncture that exists between what the Standards for Educational and Psychological Testing prescribe and what test users (individual testers, school district testing programs, state testing programs) actually do. Particular attention should be focused on the testing of bilingual individuals. 2. The U.S. Department of Education's Office for Civil Rights needs to determine whether the "disparate treatment" legal analysis under Title VI and Title IX statutes applies to the historical experience of many, if not most, Hispanic students with tests and testing. A compelling, empirical argument can be made that they are tested under different conditions: they are tested with monolingual norms when most of them have varying levels of bilingual status,'all of which have left an indelible, if not unerasable, mark on all tests that use English as the main vehicle for eliciting responses; and, their scores show evidence of attenuated predictive validity related directly to their varying levels of exposure to Spanish. 3. Excessive testing should be discouraged .. There is a widespread belief that with students forwhom current testing technology may not be appropriate, the thing to do is to test them more using many different tests. There is no evidence to support this approach. There are, however, data suggesting that excessive testing does not improve diagnostic decisions (Mehan, Hertweck, & Meihls, 1986), but, rather, that it may negatively affect children (Taylor, .1991). . 4. The U.S. Department of Education's Office for Civil Rights needs to determine whether the "disparate impact" legal analysis under Title VI and Title IX statutes applies to the comprehensive and chronic pattern of Hispanic students' 'underrepresentation in Gifted and Talented programs, their overrepresentation in programs for students with disabilities, and their miniscule presence in institutions of higher learning in the United States. There is, in effect, a clear pattern of a disparate impact from testing practices across·a wide array oftests used in multiple . educational contexts. There is also compelling evidence that there is. bias in prediction and that this differentially' constricts tests' educational pu rposes when used with most Hispanic students. . 5: Translated tests should not be used. There is very little likelihood that the new translated test will have the same technical properties as the original, and there is a substantial likelihood that the translated test will not work. The practice of translating tests and of using their scores for making decisions about individuals should stop. . 6. A clear distinction, if not separation, needs to be drawn between the issues that are significant in meeting the challenges of a disability with those involved in the education of children with two linguistic systems. Recent publications on "diversity" and "test accommodations" are linking the issues relevant to English language learners with those that are meaningful for students with disabilities. One of the great . , . 47 �TESTING HIS'PANIC STUDENTS IN THE UNITED. STATES: TECHNICAL AND POLICY ISSUES historical mistakes in American education has been the tendency to perceive bilingualism as a handicap. For example, special education is dedicated to diminishing the impact of a disability. The education of English learners should not be guided by the diminution of an asset such as bilingualism. 7. Tests that purport to have equivalent test versions in English and Spanish need to show empirical evidence that, in fact, there is equivalence. Similarly, research is , urgently needed on whether bilingual, Hispanic children in the United States can be validly and fairly compared on Spanish/English tests that relied on monolingual samples to generate monolingual norms in English and Spanish. 8. The use of interpreters should be discouraged, if not proscribed. Interpreters are· basically poor substitutes for what should be provided to Hispanic students: culturally knowledgeable, linguistically competent testers from their own communities. As currently envisioned in the 1999 Standards for Educational and Psychological Testing, interpreters can be trained and used in testing situations. New data on this practice, however, suggest that the'use of interpreters may somewhat destroy comprehensive standardization. Further, in special education, the use of interpreters may lead to invalid inferences and conclusions. The failure to recruit, train and graduate Hispanics in the testing professions cannot be ameliorated by the use of interpreters. This is a practice that may really be a malpractice. ' 9. It is recommended that the Standards in Chapter 9 for "Testing Individuals of Diverse linguistiC Backgrounds" be analyzed by experts in second-language acquisition, language proficiency testing, and bilingual assessment in order to examine the ambiguities and the assumptions of that chapter. The 1999 Standards for Educational and Psychological Testing are problematic in the areas of language proficiency testing and the use of testing accommodations with bilingual subjects. 10. The impact of Hispanic culture and Spanish language proficiency levels on the' predictive, consequential, and/or instructional validity indices of tests should be determined. There is empirical evidence that tests used with Hispanic students show , evidence of bias. Comprehensive, longitudinal investigations on this' question should be commissioned. 11. The U. S. Department of Education's Office for Civil Rights should conduct an analysis of testing practices with Hisp?\nic students throughout the states and by the National Assessment for Educational Progress to determine whether some or all of these do not meet the legal criteria of discrimination under Title VI and Title IX. 12. The research agenda on assessment proposed by the National Research Council's Committee on Developing a Research Agenda on the Education of Limited-English Proficient and Bilingual Students in its report Improving Schooling for Language Minority Children (August, & Hakuta, 1997) should be endorsed and funded. 13. The recommendations of the National Research Council on testing English language learners on NAEP (Table 4) should be adopted, funded and applied. They should 48 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES also be broadened to include Hispanic children from all the major ethnic cultural' backgrounds. The issues related to cultural factors in achievement testing (such as . acculturation, the measurement of acculturation, the use of acculturation levels) should be investigated. It is time for a comprehensive set of action plans to make large~scale, educational accountability systems, such as NAEP, relevant and useful for the educational present and future of Hispanic children. 14. There is an urgent need to determine the diagnostic validity of Spanish language tests normed on monolingual populations and used for diagnostic purposes with U.S. bilingual populations. Diagnostic tests should not be administered to Hispanic students or they should be relegated to a lower status in the decision-making process for special education, or gifted and talented education. Alternatives to the . typical battery of diagnostic tests exist, all the way from placing a child in an enriched treatment situation to "diagnosing" their work products. 15. As is the case with most tests used in the United States, new personality tests specifically made appropriate for the Hispanic population's bilingual, multicultural· status are needed. There isvery little actual research on how to do diagnostic personality work with Hispanic children and youth. 16. It is recommended that research in occupational interest tests be significantly and quickly increased. Given the widespread use of occupational interest tests with elementary and high school students, as well as their possible role in tracking students in academic programs, the lack of research on the use and impact of these measurement instruments on Hispanic children and youth is a major knowledge gap. 17. Extended analyses and debate need to be conducted on whether Hispanic students' test scores should be interpreted primarily within a school district's "normative framework." That is to say, should national or statewide comparisons that are used to . determine an individual's eligibility for promotion, graduation, or admission to higher education continue to be made given the current knowledge base on testing Hispanic students? This does not preclude the use of tests to measure the performance of school systems (schools, districts) to determine how well or how poorly they are working. Clearly, however, in those school districts where there is no equality in educational programs and opportunities for Hispanic students, the question of what Gonstitutes a fair, normative comparison needs to be answered .. 18. It should be made clear that the starting point for the reform of unfair testing of Hispanic'students is not the tests; It is the instructional context. Until there is some semblance in equity, of standards, curricula, pedagogy and resources throughout schools, school districts and states, tests will continue to reify the inequality of educational opportunities in the country. Tests will continue to blame the Hispanic student for low scores and will continue to deny him or her promotion, eligibility and opportunity. 49 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES REFERENCES Abedi, J. (1999a). NCME examining the effectiveness of accommodation on math performance' of English Language Learners. 'Paper presented at the 1999 AERA conference, Montreal, Canada. Abedi, J. (1999b). The impact of students' background characteristics on accomodation results for students with limited English proficiency. Paper presented at the 1999 AERA Conference, Montreal. Canada. ' Abedi, J. (1999c). NAEP math test accommodationsfor students with limited English proficiency. Paper presented at the 1999 AERA Conference. Montreal. Canada . .' Altus, W.O. (1945). Racial and bi-lingual group differences in predictability and in mean aptitude test scores in an Army special training center. Psychological Bulletin, 42,310-320. Altus, W.O. (1949). The Mexican American: The survival of a culture. The Journal of Social Psychology, 29, 211-220 American Educational Hesearch Association, American Psychological Association, The National Council on Measurement in Education. (1966). Standards for educational and psychological tests and manuals. Washington, DC: American Psychological Association. American Educational Research Association, American Psychological Association, The National Council on Measurement in Education. (1974). Standards for educational and psychological tests and manuals. Washington, DC: American Psychological Association. American Educational Research Association. American Psychological Association. National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington. DC: American Psychological Association. American Educational Research Association, American Psychological Association, The National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association. American Psychological Association (1993). Guidelines for providers of psychological services to ethnic, linguistic, and culturally diverse populations. American Psychologist, 48, 45-48. Ammons. R.B. & Aguero, A. (1950). The Full-Range Picture Vocabulary Test: VII. Results for a Spanish American school-age population. Journal of Social Psychology, 32, 3-10. Anderson, N.E., Jenkins, F.F. & Miller, K.E. (1996). NAEP inclusion criteria and testing accommodations: Findings from the NAEP 1995 field test in mathematics. Princeton. NJ: Educational Testing Service. Anderson, N.E. & Olson, J. (1996). Puerto Rico Assessment of Educational Progress: 1994 PRAEP Technical Report. Princeton, NJ: Educational Testing Service. Arias, B. (Ed.). (1986). The education of Hispanic Americans: A challenge for the future. American Journal of Education, 95. Arreola v. Santa Ana Board of Education (Orange County, California). No. 160-577. (1968). Arsenian, S. (1945). Bilingualism in the post-war world. Psychological Bulletin, 42. 65-86. August, O. & Hakuta, K. (Eds.). (1997). Improving schooling for language-minority children: A research agenda. Washington. DC: National Academy Press. 50 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES August, D. & lara J. (1996). Systemic reform and limited English proficient students. Washington, D. C.: Council of Chief State Schobl Officers. Baratz-Snowden, J., Pollack, J. & Rock, D. (1988). Quality of responses of selected items on NAEP special study student survey. Princeton, NJ: National Assessment of Educational Progress. Baratz-Snowden, J., Rock, D., Pollack, J., & Wilder. G. (1988). The educational progress of language minority children: Findings from the ·1985-86 special study. Princeton, NJ: Educational Testing Service. Bell, R (1935). Public school education of second generation Japanese in California. Stanford, CA: Stanford University Press. Bermudez, A.B. & Rakow, S.J. (1990). Analyzing teachers' perceptions of identification procedures for gifted and talented Hispanic limited English-proficient students at-risk. Educationallssues of Language Minority Students, 7,21-34. Bernal, E.M. Jr. & Reyna, J. (1975). Analysis and identification of giftedness in Mexican American children: A pilot study. In B.O. Boston (Ed), A resource manual of information on educating the gifted and talented. Reston, VA: Council for Exceptional Children. Bernal, M.E. & Castro, F.G. (1994). Are clinical psychologists prepared for service and research with ethnic minorities?: Report of a decade of progress. American Psychologist, 49, 797-805. Brigham, C.C. (1922). A study of American intelligence. Princeton, NJ: Princeton University Press. Brigham. C.C. (1930). Intelligence tests of immigrant groups. Psychological Review, 37, 158-165. Brizuela. C.S. (1975). Semantic differential responses of bilinguals in Argentina. Dissertation Abstracts International, 36(12-B), 6439. University Microfilms No. 76-13.514). Brown, G.L. (1922). Intelligence as related to nationality .. Journal of Educational Research, 5, 324-327. Brown v. Board of Education. 347 U.S. 483 (1954). . Callahan, C.M., Hunsaker. S.L., Adams, C.M., Moore. S.D. & Blend. l.C. (1995). Instruments. used in the identification of gifted and talented students. Charlottesville. VA: The National Research Center on the Gifted and Talented, Research Monograph 95130. Cebollero, P.A. (1936). Reactions of Puerto Rican children in New York City to psychological tests. An analysis of the study by Armstrong, Achilles, and Sacks of the same name. San Juan. P.R: The Puerto Rico School Review. pt. 11. Cervantes. RC. & Arroyo. W. (1994). DSM-IV: Implications for Hispanic children and adolescents. Hispanic Journal of Behavioral Sciences, 16, 8-27. Chambers, J.A., Barron, F. & Sprecher, J.W. (1980). Identifying gifted Mexican American students. Gifted Child Quarterly, 24, 123-128. Cleary. TA, Humphrey, l.G., Kendrick, SA & Wesman, A. (1975). Educational uses of tests with disadvantaged students. American Psychologist, 15, 15-40. Coleman, J.S., et al. (1966). Equality of educational opportunity. Washington. D.C.: U.S. Office of Education. 51 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Colvin, S.S. (1921). Intelligence and its measurement: A symposium (IV). Journal of Educational Psychology, 12, 136-139. Costantino, G., Malgady, RG., Rogier, LH. & Tsui, E.C. (1988). Discriminant analysis of clinical outpatients and public school children by TEMAS: A Thematic Apperception Test for Hispanics and Blacks. Journal of Personality Assessment, 52, 670-678. Cotter, D.E & Berk, RA. (1981). Item bias in the WISC-R using Black, White, and Hispanic learning disabled children. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles, CA. . Covarrubias v. San Diego Unified School District. No. 70-394-T, San Diego, CA. (1972). Cronbach, L.J.• Linn, RL, Brennan. Rl: & Haertel, EH. (1997). Generalizability'analysis for performance assessments of student achievement or school effectiveness. Educational Psychological Measurement, 57, 373-399. Cuellar, I., Arnold, B. & Maldonado, R (1995). Acculturation Rating Scale for Mexican Americans II: A revision of tlie original ARSMA scale. Hispanic Journal of Behavioral Sciences, 17,275-304. Cuellar, I., Harris, LC. & Jasso, R. (1980). An acculturation scale for Mexican American normal and clinical popul~tions; Hispanic Journal of Behavioral Sciences, 2, 197-217. Dana, RH. (1995). Culturally compJterit MMPI assessment of Hispanic populations. Hispanic Journal of Behavioral Sciences, 17, 305-319. Darcy, N.T. (1946). The effect of bilingualism upon the measurement of the intelligence of children of pre-school age. Journal of Educational Psychology, 38, 21-44. Darsie, M.L. (1926). The mental capacity of American-born Japanese children. Comparative Psychology Monographs, 3, 1-89. Davenport, E.L (1932). The intelligence quotients of Mexican and non-Mexican siblings. School and Society. 36, 304-306 . . Diana v California State Board of Education. No. C-70-37, United States District Court of Northern California.. (1970). Diaz-Guerrero, R (1988). Psicologia del Mexicano [Psychology of the Mexican] (4th ed.). Mexico: Editorial Trillas. Dornic, S; (1977). Information processing and bilingualism. Department of Psychology, University of Stockholm. Dornic, S. (1978a). Noise and language dominance. Department of Psychology, University of Stockholm; . . . Dornic, S. (1978b). The bilingual's performance: Language dominance, stress,and individual differences" In D. Gerver & H. Sinaiko (Eds.), Language and Interpretation and Communication. New York: Plenum Press. Dornic, S. (1979). Information processing in bilinguals: Some selected issues. Psychological Research, 40,329-348. DuFon, MA (1991). Politeness in interpreted and non-interpreted IEP (Individualized Education Program) conferences with Hispanic Americans. (Doctoral dissertation, University of Hawaii, 1991). UMI Dissertation Services, 1345950. 52 �TE$TING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES . Edgerton, RB. & Karno, M. (1971). Mexican-American illness.. Archives of General Psychiatry, 24, 286-290. bilinguali~m and the perception of mental . Emerling, F. (1990). An investigation of test bias in two nonverbal cognitive measures for two . . ethnic groups. Joumal of Psychoeducational Assessment, 3, 233-244. Feingold, G.A. (1924). Intelligen.ce of the first generation of immigrant groups. Joumal of Educational Psychology, 15, 65-82. . . Fernandez, RM. & Nielsen, F. (1986). Bilingualism and Hispanic scholastic achievement: Some baseline results. Social Science Research, 15,43-70. Figueroa, RA. (1983). Test bias and Hispanic children. Joumal of Special Education, 17,431 440. Figueroa, RA. (1987). Special education assessment of Hispanic pupils in Califomia: Looking ahead to the 1990s, Sacramento: California State Department of Education, Office of Special Education. Figueroa, RA. (1990). Assessment of linguistic minority group children. In C.R Reynolds and R.W. Kamphaus (Eds.), Handbook ,of psychological and educational assessment of children: Vol 1. Intelligence and achievement. ·New York: Guilford. . Figueroa, RA & Artiles, A. (1999). Disproportionate minority placement in special education programs: Old problem, new explanations. In A. Tashakkori & S.H. Ochoa (Eds.), Readings on Equal Education: Volume 16, Education of Hispanics in the United States: Politics, Policies, and Outcomes (pgs. 93-118). New York: AMS Press. Figueroa, RA. & Garcia. E. (1994). Issues in testing students from culturally and linguistically' diverse backgrounds. Multicultural Educatioi1:Fall; 1994, pgs. 10-19. Figueroa, RA. & Sassenrath, J .M. (1989). A longitudinal study ofthe predictive validity of the System of Multicultural Pluralistic Assessment (SOMPA). Psychology in the Schools, 26.5-19. Gandara, P., Keogh, BK. & Yoshioka-Maxwell. B. (1980). Predicting academic performance of Anglo and Mexican American Kindergarten children. Psychology in the Schools, 17, 174-177. Garcia. J.H. (1994). Nonstandardized instruments for the assessment of Mexican-American children for gifted/talented programs. In S.B. Garcia (Ed), Addressing Cultural and Linguistic Diversity in Special Education: Issues and trends (pp 46-57). Reston, VA: The Council for Exceptional Children. Garcia, O. & Otheguy. R (1995). The bilingual education of Cuban American children in Dade County's ethnic schools. In O.Garcia & C. Baker (Eds). Policy and practice in bilingual education: A reader extending the foundations (pgs. 93~102). Clevedon. England: Multilingual Matters. Garcia. S.B. (1985).. Characteristics of limited English proficient Hispanic students served in programs for the learning disabled: Implications for policy. practice and research (Part 1). Biiingual Special Education Newsletter, pp1~5. Garth, T.R., (1920). Racial differences in mental fatigue. Joumal of Applied Psychology, 4,235 244. Garth, T.R (1928). Astudy of the intelligence arid achievement of full-blooded Indians. Jouma/of Applied Psychology, 12,511-516. . 53 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Geisinger, K.F. (1992). Fairness and selected psychometric issues in the psychological testing of Hispanics. In K.F. Geisinger (Ed.), Psychological testing of Hispanics (pgs. 17-42). Washington, DC: Am~rican Psychological Association. Geisinger, K.F. (Ed.). (1992). Psychological testing of Hispanics. Washington, DC: American Psychological Association. . Geisinger, K.F. (1994a). Psychometric issues in testing students with disabilities. Applied Measurement in education, 7, 121-140. Geisinger, K.F. (1994b). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6, 304 312. G6mez-Palacio, M.M., Padilla, ER & Roll, S. (1983). Escala de Inteligencia para Nivel Escolar Weschler [Wechsler Intelligence Scale for Children-Revised]. Mexico City: EI Manual Moderna, SA de C.V. Gonzalez, JR. (1978). Language factors affecting treatment of bilingual schizophrenics. Psychiatric Annuals, 8,68-70. . Gonzalez~Reigoza, F. (1976). The anxiety-arousing effect of taboo words in bilinguais. In C.D. Spielberger & R Diaz-Guerrero (Eds.), Cross-cultural anxiety (pp. 89-105). Washington, DC: Hemisphere. Gould, S. J .. (1981). The mismeasure of man. New York: Norton. Grand, S., Marcos, L.. Freedman, N. & Barroso, F. (1977). Relation of psychopathology and bilingualism to kinesthetic aspects of interview behavior in schizophrenia. Journal of Abnormal Psychology. 86(5), 492-500. Grosjean, F. (1989). Neurolinguists, beware! The bilingual is not two monolinguals in one person. Brain and Language, 36,3-15. Guadalupe Organization v. Tempe Elementary School District. 587 F.2d 1022, U.S. Dist. Court of Arizona. (1978). . Hakuta, K., Ferdman, B.M. & Diaz, RM. (1986). Bilingualism and cognitive development: Three perspectives and methodological implications. Los Angeles: University of California, Los Angeles. Center for Language Education and Research. . Hakuta, K. & Garcia, E.E. (1989). Bilingualism and education. American Psychologist. 44, 374 379. Hartigan, JA & Wigdor, AK (Eds.1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington, D.C.: National Academy Press. Heller, KA, Holtzman, W.H., & Messick, S. (1982). Placing children in special education: A strategy for equity. Washington, DC: National Academy Press. Heubert, J.P. & Hauser, RM. (1999). High-stakes testing for tracking, promotion, and graduation. Washington, D.C.: National Academy Press. Hine, C.Y. (1993). The home environment of gifted Puerto Rican children: Family factors which support high achievement. Presented at the annual National Research Symposium on Limited English Proficient Student Issues, Washington, D.C. . Hobson v Hansen. (1967). 269 F. Supp. 401 (D.D.C.). 54 �. . TESTING HISPANIC STUDENTS IN THE'UNITED STATES: TECHNICAL AND POLICY ISSUES . . Holtzman, W.H., Draz-Guerrero, R, & Swartz, J.D. (1975). Personality development in two cultures: A cross-cultural longitudinal study of schQol children in Mexico and the United States. Austin: University of Texas Press. Hung-Hsia, H. (1929). The mentality of the Chinese and Japanese. Journal of Applied PsychOlogy, 13,9-31. ' Jensen, A.R. & Inouye,A. R. (1980). Levell and Level II abilities in Asian, Wh'ite, and Black children. Intelligence, 4,41-49. Johnsen, SK, Ryser, G., & Dougherty, E., (1993). The validity of product portfolios in the identification of gifted students. Gifted International: A Talent Development Journal, B( 1). 40-43. Johnson, L.W. (1938). A comparison of vocabularies of Anglo-American and Spanish-American , . high school pupils. Journalof Educational Psychology, 29, 135-144.' Jose P. v. Ambach" No.C-270 (E.D.N.Y. 1979). Kaufman, A.S. & Wang, J-J. (1992). Gender, race and education differences on the K-BIT at ages 4 to 90 y~ars. Journal of Psychoe.ducational As~essment, 10, 219-229. Keogh, B. K. (1990). Narrowing the gap between policy and practice. Exceptional Children, 57, 186-190.' Koch, H.L.,& Simmons; R. (1926). A study of the test performance of American, Mexican and Negro children. Psychology, Monographs, .35, 1-116. ' Kopriva, R.J. (1999). A conceptual framework for the valid and comparable measurement of all students. Washington, D. C.: Council otChief State School, Qfficers. .. 'Langdon, H.W. (1994). The interprfJter-translator process in the educational setting: A resource· manual. Sacramento, CA: Resources in Special Education, California Department of Education. Laosa, L. M. (1998). School segregation of children who migrate to the United States from Puerto Rico. Princeton, NJ:Educational Testing Service. ' Larry P. v. Riles. 343 F. Supp. 1306 (N,D. Cal. 1972)affl' 502 F.2d 963 (9th Cir. 1974); 495 F. Supp. 926 (N.D. Cal. 1979); appeal docketed, No. 80-4027 (9th Cir., Jan. 17, 1980). Lester,O.o. (1929). Performance tests and foreign children. Journal of Educational Psychology, 20,303-309. . . . ' ,Lora v. Board of Educa'tion of the City of New York, 587 F. Supp. 1592 (E.D.N.Y. 1984). , ' Luh, C.W. & Wy, T.M. (1931). A comparative study of the intelligence of Chinese children on the . Pintner performance and the Binet 'tests. Journal of Social Psychology, 2, 402-408. . Lyon, G.R. (1996). Learning disabilities. The Future of Children: Special Education for Students with Disabilities, 6(1}, 54-76. Los Altos, CA: Center for the Future of Children, David and Lucille Packard Foundation.; . MacMillan, D.L., Gresham, F.M. & Bocian, K.M. (1998). Discrepancy betwe~n definitions of learning disabilities and, school practices: An empirical investigation. Journa!of Learning Disabilities, 31, , 314-326, . , 55 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Maker, C.J. (1996). Identification of gifted minority students: A national problem, needed changes, and a promising solution. The Gifted Child Quarterly, 40, 41-50. Maker, C.J., Nielson & Rogers (1994). Giftednessjdiversity and problem-solving. Teaching Exceptional Children, 27,4-19. Malakoff, M. & Hakuta, K. (1991). Translation skill and metalinguistic awareness in bilinguals. In E. Bialystock (Ed.), Language processing in bilingual children. Cambridge, MA: Cambridge University ~~. . Malgady, R. Constantino, G. & Rogier. L. (1984). Development of a thematic apperception te'st (TEMAS) for urban Hispanic children. Journafof Consulting and'Clinical Psychology, 52(6), 986-996. Manuel, H.T. (1935). Spanish and English editions of the Stanford-Binet in relation to the abilities of Mexican children (University of Texas Bulletin No. 3532). Austin: University of Texas. Margolin. L. (1994). Goodness personified: The emergence of gifted children. New York: Aldine de Gruyter. . Marin, G.(1992). Issues in the measurement of acculturation among Hispanics. In K.F. Geisinger (Ed.), Psychological testing of Hispanics (pgs. 235-252). Washington, DC: American Psychological Association. Marquez, J.A., Bermudez, A.B. & Rakow, S.J. (1992). Incorporating community perceptions in the identification of gifted and talented Hispanic students. The Journal of Educational Issues of Language Minority Students, 10, 117-127.' . McLaughlin. M. W. & Shepard, L.A. (1995). Improving education through standards-based reform. A report of the National of Education Panel on standards-based education reform. Stanford. CA: National Academy of Education. Mehan, H., Hertweck. H. & Meihls, J.L (1986). Handicapping the handicapped. Palo Alto, CA: Stanford University Press. Mercer, J.R (1979). The system of multicultural pluralistic assessment: Technical manual. New York: Psychological Corporation. Mitchell, A.J. (1937). The effect of bilingualism on the measurement of intelligence. Elementary School Journal, 38, 29-37.' . Moreno. J.F. (Ed.)(1999). The elusive quest for equality. Cambridge, MA: Harvard Educational Review. National Commission on Secondary Education for Hispanics. (1984a). "Make Something Happen n : Hispanic and urban high school reform. V.I. Washington, DC: Hispanic Policy Development Project. National Commission on Secondary Education for Hispanics. (1984b) .. "Make Something Happen:" Hispanics and urban high school reform. V. II. Washington, DC:The Hispanic Policy Development Project Inc.. National Council ofLa Raza. (1998). Latino education, status and prospects. Washington. DC: National Council of La Raza. Neiser, U., Bodoo. G., Bouchard, T.J., Boykin, A.W., Brody, N .• Ceci, S.J .• Halpern. D.F., Loehlin. J.C., Perl off, R, Sternberg, RJ. & Urbina, S. (1996). Intelligence: Knowns and unknowns [Task Force Report of the American Psychological Association]. American Psychologist. 51.77-101. 56 �TESTING HISPANIC.STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Nieves-Grafals, S. (1995). Psychological testing as a diagnostic and therapeutic tool in the . treatment of traumatized Latin American and African Refugees. Cultural Diversity and Mental Health, 1, 19-27. . . . Olmedo, EL. (1981). Testing linguistic minorities. American Psychologist, 36. 1078-1085. Olson, J.F. & Goldstein, AA. (1997). The inclusion of students with disabilities.and limited English proficiency in large-scale assessments: A summary of a recent progress. Washington, DC: U.S. Department of Education. Ortield, G. & Yun, J.T. (1999). Resegregation in American schools. The Civil Rights Project: Harvard University. Ortiz. AA. (1986). Characteristics of limited English proficient Hispanic students served in programs for the learning disabled: Implications for policy and practice (Part II). Bilingual Special Education Newsletter, pp. 1-5. Ortiz, AA & Maldonado-Colon, E. (1986). Recognizing'learning disabilities in bilingual children: How to lessen inappropriate referral of language minority students to special education. Journal of Reading, Writing, and Learning Disabilities International, 43(1).47-56. Ortiz, AA & Polyzoi, E. (1986). Characteristics of limited English-proficient Hispanic students . served in programs for the learning disabled: Implications for policy and practice. Austin: University of Texas. (ERIC Document Reproduction Service No. ED 267 597) Ortiz, AA & Yates. J.R. (1987). Characteristics of learning disabled, mentally retarded, and speech-language handicapped Hispanic students at initial evaluation and re-evaluation. Unpublished manuscript, University of Texas at Austin. Paschal. F.C. & Sullivan. L.R. (1925). Racial influences in the mental and physical development of Mexican children. Comparative Psychology Monographs, 3,1-76. Pellegrino, J.W., Jones, L.R., & Mitchell, KJ. (Eds.) (1999). Grading the nation's report card: Evaluating NAEP and transforming the assessment of educational progress. WaShington, DC: National Academy Press. Pennock-Roman. M. (1990). Test validity and language background: A study of Hispanic American students at six universities. New York: College Entrance Examination Board. Perrine, J. (1989). The efficacy of situational identification of gifted Hispanics in East Los Angeles through nurturance that capitalizes on their culture. In C.J. Maker & S.W; Schiever (Eds.), Critical issues in gifted education: Vol. 2. Defensible programs for cultural and ethnic minorities. Austin, TX: Pro-Ed Publishers. (pp.3-18). Pilkington, C., Piersel, W., &Ponterotto, J. (1988). Home language as a predictor of first grade achievement for Anglo- and Mexican-American children. Contemporary Educational Psychology, 13(1), 1 14. Pintner, R. & Arsenian, S. (1937). The relation of bilingualism to verbal intelligence and school adjustment. Journal of Educational Research, 31, 255-263. Pintner, R.& Keller, R. (1922). Intelligence tests of foreign children. Journal of Educational Psychology, 13.214-222. Pratt, H.G. (1929). Some conclusions from a comparison of school achievement of certain racial groups. Journal of Educational Psychology. 20.661-668. �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLlcvlsSUES ... . . President's Advisory Commission on Educational Excellence for Hispanic Americans. (1996). Our nation on the fault line: Hispanic American education. Washington, DC: President's Advisory Commission on Educationar Excellence for Hispanic Americans. Ramos, RA. (1992). Testing and assessment of Hispanics for occupational and management positions: A developmental needs analysis. In K.F. Geisinger (Ed;), Psychological testing of Hispanics (pgs. 173-194). Washington, DC:. American Psychological Association. Reynolds, A. (1933). The education of Spanish-speaking children in five southwestern states (1933,Bulletin No.11). Washington, DC: U.S. Department of the Interior. Rivera, C. (1986). The national assessment of educational p'rogress: Issues and concerns for the assessment of Hispanic students. Chicago: Spencer Foundation .. Rivera, C. & Pennock-Roman; M.· (1987). Issues in race/ethnicity identification procedures in the national assessment of educational progress, 'Part I: A cpmparison of observer reports and selfidentification. Princeton, NJ: Educational Testing Service; . . Rueda, R, Cardoza, D., Mercer, J.R, & Carpenter, L(1984). An examination of special education decision-making with Hispanic first-time referrals in large urban school districts. Las Alamitos, CA: Southwest Regional Library. Rueda, R, Figueroa; R, Mercado, P. & Cardoza. D.' (1984). Performance of Hispanic educable mentally retarded learning disabled and nonclassified students on the WISC-RM, SOMPA and S-KABC. Los Alamitos, CA: Southwest Regional Laboratory. Rueda, R S. & Forness. S.R (1994). Childhood depression: Ethnic and cultural issues in special education. In RL Peterson & S.I. Jordan (Eds.). Multicultural issues in the .education of students with behavioral disorders (pgs. 40-62). Cambridge, MA: Brookline.. Ruiz. E.J. (1975). Influence of bilingualism on communication groups. International Journal of -Group Psychotherapy, 25,391-395. . . . Ruii v. State Board of Education. C.A. No. 218394 (Super; Ct. Cal. Sacramento County, 1971). Saer, D.J.- (1923). The effect of bilingualism on intelligence. British Journal of Psychology, 14,25 , '" < Sanchez, G.I. (1934). Bilingualism and Mental Measures: A word of caution. Journal of Applied Psychology, 18, 765-772. , . . Sanchez-Boyce, M. (1999).The social construction of meaning .ininterpreted events: Use of foreign language interpreters in education. Doctoral dissertation, University of California at Davis. '. Sandov"!l, J. (1979). The WISC-R and,internal evidence oftest bias withminority groups. Journal of Consulting and Clinical Psychology, 47, 9~9-927. . Sandoval, J. (1998). Critical thinking in test interpretation. In , J.Sandoval, C.L Frisby, K.F GeiSinger, J.D. Scheuneman & J.R Grenier, (Eds.). Test interpretation and diversity: Achieving equity in assessment (pgs. 31-50). Washington, D.C.: American Psychologic~1 Association. Sandoval, J., Frisby, C.L., Geisinger,KF., Scheuneman, J.D .• & Grenier, J.R. (199B). Test interpretation and diversity: Achieving equity in assessment. Washington. DC: American Psychological Association. 58 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Sapon-Shevin, M. (1994). Playing favorites: Gifted education and the disruption of community. New York: State University Press: Sawyer, C.B. 8. Marquez, JA (1993). Discrimination against LEP students in gifted and talented classes. The Journal of Educational Issues of Language Minority Students, 12, 143-149. Sired, S.G. & Geisinger, K.F. (1998). Equity issues in employment testing. In J. Sandoval, C.L. Frisby, K.F. Geisinger, J.D. Scheuneman, & J.R Grenier (Eds.). Test interpretation and diversity: Achieving equity in assessment (pgs. 105-140). Washington, DC: American Psychological Association. Skrtic, R (1991). The special education paradox: Equity as the way to excellence. Harvard Educational Review, 61, 148-186. Skrtic, T. M. (Ed.) (1995). Disability and democracy. New York: Teachers College Press. Smith, M.E. (1942): The effect of bilingual background on college and aptitude scores and grade point ratios earned by students at the University of Hawaii. Journal of Educational Psychology,' 33, 356 364. Smith, M.E. (1949). Measurement of vocabularies of young bilingual children in both of the . languages used. Journal of Genetic Psychology, 74, 305-310. Smith. M.E. (1957). Word variety as a measure of bilingualism in preschool children. Journal of Genetic Psychology, 90, 143-150. . Soto, L.D. (1988). The home environment of higher and lower achieving Puerto Rican children.' Hispanic Journal of Behavioral Sciences, 10, 161-167, . Stanton-Salazar, R.D. (1997). A social capital framework for understanding the socialization of racial minority children and youth. Harvard Educational Review, 67, 1-34 . . Stanton-Salazar, RD. & Dornbusch, S.M. (1995). Social capital and the reproduction of inequality; Information networks among MeXican-origin high school students. Sociology of Education, 68, 116-135. Stone, B.J. (1992). Prediction of achievement by Asian-American and White children. Journal of School Psychology, 30, 91-99. Swedo. J. (1987). Effective teaching strategies for handicapped limited English-proficient . students. Bilingual Special Education Newsletter. pp.1-5. Taylor. D. (1991). Learning Denied. Portsmouth. NH: Heinemann. Taylor. D. (1998). Beginning to read and the spin doctors of science: The political campaign to change America's mind about how children learn to read. Urbana, IL: National Council of Teachers of English.' . Taylor, RL. & Richards, S.B. (1991).' Patterns of'intellectual differences of Black, Hispanic, and White children. Psychology in the Schools, 28, 5-8. Thurlow, M., Liu, K.,Erickson,R., Spicuzza, R & EI Sawaf. H. (1996). Accommodations for students with limited English proficiency: Analysis of Guidelines from states with graduation exams. Minneapolis, ~N: National Center on Educational Outcomes. Trueba, H.T. (1987). Success or failure? Learning and the language minority student. New York: Newbury. 59 �TESTING HISPANIC STUDENTS iN THE UNITED STATES: TECHNICAL AND POLICY ISSUES U.S. Commission on Civil Rights. (1971a). Ethnic isolation of Mexican Americans in the public schools of the Southwest (Report I: Mexican American Educational Study). Washington, DC: U.S. Commission on Civil Rights. . U.S. Commission on Civil Rights. (1971 b). The unfinished education: Outcomes for Minorities in the Five Southwestem States (Report II: Mexican American Educational Study). Washington. DC U.S. Commission on Civil Rights. ' U.S. Commission on Civil Rights. (1972). The excluded student: Educational practices affecting Mexican Americans in the Southwest (Report III: Mexican American Educational Study). Washington, DC: U.S. Commission on Civil Rights., U.S. Commission on Civil Rights. (1973). Teachers and students: Differences in teacher interaction with Mexican-American and Anglo'students (Report V: Mexican American Educational Study). Washington, DC: U.S. Commission on Civil Rights. U.S. Commission on Civil Rights (1974). Toward quality educationfor Mexican-Americans (Report VI: Mexican-Am~rican Educational Study). Washington, DC: U.S. ·Commission on Civil Rights. U.S. Department of Education (1993). Fifteenth annual report to Congress on the implementation of The Education of the Handicapped Act. Washington, DC: U.S.-Commission on Civil Rights. U.S. Department of Education Office of Civil Rights. (draft, December 1999). Nondiscrimination in high-stakes testing: A resource guide. Washington, DC: U.S. Department of Education. Valdes, G. & Figueroa, RA. (1994). Bilingualism and testing: A special case of bias. NorwOOd, NJ: Ablex Publishing. Valdez, RS., & Valdez, C. (1983). Detecting predictive bias: The WISC-R vs. achievement scores of Mexican-American and non-minority students. Learning Disability Quarterly, 6(4),440-447. Valencia, RR (1982). Predicting academic achievement of Mexican-American children: Preliminary analysis of the McCarthy Scales. Educational and Psychological Measurement, 42, 1269 1278. Valencia, RR (Ed.). (1991). Chicano school failure and success: Research and policy agendas for the 1990s (The Stanford Series on Education and Public Policy). Basingstoke, England: Falmer Press. Valencia, RR & Rankin, RJ. (1983). Concurrent validity and reliability of the Kaufman version of the McCarthy Scales Short Form for a sample of Mexican-American children.· Educational and Psychological Measurement, 43, 915-925. Velasquez, RJ. & Callahan, W.J. (1992). Psychological testing of Hispanic Americans in clinical settings: Overview and is~ues. In K.F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 253-265). Washington, DC: American Psychological Association. Westermeyer, J .. (1987). Cultural factors in clinical assessment. Journal of Consulting and Clinical Psychology, 55,471-478. Wheeler, L. R (1932). The mental growth of dull Italian children. The Journal of Applied Psychology. 16,650-667. Whitworth, RH. & McBlaine, D.O. (1993). Coinparison of the MMPI and MMPI-2 administered to Anglo- and Hispanic-American university students. Journal of Personality Assessment, 61, 19-27. Whitworth, RH. & Unterbrink, C. (1994). Comparison of MMPI-2 clinical and content scales administered to Hispanic a~d Anglo-Americans. Hispanic Journal of Behavioral Sciences, 16,255-264. 60 �TeSTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL At;lD POLICY ISSUES Wilkinson, C. Y. & Ortiz, AA (1986). Characteristics of limited English-proficient and English proficient leaming disabled Hispanic students at initial assessment and at reevaluation. Austin: University of Texas. Handicapped Minority Research Institute on Language Proficiency. Willig, AC. & Swedo, J.J. (1987). Improving teaching strategies for exceptional Hispanic limited English proficient students: An exploratory study of task engagement and teaching strategies. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC. WOOd, M.M. (1929). Mental test findings with Armenian, Turkish, Greek and Bulgarian subjects. Joumal ofAppliedPsycho/ogy, 13, 266-273. . . Woodrow, H. (1921). Psychology, 12.207-210. Intelligen<~e and its measurement: A symposium (XI). Joumal of Educational ' Yoder. D. (1928). Present status of the question of racial differences. Joumal of Educational Psychology, 19,463-470. Zappia.IA (1989). Identification of gifted Hispanic students: A multidimensional view. In C.J . . Maker & S.W. Schiever (Eds.), Critical issues in gifted education. Vol 2: Defensible programs for cultural and ethnic minorities (pp. 19-26). Austin. TX: Pro-Ed. 61 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY .ISSUES APPENDIX A The 1999 Standards for "Testing Individuals of Diverse Linguistic Backgrounds" . (Chapter 9) Standard 9.1 Testing practice should be designed to reduce threats to the reliability and validity of test score inferences that may arise from language differences. . , Comment: Some tests are inappropriate for use with individuals whose knowledge of the language of the test is questionable. Assessment methods together with careful professional judgment are required to determine when language differences are relevant. Test users can judge how best to address this standard in a particular testing situation. . . . Standa'rd 9.2 When credible research evidence reports that test scores differ in .meaning across subgroups of linguistically diverse test takers, then to the extent feasible, test developers should collect for each linguistic subgroup studied the same form of validity evidence collected for the examinee population as a whole. Comment: Linguistic subgroups may be found to differ with respect to appropriateness of test content, the internal structure of their test responses, the relation of their test scores to other variables, or the response processes employed by individual examinees. Any such' findings need to receive due consideration in the interpretation and use of scores as well as in test revisions. There may also. be legal or regulatory requirements to collect subgroup validity evidence. Not all forms of evidence can be examined separately for members of all linguistic groups. The validity argument may rely on existing research literature, for example, and such literature may not be available for some populations. For some kinds of evidence, separate linguistic subgroup analyses may not be feasible due to the limited number of cases.available. Data may sometimes be accumi.Jlated so that these analyses can be performed after the test has been in l.!se for' a period of ~ime. It is important to note that this standard calls for more than representativeness in the selection of samples used for , validation or norming studies. Rather, it calls for separate,· parallel analyses of data for members of different linguistic groups, sample sizes permitting. If a test is being used while such data are being collected, then cautionary statements are in order regarding the limitations of interpretations based on test scores. Standard 9.3 'When testing an examinee proficient in two 'or more languages for which the test is available, the examinee's relative language proficiencies should be determined. The test generally should be administered in the test taker's most proficient language,' unless proficiency in the less proficient language is part of the assessment. ' ,. Comment: Unless the purpose of the test,ing is to determine proficiency in a particular language.or the level of language proficiency required for the test is a work requirement, test users need to take into account the linguistic 62 �TESTING HiSpANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES characteristics of examinees who are bilingual or use multiple languages. This may require the sole use of one languag~ or use of multiple languages in order to minimize the introduction of construct-irrelevant components to the measurement process. For example, in educational settings, testing in both the language used in school and the native language of the examinee may be necessary in order to determine the optimal kind of instruction required by the examinee. Professional judgement needs to be used to determine the most appropriate procedures for establishing relative language proficiencies. Such procedures may range from self-identification by examinees through formal proficiency testing. Standard 9.4 Linguistic modifications recommended by test publishers, as well as the rationale for the modifications, should be described in detail in the test manual. , Comment: Linguistic modifications may be recommended for the original test in the primary language or for an adapted version in a secondary language, or both. In any case, the test manual should provide appropriate information regarding the recommended modifications, their rationales, and the appropriate use of scores' ' obtained using these linguistic modifications. Standard 9.5 When there is credible evidence of score comparability across regular and modified tests or administrations, no flag should be attached to a score. When such evidence is lacking, specific information about the nature of the modifications should be provided, if permitted by law, to assist test users properly to interpret, and act on test scores. ' Comment: The inclusion of a flag on a test score ,where a linguistic modification was provided may conflict with legal and social policy goals promoting fairness in the treatment of individuals of diverse linguistic backgrounds. if a score from a modified administration is comparable to a score from a nonmodified administration, there is no need for a flag. Similarly, if a modification is provided for which there is no reasonable basis for believing that the modification would affect score comparability, there is no need for a flag. Further, reporting practices that use asterisks or other non-specific symbols to indicate that a test's administration has been modified provide little useful information to test users. Standard 9.6 When a test is recommended for use with linguistically diverse test takers, test developers and publishers should provide the information necessary for appropriate test use and interpretation . .comment: Test developers should include in test manuals and in instructions for score interpretation explicit statements about the applicability of the test with individuals who are not native speakers of the original language of the test. However, it should be recognized that test developers and publishers seldom will find it feasible to conduct studies specific to the large number of linguistic groups found in certain countries. Standard 9.7 When a test is translated from one language to another, the methods used in establishing the adequacy of the translation should be described, and empirical and logical evidence should be provided for score reliability and the �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLlCv.lsSUES , validity' of the translated test's score inferences for the uses intended in the linguistic groups to be tested. ' , Comment: For example, if a'test is translated into Spanish for use with Mexican, 'Puerto Rican, Cuban, Centra,lArrierican and Spanish populations, score reliability , and the validity 9ftest score inferences should be established with members of , " each of these groups separately where feasible. In addition, the test translation ' . methods used need to be described in detail. Standard 9.8 In employment and credentialing testing, the proficiency level required in the language of the test should npt exceed that appropriate to the relevant occupation or profession. . , Comment: Many occupations and professions require a suitable facility in the language of the· test. In such cases,a test that isused as a part of selection, advancement, or credentialing maY,appropriately reflect that aspect of performance., However, the level of language proficiency required on the test should be no greater than the level needed to meet work requirements. Similarly, the modality in which language proficiency is assessed should be comparable t9 that on the job. For example, if thejob requires only that employees understand verbal' instructions in the language used on the job, it would be inappropriate for a selection test to require proficiency in reading and writing that particular language. Standard 9.9 When multiple versions of a test are intended to be comparable, test developers should report evidence of test comparability~ Comment: Evidence of test comparability may include but is not limited to evidence that the different language versions measure equivalent or similar constructs, and that score reliability and validity of inferences from scores from the two versions are comparable .. Standard 9.10 Inferences about test takers' general language proficiency should be based on tests that measure a range of language features, and not a single linguistic skill. Comment: For example, a multiple-choice, pencil-and,"-'paper test of vocabulary does not indicate how well a person understands the language when spoken nor how well the person speaks the languag~. However, the test score might be helpful in determining how well a person. understands some aspects of the written language. In making educational placement decisions, a more complete range of communicative abilities (e.g., V\iord knowledge, syntax) will typically need to be assessed. ' 64 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES· . , Standard 9.11 When an interpreter is used in testing, the interpreter should be fluent in both the language of the test and the examinee's native language, should have expertise in translating, and should have a basic understanding of the assessment process~ Comment: Although individuals with limited proficiency in the language of the test should ideally. be tested by professipnally trained bilingual examiners, the use of an interpreter may be necessary in some situations. If an interpreter is required, the professional examiner is responsible for insuring that the interpreter has the appropriate qualifications, experience, and preparation to assist appropriately in the administration of the test. It is necessary for the interpreter to understand the importance of following standardized procedures, how testing is conducted typically, the importance of accurately conveying to the examiner an examinee's aC,tual responses, and the role and responsibilities of the interpreter in testing. 65 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES "By the Authority vested in 'ine as President by the Constitution and the lawS of the United States of America, and in order to advance the development of human potential, to strengthen the Nation's capacity to provide high-quality education, and to increase the opportunities forHispanic Americans to participate in and benefit from Federal education programs, it is hereby ordered... " Founding language of Executive Order 12900 President Clinton, February 22, 1994 Recognizing the importance of increasing the level of educational attainment for Hispanic Americans, President Clinton established the White House Initiative on Educational Excellence for Hispanic Americans through Executive Order 12900 in SeptElmber 1994. Guiding the White House Initiative is the President's Advisory Commission on Educational Excellence for Hispanic Americans, whose responsibility is to advise the president, the secretary of education, and the nation on the most preSSing educational needs of Hispanic Americans. The White House Initiative also provides the connection between the Commission, the White House, the federal government and the Hispanic community throughout the nation. . Current White House Initiative activities include initiating policy seminars, offering anational conference series, uExce/encia en Educacion: The Role of Parents in the Education of Their Children," focused on improving the education of latino youth by better engaging latino parents, increasing understanding and awareness of Hispanic Serving Institutions (HSls), and coordinating a new round of high·level efforts across the national government to improve the education of Hispanics. These activities are driven by the president's request to assess: • • • • • Hispanic educational attainment from pre·K through graduate and professional school; Current federal efforts to promote the highest Hispanic educational attainment; State, private sector, and community involvement in education; Expanded federal education activities to complement existing efforts; and Hispanic federal employment and effective federal recruitment strategies. Accelerating the educational success of Hispanic Americans is among the most important keys to America's continued success. Please join us in ensuring educational excellence for all Americans. ' White House Initiative Staff Sarita E. Brown Executive Director Richard Toscano Special Assistant for Interagency Affairs Deborah A. Santiago Deputy Director Debbie Montoya Assistant to the Executive Director Julieta laurel Policy Analyst Danielle Gonzales Policylntem 66 �; , �Clinton Presidential Records Digital Records Marker This is not a presidential record. This is used as an administrative marker by t~e William J. Clinton PresidentIal Library Staff. This marker identifies the place of a publication. Publications have not been scanned in their entirety for the purpose of digitization. To see the full publication please search online or visit the Clinton Presidential Library's Research Room. � Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Kendra Brooks - Subject Series Creator An entity primarily responsible for making the resource Domestic Policy Council Kendra Brooks Is Part Of A related resource in which the described resource is physically or logically included. <a href="http://clinton.presidentiallibraries.us/items/show/36031" target="_blank">Collection Finding Aid</a> <a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a> Description An account of the resource The Kendra Brooks Subject Files contain correspondence, reports, articles, memos, and various printed material. Other documents include background information for education events and meetings. The files include material pertaining to charter schools, national testing, SAT preparation, school safety, school modernization/construction, affirmative action, Blue Ribbon Schools, class–size reduction, teacher quality, Limited English Proficiency (LEP), the White House Initiative on Education Excellence for Hispanic Americans, Tribal Colleges and Universities, Historically Black Colleges and Universities (HBCU’s), the Individuals with Disabilities Education Act (IDEA), and Title 1 of the Elementary and Secondary Education Act of 1965. Provenance A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource. Clinton Presidential Records: White House Staff and Office Files Publisher An entity responsible for making the resource available William J. Clinton Presidential Library & Museum Extent The size or duration of the resource. 157 folders in 16 boxes Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Original Format The type of object, such as painting, sculpture, paper, photo, and additional data Paper Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource [Education - Hispanic] [2] Creator An entity primarily responsible for making the resource Domestic Policy Council Kendra Brooks Subject Files Is Part Of A related resource in which the described resource is physically or logically included. Box 4 <a href="http://clintonlibrary.gov/assets/Documents/Finding-Aids/Systematic/KendraBrookssubjectfile.pdf" target="_blank">Collection Finding Aid</a> <a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a> Provenance A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource. Clinton Presidential Records: White House Staff and Office Files Format The file format, physical medium, or dimensions of the resource Adobe Acrobat Document Publisher An entity responsible for making the resource available Clinton Presidential Library & Museum Medium The material or physical carrier of the resource. Reproduction-Reference Date Created Date of creation of the resource. 1/17/2012 Source A related resource from which the described resource is derived 647992-education-hispanic-2b.pdf 647992

<src>https://clinton.presidentiallibraries.us/files/original/c7b1612a7344820abb17b828213a9417.pdf</src>

<text>TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES P·resident's Advisory Commission on Educational Excellence for Hispanic'Americans APRIL 2000 DRAFT DRAFT DRAFT. NOT FOR CITATION Prepared by': DRAFT DRAFT DRAFT DO NOT DUPLICATE' . Richard A. Figueroa, University of California at Davis Sonia Hernandez, California Department of Education �PRESIDENT'S ADVISORY COMMISSION ON EDUCATIONAL EXCELLENCE FOR HISPANIC AMERICANS Guillermo Linares (Chair)* New York, New York Erlinda Paiz Archuleta· Denver, Colorado Cecilia Preciado Burciaga Seaside, California George Castro* San Jose, California Darlene Chavira Chavez Tuscon, Arizona David J. Cortiella Boston, Massachusetts Miriam Cruz Washington, D.C. Juliet Villareal Garcia Brownsville, Texas Jose Gonzalez San Juan, Puerto Rico Waldemar Rojas· Dallas, Texas lsaura Santiago-Santiago New York, New York John Phillip Santos New York, New York Samuel Vigil· Las Vegas, New Mexico Diana Wasserman Ft. Lauderdale, Florida Ruben Zacarias· Los Angeles, California (*Members, Commission Assessment Committee) White House Initiative on Educational Excellence for Hispanic Americans Staff Sarita E. Brown Sonia Hernandez· Sacramento, California Cipriano Munoz· San Antonio, Texas Harry P. Pachon Claremont, California Executive Director 'Deborah A. Santiago Deputy Director Richard Toscano Special Assistant for Interagency Affairs Julie Laurel Eduardo J. Padron Miami, Florida Janice Petrovich New York, New York Policy Analyst Debbie Montoya to the Executive Director Assistant Danielle Gonazales Policy Intern Gloria Rodriguez San Antonio, Texas �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES ACKNOWLEDGEMENTS A very special thanks is extended to Dianna Gutierrez. We are indebted for the tremendous amount of research that she directed in the preparation of this report. Commendations are extended to her assistant, Marysol Flores. We must also acknowledge the help of many colleagues throughout the United States. They were all generous, prompt and supportive in their efforts. The Professional Staff of WestEd: Stanley Rabinowitz, Rose Marie Fontana, Sri Ananda, Robert Linquanti Guadalupe Valdes, Stanford University . Janette Klinger, University of Miami Jose Cintron, California State University Sacraniento Lila Jacobs, California State University Sacramento Linda Murai, Sacramento County Office of Education Pedro Pedraza, Center for Puerto Rican Studies, Hunter College Diane August, August & Associates Jon Sandoval, University of California at Davis Eugene Garcia, University of California at Berkeley Alfredo Artiles, University of California at Los Angeles Maria Trejo and David Dolson, California Department of Education " 2 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Forevvard . There is no more promising reform in public education today than the standards-based movement. It is'not only the most widely accepted school change'process, it also offers the greatest probability for leveling the playing field for all children, by clearly stating expectations for instruction, assessing the progress of each child toward achieving the standards, and holding schools accountable for student learning. Where these three core elements of a standards-based system--clear expectations,assessment and accountability--are in place, students experience success as never before. This is ,?specially true for the growing Hispanic student population in the United States, which has traditionally had limited access to rigorous mainstream instruction. But in the cLirrent rush to implement world-class standards supported by systems of accountability in the nation's public schools, state education leaders have compromised the educational future ofHispanic students by making high-stakes decisions based on inaccurate and inadequate testing information. Hundreds of thousands of Hispanic students, many lacking functional fluency in English, are assessed with a myriad of tests entirely in English and, oftentimes, only in English. ,The resulting data is used to determine high-stakes decisions, such as for student promotion or retention,'or high'. school'graduation--but rarely for the purposes of true accountability. When it comes to holding schools accountable for the academic achievement of our ~tudEmts,states allow Hispanic youngsters to. become invisible inside the very system charged with educating ~em. ' State policies often require that Hispanic students be assessed in English with tests they, may not even understand or with alternative but less rigorous tests in Spanish whether or not they are receiving instruction inthat'language. While' neither approach produces accurate information about student leaming, the resulting data is often used to hold students accountable for their own success, rather than the educators or the public school systems. Who should be responsible for vvhat Hispanic students learn in school? The answer is simple: students, educators, and parents all must share the responsibility. '. ' But what kinds of assessments should be ,used to provide accurate information about what students have been taught? Regrettably, the answer to this question is not as simple. It ,is explored in this document. . With few exceptions, students bear the weight of academic success or failure on the basis of one or two test scores. Where exemptions from testing exist~ Hispanics disappear 'from the accountability reports, triggering both positive and negative consequences for the responsible adults in the system. Thus more than two million . Hispanic students in the United States are underrepresented or absent from the rolls of stud,ents who are counted via ~ssessment and who, therefore, count. It is our belief that Hispanic studer)ts, whether they are English dominant or English Language Learners, should be tested with appropriate test instruments in order to be included at all times in the states' accountability systems. If th,is does not occur, 3 �· TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Hispanic children will not benefit from the powerful arid promising standards movement. As the United States enters the new millennium, deliberate action by policymakers at every level must be taken to include the country's fastest growing and soon-to-be largest minority, within the bounds of systems accountability using accurate information for decisionmaking. The purpose of this report is twofold: (1) to bring attention to the growing crisis of the "invisible" Hispanic students in public education to the nation's leaders and (2) to provide guidance to the nation and the states on taking the necessary steps to rectify the conditions that allow Hispanic students to be wrongly measured and unaccounted for in their own schools. It is our intent to help education leaders in this country choose wisely for the sake of the children. Commission Assessment Commitlee--President's Advisory Commission on Educational Excellence For Hispanic Americans Washington, D.C. September 15, 1999 4 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICYISSUES CHAPTER 1: INTRODUCTION All forms of human mental measurement are fragile and problematic (Gould, 1981). At their best, for example, psychometric tests account for a modest 2,5-35 percent of the variance of what they predict (Neiser, Bodoo, Bouchard, Boykin, Brody, Ceci, . He'lpern, Loehlin, Perloff, Sternberg, & Urbina, 1996; Cleary, Humphreys, Kendrick, & Wesman, 1975). This isa technical ceiling that test-makers have not succeeded in breaking for nearly a century. A fundamental assumption of all testing is that the normative framework (psychometric, criterion, or rubrics-based) on which the test scores are based assumes'a high degree of experiential homogeneity, culturailliriguistic similaritya'nd equity in learning opportunities among test takers (Colvin, 192·1; Woodrow, 1921; Heller, Holtzman, & Messick, 1982). UndeHhese conditions, a test score becomes , a measure that belongs pre-eminently to the individu.al and his or her talents, achievements, traits and predispositions. In a real sense, tests work best in a perfect democracy of monolingual and monocultural citizens. . Hispanic Americans present a mC)'ssiv.echallenge to the assumptions of tests. The vast majority has varying levels of exposure:to and proficiency in Spanish, though many· also come from other linguistic backgrounds (e.g., Portuguese, Catalan, Basque). Their cultural ancestries include Mexico, Latin America, Puerto Rico, Cuba, the Caribbean, . Spain and Portugal. Their cultural experiences in the United States are multigenerational and reflect a broad range of acculturation levels, soCioeconomic differences, and political power. So vast is their heterogeneity, that the assumptions of tests about homogeneity may well be untenable. Yet, Hispanic stUdents and Hispanic citizens are tested every day and are compared to middle class America in the unique reification of democracy and assimilation that tests impose. But the history of testing Hispanics in the United States has never been typified by equanimity. . There are few issues in Americanpsychol~gy or education that are as complex or as misunderstood as the testing of Hispanic students. Two fundamental questions have , challenged and continue to perplex test-makers, test-givers and test~users: Does Spanish in the home or as the primary language affect test scores? and, Do any aspects of Hispanic culture in the. United States attenuate or .change test outcomes? In the 1930s, the great Mexican American psychologist, George Sanchez, ' addressed both issues. The reiative responsibility of the school and of the child in the achievement of desirable goals must be examined. Is the fact that a child makes an inferior score on an intelligence test prima 'facie evidence that he is dull? Or is it the function of the test to reflect the inferior or different training and development with which the child was furnished by'his home, his language, the culture of his people, and by his school? When the child fails in promotion is it his failure or has the school failed to use the proper whetstone in bringing out the true temper and quality of his steel? The school has the. responsibility ot'supplying those experiences to the child which will make the experiences sampled by standard measures as common to him as they were to those on whom the norms of the measures were based. When the school has met the language, cultural. disciplinary, and informational lacks of the child and the. child has, reached a saturation point of his capacity, in the assimilation of fundamental experiences 5 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES and activities - then failure on his part to respond to tests of such experiences and activities may be considered his failure. As long as the tests do not at least sample in equal degree a state of saturation that is equal for the "norm children" and the particular bilingual child it cannot be assumed that the test is a valid one for the child. (Sanchez, 1934, pgs. 770-771) Presently, the impact of cultural differences on test scores remains understudied. Most of what is known about cultural effects comes from the use of U.S.-made tests on foreign populations. Anthropologists were early consumers who believed that the scientific nature of tests made them appropriate for universal use. Very little is known about cross-cultural differences in testing, in fact, precisely because monocultural tests, when translated into the local language, yielded predominantly lower scores and Anglocentric interpretations. There are, however, some notable exceptions. Holtzman, Diaz~Guerrero, & Schwarts (1975) conducted a longitudinal study comparing approximately 400 middle class Mexican children with 400 middle class white children from northern Texas. One of the unique aspects of this investigation was that a comprehensive attempt was made to make all the sociological, psychological and educational tests and scoring protocols appropriate for Mexican students and their families. The result was a compelling description of cultural differences as well as of the production of knowledge about how psychometric tests need to undergo a radical overhaul for crosscultural use and how cultural bias can subtly affect scores .. Regrettably, this type of investigation has never been replicated with Hispanic children and their families living in the United States. By and large, the study of cultural differences in testing has always operated from a "black box" design. Culture ha$ resided in the "Puerto Rican", "Mexican," or "Cuban American" samples used (Valdes & Figueroa, 1994). The only cultural effect on U.S.~ normed, English~language tests has been lower scores. Interestingly, these types of "black box" studies have seldom found evidence of lower test reliabilities or validities because of cultural differences (Cleary, Humphreys, Kendrick, & Wesman, 1975; Geisinger, 1992; Sandoval,Frisby, Gei~inger, Scheuneman, & Grenier, 1998). The same has not been true for the one cultural variable that has left its mark on virtually every investigation using tests with Hispanic populations. Linguistic exposure to Spanish has affected every type of psychometric test and test score given in the United States (Valdes & Figueroa, 1994). It is the one variable for which there is evidence of psychometric bias (Figueroa & Garcia, 1995). It is the one variable that finally has drawn the attention of the scientific community as a complex disrupter of established testing policies and practices (Pellegrino, Jones, & Mitchell; 1999). 6 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 2: THE HISTORICAL CONTEXT In 1922, Charles Brigham published The Study of American Intelligence, an analysis of government data from testing conducted on World War I recruits. A subset of the large sample included 11,300 foreign-born recruits who were tested with the Army Alpha (a verbal test of intelligence) and the Army Beta. (a nonverbal test of intelligence). Approximately 32 percent of them had been in the United States 0-5 years,38 percent 6-10 years, 17 percent 11-15 years, 7 percent 16-20 years, and 6 percent over 20 years. Because of its high verbal content, the Army Alpha was a good measure of English language proficiency. In spite of Brigham's tortured defense of the integrity of this sample of foreign-born recruits. there is a strong indication that after 10 years the sample reflected a different type of immigrants-those who did not return to their native countries and who in all likelihood had adapted culturally and linguistically. Taking this into account, the increase in Army Alpha scores for the first two, five-year groups was a meager .11 of a point. This is one of the first major empirical findings that showed proficiency in a language besides English systematically produces lower verbal scores- . possibly for avery, very long time. The 1920s and 1930s produced a great amount of research on ethnic minorities in the United States. Much of it came under the title of "Race Psychology" and reflected a naive use of test scores to support genetic arguments about lower intellectual potential in non-Nordic groups. A considerable amount of this published work included Hispanic test subjects whose linguistic backgrounds can generally be described as "bilingual." That is, they came from homes where Spanish was spoken and with varying and unknown degrees of proficiency in Spanish. Most.of these studies were conducted on Mexican American children. Several conclusions can be extracted from this early research on bilingual test takers. First, the test results of bilingual individuals compared to those of monolinguals, for all age groups, consistently produced a profil~ of lower (English) test scores regardless of the test being used. This was most pronounced in tests of verbal intelligence, although a similar profile appeared in tests of academic achievement (Brown, 1922; Cebollero, 1936; Johnson, 1938; Koch & Simmons, 1926; Manuel, 1935; Pratt, 1929). There, English-dependent skills such as vocabulary, comprehension, sentence completion skills, analogies,essay composition, etceteras were markedly low in bilingual test-takers ill comparison to their arithmetic and memory skills. The effect of differences in exposure to English appeared to be unerasable (Saer, 1923), or as the Brigham study showed, virtually unerasable. This phenomenon became widely known as the "language handicap" of all immigrant test·takers. In many research publications, this provided a rationale for denigrating or eradicating bilingualism and instruction in the primary language. Second, the psychometric properties of tests showed a curious profile. Bilingualism had no effect on the internal consistency and stability of tests, particularly indices of reliability (Figueroa, 1990). But on the critical external indices of validity, particularly predictive validity, bilingualism appeared to attenuate the power of tests 7 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES (Altus, 1945; Davenport; 1932; Feingold, 1924; Garth, 1928; Paschal & Sullivan, 1925; Pintner & Keller, 1922; Wheeler, 1932; Wood, 1929; Yoder, 1928). Third, some anomalous data appeared. Bilingual individuals from middle or upper-middle class homes occasionally either outperformed monolingual, English . speakers or did as well in test scores (Darcy, 1946; Feingold, 1924; Manuel, 1935; Pintner & Arsenian 1937). The "bilingual handicap" in effect was cured by advantaged or enriched environments and backgrounds. Clearly, however, in the early part of the 20th century, foreign-born individuals with such cultural capital were relatively rare. Also, individuals with two under-developed languages did worse on tests than individuals with a single, educationally developed foreign language (Altus, 1949; Arsenian, 1945,; Smith, 1949, 1957). Another finding that to this day remains present and unexplained is the ability of bilingual individuals to do better than English speakers on recalling digits backward (a staple of IQ tests since their inception) (Darsie, 1926; Hung-Hsia. 1929; Jensen & Inouye, 1980 Luh & WY,1931; Manuel, 1935). Finally, on school grades the "bilingual handicap" did not materialize to the same degree or persistence as on tests (Bell, 1935; Smith, 1942). Fourth, the psychometric, scientific community began the unfortunate procedural tradition of dealing with cultural groups as monolithic entities (e.g., Garth, 1920). The "Mexican" sample operationalized "Mexican culture". Socioeconomic and other intervening variables were often ignored. English language proficiency in test subjects remained as an uncontrolled source of error. Background factors such as educational backgrounds or the segregated nature of public schooling for bilingual students were overlooked. The methodological flaws in the design of studies with bilingual persons, in effect, were substantial and virtually precluded reasonable inferences or attributions. Many of these design flaws continue (e.g., MacMillan, Gresham, & Bocian,' 1998; Sandoval, 1979).· . For test users, however, the "language handicap" produced several innovations or, in the current lexicon, a series of accommodations. The testing community came to believe that nonverbal tests of mental ability were free of linguistic factors and were culturally neutral (Brigham, 1922). To this day, they are seen as culture fair measures of intelligence, mental aptitudes or personality. Tests were often simply translated without conducting norming studies (Lester, 1929; Mitchell, 1937; Paschal & Sullivan, 1925). These translations were used for research purposes and for conducting actual assessments. Eth'nic norms were occasionally produced for some bilingual groups (Ammons & Aguero, 1950; Luh & Wy, 1931). Many caveats and precautions on the use of tests with bilingu·al subjects were voiced. For example, Charles Brigham, the father of the modern SAT (Scholastic Aptitude Test) and the principal investigator in "The Study of American Intelligence," also concluded that: For purposes of comparing individuals or groups, it is apparent that tests in the vernacular [English] must be used only with individuals having equal opportunity to acquire the vernacular of the test. This requirement precludes the use of such tests in making comparative studies of individuals brought up in homes in which the vernacular of the test is not used, or in which two vernaculars are used. The last condition is frequently violated 8 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLlcvlSSUES here in studies of children born in this country whose parents spe,ak another tongue. It is important as the effects of bilingualism are not entirely known. (Brigham. 1930, pg. 165) Some 70 years later, "the effects oJ bilingualism [still] are not entirely known." What has changed is that there are more caveats about testing bilinguals. 9 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 3: TESTING STANDARDS AND OFFICE OF CIVIL RIGHTS (OCR) GUIDELINES . The most important and potentially powerful set of regulations and policies on the development and use of tests in the United States is the, Stancfards for Education{!J1 and Psychological Testing (American Equcational Research Association, American Psychological Association, National Council on Measurement in Education, 1999). They are the, "standard of the industry" and constitute somewhat of an ultimate arbiter on all matters related to test development and usage. The importance of the Standards for addressing the problems associated with testing Hispanic stUdents cannot be overstated. ' " , What is sing!J1arly unique is that the current Standards have outdistanced current test technology and testing practices with "Individuals of Differing Linguistic Backgrounds." The gulf between what the Standards promulgate and what test developers and test 'users actually do is very large. Given the directives propOsed by the Office forCivil Rights in their "Nondiscrimination in High-Stakes Testing: A Resource Guide",(U.S. Department of Education. draft, December 1999), this gulf may well constitute a denial of substantive, due process with Hispanic students, and citizens. The following is a historical look at how the current Standards evolved with respect to testing "bilingual" individua'is. A review of the OCR Guidelines then follows. ' THE STANDARDS FOR EDUCATIONAL AND PSYCHOLOGICAL TESTING The first set ofthese Standarqs appeared in 1966, even though both the , American Educational Research Association and the American Psychological Association had address~d the issues attendant to achievement and, psychological/diagnostictesting ,in two prior, separate documents respectively in the mid-1950s. The 1966.Standards for Educational and Psychological Tests and Manuals (American Psychological Association, American Educational Research Association, National Council on Measurement in Education, '1966) had one overriding goal: ''the essential principl~ underlying this document is that a test manual should carry information sufficient to enable any qualified user to make sound judgments regarding , the usefulness and interpretation of the test" (pg. 2). Behind this "essential principle" was the recognition "that tests are used in arriving at decisions which may have great influence on the, ultimate welfare of the persons tested, on educational pOints of view and practices, and on development and utilization of human resources" (pg. 1). Interestingly, in the entire 36 pages of text, there are only occasional references to general demographic variables that should be addressed by test manuals. There are only two instances when these references vaguely touch on linguistic and cultural diversity. In both instances they are not prescribed as ESSENTIAL: 10 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CS.S. If the validity of the test is likely to be different for subsamples that can be identified when the test is given, the manual should report the results for each subsample separately or should report that no differences , were found. VERYDESIREABLE ' (pg. 20), 0.2.21. [concerning the psychometric index of reliability] Demographic information, such as distributions of the subjects with respect to age, sex, socioeconomic level, intellectual level, employment status or history, and minority group membership should be given in the test manual. DESIREABLE (pg.28) In 1974, the new edition of the Standards for Educationaland Psychological Tests {American Psychological Association, American Educational Research Association, National Council on Measurement in Education, 1974) paid more attention to the issues of cultural and linguistic diversity. There was recognition that the validity of a test could be attenuated for certain groups and under certain conditions. This acknowledgement was due, in great part; to the impactof federal court cases alleging diagnostic bias in tests. In many school districts, tests produced inflated incidence rates of mental disabilities among Latino and African American stu<:1ent populations. Typically, these inflated incidence rates appeared with greater frequency than in the past. 81.3. The manual should call 'attention to marked in-nuences on test scores known to be associated with region, socioeconomic status, race, creed, color, national origin, or sex. Essential [Comment Social or cultural factors known to affect performance on the test differentially, administrator errors that are frequently repeated, examiner examinee differences, and other factors that may result in spurious or unfair test scores should, for example, be clearly and prominently identified in the manual.] (pg. 14) Of even greater importance, two warnings appeared about the possible impact of tests and testing practices on English-language learners. Standard G2 directed test users to know the research literature on tests and testing particularly with respect to the problems associated with testing individuals with "limited or restricted cultural exposure." Standard G2 suggested that theoverrepresentation of African American and Spanish speaking children "with limited cultural exposure" was caused by test users' lack of knowledge about the limitations of tests when cultural differences existed~ Standards J5, J.5.3, and J.5.3.1. went even further. They recommended that when there were great cultural differences between the test taker and the test's norming sample, the tester should not test ("Essential"). They also set forth an accommodation that has become exceedingly popular: when there are no appropriatetests for a given person or popUlation, the tester should use "a broad-based approach to assessment using as many methods as are available to him. Very Desirable" (pg. 71). What this was interpreted to mean by many was to do more assessments with more tests. The Comment for this Standard elaborated on this .. 11 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES [Comment: The standard is to do the best one can. This perhaps includes the use of a test, even though no appropriate normative data are available, simply as a means of finding out how the individual approa'ches the task of the test. It might include references, extensive interviews, or perhaps some ad hoc situational tasks. Efforts to help solve educational or psychological problems should not be abandoned simply because of the absence of an appropriate standardized instrument.f(pg. 71) When a test or tests are not appropriate, giving more tests simply to see how an individual handles the testing situation is questionable. Data exist shovying that in some high-stakes testing situations, giving more tests helps neither the tester (Mehan, ' Hertweck, & Meihls, 1986) nor the child (Taylor, 1991). Whena test is not appropriate. it should not be given if it may hurt an individual or lead to serious negative consequences for the individual. The use of an inappropriate test can only be justified if it has little or no consequences for the individual and if it helps assess system effects on similar individuals. For Hispanic students, the use of inappropriate tests is a national problem with a long history of abuse. As the Comment cited above underscores, there are, " alternative ways to help solve psychological and educational problems. With Hispanic children, these must be linguistically and culturally appropriate. In 1985, the Third Edition of the Standards was published (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1985). The challenges posed by the multiple aspects of . diversity (culture, language, disability, gender, socioeconomic status, etceteras) appear throughout the entire document. But the most Significant evolution of the Standards is Chapter 13, "Testing Linguistic Minorities." In many ways, Chapter 13 was revolutionary. , The text that introduced the seven Standards for this chapter began with the most profound acknowledgment. "For a non-native English speaker or for a speaker of some, dialects of English, every test given in English becomes, in part, a language or literacy test" (pg. 73) of English. Basically, this means that for bilinguals who have been exposed to another language, every test, except a test of English language proficiency, contains an unknown, systematic degree of error. Such tests, in effect, are biased because they may not be measuring accurately whatever is being measured. Accordingly, the Standards called for "special attention" to these issues on the part of test development, test use, and test interpretation. It was also recognized that bilingual individuals vary , extensively in their functional, academic and literate use of each language separately or simultaneously. Also, cognitive processing in the weaker language is more fragile and can be slower. Language background, in effect, is an important consideration in all aspects of testing and test validity. With respect to using tests t,hat are in the primary language of bilingual individuals, the Standards made several, key pronouncements. Translating a test does not guarantee that the test items will have the same degree of difficulty in the other language,' The latter must be empirically established. For example, a straight translation of a second-grade test of reading ability will not necessarily yield a second-grade . reading test in the other language. Tests for determining English language profici~ncy 12 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES are vitally important for making educational placement decisions. However, these tests must assess multiple dimensions of linguistic ability. Chapter 13 also made a distinction between "natural" uses of language and more formal, cognitively demanding uses. Because of these "special difficulties" attendant on the use of tests with persons who have not had adequate exposure to the language of the test, it was suggested that more testing and observations be done with them. As previously noted, more testing was and is a questionable policy. Chapter 13 also acknowledged the possible influence of culturally mediated ways of responding to test questions. Elaborated speech may not be congruent with culturally specific ways of speaking to adults. When these factors are ignored the validity of . interpret~tions and recommendations maybe questionable and harmful. Chapter 13 of the 1985 Standards was vitally important for the educational and psychological testing of English-language learners in the United States. There were, however, several problems. First, it was not known how well the testing industry and professions would abide by them. Throughout the 1980s and 1990s, the manuals of most tests showed that Chapter 13 of Standards was routinely ignored. Also, these Standards ignored several, historical practices and problems. They did not address one of the most widely used techniques for testing English-language learners--the use of interpreters who either translate the test into the primary language on the spot or who help administer a test that has already been translated. They were silent on the apparent inability of tests and test users to differentiafe among cultural factors, language ' proficiency levels, and mental/emotional disabilities. They endorsed a historical solution for what to do when there are no tests ("test more") in spite of the fact that there was no evidence that this worked. In fact, there was evidence to the contrary. In August of 1999, a new edition of the Standards was approved. Chapter 9 is titled, "Testing Individuals of Differing Linguistic Backgrounds". Just as in the 1985 Standards, the narrative introducing this chapter cautions that, with 'indiViduals from diverse linguistic backgrounds, tests that are in English become tests of English ability to a degree that is generally more pronounced than with monolingual English speakers. With individuals of varying levels of bilingualism, tests may fail to measure what they intend to measure. Accordingly, norms developed for monolingual English-speaking populations should either not be used or should be interpreted with the understanding that English language proficiency is a contaminating factor. Precautions regarding processing speed factors are also raised. The chapter suggests accommodations should be undertaken with English-language learners. It also notes that cultural factors can affect test scores, so attention should be paid to these factors. The problem with this part of the narrative in Chapter 9 is that, in spite of acknowledging the complexities associated with testing bilingual students, the precautions are tenuous and weak. Chapter 9 repeats many historical caveats about translating tests without conducting norming stUdies. Back translations are specifically mentioned as being inadequate by themselves. A similar point was made in the 1985 Standards. Yet, . publications describing and endorsing this process continue to appear {(3eis,inger, 13 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES 1994a,b). A translated test is an inappropriate test. The practice should be proscribed nationally. . The 1999 Standards break new ground along several dimensions. They note that ,several issues raised in Chapter 9 apply to persons with disabilities that affect communication such as deafness and visual impairments. This connection with disability appears to be part of a general trend in addressing the issues related to linguistic diversity. It also appears in a new, important text commissioned by the American Psychological Association (Sandoval, Frisby, Geisinger, Scheuneman, & Grenier, 1998). But, historically. bilingualism all too often has been equated with a handic,ap. Linking the test accommodations appropriate for bilingual learners with those appropriate for students with disabilities is exceedingly problematic for bilingual children. Accommodations may seem similar, but their use and outcomes may be different for bilingual children (Thurlow, Liu, Erickson, Spicuzza, & EI Sawaf, 1996). This is a matter for serious consideration. Being bilingual is not a handicap. but an asset. . The new 1999 Standards discuss several types of test accommodations that may have to be done with English-language 'E!arners: using only sections of the test that match the linguistic proficiency of the test-taker, changing the test and response formats, administering the test in a different context, and allowing more time for taking the test. Most of these modifications are currently understudy. It is difficult to see, however, how theSe will overcome the well-documented, historical impact of bilingualism on tests. It is difficult to see how testing a student in the wrong language, or testing for content that ' has not been taught, or testing for cultural material that is not in a Hispanic child's i repertOire will be made fair by the changes suggested in these accommodations. The issue of "equivalence" receives a great deal of attention in the new Standards. This refers to several aspects of test-development, use and interpretation, for example: the degree of confidence that a test-user can exercise in determining whether a test score means the same for someone who is unlike.the norming population, the equivalence across translated and renormed versions of the sam~ test, and equivalence in psychometric characteristics. As versions of the same test appear in both English and Spanish, this will become a major topic for reSearch and examination. However, some ' have asserted that the conceptual basis for testing bilingual childr.en in the United States using monolingual norms in both Spanish and English may be flawed (Grosjean, 1989; Valdes & Figueroa, 1994). It is argued that a bilingual child cannot validly be co'mpared against norms for children whose linguistic experience and development is with only one language. Bilingual children need norms derived from bilingual norming samples, controlling for differential levels of linguistic proficiencies. This issue urgently needs empirical studies and calls for an Immediate analysis. A new directive in these Standards calls for taking into account a determination of both language dominance and language proficiency. Consideration should be given to . the possibility that bilinguals may have "domain-specific" competencies in one or both languages. For example, a bilingual person may have competency .in speaking Spanish but not in reading Spanish. It is recommended that an individual's degree and type of . bilingualism be understood in order to use test results properly. This directive has 14 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES particular relevance for state wide testing .programs. Clearly, if achievement measures are interpreted without the degree of linguistic information suggested by the· new Standards, test results may not be correctly analyzed or understood. Extensive attention is given to the process of administering tests and to the possible impact of test-giver variables (culture, bilingualism, gender, time limits, and the use of interpreters). Pi crucial principle is recommended for testing English-language learners: give them enoughtime to finish the test and to show what they know and what they can do. This principle should also be applied in state wide testing programs across the United States. . ' Or)e of the most surprising parts of these new Standards' is the attention given to the use of interpreters. Not only are there multiple precautions, there is also a veritable map for how to train and. use interpreters. The unfortunate .part of this section of the Standards is that there is no empirical evidence that even remotely validates any of the procedures for using interprete~s. In fact, several, new dissertations are reporting findings to the contrary (e.g., Sanchez-Boyce, 1999). ' , There are actually11 Standardsin this new chapter, ''Testing Individuals of Differing Linguistic Backgrounds." Most are similar to the ones promulgated in the 1985 edition. Because of their vital importance for the testing of Hispanic children, they are each reviewed and critiqued here: In Appendix A, they are reproduced in their entirety. There are several meta-issues in these new Standards that are very important. Test users are charged with the responsibility of determining when a test may be inappropriate with linguistic minorities because they do not know "the language of the test"'(Conimentfor Standard '9.1). Similarly, test developers are held responsible, plausibly under "legal or regulatory requirements," for collecting evidence of test validity when the~e is research indicating differential meaning for test scores for a linguistic group.' In a' break with historical practices, the use of "representative" norming samples for these validity studies is proscribed. Separate norming studies specific to a linguistic group are called for (Comment for Standard 9.2). Test users are also required to use professional judgement in order to determine language proficiencies prior to testing. Then they are to test in either the ma'st proficient language or using both languages in order to assure construct validity (Comment for Standard 9.3). This is a highly ambitious directive that rests in some exceedingly tenuous assumptions: c;tS it applies to Hispanic students, it is assumed tl1at there are equivalent language proficiency tests in Spanish and English, that such equivalenttests can measure the complexity of linguistic proficiency in both languages, that suchtests . ' would have universal application among Hispanic Americans in the United States, that variation in linguistic proficiencies can be used to interpret an individual's test score (how' does one interpret a score ina language that is 70 percent proficient and another score in a language that is 55 percent proficient?). and that bilingualism is the sum of two languages (in which case language proficiency testing makes some sense) rather than a linguistic unit (in which case linguistic proficiency testing may be of limited use). 15 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Standard 9.4 seems to tacitly accept the validity of "linguistic modifications" of tests that are in English. These may involve changing the test to the test taker's primary language (translating?) or altering the test given in English. There are no justifications for this Standard (9.4) and the historical, empirical literature reviewed in this document argues against modifications such as translations. The Standards (Comment on 9.4) place the responsibility for justifying these modifications on test developers. The current research on test modification~ for English learners is not sufficient to warrant the existence of this Standard. . Standard 9.5 address~s the issue of "flagging" a test score when "linguistic modifications" were provided during the testing. The Standard basically suggests that such flagging may be unfair, and not useful if the score from the modified administration is "comparable" to the score on the "nonmodified" administration and if there is "no reasonable basis" for thinking that such modification affects "score comparability." This is a very problematic Standard because of its lack of specificity. Are tests to be administered to linguistic minority individuals with and without modifications to see if there is comparability? What is a "reasonable basis" for determining comparability among scores? More than anything, the problem with this Standard comes from the unknowns related to "linguistic modifications," or what in the literature is known as "test accommodations" for English language learners. The current, available literature on such accommodations (Abebi, 1999a,b,c; Thurlow, Liu, Erickson, Spicuzza, & EI Sawaf, 1996) makes several points. The use of accommodations varies greatly across and within states that provide·such test adaptations. The question of who gets accommodations also varies greatly within and across these states. The most popular accommodations in statewide testing programs are: allowing for extra time, using of a bilingual dictionary, being tested in a separate room, receiving oral translations of directions, offering multiple testing sessions, answering questions, providing written and Qral translations, having words defined, and allowing for students to mark the test booklets (Thurlow, Liu, Erickson, Spicuzza, & EI Sawaf, 1996). These researchers also note: "Few accommodations are universally allowed, and further research on the appropriateness and technical adequacy of different types of accommodations would be beneficial" (pg. 13). Actual, empirical studies of accommodations (Abebi, 1999a,b,c) have produced modest results in improved test scores of bilingual children. This applies to achievement tests that are among the easiest to "accommodate," namely, math tests. In fact, one could argue that the study of accommodations has made its greatest contribution to children who do not need accommodations. Researchers have found that tests often include test language that is needlessly obtuse and immaterial to the construct being measured~ Cleaning up such language improves the performance of all test takers. Research on test accommodations is currently insufficient to support Standard 9.5. Testing bilingual, Hispanic children on an English test with accommodations may not b~ adequate to remove the high level of "distortion" or the construct-irrelevant error (Kopriva, 1999) implicated in the assessment of bilingual learners since the 1920s. Accommodations, in effect, may prove to be a subterfuge procedure for testing in the 16 �, i " TESTING HISPANic STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES wrong language. Data already exist showing that some of the most' popular accommod~tions--translating and interpreting:-simply do not work. Standard 9.6 requires test developers and users to explicitly address the questions related to test use and interpretation with non-native speakers. This Standard , seems to eXGluqe simultaneous bilinguals from such consideratiori It is, regrettably, an example of the 1999 Standards' lack of precision ~bout the Complexity of bilingualism. ' Standard 9.7 establishes a rule for using translated tests. The methods of translating have to be described and so does the evidence for establishing validity and reliability across language groups. This basically asserts, once and for all, the need to ' go beyond translations before inte'rpreting or using test scores . .But this 'Standard is 'problematic. Properly translating a test and then establishing its reliability and validity within different Hispanic cultural'groLips is really only a first step. There is also a ne'ed to ' establish the validity and reliability of the test within different levels of linguistic pro'ficiencies within different Hispanic cultLiral groups. ' This is another example of how the Standards are fundameritally naIve about the linguistic nature of Hispanic populations inthe.United States: A.well-:translated, well- ' normed test'will be confronted not by Spanish speakers of one level of proficiency, but by the typical bilingual Spanish- speaking population of this country: culturally diverse and bilingually diverse. ' ' , ' ' Standard 9.8 speaks to the need for concurrence between what a test measures' and what a cr~dential or an occupation demands in terms of actual performance on the ' job. Particular attention is given in this Standard to the equivalence that should exist in the linguistic demands of the test and those of tbe job. Standard .9.9 req~ires that tests that are availab'le in ~o languages provide evidence that each linguistic version is comparable to the other in, terms of reliability, validity (particularly construCt validity) and other data. Once again, however, this : Standard does not address,the existential reality of bilinguals in the United States. Tests that are available in two languages have to demonstrate equivalence across a wide spectrum of linguistic abilities. ' L .' Standard 9.10 is critical to this chapter as well as to all testing of Hispanic individuals. The measurement of linguistic proficiency sh9Uld be done across "a range of language features" (pg. 154) and in more than one testing format (such as multiple ' choice)., Language profiCiency is a crucial covariate, or control measure in much of what Chapter 9 of the new Standards propo~es as solutions to testing bilingual individuals. Here, the requirement is that language proficiency be measured in multiple ways. This is an important directive, a critically necessary albeit insufficient step in testing bilingual populations. What remains 'unaddressed is how such multiple measures bf linguistic, ' proficiency are to be used for the interpretation of test scores in both the primary and the secondary language and across the language 'proficiency profiles that exist in Hispanic ' and other bilingual populations. 17 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Standard 9.11 is on interpreters. It touches on what is probably the most common historical accommodation in the testing of Hispanics. Unfortunately, it is the most problematic Standard in this chapter. As mentioned, there are no data to substantiate the assumption that it is possible to use an interpreter without severely and negatively affecting the standardization requisites, psychometric properties and the interpretation of test scores. The Standard seems to sanction the translation of tests by the interpreter and requires that the tester assume responsibility for the competence of the interpreter when there is no empirically validated model for training interpreters. Also •. in most real life situations, it is the school district or the clinic that is responsible for selecting. training, and assigning interpreters to testing situations. This is a Standard that urgently needs more deliberation and research. One interesting omission in the new Standards is the historical accommodation that when there are no appropriate tests available, more testing is an implied and acceptable methodology. This is a welcome change. However; this change should be broadly publicized in order to stop the practice of "more testing." THE OFFICE FOR CIVIL RIGHTS' RESOURCE GUIDE The issue of nondiscriminatory testing took on a unique significance in the federal courts in the early.1970s because of the overrepresentation of minority students in classes for pupils with disabilities. The most current and extensive analysis of nondiscriminatory assessment has been done by the U.S. Office for Civil Rights (U.S. Department of Education, draft, December 1999). Their Resource Guide, however, does not just address nondiscriminatory assessment from the perspective of special education diagnosis. It extends the application of nondiscriminatory assessment beyond special education to all "high-stakes testing".and all assessment methods (norm-referenced, criterion-referenced, and alternative testing methods). An important point highlighted by this Resource Guide is that nondiscriminatory assessment must be seen as part of the high-standards movement in American education. It must include, a priori, equity in·the . provision of opportunities to learn for a" students. lri educational contexts, tests function as measures of system accountability and as measures of current status or prediction for the student. With this in mind, the . Resource Guide underlines a critical distinction made by the courts between educational and employment testing: Inests predict that a person is going to be a poor employee, the employer can legitimately deny the person the job, but if tests suggest that a young child is probably gOing to be a poor student, a school cannot on that basis alone deny that child the opportunity to improve and develop the . academic skills necessary to success in our society. (larry P. v. Riles, 1984; cited in U.S. Department of Education, draft, December 1999, pg. ii) The OCR Resource Guide is fundamentally an exposition of the "testing and assessment principles that lie at the core of Title VI of the Civil Rights Act of 1964 (Title VI) and Title IX of the Education Amendments of 1972 (Title IX) ..... {U.S. Department of 18' �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Education, draft, December 1999, pg. i). Two legal theories of test discrimination are presented: disparate impact and disparate treatment. " . The analysis ofdisparate impact concentrates on whether test practices and policies, regardless of neutrality of application, produce "adverse consequences" (pg. jv) with specific racial, gender or national origin groups. The negative consequences described include "granting or denial of benefits or opportunities" (pg.' iv). "Educational necessity", for example .placement or student designations, is the only exception in·this regard. This means that tests must be reliable and valid for their intended educational purpose. Further, tests may not be used under this analysis if they are "not the least discriminatory practical alternative that can serVe the educational institution's educational purpose" (pg. iv). There are three criteria that define "disparate impact." . First, a test yields disparate results based on race, national origin, or gender. Second, the test has no educational utility. Third, there are no other practical, valid, or reliable alternatives to assess the students. . For Hispanic students, ample evidence exists showing that tests do cause disparate outcomes in high-stakes decisions: high special education representation rates for LEP:students in certain categories of disabilities (United States Department of Education, 1993); low representation rates in programs for the Gifted and Talented (Callahim, Hunsaker, Adams, Moore,.& Blend, 1995); and low representation rates in higher education (President's.Advisory Commission on Educational Excellence for Hispanic Americans. 1996). For all students, the question of a test's 'educational usefulness can often be answered in the negative with respect to Gurricular, remedial, or pedagogical decisions about an individual. For example. the most test.;.driven educational document, the special education InqividualizedEd~cation Program (IEP). has not produced educational' benefits fo children with disabilities (Skrtic, 1991). A clear distinction needs to be made here that "educational usefulness" is both an individual as well as a system . consideration. It may work for the latter but not the former, particularly when a test score fora bilingual Hispanic child contains systematic error (American Educational Research Association, et aI., 1985. Chapter 13; 1999, Chapter 9). The educational usefulness of tests to Hispanic students is an issue that needs serious, empirical consideration. The possible attenuation of tests' predictive validities with Hispanic, bilingual populations augurs badly for most forms of instructional validity or educational utility . . Invalid inferences are highly probable'when tests are used on Hispanic children with varying degrees of exposure to a language other than English. The tests measure something other than what they intend to measure. Predictive validity stUdies that control for language background strongly indicate that psychometric bias is a real possibility in the testing of students from diverse linguistic backgrounds (Figueroa, 1990; Figueroa & . Garcia, 1994). There is a great need for ·Iarge, longitudinal studies on the' predictive validity of tests used in educational contexts holding linguistic backgro~nd' and proficiencies as controls. If this type of predictive bias is further substantiated, the. legal theory of disparate impact with Hispanic students would. be significantly strengthened. 19 �TESTING HISPANIC STUDENTS IN THE UNITED STATES; TeCHNICAL AND POLICY ISSUES Practical alternatives to these tests include grades, portfolios, and student work products keyed to rubrics. However, the requirement that the measures be reliable psychometrically, is somewhat problematic. The history of testing Hispanic students clearly shows that reliability indices are insensitive to linguistic or cultural differences. Also, the requirements for educationally useful alternatives to tests should focus on visual and instructional indices of validity for appropriate consequences since criterion indices of validity with Hispanic students are currently suspect (Figueroa, 1990; Figueroa & Garcia, 1994). On the matter of cut-off scores, the OCR Resource G'uide relies on the principle that "the method and rationale for setting the cut score" including the technical analyses, should be presented in a manual or in a report" (American Educational Research Association, et aI., 1985, Standard 6.9). The most prevalent use .of such scores occurs in colleges and universities with respect to admissions. Yet, institutions of higher' learning typically do not provide any technical analyses that would justify a particular cut off score on educational grounds. More often than not, universities set up cut-off scores without any empirical consideration as to'what score differentiates between those who can learn in university settings and those who cannot. Cut-off scores ignore the systematic under-education of Hispanic students. Whereas institutions of higher learning may see them as indices of merit, for most Hispanic students they are also measures of unequal opportunities to learn at the K-12 level. Cut-off scores are largely responsible for Latinos underrepresentation in institutions of higher learning. Further, as the National Council of La Raza reports, California's Proposition 209 banning affirmative action programs in colleges and universities may re-establish the prominence of using cut·off scores in the SAT and GRE exams for admission purposes. The impact of this would only exacerbate an already inequitable situation. Between 1987 and 1997, Hispanic, students' SAT scores decreased in relation to white students' (National Council of La Raza, 1998). . The analysis of disparate treatment focuses on whether testing policies or practices are done differently for individuals or groups with distinct racial, national origin, or gender characteJistics~ Examples of differential treatments would include "being tested under different conditions" (pg. iv) or "whether students with the same test scores are ... .treated differently by an educational institution" (pg. iv). The OCR Resource Guide fails to consider the possibility that, for a Hispanic stUdent from a linguistically or culturally different background, tests administered in English are tests given "under different conditions" (pg. iv) than those for a monolingual, monocultural student. Clearly, given the evidence reviewed here and 'acknowledged by the.Standards for Educational and Psychological Testing (American Educational Research Association, et aI., 1985; 1999) this is an ,issue that should be addressed by the U.S. Office for Civil Rights. The Resource Guide's section titled Equal Opportunity for Limited-English Proficient Students (pg. 9) is quite inadequate in this respect. Some of its suggested accommodations lack any empirical justification (such as "bilingual dictionaries") and may actually attenuate psychometric properties. Similarly, the suggested "Remedies" are problematic for Hispanic children and youth: test more, revise the test, substitute the test. 20 �( TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Presently, a strong argument can be made that tests produce disparate impacts and that they do constitute a disparate treatment with regard to Hispanic students from diverse linguistic backgrounds. The viability of such arguments should be debated. It should be done on two levels: the legal/policy level and the psychometric/professional· level (with respect to consequential validity). 21 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 4: BIAS In the early part of the 20th century, the discussion of bias in tests and testing focused on two areas: the impact of the "language handicap" experienced by bilingual individuals and the possible misinterpretation of test data to assert genetic differences among groups, particularly with regards to intelligence. But it was not until the 1960s that the' problem of test bias became prominent. Tests were linked with tracking and segregation policies in school districts in several court cases. Most notably in Hobsen v. Hansen (1967), the pivotal use of academic aptitude tests for tracking African American children in vocational, highschool programs was outlawed by the court. The court found that the tests were biased because they could not really measure student learning potential and because they produced the sort of segregation proscribed by Brown v. Board of Education. In Hobsen v. Hansen (1967) and subsequent litigation that involved testing practices (Diana v. California Board of Education,1970; Larry P. v. Riles, 1979). test bias became linked with the civil rights meaning of bias, discrimination and prejudice. One ofthe consequences of this linkage was a vigorous response from the testing community in the form of extensive research on the empirical documentation of bias. Studies conducted on racial/ethnic groups across the full spectrum of available tests· were fairly unanimous: test bias could not be found (Cleary, Humphreys, Kendrick, & Wesman, 1975) in the multiple indices of reliability (items, factors, alternate forms, retest) and all the various forms of validity (content, criterion, concurrent, construct). However, a careful examination and interpretation of the research data on . Hispanics since the 1920s suggests that there is evidence of bias. Table 1 presents the sources of test bias and the degree of empirical evidence available .. .TABLE 1: SOURCES OF TEST BIAS WITH HISPANIC TEST-TAKERS 1) SIGNIFICANT EXPOSURE TO A LANGUAGE OTHER-THAN ENGLISH -Extensive documentation 2) PROCESSING SPEED IN THE WEAKER LANGUAGE -EXttimsive documentation , 3) . USING TRANSLATIONS OF TESTS -Extensive documentation 4) DIMINISHED OPPORTUNITY TO LEARN . -Extensive documentation 5) USING INTERPRETERS DURING TESTING :"Emerging documentation . 6) DECISION-MAKING BASED ON TEST~ -Emerging documentation 22 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES The following is an extended explanation of each source of bias presented in Table 1. 1) SIGNIFICANT EXPOSURE TO A LANGUAGE OTHER-THAN ENGLISH Since the 1920s, Hispanic children and adults have consistently demonstrated a classic test profile on tests of mental ability and academic achievement: low [English] verbal scores and high non-verbal/math test scores. The depressed verbal scores are directly related to the degree of exposure to Spanish in the home and the community. Its not that the tests are inherently flawed, but rather that they are applied to the wrong population. Tests in English. since they are generally normed on monolingual English speaking populations, inherently tap a developmental sequence of English proficiency and English literacy. When exposure to English varies in degree across chronological ages (as with simultaneous bilinguals), or in the time of onset (as with sequential bilinguals), or both, tests register this as a subtrahend. Item analysis studies (Sandoval, 1979; Figueroa,1983) clearly show this asa 'generalized impact throughout the test items rather than as a discrete phenomenon affecting some items and not others. This explains why studies of test bias in test item structures have not found such bias (Cotter & Burke, 1981). It also explains why internal indices of test reliability and stability have also not found bias with Hispanic children and adults (Valdes & Figueroa,1994). Item difficulty levels are not changed. The total scores are loWer, but the test items perform the same way regardless of English proficiency. The most powerful impact from exposure to Spanish is manjfested in one of the most critical functions of tests: prediction. Though empirical data have suggested this from the beginnings of psychometrics, more recent studies have clearlydocllmented that the greater the degree of exposure to Spanish the lower the predictive validity of [English] tests (Gandara, Keogh, Yashioka-Maxwell, 1980; Pil~ington, Piersel, & Ponterotto. 1988; Emerling, 1990; Kaufman & Wang, 1992; Stone, 1992; Valdez & Valdez, 1983; Valencia, 1982; Valencia & Rankin, 1988; Figueroa, 1990; Figueroa & Garcia, 1995; Pennock-Roman, 1990). , ' . The clearest example of this comes from the validity study of the System of. Multicultural Pluralistic Assessment (Mercer, 1979). Mercer developed a battery of tests that purported to operationalize the federal requirement of nondiscriminatory testing :in special education diagnoses .. Shenormed the tests in 1972 on what is in all likelihood the most random and representative sample of Hispanic children (N=700) in California. A critical aspect of this study was that the children were all judged to be English proficient by the school staff and by those who individually administered the tests. However, the children came from three types of linguistic environments: homes where only Spanish was spoken, where Spanish and English were spoken, and where only English was spoken. In 1982, a predictive validity study of the tests was undertaken (Figueroa & Sassenrath, 1989) on approximately half of the original 'norming sample. It was found that on the WISC-R IQ's the predictive validity coefficients for: the Hispanic children varied in direct proportion to their exposure to Spanish. Figure 1 presents these data. 23 �TESTING HISPANIC STUDENTS IN THE UN!TEP STATES: TECHNICAL AND POLICY ISSUES 0.7 , . . . - - - : - - - - - - - - - - - - - - - ' : : - - - - : - - - - - - - - , 0.6 0.5 -+- Verbal 10 0.4 _ _ Performance 10 . ~FuIiIO 0.3 0.1 O+-----+----~r_....,.._---~-....,.._~-----_+-----~ .. 2 3 English English/Spanish Spanish . 4 5 6 7 English English/Spanish Spanish READING· MATH Figure 1: Evidence of psychometric bias in the predictive validity of 1912 IQ's relative to 1982 standardized .measures of reading and math achievement for Hispanic children's home language background. . As Figure 1 shows, the more Spanish there was in the home in 1972, the less the. IQ predicted reading and math achievement in 1982. Ironically, the nonverballQ, the historically used solution for measuring· mental abilities and aptitudes in bilinguals, proved to be the most hypersensitive to Spanish in the home: Other studies show similar results (Figu~roa & Garcia, 1994). However, a definitive predictive validity study using multiple measures of the Hispanic subjects' English proficiency as a control urgently needs to be done. Systematic differences in predictive validity are:one of the key signatures of bias in tests. . . 2) PROCESSING SPEED IN THE WEAKER LANGUAGE Differences in cognitive processing speed between mbnolinguals and bilinguals is dramatically demonstrated in the work of Dornic (1979, 1978a, 1978b, 1977) in Europe. Basically, he found that processing information in the weaker language produced consistently slower functioning. Further, the entire process ofmentation became progressively more and more vulnerable (to the point of shutting down) when the material was too complex, when the testing situation was too noisy, or when stress levels increased. The importance of these findings for testing that is done under timed, noisy or stressful conditions with Hispanic children merits further research, It should be·· noted, however, that accommodations providing more time for English learners are already routinely recommended. . 24 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES These are critical findings whose relevance may extend beyond issues of test faimes's or bias. Processing speed and automaticity are crucial requisites for certain academic skills. Currently, for example, the great emphasis being given to fluency in , reading as a major indicator of reading proficiency may well be biased or invalid when , applied to Hispanic English-language le~mers. Research has also continued to document how Hispanic and Japanese bilingual children are better at naming digits backward than monolingual children (Jensen & Inouye,' 1980; Figueroa, 1987). Though the typical explanation attributes this to "bilingualism," a more precise explanation m'ay be slower processing in English in children from Spanish-speaking homes. Slower processing may actually be favorable for naming digits backward. Just as likely, slower processing may come from translating material into Spanish and then back into English. In conditions where "slow" is not an impediment, there may be better remembering and possibly better learning (Malakoff & ,Hakuta, 1991). ' 3) USING TRANSLATIONS OF TESTS The 1985 and 1999 Standards place all manner of caveats on translating tests without any re-norming on the target populations for which the translation was developed. This limits the effectiveness of test translations, The rationale for this is quite , clear. When a test ,is normed, the item difficulty levels and the actual norms flow from the responses of the norming sample. Translating a test, no matter how well done, no matter ifthe translation is then back-translated to English, does not necessarily produce either an equivalent or a useful instrument. But are the caveats in the 1985 and 1999 Standards about translating tests devoid of any empirical proof? Can psychometricians really proceed with elaborate directives on how to translate tests (Geisinger, 1994b) without having to actually re-norm the new, translated test? In 1982, the Mexican government undertook a renorming of the WISC-R intelligence test in Mexico City (Gomez-Palacio, Padilla, '&, Roll, 1983). They began with a straight translation of the verbal test items as well as all the instructions. They also added extra items to most of the verbal subtests in order to make these more aligned to Mexican children's opportunities-to-Ieam in both their public schools and their communities. When the norming was finished, approximately 80 percent of the items . that were translated from the English remained. The critical questions in this discussion are: What happened to the test items? Did they remain in the same location as when they were in the translation and in the English version? Not only did the item sequences, or the degree of difficulty that the children experienced with each word, change; they did so in a most unusual manner. In the first half of the vocabulary subtest,the items were generally easier for Mexican children. In the second half, there were more items that were harder. Overall, Figure 1 clearly shows that equivalence attests merely through translation does not work. This refutes the practices associated with just translating a test. The new translated test, in all likelihood, 25 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES will not be equivalent to the first. The Mexico City data indicate that translating a test, no matter how well done, is a biased procedure. It produces an instrument with unknown psychometric characteristics. It precludes any useful decisions based on the scores. 4) Tests are based on a set of critical assumptions .. It is assumed that the individual being tested has had similar experiences asthe individuals who generate the test norms. It is concomitantly assumed that there is general equivalence in the . opportunities that individuals have had to learn the content in the test, the linguistic genre of the test, and the demands of tests. Intelligence and achievement tests are particularly dependent on meeting these assumptions since without them the attributions to intellectual or academic abilities cannot be reasonably made. In the case of achievement testing, however, it is possible to determine from multiple research sources when the opportunity to leam for a population is not even or not fair (Orfield & Yun, 1999; Moreno, 1999). In this case, the test scores may reflect or even assess opportunity-to-Iearn. They become measures of system accountability. Such scores, however, may not reflect the learning ability of the individual nor his/her potential for learning. In effect, any high-stakes decisions based on scores that do not meet the assumptions of some equivalence in opportunity-to-Iearn can be unfair and invalid. Such may be the case in statewide testing programs when .the tests are administered in English to English-language learners. The scores themselves become,. in unknown degrees, measures of opportunity-ta-Ieam (and English language proficiency), not of individual achievement and certainly not of future academic achievement. In 1982, the National Academy of Sciences broke tradition.with American psychology (Heller, Holzman, & Messick, 1982). It suggested that individual differences in academic achievement may not be the primary source of score differences. It recommended that before a child is tested for special education diagnosis, his or her present instructional setting be evaluated for its validity, effectiveness, and delivery. Similar considerations should be taken into account when testing Hispanic children from diverse cultural, linguistic and schooling communities in the United States. 5) The 1985 Standards for Educational and Psychological Testing was criticized ·(Valdes & Figueroa, 1994) for not addressing one of the most commonly used practices in the testing of bilingual individuals, that is, the use of interpreters. The 1999 Standards finally discuss this practice. But they also endorse it and set about describing the who, how and what for using an interpreter during testing. The unfortunate aspect of this is that there is no empirical evidence in the testing literature.that can attest to the procedural equivalence of this process to doing. testing in one language or to any scientific documentation that the process actually works reasonably well. . . ' In fact, the few doctoral studies that have recently been done investigating the use of interpreters conclude the opposite (DuFon, 1991; Sanchez-Boyce, 1999). Sanchez-Boyce's dissertation {1999} describes the actual process of using an interpreter during individualized testing sessions for special education placement of Hispanic students as chaotic, erratic, and fairly devoid of any standardization· 26 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES procedures. She documents how the conclusions reached from such testing sessions are really socially constructed and have no bearing on what the child can or cannot actually do. Research is urgently needed in this widespread practice to either refute or validate these investigations. If the latter turns out to be the case, the 1999 Standards' should be revised relative to their endorsement of the use of interpreters and it should be done before the next comprehensive draft appears around 2010. 6) The actual process of making decisions on the basis of test scores administered in either English or Spanish has never received much attention. Recently. however, Sandoval (1998) has heuristically taken findings from the research literature on decision-making theory and attempted to apply them to the testing situation where the subject is from a different cultural and linguistic background. Table 3 presents the multiple set of factors that a test-user must engage or consider when testing a Hispanic subject. TABLE 2: FACTORS TO CONSIDER IN MAKING DIAGNOSTIC DECISIONS WHERE ISSUES OF LINGUISTIC AND CULTURAL DIVERSITY ApPLY 1. REPRESENTATIVE BIAS Ignoring the prevalence of a behavior in a given population Reaching decisions on the basis of a limited sample of observations Failing to take into account the fact that some scores will be off by chance Making premature casual inferences on the basis of correlations that are not generalizable, coherent, consistent, robust, or reversible ·2. CONFIRMATORY BIAS Bias from what is expected or what is stereotypic 3. AVAILABILITY BIAS Bias that comes from vivid, recent data .4. INTELLECTUAL BIAS Bias due to intellectual limitations or cognitive overload due to high degrees of complexity (for example, interactions among cultural, lingUistic, and opportunity to-learn variables) Assuming that even two of these factors are robust in doing testing and diagnostic work with Hispanic populations, two implications arise: Can anyone with cultural and linguistic backgrounds that are different from the student being tested actually do the task? and, Is·it possible to train individuals to effectively use these parameters in making test/diagnostic decisions? As noted earlier, decision-making has never been adequately studied with multilingual and multicultural populations. The distinct possibility exists that this has been and continues to be an inadequately studied source of test bias with Hispanic populations. 27 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES The content of this chapter provides empirical evidence that tests used with Hispanic students show evidence of bias. Comprehensive, longitudinal investigations on this question should be commissioned. The impact of Hispanic culture and Spanish language proficiency levels on the predictive, consequential, and/or instructional validity . indices of tests should be determined.. 28 �.) TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 5. EDUCATIONAL ACCOUNTABILITY For all Hispanic Americans, achievement tests are one of the most important sources of information affecting their lives and communities. The emphasis on higher academic standards for American schools has brought with it an unparalleled degree of concern about system and student accountability. Achievement tests supposedly fulfill this· goal better than any other indicator. The precarious aspect of this is that achievement tests are among the most difficult measurement instruments to develop and interpret particularly when they are given in group situations in order to .compare academic gains across states, school districts and schools (Cronb,ach, Linn, Brennan, & Haertel, 1997). These are fragile instruments that suffer from low reliabilities and that all ' too often assess not just what has been learned but also degrees of English-language proficiency, cultural differences, socioeconomic status in the home and community, quality ot-past and, present pedagogy, and the economic advantage or disadvantage of school districts. Historically, achievement tests have nearly always described a chronic pattern of underachievement for Hispanic students of all ages. Reynolds (1933) called attention to the 'one to two-standard deviation differences in achievement test scores between Anglo and Hispanic American populations in the Southwest. In the Coleman Report (Coleman et ai, 1966), the academic levels of Mexican American and Puerto Rican children, continued to show a one to two standard deviation deficit compared to white children. In the 1970s the U.S. Commission on Civil Rights (1971a, 1971b, 1972, 1973, 1974) documented a similar level of underachievement for Mexican American children. These reports also found that the education of Mexican American children differed significantly from that of white children: fewer questions from their teachers, less reinforcementfor their classroom responses, poorer schools, no reflection of their ethnic/cultural background in the curricula, and no Mexican'American models in the teaching, administrative, or counseling staffs. Other data in the 1980s and 1990s also continued to show evidence of comprehensive, national levels of underachievement among Hispanic children and youth (National Commission on Secondary Education for Hispanics, 1984a,b; Arias, 1986; Valencia, 1991; President's AdviSOry Commission on Educational Excellence for Hispanic Americans, 1996; National Council of La Raza, 1998; Laosa, 1998; Moreno, 1999). , . Although there were some modest gains between the NAEP achievement scores of Hispanic students across the country between 1988 and 1994, particularly in math and science, the overall picture is one of decline. The achievement gap between white and Hispanic students continues to increase. Key indicators (such as lower enrollment in preschool programs than white students, higher Hispanic enrollment below modal gr~de, underrepresentation in gifted and talented programs, increased enrollment in segregated schools, a 103 percent increase in suspension rates, a growing digital divide, an increase in the drop-out rates between white and Hispanic students) show that Hispanic students continue to have different educational experiences than their white counterparts in the public schools of the United States. 29 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Yet i;lchievement tests have always operated under the belief that the public schools provide similar educational experiences and opportunities for all students. In 1975, this belief was explicitly noted in a report from the American P~ychological Association (Cleary, Humphreys, Kendrick, & Wesman, 1975): ' ( It is recognized that three assumptions are basic to this .report. The first assumption is acceptance of a single society; heterogeneous though it may be, rather than a divided one. The .second assumption is that few radical changes are expected in curriculum content or methodology of instruction in our educational establishmenL. ...The third assumption accepts the importance of evaluation in education: The assumption of homogeneity of "opportunity to learn" in the public schools of the United States is never really mentioned, but it is implied. With all achievement . testing, this is a necessary condition that must be met prior to any interpretation about a student's or a group's level of academic achievement. Recently, Orfield & Yun (1999) have documented a national trend towards more segregation of Hispanic students in the public schools of the United States. As has always been the case, segregated schools typically have fewer financial resources, the most inexperienced teachers, and the lowest academic achievement levels. In effect, the Hispanic student does not receive the same curricula or pedagogy or resources as do those students in affluent or middle income school districts. The issue of differential "opportunities to learn" for Hispanic children in the public schools of the United States during the last 100 years is incontrovertible.·· , According to the legal analysis on "Nondiscrimination in High-Stakes Testing" author(3d by the U.S. Office for Civil Rights, this condition of limited opportunity to learn establishes a claim of substantive due process violation related to achievement testing because "the students were not taught the material on which the tests were based" (U.S. Department of Education, draft, December 1999, pg. 2). This is a national claim and one that needs to be addressed, particularly in the current Climate of setting progressively higher and higher academic standards without a concomitant resolve to equalize opportunities to learn. NEW INITIATIVES The 1990s produced an unprecedented degree of attention to the measurement of academic achievement.in Hispanic students. For example, considerable legislative· workwas focus~d on including English language learners in all large-scale achievement testing programs (Goals 2000, Educate America Act, P.L. 103-227; the Perkins Act, P.L. 98-524; Improving America's Schools Act, .P.L. 103 328). Three initiatives, however, are particularly noteworthy because of their potential impact on national policy and discussion on this matter: the new Stanqards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, 1999), the National Research Council's report Improving Schooling for Language.,.Minority Children (August & Hakuta,1997), and Grading the Nation's Report Card, by the National Research Council. Each of these· R 30 �\ TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES addresses the challenges involved in testing the academic achievement of English language learners. In the new Standards for Educational and Psychological Testing (American Educational Research Association,et al., 1999), there is a particularly powerful caveat· noted, with respect to achievement tests: it is not known how much they become measu res of English language proficiency when they are used with individuals whose primary language is not just English. [" ..among non-native speakers of the language of the test. one may not know whether a test designed to measure primarily academic achievement becomes in whole or in part a measure of proficiency in the language of the test" (pg. 9-4)]. Across the United States, however, the test scores of English language learners are never described in these terms nor is the possible degree of test bias or error recognized .. Statewide testing programs of academic achievement should include statewide standardized measures of English language proficiency capable of measuring multiple dimensions ofthis competence. As the 1999 testing Standards suggest, a determination should be made first about which language is dominant. Then, the degree of proficiency in the dominant language should be measured along dimensions such as reading, . writing, comprehension, grammar, pronunciation, and communicative competence. There should be a clear understanding, hoWever. that even with this type of comprehensive language proficiency assessment, knowledge in certain domains may be missed by achievement tests. The new Standards therefore recommend doing academic . testing in both languages even when proficiency in English is established. In 1997, the National Research Co'uncil, through its Committee on Developing a Research Agenda on the Education of Limited-English-Proficient and Bilingual Students, ostensibly summarized the current knowledge base on. testing English-language . learners in its report on Improving Schooling for Language-Minority Children (August & Hakuta, 1997). Four contemporary concerns are addressed: the measurement and use of students' L 1 and L2 linguistic proficiencies in school districts across the United States, the use of tests with bilingual students for entrance and exit criteria from educational programs (including Title I and special education). the impact of L 1 on the validity and reliability of tests, and the measurement of academic achievement particularly in standards-driven educational contexts. The report sets forth the following research agenda for the assessment of second language learners (August & Hakuta, 1997, pgs. 113-134): . 1) Given that current tests tend to measure only discrete linguistic features, the assessment of linguistic proficiencies in L1 and L2 needs to be aligned with current research on how language acquisition occurs in children in bilingual communities. 2) Because different dimensions of English are required by different academic subjects and in different grades, it is necessary to determine how to use 31 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES multifaceted measurements of English proficiency to validly predict success in English-only classrooms. 3) Research is needed on how to measure knowledge in academic subjects. Specifically: Does, testing in English underestimate academic knowledge if English proficiency is limited? Does lack of familiarity with "test language" affect academic test scores even when the tests are given in the primary language? Is it better to test in English or in the primary language when subject matter has been given only in English? What is the !mpact on test scores from varying levels of native language proficiency, years of schooling in English, and difficulty of academic content? 4) Studies need to be done on how English learners take tests, specifically, how test demands, test format and test language (such as instructions) affect scores., These studies also need to answer one question: when can an English learner validly and reliably take a test in English and when does an English learner need test' modifications or accommodations? Further, what is the impact of such modifications on test reliability and validity? 5) Studies are needed to determine how rater or scorer error can be reduced in "open-ended or performance-based" (pg. 130) assessments when the test-taker's English proficiency can influence scoring decisions. 6) -'rhe standards and accountability reform movements in education call for content, performance and opportunity-to-Iearn benchmarks (McLaughlin & ' Shepard, 1995) throughout schools, school districts and federal programs. Research is needed on how to operationalize these for English language learners. Specifically: Can indicators of subject matter competence and English proficiency development be produ,ced for English language learners? How can the progress of English language learners (ELLs) be gauged within school district standards and on indices of academic achievement? Ifnonstandard assessments are used with ELLs, how can these be included within state and district accountability measures? What is the operational meaning of "yearly progress" for ELLs given the possibility that ELLs "may take more time to meet ... standards" (pg. 127)? Finally, given the fact that there are few data on effective pedagogical, curricular, or contextual conditions for the schooling of ELLs, how can opportunity-to-Iearn standards be operationalized for them? There is one area missing in this research agenda. In spite of the fact that there is text (pgs. 124-125) in the report on meeting the assessment needs of English-language learners referred for special education testing, the report fails to heed its own voice. 8ec:;ause there are no assessment instruments that can differentiate between linguistic/cultural differences and disabilities, .research is' needed on how to ope rationalize the "nondiscriminatory .assessment" provisions of federal special \ education laws, as these apply to Hispanic, English language learners. 32 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES, By and large, the research agenda pro'posed by the National Research Council should be endorsed and funded. Its importance is twofold. First, it points to work that is vitally needed now. Second; it highlights the elementary stage that the country is in with respect t6 measuring'Jhe academic achievement levels of English langl(age learners., , ' The National Research Council has, also evaluated the efforts of the National Assessment of Educational Progress (NAEP) in measuring the academic achievement ofall students in the United States {Pe"egrino~ Jones, & Mitchell, 1999. NAEP is one qf the more problematic test-developers and test-users with respect to the academic achievement testing of Hispanic students. Up to the 1~990s, NAEP, for example, did not even have reliable procedures for identifying the ethnic background of Hispanic students (Rivera, 1986; Rivera & Pennock-Roman, 1987; Baratz-Snowden; Pollack, & Rock, ' 1988). The manner in which the students were chosen for inclusion in NAEP testing was not random and systematically excluded English language learners. Even when NAEP attempted to address the complexities associated with testing Hispanic children, its' efforts were so fJawedthat it precluded any meani'ngful interpretation of test ~cores (e.g., 8aratz-Snowden, Rock" Pollack, & Wilder,1988). , , , In the most recent report on NAEP, the National Research Council (Pellegrino, Jones, & Mitchell, 1999) pays particular attention to the assessment of English language learners'. Asserting that NAEP and other assessment programs have made many efforts to accommodate the special needs of English language learners (Olson & Goldstein, 1997), this report highlights the efforts of.the Puerto Rico Assessment of Educational Progress, unique Spanish language translation and accommodation of NAEP , mathematics tests. After the administration of NAEP in Puerto Rico, several key findings about the complexities associated with transporting American tests across languages were made: translations often 1ail (Anderson & Olson, 1996), and i~em response theory analyses yield'noncomparable scales for English and Spanish versions ofthe test (Olson & Goldstein, 1997). In another study (Anderson, Jenkins, & Miller, 1996), similar conclusions were ,reached with respect to the translation of other NAEP tests: "the translated versions of the assessment are not parallel in measurement properties to the English version and scores are not comparable" (pg. 31). a In 1995, a field test of the NAEP mathematics' test included more English language learners than ever before. In fact, where previously the NAEP instructed schools across the nation on'which ELL students to exclude, inthis field test the instructions were on how to include .ELL students who, the school staff thought could actually take the test. Also, the following accommodations were included: more time, ' more testing.sessions, different testing sessions (group and individual), using an ' ' interpreter to elaborate on instructions, and test booklets in Spanish. There is some' evidence that accommodations do enhance participation, but the explicit of such accommodations on test scores are not clear. Recent research from UCLA suggests that the benefits may be marginal (Abedi, 1999a, b, c). Problems also remain with respect to the test technology used to identify and classify ELL students nationally and with respect to the wide variation across states and school districts in the criteria that they use for these purposes (August & Lara, 1996; Valdes & Figueroa, 1994). As the ,National Research Couj)cil' acknowledges: ' 33 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES iTo date, the dilemmas described ..... have not been resolved. Children potentially in need of native language support are still being assessed at entry level using one of several instruments that many scholars have questioned, and some years later they are tested again using another of such instruments that is in no way comparable to the first. The field is no closer to developing means for assessing whether a child can or cannot function satisfactorily in an all-English program-or participate in all-English large-scale assessments-than it was in 1964. (Pellegrino, Jones; & Mitchell, 1999, pg. 105) Table 3 presents the "research agenda" suggested by the National Research Council in order to help NAEP cope with the inclusion of English language learners as part of the Nation's Report Card. As will be. noted, 'some of these (such as using translations) are questionable given the historical and scientific experience of ~he country with such procedures. TABLE 3: THE NATIONAL RESEARCH COUNCIL'S CONSIDERATIONS FOR INCLUDING ENGLISH LANGUAGE LEARNERS IN THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS . What types of demands do different assessments make on English language learners and how do different types of accommodations help different types of students? How do accommodations affect the construct validity of the tests? Is it useful and cost effective to try accommodations such as translations? Do scores from accommodated administrations have the same scaling properties and can they be reported in the same fashion as for all other students in NAEP? Do English language learners have the same opportunity to learn and curricula as non-ELL students? . What are.possible, alternative assessment methods for ELLs? (Pellegrino, Jones, & Mitchell, 1999, pg. 1,10-111) Table 4 presents the major conclusions and recommendations of the National . Research Council for NAEP's testing of English language learners. 34 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES . TABLE 4: THE NATIONAL RESEARCH COUNCIL'S MAJOR CONCLUSIONS AND RECOMMENDATIONS FOR TESTING ENGLISH LANGUAGE LEARNERS ON THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRE~S (NAEP) \ CONCLUSIONS Conclusion 3A. The participation and accommodation of .... English-language learners . are necessary if NAEP results are to be representative of the nation's students. There is currently a paucity of interpretable achievement data and accompanying contextual .. data on the performance and educational needs of these populations .. . . Conclusion 38. Enhanced participation of ..... English language learners in NAEP depends on (1) the consistent application of well·defined criteria to identify these students and (2) accurate collection and reporting of information about them. RECOMMENDATIONS Recommendation 3A. NAEP should include sufficient numbers of ..... English language learners in the large-scale assessment so that the results are representative of the nation and reliable subgroup information can be reported. Recommendation 38. Criteria for identifying ...... English language learners for inclusion in· the large-scale survey need to be more clearly defined and consistently applied.· . . Recommendation 3C. For students who cannot participate in NAEP's standard large scale surveys, appropriate, alternative methods should be devised for the ongoing collection of data on their achievement, educational opportunities, and instructional experiences. Recommendation 3D. In order to accomplish the committee's recommendations, the NAEP program should investigate the following: XI. Methods for appropriately assessing, providing accommodations, and reporting on the achievements oL ... English language learners, and XII. Effects ofchanges in inclusion criteria and accommodation trends in , achievement results." (Pellegrino, Jones, & Mitchell, 1999, pg. 112-113) 35 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Table 4 presents an exceedingly modest set of recommendations (for English language learners. and for Hispanic children. Some of these recommendations go back nearly 70 years (Sanchez,' 1934; Reynolds 1933). It is time for a comprehensive set of action plans to make large-scale, educational accountability systems, such as NAEP, relevant and useful for the educational present and future of Hi~panic children. The points made in Table 4 are good starting points. But there are other, more complex issues that need to be asked and answered. For example, assuming that it is possible to measure students' multidimensional levels of proficiency in both languages, the challenge remains as to what to do with language proficiency scores. Should they be used to, generate expectancy scores? Should they be used to adjust achievement scores to compensate for error? Should they alter cut-offs for eligibility, detention or promotion purposes? Research is needed to establish the function of such scores for interpreting the academic achievement of Hispanic, bilingual individuals with varying levels of acculturation and from the multiple ethnic and cultural backgrounds of Hispanic Americans. j 36 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 6. DIAGNOSTIC TESTING FOR SPECIAL EDUCATION Linguistic minority children, for nearly a century, have tended to overpopulate classes for students with mental disabilities. No other area of education is as linked to issues of genetic differences in intelligence as special education. The fundamental question that has plagued this area of American education is why there are so many minority children with low las. Race psychologists in the 1920s and 1930s attributed this , to genetic inferiority. Apostles of the same doctrine have asserted the same in the 1960s and continue to do so even now. On the other hand you look at a lot of kids in the inner cities who have not seen a book by the time they come to kindergarten, and you give them one and they hold it upside down and the wrong way ... The language interactions that they've had at home are nil. They , have never even heard these sound systems. Are they lousy readers? A lot of them are. Are they genetically predisposed? Some of them are making that combination a tough one to treat. (Reid Lyon. quoted in Taylor, 1998, pg. 192) In the 1960s, the federal courts entered this debate. The questions posed then were: Why are there so many minority children in classrooms for the mentally retarded? and, Are the tests used to diagnose mental disabilities biased against them? Many of these court cases focused on the possible linguistic bias of 10 tests and on the denial of equal educational opportunities to Hispanic students placed in special education {Diana v. California Board of Education. 1970; Jose P. v. Ambach, 1979; Arreola v. Board of . Education, 1968 Guadalupe Organization v. Tempe School District, 1978; Ruiz v. State Board of Education, 1971; Covarrubias v. San Diego Unified School District, 1972; Lora v. Board of Education of the City of New York, 1984). For the most part, the courts ruled in the direction of using non-verbal tests of intelligence, establishing monitoring systems for determining when Hispanic children were overrepresented in classes for students with mental disabilities, and overseeing training programs for staff in order to do nondiscriminatory assessments. This last provision became part of the federal law for special education in 1975 and was repeated in the mandates for the 1997 Individuals with Disabilities Education. Act. However, nondiscriminatory assessment remains· either an enigma or serious problem for the American Psychological Association. In its latest, major document on testing individuals from diverse backgrounds (Sandoval. et ai, 1998), nondiscriminatory assessment does not exist. . The sort of testing that is done in order to determine whether a child has disabilities is unique in education. It is really an analog of what a medical doctor does when an individual has serious symptoms. Most children who are tested for special education placement go to the school psychologist with one predominant symptom, poor reading. After the administration of a few or many, many tests, the psychologist does a diagnOSis and at an Individualized Education Program meeting recommends either special or general education placement. At that time he/she also prescribes a treatment to "cure" the symptom. But the similarities between a medical doctor and a school psychologist are illUSOry. The tests that the school psychologist uses have no real power to diagnose and the educational treatments are usually ineffective (Skrtic. 1991). 37 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Most children who are tested for special education have no clear, biological reasons that triggered the testing. Most children are sent to special education testing between the second and fifth grades. Again, the predominant reason for being sent is poor reading. There are four categories ofdisabilities (Learning Disabilities, mild Mental Retardation, Behavioral Disorders, Speech and Language Problems) that are unique in this entire enterprise because they are suspect. Many researchers have argued that the diagnostic tests used in special education are incapable of differentiating among these four disabilities (Keogh 1990; Lyon, 1996). Also, for many children, particularly from culturally and linguistically diverse backgrounds, these categories may be socially· constructed. That is to say, these students could have a disability or their "symptoms" could be due to socioeconomic, cultural, linguistic or poor opportunity-to-Iearn factors (Rueda & Forness, 1994; Trueba, 1987; Heller, Holtzman, & Messick, 1982). The tests, however,cannot "diagnose" the difference. For Hispanic children, the Handicapped Minority Research Institutes in Texas and California (Rueda, Figueroa, Mercado, & Cardoza, 1984; Rueda, Cardoza, Mercer, & Carpenter, 1984; Garcia, 1985; Ortiz, 1986; Ortiz & Maldonado-Colon, 1986; Ortiz & Polyzoi, 1986 ; Ortiz & Yates, 1987; Swedo, 1987; Wilikinson & Ortiz, 1986; Willig & Swedo, 1987) documented the unique problems Hispanic children and their families faced when confronted by the testing/diagnostic process in special education. Language . proficiency levels of the students were often not considered. Most testing was done in English. The linguistic challenges experienced by English learners were often diagnosed as a disability. Depending on the tests given, a Hispanic child could easily qualify for the Learning Disability or the Communication Handicapped categories. When retested after being in special education, the Hispanic children's las decreased. Limited English Proficient students were more likely to be recategorized with another disability. Tests developed and normed for Spanish-speaking children were just as problematic as tests normed on English speakers. If the parents were born outside the United States, there was a greater likelihood that their child would end up in special education. Finally, it was found that diagnostic tests were capricious in their "diagnoses": when the tests were given to an entire class of Hispanic children in general education, 53 percent were found eligible for the Learning Disability program. When "mentally retarded" Hispanic children were tested, 43 percent of them were "diagnosed" as Learning Disabfed. It should be noted that many researchers in the area of special education see no problem in the use of psychometric tests with Hispanic or African American children. For them, the issue of nondiscriminatory assessment simply does not apply, neither do linguistic and cultural factors describ~d in the 1985 or 1999 Testing Standards (MacMillan, Gresham, & Bocian, 1998). Special education testing with Hispanic students has very little empirical, research data to support many of its extant practices. The development ofdiagnostic tests normed on Spanish-speaking populations abroad provides one of the most widespread uses of diagnostic instruments with Hispanic children. But as some have argued: The bilingual is NOT the sum of two complete or incomplete monolinguals: rather he or she has a unique and specific linguistic configuration. The coexistence and constant interaction 38 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES of two languages in the bilingual has produced a different but complete linguistic entity. (Grosjean, 1989. pg.6) , a When bilingual individual confronts a monolingual test. developed by monolingual individuals, and standardized andnormed on a monolingual population, both the test taker 'and the test are asked to do something that they cannot. The bilingual test taker cannot perform like the monolingual. The monolingual' test cannot "measure" in the other language. (Hakuta& Garcia, 1989; Hakuta. Ferdman, & Dfaz, 1986) , ' Ironically, single-language tests deceptively measure the "monolingual" part of the bilingual (one or the other of the bilingual's two languages). irrespective of proficiency in that language, and they do so reliably, But these,tests fail insofar as they may exclude mental , content that is available to the bilingual in the other language, and mental processes, and abilities that are the product of bilingualism. ' (Valdes & Figueroa, 1994, pg.87) There is an urgent need to determine the diagnostic validity of Spanish language tests normed on monolingual populations andused,for diagnostic purposes with U.S. bilingual populations. ' ' :. , Another current practice in special education testing ofHispanic children is to test repeatedly until the right diagnostic profile appears or to conduct elaborate decision making procedures in order to get to the "real disability." Data exist, independent of issues of cultural and language differences, indicating that the more testing that is'done the less likely the "real disability" will appear (Mehan, Hertweck, & Meihls, 1986; Taylor, 1991 ). The use of interpreters' is widespread in special education testing (Langdon, 1994). As already mentioned, emerging. empirical data on this practice suggest that it should be proscribed; Another common practice in special 'education testing is the use of translated tests. This practice should also be stopped. Though most of this chapter has focused on diagnostic testing, other forms of assessment are routinely done in special education (such as curriculum-based " assessments, ,functional assessments, ecological assessments, dynamiC assessments); The Same general principles that have been discussed with respect linguistic/cultural differences, bias, validity and reliability apply. For the population of Hispanic children and their families, all of these testing practices also need to be considered in light of several questions: Does placement in special education provide any educational , benefits for the student? Does any testing really help to promote educational, achievement in special education? Many researchers (Skrtic, 1995; Figueroa' & Artiles, 1999; Mehan, Hertweck, & Meihls, 1986) would answer both in the negative. , , Finally, there is the question of testing for placement in the gifted ,and talented programs,acrosS the United States. The search for a measure of Hispanic intelligence that would give Hispanic students a fairchance at being equitably represented in such' programs has been extensive (Bernal &Reyna,1975; Chambers, Barron, & Sprecher, 1980; Zappia, 1989; Perrine, '1989; Bermudez & Rakow, 1990; Marquez, Bermudez & Rakow, 1992; Johnsen, Ryser, & Dougherty, 1993; Sawyer& Marquez, 1993; Garcia; 39 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES 1994; Maker, 1996; Maker, Nielson, & Rogers, 1994}: By and large, however, none of 'these have succeeded in establishing a national procedure for identifying gifted Hispanic , students. Hispanic pupils, accordingly, are very underrepresented in these programs. They will continue to be absent as long as developers and users of tests and eligibility criteria for gifted and talented programs fail to realize that the opportunity-to-Iearn experiences of Hispanic children in America's public schools are very different and that tests respond to these differences in the form of lower scores. Cultural factors may also complicate this form of assessment. Data suggest that the family contexts in Hispanic homes of highly academically gifted students vary significantly from Anglo, middle class homes. In Hispanic homes, family values take precedence over individualism and bilingualism is prized over monolingualism (Soto, 1988; Fernandez & Nielsen, 1986; Hine, 1993). ' For the emerging Hispanic community in the United States, there is an ' overarching question to be asked about the education of their intelligent children: Is it a good idea to isolate the "smartest" and to give only them an enriched, gifted education? or, Is it a better idea to offer this to all students, including those with disabilities? Some research on programs for the gifted'in close-knit communities and in schools suggests that the social impact of these programs can be quite negative for all children, their. parents and their communities {Margolin, 1994; Sapon-Shevin, 1994}. f 40 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 7 - OTHER TYPES OF TESTING·· Two other broad types of tests are often given to Hispanic students in educational settiilgs. These are personality and vocational· tests. Both are used in counseling settings, though the former is also used Jor special education diagnoses of emotional disturbance or personality disorders. Both,. in varying degrees, are affected by the same issues extant in all other forms of assessments with Hispanic students: cultural differences and linguistic backgrounds. PERSONALITY TESTING The empirical and professional literature on personality tests, by and large,has never really attended to the issues ofcultural differences or bilingualism in the Hispanic community (MalgadYi Constantino, & Rogier, 1987; Olmedo, 1981; Bernal, & Castro, . 1994). The officiCiI "Guidelines for providers of psychological service~ to ethnic, linguistic, and culturally diverse populations" (American Psychological Association, 1993) acknowledges tha~ there are problems associated with tests that have not been validated for use with minority populations. But there are no serious limitations placed on their use. Similarly, cultural factors in mental health testing and diagnosis receive very little attention in the "Diagnostic and Statistical Manual of Mental Disorders IV" (Dana, 1995). . < As a consequence, some psychological providers recommend that tests be us~d with interpreters, with norms from other countries (such as Spain), or as a spontaneous translation (e.g., Nieves~Grafals, 1995; Dana, 1995). Yet, there is evidence that . Hispanic test-takers can give different meanings to the connotative, emotive vocabulary that is often used in such tests (Brizuela,1975; Diaz-Guerrero, 1988; Gonzalez-Reigoza, 1976). Similarly, there is research on how they express affective material in ways that are quite different in Spanish than in English (Gonzalez, 1978; Ruiz, 1975). Also, data exist on how Hispanic clients are rated differently on personality variables when they are evaluated in English versus when they are evaluated in Spanish (Grand, Marcos, Freedman, & Barroso, 1977; Westermeyer, 1987; Edgerton & Kamo, 1971). . . A contemporary approach to personality testing involves the parallel use of tests of acculturation. Research on one such test, the "Acculturation Rating Scale for Mexican Americans" (ARSMA I & II) (Cuellar, Harris, & Jasso, 1980; Cuellar, Arnold, & Maldonado, 1995) has proven to be of some importance both in terms of heuristics and irnprovements in the personality testing of Hispanics (Velasquez & Callahan, 1992). It has been demonstrated, for example, that on the most widely used test of personality, the MMPI, Hispanic Americans whose cultural orientation is "traditional" show more· "pathology." These outcomes are basically the result of differential (cultural) treatment. The ARMA studies show this (Dana, 1995) in a particularly "emic" way, though other , research similarly confirms that differential cultural impacts persist even in the Mrv'JPI-2 (Whitworth & McBlaine, 1993; Whitworth & Unterbrink, 1994). 41 �TESTING HISPANIG STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUE;; The use of acculturation scales, such as the ARMA I and 1/, is usually limited to a specific subgroup of Hispanics. Currently, acCulturation scales are more abundant for Cuban American and Mexican American populations (Marin, 1992). Given the limited . representation of the scales, their use is limited. Also the use, of such scales remains optional. No attempt is being made to psychometrically incorporate such measures into tests such as the MMPI. Admittedly, however, using measures of acculturation or sociocultural variation to "correct" score bias has not fared well in the past. The Sociocultural Scales of the System of Multicultural Pluralistic Assessment (SOMPA) {Mercer, 1979}, for example, do produce a new IQ estimate based on the degree of distance that a Hispanic child's family exhibits from middle class, white families. Bat the new IQ proved to be neither a better predictor nor a better "corrector" (Figueroa & Sassenrath, 1989; Valdes & Figueroa, 1994). . . . There is formal resistance to norminga test such as the MMPI. within Hispanic , populations. This is a reluctance that cuts across all types of tests currently in use: As Dana {1995} has suggested, the press for group-specific norms may have to come from studies showing the error or mistake rates that mono-normative tests produce in personality diagnoses with Hispanics. In many ways these data are already available in all other areas of testing. Most tests do produce negative, differential impacts with Hispanic students. The present state of testing Hispanics on personality tests relies on a series of . questionable practices: making testers "culturally competent," doing "corrections on the tests, promoting ideographic, tester in~erpretations based on measures of acculturation. There is very little, actual research on how to do diagnostiC personality work with Hispanic children and youth (Cervantes & Arroyo, 1994). As a consequence, recommendations to clinicians, even when they are very well crafted (such as Cervantes & Arroyo, 1994), remain anecdotal and of \Jnknown generalizability, utility and validity. As. is the case with most tests used in the United States, new tests specifically made appropriate for the Hispanic populations' bilingual, multicultural status are needed. Preliminary efforts in this regard {Costantino, Malgady, Rogier, & Tsui, 1988} have been fruitful and deserve more research and development. A sea change in attitude about testing distinCtly bilingual/multicultural populations is needed within the testing community in the United States. Dana {1995} notes: j , The [current] repertoire of standard' tests emerged in an era when a "melting pot" conception of acculturation was in vogue and new immigrants were expected to assume a relatively homogeneous identity after three generations in this country. This expectation did not occur uniformly even fpr descendents of European immigrants. Now that diversity instead of homogenization has become the hallmark of American society, professional acts and technologies must reflect this societal change. (Dana, 1995, pg. 314) OCCUPATIONAL INTEREST TESTS Helping a student choose a career and plan a requisite program of academic preparation are critical counseling functions. Often, this process begins with the administration of interest inventories as early as elementary school. A fundamental 42 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES assumption behind these tests is that students in society have somewhat equivalent levels of cultural capital. That is, they have a relatively realistic notion about how societal systems function and what it takes to negotiate er:ttrance into such systems. In terms of careers and jobs, cultural capital means knowing what defines a particular area of .occupational interest and what levels are possible within a chosen occupation. Cultural capital in this context also means knowing how to negotiate entrance into systems that help produce the requisite' competencies and the desired job or career. For Hispanic students, recent research (Stanton-Salazar, 1997; Stanton-Salazar & Dornbusch, 1995) suggests that, in fact, cultural capital is not easily accessed by them. Res,earch on the use and impact of occupational interest tests with Hispanic students is neither elaborate nor elegant. There are not many studies and those that do exist do not control for either bilingualism or acculturation in terms of cultural capital. . On the other hand, the related area of employment testing is uniquely optimistic about its fairness, validity and virtual universality (Ramos, 1992) for use with Hispanic adults. Some of this comes from predictive validity studies that show that there are no differences in the way tests function with Hispanic job applicants. The proplem here, however, is that these tests are very poor predictors for everybody and that the effects of bilingualism on predictive validity remain's unknown. Also, a central element discovered in the employment testing of Hi~panics is that so much of the success in the tests and in the jobs depends on educational background. . . 43 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES CHAPTER 8 - SUMMARY AND RECOMMENDATIONS . . The testing of Hispanic children has not made much progress in the 20 th century. The areas where there has been quite a bit of progress is in the empirical documentation of the impact of bilingualism on test scores and on the development of policies and caveats associated with the testing of Hispanic individuals. However, there has not been· much progress in test development and technology in any area of testing \,Vith respect to these students. On the basis of what has been reviewed, seven options appear to exist concerning the measurement of Hispanic students' abilities, achievements, personality, and occupational interests: 1) tests can be administered in English using what are basically monocultural norms, 2) testers can be given "cultural training" so that they can interpret the tests in ways that appear to be more valid, 3) accommodations in the tests and the testing situation can be provided, 4) a testing moratorium on the use,of individual test scores for any high-stakes assessment can be put in place until research sorts out the complex issues associated with testing Hispanic students, 5) tests can be used for holding systems legally and politically accountable for the educational decisions that adversely impact Hispanic students as manifested in differential, negative outcomes, 6) Hispanic-specific local norms 'can be developed in order to compare students with similar cultural, linguistic, and scholastic experiences, and 7) school systems and opportunities to learn are made equitable for Hispanic children across the United States, thereby meeting the crucial assumption of tests about experiential homogeneity. At present, only the first three are viable and in use. None of these three, however, can demonstrate that they are free of Significant degrees of bias, unfairness, or denial of substantive due process. . The fourth option has been suggested (Valdes & Figueroa, 1994) but has received virtually no support. The fifth option has not really been tried in the last decade, but it remains a plausible response to political attacks, such as California's propositions 227 and 209; that are already inflicting harm and damage on Hispanic children and that can be documented by the tests' ability to measure contextual effects. In Kern County in California, for example, the school board has decreed that Hispanic children must learn English in three months and then receive their education in English. The impact of this decision will become manifest in the tests administered in English. The sixth option may well be the most immediately relevant for both test developers and the Hispanic communities in the United States. But there is a great deal of opposition from both political and professional interests. Ethnic/linguistic norms will provide comparisons among children with generally homogeneous experiences and background in local communities. But, they arouse suspicions about a "divided" society. They may also be seen as sources of reverse discrimination. In employment testing, the courts and Congress have refused to accept group-specific norming precisely because of issues related to reverse discrimination (Sireci & Geisinger, 1998). Ironically, the intellectual community has not been so reluctant. The National Academy of Sciences recommended this as a solution to the bias that results from employment tests among job applicants with differential opportunities-to·learn (Hartigan & Wigdor, 1989). 44 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Education, however, has always occupied a different status with the courts. This may also apply with regard to testing in educational contexts. The issue of group norms in all aspects of schooling should be studied and debated. Certainly, the 1999 Standards for Educational and Psychological Testing may already have sided with this option. There are clear mandates there that validity needs to be grounded within linguistic groups when research indicates that test scores are affected by language background (Comment for Standard 9.2). The seventh is the best option; but if history is any indicator, it .is the one most likely to take multi-generations to accomplish. It is also the option that best explains why tests are such a failure for Hispanic communities. The primary problem with tests is not the tests. It is the educational context in which they are developed, used, and studied. The historical and contemporary data have clearly documented that, in the United States, public education has not wor:ked for Hispanic children. Tests help and perpetuate much of the dysfunction that Hispanic children get in schools. The one positive conclusion that can be drawn from the review presented in this document is that the testing community is finally beginning to realize that the . problems with testing Hispanic students are far more complex than ever imagined and that they are potentially irremediable in the status quo (Heller, Holtzman, & Messick, 1982; American Educational Research Association, et aI., 1985, 1999; Pellegrino, Jones, & . Mitchell, 1999; August & Hakuta, 1997; Sandoval et aI., 1998; Heubert& Hauser, 1999). The solution to the problems engendered and embodied in tests resides in changing the educational experiences of Hispanic children. . A compelling example of what this may entail was described by Garcia and Otheguy (1995). They set out to answer four research questions in an ethnographic study of "seven private, but low-tuition, non-elite schools in Dade County, Florida." They were "run by and for Cubans." The parents of the children were predominantly from working class and middle class income levels. They were, in effect, similar to families of Hispanic children in urban school districts. The four research questions were typically those that preoccupy educational researchers about bilingual children in U.S. public schools: Should Spanish be used? How is language dominance measured and used? When do you use English? In which language is reading taught? The authors were unable to answer these research questions. The following are the reasons for this failure. When majority educators look at the education of Hispanic children in the United States, they focus on their linguistic deficits .... Discussions about the education of these children begin and end with the issue of the English language, or how they lack it, and how best to give it to them ..... However. when Hispanic parents and educators in control of the education of their own children think about the educational process, they ask different questions. They ask questions about the way to educate their children, about pedagogy, instructional strategies and teaching methods, about curriculum and materials. We asked them about language. they told us about education ..... Spanish naturally belongs in ethnic schools that are controlled, staffed and run by the Hispanic community, so there is no need to question its role in public education ..... Those of us in public education need to learn from these educatprs that substantive high expectations do matter; that bilingualism and biliteracy are obtainable .if . . 45 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES one holds both children and teachers unequivocally responsible for obtaining them; that . initial literacy in two languages is possible and doesn't have to be limited to Spanish; that advanced literacy in two languages is possible and doesn't have to be limited to English; that in US society all children acquire English naturally and that therefore English acquisition should not be the main focus of education; that parents and community do matter for education; that when they are in controL .. .the results are ultimately superior; that the context of a child's home culture is essential..; and that continuity with the intellectual and social climate of the home is of paramount importance if the school is to help children develop and foster their intellectual and social growth. (Garcia and Otheguy, 1995, pg. 99-100) The public education of Hispanic children needs to focus on education. It needs to be reformed pre-eminently in terms of local control. Until such time as when the U.S. educational system is locally and proportionally controlled by Hispanic commu·nities and until it achieves a modicum of equity in how it distributes resources. cultural capital, and the application of "high standards" across all school districts, tests and test scores will continue to show massive technical problems of bias, differential treatments and differential outcomes. They will continue to· impede the future of Hispanic communities. Tests will "work" when the public education of Hispanic children becomes democratic and effective. . 46 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES .; RECOMMENDATIONS 1. The U.S. Department of Education's Office for Civil Rights should undertake a legal analysis of test usage with Hispanic students and individuals, focusing on the dysjuncture that exists between what the Standards for Educational and Psychological Testing prescribe and what test users (individual testers, school district testing programs, state testing programs) actually do. Particular attention should be focused on the testing of bilingual individuals. 2. The U.S. Department of Education's Office for Civil Rights needs to determine whether the "disparate treatment" legal analysis under Title VI and Title IX statutes applies to the historical experience of many, if not most, Hispanic students with tests and testing. A compelling, empirical argument can be made that they are tested under different conditions: they are tested with monolingual norms when most of them have varying levels of bilingual status,'all of which have left an indelible, if not unerasable, mark on all tests that use English as the main vehicle for eliciting responses; and, their scores show evidence of attenuated predictive validity related directly to their varying levels of exposure to Spanish. 3. Excessive testing should be discouraged .. There is a widespread belief that with students forwhom current testing technology may not be appropriate, the thing to do is to test them more using many different tests. There is no evidence to support this approach. There are, however, data suggesting that excessive testing does not improve diagnostic decisions (Mehan, Hertweck, & Meihls, 1986), but, rather, that it may negatively affect children (Taylor, .1991). . 4. The U.S. Department of Education's Office for Civil Rights needs to determine whether the "disparate impact" legal analysis under Title VI and Title IX statutes applies to the comprehensive and chronic pattern of Hispanic students' 'underrepresentation in Gifted and Talented programs, their overrepresentation in programs for students with disabilities, and their miniscule presence in institutions of higher learning in the United States. There is, in effect, a clear pattern of a disparate impact from testing practices across·a wide array oftests used in multiple . educational contexts. There is also compelling evidence that there is. bias in prediction and that this differentially' constricts tests' educational pu rposes when used with most Hispanic students. . 5: Translated tests should not be used. There is very little likelihood that the new translated test will have the same technical properties as the original, and there is a substantial likelihood that the translated test will not work. The practice of translating tests and of using their scores for making decisions about individuals should stop. . 6. A clear distinction, if not separation, needs to be drawn between the issues that are significant in meeting the challenges of a disability with those involved in the education of children with two linguistic systems. Recent publications on "diversity" and "test accommodations" are linking the issues relevant to English language learners with those that are meaningful for students with disabilities. One of the great . , . 47 �TESTING HIS'PANIC STUDENTS IN THE UNITED. STATES: TECHNICAL AND POLICY ISSUES historical mistakes in American education has been the tendency to perceive bilingualism as a handicap. For example, special education is dedicated to diminishing the impact of a disability. The education of English learners should not be guided by the diminution of an asset such as bilingualism. 7. Tests that purport to have equivalent test versions in English and Spanish need to show empirical evidence that, in fact, there is equivalence. Similarly, research is , urgently needed on whether bilingual, Hispanic children in the United States can be validly and fairly compared on Spanish/English tests that relied on monolingual samples to generate monolingual norms in English and Spanish. 8. The use of interpreters should be discouraged, if not proscribed. Interpreters are· basically poor substitutes for what should be provided to Hispanic students: culturally knowledgeable, linguistically competent testers from their own communities. As currently envisioned in the 1999 Standards for Educational and Psychological Testing, interpreters can be trained and used in testing situations. New data on this practice, however, suggest that the'use of interpreters may somewhat destroy comprehensive standardization. Further, in special education, the use of interpreters may lead to invalid inferences and conclusions. The failure to recruit, train and graduate Hispanics in the testing professions cannot be ameliorated by the use of interpreters. This is a practice that may really be a malpractice. ' 9. It is recommended that the Standards in Chapter 9 for "Testing Individuals of Diverse linguistiC Backgrounds" be analyzed by experts in second-language acquisition, language proficiency testing, and bilingual assessment in order to examine the ambiguities and the assumptions of that chapter. The 1999 Standards for Educational and Psychological Testing are problematic in the areas of language proficiency testing and the use of testing accommodations with bilingual subjects. 10. The impact of Hispanic culture and Spanish language proficiency levels on the' predictive, consequential, and/or instructional validity indices of tests should be determined. There is empirical evidence that tests used with Hispanic students show , evidence of bias. Comprehensive, longitudinal investigations on this' question should be commissioned. 11. The U. S. Department of Education's Office for Civil Rights should conduct an analysis of testing practices with Hisp?\nic students throughout the states and by the National Assessment for Educational Progress to determine whether some or all of these do not meet the legal criteria of discrimination under Title VI and Title IX. 12. The research agenda on assessment proposed by the National Research Council's Committee on Developing a Research Agenda on the Education of Limited-English Proficient and Bilingual Students in its report Improving Schooling for Language Minority Children (August, & Hakuta, 1997) should be endorsed and funded. 13. The recommendations of the National Research Council on testing English language learners on NAEP (Table 4) should be adopted, funded and applied. They should 48 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES also be broadened to include Hispanic children from all the major ethnic cultural' backgrounds. The issues related to cultural factors in achievement testing (such as . acculturation, the measurement of acculturation, the use of acculturation levels) should be investigated. It is time for a comprehensive set of action plans to make large~scale, educational accountability systems, such as NAEP, relevant and useful for the educational present and future of Hispanic children. 14. There is an urgent need to determine the diagnostic validity of Spanish language tests normed on monolingual populations and used for diagnostic purposes with U.S. bilingual populations. Diagnostic tests should not be administered to Hispanic students or they should be relegated to a lower status in the decision-making process for special education, or gifted and talented education. Alternatives to the . typical battery of diagnostic tests exist, all the way from placing a child in an enriched treatment situation to "diagnosing" their work products. 15. As is the case with most tests used in the United States, new personality tests specifically made appropriate for the Hispanic population's bilingual, multicultural· status are needed. There isvery little actual research on how to do diagnostic personality work with Hispanic children and youth. 16. It is recommended that research in occupational interest tests be significantly and quickly increased. Given the widespread use of occupational interest tests with elementary and high school students, as well as their possible role in tracking students in academic programs, the lack of research on the use and impact of these measurement instruments on Hispanic children and youth is a major knowledge gap. 17. Extended analyses and debate need to be conducted on whether Hispanic students' test scores should be interpreted primarily within a school district's "normative framework." That is to say, should national or statewide comparisons that are used to . determine an individual's eligibility for promotion, graduation, or admission to higher education continue to be made given the current knowledge base on testing Hispanic students? This does not preclude the use of tests to measure the performance of school systems (schools, districts) to determine how well or how poorly they are working. Clearly, however, in those school districts where there is no equality in educational programs and opportunities for Hispanic students, the question of what Gonstitutes a fair, normative comparison needs to be answered .. 18. It should be made clear that the starting point for the reform of unfair testing of Hispanic'students is not the tests; It is the instructional context. Until there is some semblance in equity, of standards, curricula, pedagogy and resources throughout schools, school districts and states, tests will continue to reify the inequality of educational opportunities in the country. Tests will continue to blame the Hispanic student for low scores and will continue to deny him or her promotion, eligibility and opportunity. 49 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES REFERENCES Abedi, J. (1999a). NCME examining the effectiveness of accommodation on math performance' of English Language Learners. 'Paper presented at the 1999 AERA conference, Montreal, Canada. Abedi, J. (1999b). The impact of students' background characteristics on accomodation results for students with limited English proficiency. Paper presented at the 1999 AERA Conference, Montreal. Canada. ' Abedi, J. (1999c). NAEP math test accommodationsfor students with limited English proficiency. Paper presented at the 1999 AERA Conference. Montreal. Canada . .' Altus, W.O. (1945). Racial and bi-lingual group differences in predictability and in mean aptitude test scores in an Army special training center. Psychological Bulletin, 42,310-320. Altus, W.O. (1949). The Mexican American: The survival of a culture. The Journal of Social Psychology, 29, 211-220 American Educational Hesearch Association, American Psychological Association, The National Council on Measurement in Education. (1966). Standards for educational and psychological tests and manuals. Washington, DC: American Psychological Association. American Educational Research Association, American Psychological Association, The National Council on Measurement in Education. (1974). Standards for educational and psychological tests and manuals. Washington, DC: American Psychological Association. American Educational Research Association. American Psychological Association. National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington. DC: American Psychological Association. American Educational Research Association, American Psychological Association, The National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association. American Psychological Association (1993). Guidelines for providers of psychological services to ethnic, linguistic, and culturally diverse populations. American Psychologist, 48, 45-48. Ammons. R.B. & Aguero, A. (1950). The Full-Range Picture Vocabulary Test: VII. Results for a Spanish American school-age population. Journal of Social Psychology, 32, 3-10. Anderson, N.E., Jenkins, F.F. & Miller, K.E. (1996). NAEP inclusion criteria and testing accommodations: Findings from the NAEP 1995 field test in mathematics. Princeton. NJ: Educational Testing Service. Anderson, N.E. & Olson, J. (1996). Puerto Rico Assessment of Educational Progress: 1994 PRAEP Technical Report. Princeton, NJ: Educational Testing Service. Arias, B. (Ed.). (1986). The education of Hispanic Americans: A challenge for the future. American Journal of Education, 95. Arreola v. Santa Ana Board of Education (Orange County, California). No. 160-577. (1968). Arsenian, S. (1945). Bilingualism in the post-war world. Psychological Bulletin, 42. 65-86. August, O. & Hakuta, K. (Eds.). (1997). Improving schooling for language-minority children: A research agenda. Washington. DC: National Academy Press. 50 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES August, D. & lara J. (1996). Systemic reform and limited English proficient students. Washington, D. C.: Council of Chief State Schobl Officers. Baratz-Snowden, J., Pollack, J. & Rock, D. (1988). Quality of responses of selected items on NAEP special study student survey. Princeton, NJ: National Assessment of Educational Progress. Baratz-Snowden, J., Rock, D., Pollack, J., & Wilder. G. (1988). The educational progress of language minority children: Findings from the ·1985-86 special study. Princeton, NJ: Educational Testing Service. Bell, R (1935). Public school education of second generation Japanese in California. Stanford, CA: Stanford University Press. Bermudez, A.B. & Rakow, S.J. (1990). Analyzing teachers' perceptions of identification procedures for gifted and talented Hispanic limited English-proficient students at-risk. Educationallssues of Language Minority Students, 7,21-34. Bernal, E.M. Jr. & Reyna, J. (1975). Analysis and identification of giftedness in Mexican American children: A pilot study. In B.O. Boston (Ed), A resource manual of information on educating the gifted and talented. Reston, VA: Council for Exceptional Children. Bernal, M.E. & Castro, F.G. (1994). Are clinical psychologists prepared for service and research with ethnic minorities?: Report of a decade of progress. American Psychologist, 49, 797-805. Brigham, C.C. (1922). A study of American intelligence. Princeton, NJ: Princeton University Press. Brigham. C.C. (1930). Intelligence tests of immigrant groups. Psychological Review, 37, 158-165. Brizuela. C.S. (1975). Semantic differential responses of bilinguals in Argentina. Dissertation Abstracts International, 36(12-B), 6439. University Microfilms No. 76-13.514). Brown, G.L. (1922). Intelligence as related to nationality .. Journal of Educational Research, 5, 324-327. Brown v. Board of Education. 347 U.S. 483 (1954). . Callahan, C.M., Hunsaker. S.L., Adams, C.M., Moore. S.D. & Blend. l.C. (1995). Instruments. used in the identification of gifted and talented students. Charlottesville. VA: The National Research Center on the Gifted and Talented, Research Monograph 95130. Cebollero, P.A. (1936). Reactions of Puerto Rican children in New York City to psychological tests. An analysis of the study by Armstrong, Achilles, and Sacks of the same name. San Juan. P.R: The Puerto Rico School Review. pt. 11. Cervantes. RC. & Arroyo. W. (1994). DSM-IV: Implications for Hispanic children and adolescents. Hispanic Journal of Behavioral Sciences, 16, 8-27. Chambers, J.A., Barron, F. & Sprecher, J.W. (1980). Identifying gifted Mexican American students. Gifted Child Quarterly, 24, 123-128. Cleary. TA, Humphrey, l.G., Kendrick, SA & Wesman, A. (1975). Educational uses of tests with disadvantaged students. American Psychologist, 15, 15-40. Coleman, J.S., et al. (1966). Equality of educational opportunity. Washington. D.C.: U.S. Office of Education. 51 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Colvin, S.S. (1921). Intelligence and its measurement: A symposium (IV). Journal of Educational Psychology, 12, 136-139. Costantino, G., Malgady, RG., Rogier, LH. & Tsui, E.C. (1988). Discriminant analysis of clinical outpatients and public school children by TEMAS: A Thematic Apperception Test for Hispanics and Blacks. Journal of Personality Assessment, 52, 670-678. Cotter, D.E & Berk, RA. (1981). Item bias in the WISC-R using Black, White, and Hispanic learning disabled children. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles, CA. . Covarrubias v. San Diego Unified School District. No. 70-394-T, San Diego, CA. (1972). Cronbach, L.J.• Linn, RL, Brennan. Rl: & Haertel, EH. (1997). Generalizability'analysis for performance assessments of student achievement or school effectiveness. Educational Psychological Measurement, 57, 373-399. Cuellar, I., Arnold, B. & Maldonado, R (1995). Acculturation Rating Scale for Mexican Americans II: A revision of tlie original ARSMA scale. Hispanic Journal of Behavioral Sciences, 17,275-304. Cuellar, I., Harris, LC. & Jasso, R. (1980). An acculturation scale for Mexican American normal and clinical popul~tions; Hispanic Journal of Behavioral Sciences, 2, 197-217. Dana, RH. (1995). Culturally compJterit MMPI assessment of Hispanic populations. Hispanic Journal of Behavioral Sciences, 17, 305-319. Darcy, N.T. (1946). The effect of bilingualism upon the measurement of the intelligence of children of pre-school age. Journal of Educational Psychology, 38, 21-44. Darsie, M.L. (1926). The mental capacity of American-born Japanese children. Comparative Psychology Monographs, 3, 1-89. Davenport, E.L (1932). The intelligence quotients of Mexican and non-Mexican siblings. School and Society. 36, 304-306 . . Diana v California State Board of Education. No. C-70-37, United States District Court of Northern California.. (1970). Diaz-Guerrero, R (1988). Psicologia del Mexicano [Psychology of the Mexican] (4th ed.). Mexico: Editorial Trillas. Dornic, S; (1977). Information processing and bilingualism. Department of Psychology, University of Stockholm. Dornic, S. (1978a). Noise and language dominance. Department of Psychology, University of Stockholm; . . . Dornic, S. (1978b). The bilingual's performance: Language dominance, stress,and individual differences" In D. Gerver & H. Sinaiko (Eds.), Language and Interpretation and Communication. New York: Plenum Press. Dornic, S. (1979). Information processing in bilinguals: Some selected issues. Psychological Research, 40,329-348. DuFon, MA (1991). Politeness in interpreted and non-interpreted IEP (Individualized Education Program) conferences with Hispanic Americans. (Doctoral dissertation, University of Hawaii, 1991). UMI Dissertation Services, 1345950. 52 �TE$TING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES . Edgerton, RB. & Karno, M. (1971). Mexican-American illness.. Archives of General Psychiatry, 24, 286-290. bilinguali~m and the perception of mental . Emerling, F. (1990). An investigation of test bias in two nonverbal cognitive measures for two . . ethnic groups. Joumal of Psychoeducational Assessment, 3, 233-244. Feingold, G.A. (1924). Intelligen.ce of the first generation of immigrant groups. Joumal of Educational Psychology, 15, 65-82. . . Fernandez, RM. & Nielsen, F. (1986). Bilingualism and Hispanic scholastic achievement: Some baseline results. Social Science Research, 15,43-70. Figueroa, RA. (1983). Test bias and Hispanic children. Joumal of Special Education, 17,431 440. Figueroa, RA. (1987). Special education assessment of Hispanic pupils in Califomia: Looking ahead to the 1990s, Sacramento: California State Department of Education, Office of Special Education. Figueroa, RA. (1990). Assessment of linguistic minority group children. In C.R Reynolds and R.W. Kamphaus (Eds.), Handbook ,of psychological and educational assessment of children: Vol 1. Intelligence and achievement. ·New York: Guilford. . Figueroa, RA & Artiles, A. (1999). Disproportionate minority placement in special education programs: Old problem, new explanations. In A. Tashakkori & S.H. Ochoa (Eds.), Readings on Equal Education: Volume 16, Education of Hispanics in the United States: Politics, Policies, and Outcomes (pgs. 93-118). New York: AMS Press. Figueroa, RA. & Garcia. E. (1994). Issues in testing students from culturally and linguistically' diverse backgrounds. Multicultural Educatioi1:Fall; 1994, pgs. 10-19. Figueroa, RA. & Sassenrath, J .M. (1989). A longitudinal study ofthe predictive validity of the System of Multicultural Pluralistic Assessment (SOMPA). Psychology in the Schools, 26.5-19. Gandara, P., Keogh, BK. & Yoshioka-Maxwell. B. (1980). Predicting academic performance of Anglo and Mexican American Kindergarten children. Psychology in the Schools, 17, 174-177. Garcia. J.H. (1994). Nonstandardized instruments for the assessment of Mexican-American children for gifted/talented programs. In S.B. Garcia (Ed), Addressing Cultural and Linguistic Diversity in Special Education: Issues and trends (pp 46-57). Reston, VA: The Council for Exceptional Children. Garcia, O. & Otheguy. R (1995). The bilingual education of Cuban American children in Dade County's ethnic schools. In O.Garcia & C. Baker (Eds). Policy and practice in bilingual education: A reader extending the foundations (pgs. 93~102). Clevedon. England: Multilingual Matters. Garcia. S.B. (1985).. Characteristics of limited English proficient Hispanic students served in programs for the learning disabled: Implications for policy. practice and research (Part 1). Biiingual Special Education Newsletter, pp1~5. Garth, T.R., (1920). Racial differences in mental fatigue. Joumal of Applied Psychology, 4,235 244. Garth, T.R (1928). Astudy of the intelligence arid achievement of full-blooded Indians. Jouma/of Applied Psychology, 12,511-516. . 53 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Geisinger, K.F. (1992). Fairness and selected psychometric issues in the psychological testing of Hispanics. In K.F. Geisinger (Ed.), Psychological testing of Hispanics (pgs. 17-42). Washington, DC: Am~rican Psychological Association. Geisinger, K.F. (Ed.). (1992). Psychological testing of Hispanics. Washington, DC: American Psychological Association. . Geisinger, K.F. (1994a). Psychometric issues in testing students with disabilities. Applied Measurement in education, 7, 121-140. Geisinger, K.F. (1994b). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6, 304 312. G6mez-Palacio, M.M., Padilla, ER & Roll, S. (1983). Escala de Inteligencia para Nivel Escolar Weschler [Wechsler Intelligence Scale for Children-Revised]. Mexico City: EI Manual Moderna, SA de C.V. Gonzalez, JR. (1978). Language factors affecting treatment of bilingual schizophrenics. Psychiatric Annuals, 8,68-70. . Gonzalez~Reigoza, F. (1976). The anxiety-arousing effect of taboo words in bilinguais. In C.D. Spielberger & R Diaz-Guerrero (Eds.), Cross-cultural anxiety (pp. 89-105). Washington, DC: Hemisphere. Gould, S. J .. (1981). The mismeasure of man. New York: Norton. Grand, S., Marcos, L.. Freedman, N. & Barroso, F. (1977). Relation of psychopathology and bilingualism to kinesthetic aspects of interview behavior in schizophrenia. Journal of Abnormal Psychology. 86(5), 492-500. Grosjean, F. (1989). Neurolinguists, beware! The bilingual is not two monolinguals in one person. Brain and Language, 36,3-15. Guadalupe Organization v. Tempe Elementary School District. 587 F.2d 1022, U.S. Dist. Court of Arizona. (1978). . Hakuta, K., Ferdman, B.M. & Diaz, RM. (1986). Bilingualism and cognitive development: Three perspectives and methodological implications. Los Angeles: University of California, Los Angeles. Center for Language Education and Research. . Hakuta, K. & Garcia, E.E. (1989). Bilingualism and education. American Psychologist. 44, 374 379. Hartigan, JA & Wigdor, AK (Eds.1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington, D.C.: National Academy Press. Heller, KA, Holtzman, W.H., & Messick, S. (1982). Placing children in special education: A strategy for equity. Washington, DC: National Academy Press. Heubert, J.P. & Hauser, RM. (1999). High-stakes testing for tracking, promotion, and graduation. Washington, D.C.: National Academy Press. Hine, C.Y. (1993). The home environment of gifted Puerto Rican children: Family factors which support high achievement. Presented at the annual National Research Symposium on Limited English Proficient Student Issues, Washington, D.C. . Hobson v Hansen. (1967). 269 F. Supp. 401 (D.D.C.). 54 �. . TESTING HISPANIC STUDENTS IN THE'UNITED STATES: TECHNICAL AND POLICY ISSUES . . Holtzman, W.H., Draz-Guerrero, R, & Swartz, J.D. (1975). Personality development in two cultures: A cross-cultural longitudinal study of schQol children in Mexico and the United States. Austin: University of Texas Press. Hung-Hsia, H. (1929). The mentality of the Chinese and Japanese. Journal of Applied PsychOlogy, 13,9-31. ' Jensen, A.R. & Inouye,A. R. (1980). Levell and Level II abilities in Asian, Wh'ite, and Black children. Intelligence, 4,41-49. Johnsen, SK, Ryser, G., & Dougherty, E., (1993). The validity of product portfolios in the identification of gifted students. Gifted International: A Talent Development Journal, B( 1). 40-43. Johnson, L.W. (1938). A comparison of vocabularies of Anglo-American and Spanish-American , . high school pupils. Journalof Educational Psychology, 29, 135-144.' Jose P. v. Ambach" No.C-270 (E.D.N.Y. 1979). Kaufman, A.S. & Wang, J-J. (1992). Gender, race and education differences on the K-BIT at ages 4 to 90 y~ars. Journal of Psychoe.ducational As~essment, 10, 219-229. Keogh, B. K. (1990). Narrowing the gap between policy and practice. Exceptional Children, 57, 186-190.' Koch, H.L.,& Simmons; R. (1926). A study of the test performance of American, Mexican and Negro children. Psychology, Monographs, .35, 1-116. ' Kopriva, R.J. (1999). A conceptual framework for the valid and comparable measurement of all students. Washington, D. C.: Council otChief State School, Qfficers. .. 'Langdon, H.W. (1994). The interprfJter-translator process in the educational setting: A resource· manual. Sacramento, CA: Resources in Special Education, California Department of Education. Laosa, L. M. (1998). School segregation of children who migrate to the United States from Puerto Rico. Princeton, NJ:Educational Testing Service. ' Larry P. v. Riles. 343 F. Supp. 1306 (N,D. Cal. 1972)affl' 502 F.2d 963 (9th Cir. 1974); 495 F. Supp. 926 (N.D. Cal. 1979); appeal docketed, No. 80-4027 (9th Cir., Jan. 17, 1980). Lester,O.o. (1929). Performance tests and foreign children. Journal of Educational Psychology, 20,303-309. . . . ' ,Lora v. Board of Educa'tion of the City of New York, 587 F. Supp. 1592 (E.D.N.Y. 1984). , ' Luh, C.W. & Wy, T.M. (1931). A comparative study of the intelligence of Chinese children on the . Pintner performance and the Binet 'tests. Journal of Social Psychology, 2, 402-408. . Lyon, G.R. (1996). Learning disabilities. The Future of Children: Special Education for Students with Disabilities, 6(1}, 54-76. Los Altos, CA: Center for the Future of Children, David and Lucille Packard Foundation.; . MacMillan, D.L., Gresham, F.M. & Bocian, K.M. (1998). Discrepancy betwe~n definitions of learning disabilities and, school practices: An empirical investigation. Journa!of Learning Disabilities, 31, , 314-326, . , 55 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Maker, C.J. (1996). Identification of gifted minority students: A national problem, needed changes, and a promising solution. The Gifted Child Quarterly, 40, 41-50. Maker, C.J., Nielson & Rogers (1994). Giftednessjdiversity and problem-solving. Teaching Exceptional Children, 27,4-19. Malakoff, M. & Hakuta, K. (1991). Translation skill and metalinguistic awareness in bilinguals. In E. Bialystock (Ed.), Language processing in bilingual children. Cambridge, MA: Cambridge University ~~. . Malgady, R. Constantino, G. & Rogier. L. (1984). Development of a thematic apperception te'st (TEMAS) for urban Hispanic children. Journafof Consulting and'Clinical Psychology, 52(6), 986-996. Manuel, H.T. (1935). Spanish and English editions of the Stanford-Binet in relation to the abilities of Mexican children (University of Texas Bulletin No. 3532). Austin: University of Texas. Margolin. L. (1994). Goodness personified: The emergence of gifted children. New York: Aldine de Gruyter. . Marin, G.(1992). Issues in the measurement of acculturation among Hispanics. In K.F. Geisinger (Ed.), Psychological testing of Hispanics (pgs. 235-252). Washington, DC: American Psychological Association. Marquez, J.A., Bermudez, A.B. & Rakow, S.J. (1992). Incorporating community perceptions in the identification of gifted and talented Hispanic students. The Journal of Educational Issues of Language Minority Students, 10, 117-127.' . McLaughlin. M. W. & Shepard, L.A. (1995). Improving education through standards-based reform. A report of the National of Education Panel on standards-based education reform. Stanford. CA: National Academy of Education. Mehan, H., Hertweck. H. & Meihls, J.L (1986). Handicapping the handicapped. Palo Alto, CA: Stanford University Press. Mercer, J.R (1979). The system of multicultural pluralistic assessment: Technical manual. New York: Psychological Corporation. Mitchell, A.J. (1937). The effect of bilingualism on the measurement of intelligence. Elementary School Journal, 38, 29-37.' . Moreno. J.F. (Ed.)(1999). The elusive quest for equality. Cambridge, MA: Harvard Educational Review. National Commission on Secondary Education for Hispanics. (1984a). "Make Something Happen n : Hispanic and urban high school reform. V.I. Washington, DC: Hispanic Policy Development Project. National Commission on Secondary Education for Hispanics. (1984b) .. "Make Something Happen:" Hispanics and urban high school reform. V. II. Washington, DC:The Hispanic Policy Development Project Inc.. National Council ofLa Raza. (1998). Latino education, status and prospects. Washington. DC: National Council of La Raza. Neiser, U., Bodoo. G., Bouchard, T.J., Boykin, A.W., Brody, N .• Ceci, S.J .• Halpern. D.F., Loehlin. J.C., Perl off, R, Sternberg, RJ. & Urbina, S. (1996). Intelligence: Knowns and unknowns [Task Force Report of the American Psychological Association]. American Psychologist. 51.77-101. 56 �TESTING HISPANIC.STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Nieves-Grafals, S. (1995). Psychological testing as a diagnostic and therapeutic tool in the . treatment of traumatized Latin American and African Refugees. Cultural Diversity and Mental Health, 1, 19-27. . . . Olmedo, EL. (1981). Testing linguistic minorities. American Psychologist, 36. 1078-1085. Olson, J.F. & Goldstein, AA. (1997). The inclusion of students with disabilities.and limited English proficiency in large-scale assessments: A summary of a recent progress. Washington, DC: U.S. Department of Education. Ortield, G. & Yun, J.T. (1999). Resegregation in American schools. The Civil Rights Project: Harvard University. Ortiz. AA. (1986). Characteristics of limited English proficient Hispanic students served in programs for the learning disabled: Implications for policy and practice (Part II). Bilingual Special Education Newsletter, pp. 1-5. Ortiz, AA & Maldonado-Colon, E. (1986). Recognizing'learning disabilities in bilingual children: How to lessen inappropriate referral of language minority students to special education. Journal of Reading, Writing, and Learning Disabilities International, 43(1).47-56. Ortiz, AA & Polyzoi, E. (1986). Characteristics of limited English-proficient Hispanic students . served in programs for the learning disabled: Implications for policy and practice. Austin: University of Texas. (ERIC Document Reproduction Service No. ED 267 597) Ortiz, AA & Yates. J.R. (1987). Characteristics of learning disabled, mentally retarded, and speech-language handicapped Hispanic students at initial evaluation and re-evaluation. Unpublished manuscript, University of Texas at Austin. Paschal. F.C. & Sullivan. L.R. (1925). Racial influences in the mental and physical development of Mexican children. Comparative Psychology Monographs, 3,1-76. Pellegrino, J.W., Jones, L.R., & Mitchell, KJ. (Eds.) (1999). Grading the nation's report card: Evaluating NAEP and transforming the assessment of educational progress. WaShington, DC: National Academy Press. Pennock-Roman. M. (1990). Test validity and language background: A study of Hispanic American students at six universities. New York: College Entrance Examination Board. Perrine, J. (1989). The efficacy of situational identification of gifted Hispanics in East Los Angeles through nurturance that capitalizes on their culture. In C.J. Maker & S.W; Schiever (Eds.), Critical issues in gifted education: Vol. 2. Defensible programs for cultural and ethnic minorities. Austin, TX: Pro-Ed Publishers. (pp.3-18). Pilkington, C., Piersel, W., &Ponterotto, J. (1988). Home language as a predictor of first grade achievement for Anglo- and Mexican-American children. Contemporary Educational Psychology, 13(1), 1 14. Pintner, R. & Arsenian, S. (1937). The relation of bilingualism to verbal intelligence and school adjustment. Journal of Educational Research, 31, 255-263. Pintner, R.& Keller, R. (1922). Intelligence tests of foreign children. Journal of Educational Psychology, 13.214-222. Pratt, H.G. (1929). Some conclusions from a comparison of school achievement of certain racial groups. Journal of Educational Psychology. 20.661-668. �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLlcvlsSUES ... . . President's Advisory Commission on Educational Excellence for Hispanic Americans. (1996). Our nation on the fault line: Hispanic American education. Washington, DC: President's Advisory Commission on Educationar Excellence for Hispanic Americans. Ramos, RA. (1992). Testing and assessment of Hispanics for occupational and management positions: A developmental needs analysis. In K.F. Geisinger (Ed;), Psychological testing of Hispanics (pgs. 173-194). Washington, DC:. American Psychological Association. Reynolds, A. (1933). The education of Spanish-speaking children in five southwestern states (1933,Bulletin No.11). Washington, DC: U.S. Department of the Interior. Rivera, C. (1986). The national assessment of educational p'rogress: Issues and concerns for the assessment of Hispanic students. Chicago: Spencer Foundation .. Rivera, C. & Pennock-Roman; M.· (1987). Issues in race/ethnicity identification procedures in the national assessment of educational progress, 'Part I: A cpmparison of observer reports and selfidentification. Princeton, NJ: Educational Testing Service; . . Rueda, R, Cardoza, D., Mercer, J.R, & Carpenter, L(1984). An examination of special education decision-making with Hispanic first-time referrals in large urban school districts. Las Alamitos, CA: Southwest Regional Library. Rueda, R, Figueroa; R, Mercado, P. & Cardoza. D.' (1984). Performance of Hispanic educable mentally retarded learning disabled and nonclassified students on the WISC-RM, SOMPA and S-KABC. Los Alamitos, CA: Southwest Regional Laboratory. Rueda, R S. & Forness. S.R (1994). Childhood depression: Ethnic and cultural issues in special education. In RL Peterson & S.I. Jordan (Eds.). Multicultural issues in the .education of students with behavioral disorders (pgs. 40-62). Cambridge, MA: Brookline.. Ruiz. E.J. (1975). Influence of bilingualism on communication groups. International Journal of -Group Psychotherapy, 25,391-395. . . . Ruii v. State Board of Education. C.A. No. 218394 (Super; Ct. Cal. Sacramento County, 1971). Saer, D.J.- (1923). The effect of bilingualism on intelligence. British Journal of Psychology, 14,25 , '" < Sanchez, G.I. (1934). Bilingualism and Mental Measures: A word of caution. Journal of Applied Psychology, 18, 765-772. , . . Sanchez-Boyce, M. (1999).The social construction of meaning .ininterpreted events: Use of foreign language interpreters in education. Doctoral dissertation, University of California at Davis. '. Sandov"!l, J. (1979). The WISC-R and,internal evidence oftest bias withminority groups. Journal of Consulting and Clinical Psychology, 47, 9~9-927. . Sandoval, J. (1998). Critical thinking in test interpretation. In , J.Sandoval, C.L Frisby, K.F GeiSinger, J.D. Scheuneman & J.R Grenier, (Eds.). Test interpretation and diversity: Achieving equity in assessment (pgs. 31-50). Washington, D.C.: American Psychologic~1 Association. Sandoval, J., Frisby, C.L., Geisinger,KF., Scheuneman, J.D .• & Grenier, J.R. (199B). Test interpretation and diversity: Achieving equity in assessment. Washington. DC: American Psychological Association. 58 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES Sapon-Shevin, M. (1994). Playing favorites: Gifted education and the disruption of community. New York: State University Press: Sawyer, C.B. 8. Marquez, JA (1993). Discrimination against LEP students in gifted and talented classes. The Journal of Educational Issues of Language Minority Students, 12, 143-149. Sired, S.G. & Geisinger, K.F. (1998). Equity issues in employment testing. In J. Sandoval, C.L. Frisby, K.F. Geisinger, J.D. Scheuneman, & J.R Grenier (Eds.). Test interpretation and diversity: Achieving equity in assessment (pgs. 105-140). Washington, DC: American Psychological Association. Skrtic, R (1991). The special education paradox: Equity as the way to excellence. Harvard Educational Review, 61, 148-186. Skrtic, T. M. (Ed.) (1995). Disability and democracy. New York: Teachers College Press. Smith, M.E. (1942): The effect of bilingual background on college and aptitude scores and grade point ratios earned by students at the University of Hawaii. Journal of Educational Psychology,' 33, 356 364. Smith, M.E. (1949). Measurement of vocabularies of young bilingual children in both of the . languages used. Journal of Genetic Psychology, 74, 305-310. Smith. M.E. (1957). Word variety as a measure of bilingualism in preschool children. Journal of Genetic Psychology, 90, 143-150. . Soto, L.D. (1988). The home environment of higher and lower achieving Puerto Rican children.' Hispanic Journal of Behavioral Sciences, 10, 161-167, . Stanton-Salazar, R.D. (1997). A social capital framework for understanding the socialization of racial minority children and youth. Harvard Educational Review, 67, 1-34 . . Stanton-Salazar, RD. & Dornbusch, S.M. (1995). Social capital and the reproduction of inequality; Information networks among MeXican-origin high school students. Sociology of Education, 68, 116-135. Stone, B.J. (1992). Prediction of achievement by Asian-American and White children. Journal of School Psychology, 30, 91-99. Swedo. J. (1987). Effective teaching strategies for handicapped limited English-proficient . students. Bilingual Special Education Newsletter. pp.1-5. Taylor. D. (1991). Learning Denied. Portsmouth. NH: Heinemann. Taylor. D. (1998). Beginning to read and the spin doctors of science: The political campaign to change America's mind about how children learn to read. Urbana, IL: National Council of Teachers of English.' . Taylor, RL. & Richards, S.B. (1991).' Patterns of'intellectual differences of Black, Hispanic, and White children. Psychology in the Schools, 28, 5-8. Thurlow, M., Liu, K.,Erickson,R., Spicuzza, R & EI Sawaf. H. (1996). Accommodations for students with limited English proficiency: Analysis of Guidelines from states with graduation exams. Minneapolis, ~N: National Center on Educational Outcomes. Trueba, H.T. (1987). Success or failure? Learning and the language minority student. New York: Newbury. 59 �TESTING HISPANIC STUDENTS iN THE UNITED STATES: TECHNICAL AND POLICY ISSUES U.S. Commission on Civil Rights. (1971a). Ethnic isolation of Mexican Americans in the public schools of the Southwest (Report I: Mexican American Educational Study). Washington, DC: U.S. Commission on Civil Rights. . U.S. Commission on Civil Rights. (1971 b). The unfinished education: Outcomes for Minorities in the Five Southwestem States (Report II: Mexican American Educational Study). Washington. DC U.S. Commission on Civil Rights. ' U.S. Commission on Civil Rights. (1972). The excluded student: Educational practices affecting Mexican Americans in the Southwest (Report III: Mexican American Educational Study). Washington, DC: U.S. Commission on Civil Rights., U.S. Commission on Civil Rights. (1973). Teachers and students: Differences in teacher interaction with Mexican-American and Anglo'students (Report V: Mexican American Educational Study). Washington, DC: U.S. Commission on Civil Rights. U.S. Commission on Civil Rights (1974). Toward quality educationfor Mexican-Americans (Report VI: Mexican-Am~rican Educational Study). Washington, DC: U.S. ·Commission on Civil Rights. U.S. Department of Education (1993). Fifteenth annual report to Congress on the implementation of The Education of the Handicapped Act. Washington, DC: U.S.-Commission on Civil Rights. U.S. Department of Education Office of Civil Rights. (draft, December 1999). Nondiscrimination in high-stakes testing: A resource guide. Washington, DC: U.S. Department of Education. Valdes, G. & Figueroa, RA. (1994). Bilingualism and testing: A special case of bias. NorwOOd, NJ: Ablex Publishing. Valdez, RS., & Valdez, C. (1983). Detecting predictive bias: The WISC-R vs. achievement scores of Mexican-American and non-minority students. Learning Disability Quarterly, 6(4),440-447. Valencia, RR (1982). Predicting academic achievement of Mexican-American children: Preliminary analysis of the McCarthy Scales. Educational and Psychological Measurement, 42, 1269 1278. Valencia, RR (Ed.). (1991). Chicano school failure and success: Research and policy agendas for the 1990s (The Stanford Series on Education and Public Policy). Basingstoke, England: Falmer Press. Valencia, RR & Rankin, RJ. (1983). Concurrent validity and reliability of the Kaufman version of the McCarthy Scales Short Form for a sample of Mexican-American children.· Educational and Psychological Measurement, 43, 915-925. Velasquez, RJ. & Callahan, W.J. (1992). Psychological testing of Hispanic Americans in clinical settings: Overview and is~ues. In K.F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 253-265). Washington, DC: American Psychological Association. Westermeyer, J .. (1987). Cultural factors in clinical assessment. Journal of Consulting and Clinical Psychology, 55,471-478. Wheeler, L. R (1932). The mental growth of dull Italian children. The Journal of Applied Psychology. 16,650-667. Whitworth, RH. & McBlaine, D.O. (1993). Coinparison of the MMPI and MMPI-2 administered to Anglo- and Hispanic-American university students. Journal of Personality Assessment, 61, 19-27. Whitworth, RH. & Unterbrink, C. (1994). Comparison of MMPI-2 clinical and content scales administered to Hispanic a~d Anglo-Americans. Hispanic Journal of Behavioral Sciences, 16,255-264. 60 �TeSTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL At;lD POLICY ISSUES Wilkinson, C. Y. & Ortiz, AA (1986). Characteristics of limited English-proficient and English proficient leaming disabled Hispanic students at initial assessment and at reevaluation. Austin: University of Texas. Handicapped Minority Research Institute on Language Proficiency. Willig, AC. & Swedo, J.J. (1987). Improving teaching strategies for exceptional Hispanic limited English proficient students: An exploratory study of task engagement and teaching strategies. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC. WOOd, M.M. (1929). Mental test findings with Armenian, Turkish, Greek and Bulgarian subjects. Joumal ofAppliedPsycho/ogy, 13, 266-273. . . Woodrow, H. (1921). Psychology, 12.207-210. Intelligen<~e and its measurement: A symposium (XI). Joumal of Educational ' Yoder. D. (1928). Present status of the question of racial differences. Joumal of Educational Psychology, 19,463-470. Zappia.IA (1989). Identification of gifted Hispanic students: A multidimensional view. In C.J . . Maker & S.W. Schiever (Eds.), Critical issues in gifted education. Vol 2: Defensible programs for cultural and ethnic minorities (pp. 19-26). Austin. TX: Pro-Ed. 61 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY .ISSUES APPENDIX A The 1999 Standards for "Testing Individuals of Diverse Linguistic Backgrounds" . (Chapter 9) Standard 9.1 Testing practice should be designed to reduce threats to the reliability and validity of test score inferences that may arise from language differences. . , Comment: Some tests are inappropriate for use with individuals whose knowledge of the language of the test is questionable. Assessment methods together with careful professional judgment are required to determine when language differences are relevant. Test users can judge how best to address this standard in a particular testing situation. . . . Standa'rd 9.2 When credible research evidence reports that test scores differ in .meaning across subgroups of linguistically diverse test takers, then to the extent feasible, test developers should collect for each linguistic subgroup studied the same form of validity evidence collected for the examinee population as a whole. Comment: Linguistic subgroups may be found to differ with respect to appropriateness of test content, the internal structure of their test responses, the relation of their test scores to other variables, or the response processes employed by individual examinees. Any such' findings need to receive due consideration in the interpretation and use of scores as well as in test revisions. There may also. be legal or regulatory requirements to collect subgroup validity evidence. Not all forms of evidence can be examined separately for members of all linguistic groups. The validity argument may rely on existing research literature, for example, and such literature may not be available for some populations. For some kinds of evidence, separate linguistic subgroup analyses may not be feasible due to the limited number of cases.available. Data may sometimes be accumi.Jlated so that these analyses can be performed after the test has been in l.!se for' a period of ~ime. It is important to note that this standard calls for more than representativeness in the selection of samples used for , validation or norming studies. Rather, it calls for separate,· parallel analyses of data for members of different linguistic groups, sample sizes permitting. If a test is being used while such data are being collected, then cautionary statements are in order regarding the limitations of interpretations based on test scores. Standard 9.3 'When testing an examinee proficient in two 'or more languages for which the test is available, the examinee's relative language proficiencies should be determined. The test generally should be administered in the test taker's most proficient language,' unless proficiency in the less proficient language is part of the assessment. ' ,. Comment: Unless the purpose of the test,ing is to determine proficiency in a particular language.or the level of language proficiency required for the test is a work requirement, test users need to take into account the linguistic 62 �TESTING HiSpANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES characteristics of examinees who are bilingual or use multiple languages. This may require the sole use of one languag~ or use of multiple languages in order to minimize the introduction of construct-irrelevant components to the measurement process. For example, in educational settings, testing in both the language used in school and the native language of the examinee may be necessary in order to determine the optimal kind of instruction required by the examinee. Professional judgement needs to be used to determine the most appropriate procedures for establishing relative language proficiencies. Such procedures may range from self-identification by examinees through formal proficiency testing. Standard 9.4 Linguistic modifications recommended by test publishers, as well as the rationale for the modifications, should be described in detail in the test manual. , Comment: Linguistic modifications may be recommended for the original test in the primary language or for an adapted version in a secondary language, or both. In any case, the test manual should provide appropriate information regarding the recommended modifications, their rationales, and the appropriate use of scores' ' obtained using these linguistic modifications. Standard 9.5 When there is credible evidence of score comparability across regular and modified tests or administrations, no flag should be attached to a score. When such evidence is lacking, specific information about the nature of the modifications should be provided, if permitted by law, to assist test users properly to interpret, and act on test scores. ' Comment: The inclusion of a flag on a test score ,where a linguistic modification was provided may conflict with legal and social policy goals promoting fairness in the treatment of individuals of diverse linguistic backgrounds. if a score from a modified administration is comparable to a score from a nonmodified administration, there is no need for a flag. Similarly, if a modification is provided for which there is no reasonable basis for believing that the modification would affect score comparability, there is no need for a flag. Further, reporting practices that use asterisks or other non-specific symbols to indicate that a test's administration has been modified provide little useful information to test users. Standard 9.6 When a test is recommended for use with linguistically diverse test takers, test developers and publishers should provide the information necessary for appropriate test use and interpretation . .comment: Test developers should include in test manuals and in instructions for score interpretation explicit statements about the applicability of the test with individuals who are not native speakers of the original language of the test. However, it should be recognized that test developers and publishers seldom will find it feasible to conduct studies specific to the large number of linguistic groups found in certain countries. Standard 9.7 When a test is translated from one language to another, the methods used in establishing the adequacy of the translation should be described, and empirical and logical evidence should be provided for score reliability and the �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLlCv.lsSUES , validity' of the translated test's score inferences for the uses intended in the linguistic groups to be tested. ' , Comment: For example, if a'test is translated into Spanish for use with Mexican, 'Puerto Rican, Cuban, Centra,lArrierican and Spanish populations, score reliability , and the validity 9ftest score inferences should be established with members of , " each of these groups separately where feasible. In addition, the test translation ' . methods used need to be described in detail. Standard 9.8 In employment and credentialing testing, the proficiency level required in the language of the test should npt exceed that appropriate to the relevant occupation or profession. . , Comment: Many occupations and professions require a suitable facility in the language of the· test. In such cases,a test that isused as a part of selection, advancement, or credentialing maY,appropriately reflect that aspect of performance., However, the level of language proficiency required on the test should be no greater than the level needed to meet work requirements. Similarly, the modality in which language proficiency is assessed should be comparable t9 that on the job. For example, if thejob requires only that employees understand verbal' instructions in the language used on the job, it would be inappropriate for a selection test to require proficiency in reading and writing that particular language. Standard 9.9 When multiple versions of a test are intended to be comparable, test developers should report evidence of test comparability~ Comment: Evidence of test comparability may include but is not limited to evidence that the different language versions measure equivalent or similar constructs, and that score reliability and validity of inferences from scores from the two versions are comparable .. Standard 9.10 Inferences about test takers' general language proficiency should be based on tests that measure a range of language features, and not a single linguistic skill. Comment: For example, a multiple-choice, pencil-and,"-'paper test of vocabulary does not indicate how well a person understands the language when spoken nor how well the person speaks the languag~. However, the test score might be helpful in determining how well a person. understands some aspects of the written language. In making educational placement decisions, a more complete range of communicative abilities (e.g., V\iord knowledge, syntax) will typically need to be assessed. ' 64 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES· . , Standard 9.11 When an interpreter is used in testing, the interpreter should be fluent in both the language of the test and the examinee's native language, should have expertise in translating, and should have a basic understanding of the assessment process~ Comment: Although individuals with limited proficiency in the language of the test should ideally. be tested by professipnally trained bilingual examiners, the use of an interpreter may be necessary in some situations. If an interpreter is required, the professional examiner is responsible for insuring that the interpreter has the appropriate qualifications, experience, and preparation to assist appropriately in the administration of the test. It is necessary for the interpreter to understand the importance of following standardized procedures, how testing is conducted typically, the importance of accurately conveying to the examiner an examinee's aC,tual responses, and the role and responsibilities of the interpreter in testing. 65 �TESTING HISPANIC STUDENTS IN THE UNITED STATES: TECHNICAL AND POLICY ISSUES "By the Authority vested in 'ine as President by the Constitution and the lawS of the United States of America, and in order to advance the development of human potential, to strengthen the Nation's capacity to provide high-quality education, and to increase the opportunities forHispanic Americans to participate in and benefit from Federal education programs, it is hereby ordered... " Founding language of Executive Order 12900 President Clinton, February 22, 1994 Recognizing the importance of increasing the level of educational attainment for Hispanic Americans, President Clinton established the White House Initiative on Educational Excellence for Hispanic Americans through Executive Order 12900 in SeptElmber 1994. Guiding the White House Initiative is the President's Advisory Commission on Educational Excellence for Hispanic Americans, whose responsibility is to advise the president, the secretary of education, and the nation on the most preSSing educational needs of Hispanic Americans. The White House Initiative also provides the connection between the Commission, the White House, the federal government and the Hispanic community throughout the nation. . Current White House Initiative activities include initiating policy seminars, offering anational conference series, uExce/encia en Educacion: The Role of Parents in the Education of Their Children," focused on improving the education of latino youth by better engaging latino parents, increasing understanding and awareness of Hispanic Serving Institutions (HSls), and coordinating a new round of high·level efforts across the national government to improve the education of Hispanics. These activities are driven by the president's request to assess: • • • • • Hispanic educational attainment from pre·K through graduate and professional school; Current federal efforts to promote the highest Hispanic educational attainment; State, private sector, and community involvement in education; Expanded federal education activities to complement existing efforts; and Hispanic federal employment and effective federal recruitment strategies. Accelerating the educational success of Hispanic Americans is among the most important keys to America's continued success. Please join us in ensuring educational excellence for all Americans. ' White House Initiative Staff Sarita E. Brown Executive Director Richard Toscano Special Assistant for Interagency Affairs Deborah A. Santiago Deputy Director Debbie Montoya Assistant to the Executive Director Julieta laurel Policy Analyst Danielle Gonzales Policylntem 66 �; , �Clinton Presidential Records Digital Records Marker This is not a presidential record. This is used as an administrative marker by t~e William J. Clinton PresidentIal Library Staff. This marker identifies the place of a publication. Publications have not been scanned in their entirety for the purpose of digitization. To see the full publication please search online or visit the Clinton Presidential Library's Research Room. �</text>

</elementText>

</elementTextContainer>

</element>

</elementContainer>

</elementSet>

</elementSetContainer>

</file>

</fileContainer>

<name>Dublin Core</name>

<description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>

<name>Title</name>

<description>A name given to the resource</description>

<text>Kendra Brooks - Subject Series</text>

</elementText>

</elementTextContainer>

</element>

<name>Creator</name>

<description>An entity primarily responsible for making the resource</description>

<text>Domestic Policy Council</text>

</elementText>

<text>Kendra Brooks</text>

</elementText>

</elementTextContainer>

</element>

<description>A related resource in which the described resource is physically or logically included.</description>

<text><a href="http://clinton.presidentiallibraries.us/items/show/36031" target="_blank">Collection Finding Aid</a></text>

</elementText>

<text><a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a></text>

</elementText>

</elementTextContainer>

</element>

<name>Description</name>

<description>An account of the resource</description>

<text>The Kendra Brooks Subject Files contain correspondence, reports, articles, memos, and various printed material. Other documents include background information for education events and meetings. The files include material pertaining to charter schools, national testing, SAT preparation, school safety, school modernization/construction, affirmative action, Blue Ribbon Schools, class–size reduction, teacher quality, Limited English Proficiency (LEP), the White House Initiative on Education Excellence for Hispanic Americans, Tribal Colleges and Universities, Historically Black Colleges and Universities (HBCU’s), the Individuals with Disabilities Education Act (IDEA), and Title 1 of the Elementary and Secondary Education Act of 1965.</text>

</elementText>

</elementTextContainer>

</element>

<name>Provenance</name>

<description>A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation. The statement may include a description of any changes successive custodians made to the resource.</description>

<text>Clinton Presidential Records: White House Staff and Office Files</text>

</elementText>

</elementTextContainer>

</element>

<name>Publisher</name>

<description>An entity responsible for making the resource available</description>

<text>William J. Clinton Presidential Library & Museum</text>

</elementText>

</elementTextContainer>

</element>

<name>Extent</name>

<description>The size or duration of the resource.</description>

<text>157 folders in 16 boxes</text>

</elementText>

</elementTextContainer>

</element>

</elementContainer>

</elementSet>

</elementSetContainer>

</collection>

<description>A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text.</description>

<name>Original Format</name>

<description>The type of object, such as painting, sculpture, paper, photo, and additional data</description>

<text>Paper</text>

</elementText>

</elementTextContainer>

</element>

</elementContainer>

</itemType>

<name>Dublin Core</name>

<name>Title</name>

<description>A name given to the resource</description>

<text>[Education - Hispanic] [2]</text>

</elementText>

</elementTextContainer>

</element>

<name>Creator</name>

<description>An entity primarily responsible for making the resource</description>

<text>Domestic Policy Council</text>

</elementText>

<text>Kendra Brooks</text>

</elementText>

<text>Subject Files</text>

</elementText>

</elementTextContainer>

</element>

<description>A related resource in which the described resource is physically or logically included.</description>

</elementText>

<text><a href="http://clintonlibrary.gov/assets/Documents/Finding-Aids/Systematic/KendraBrookssubjectfile.pdf" target="_blank">Collection Finding Aid</a></text>

</elementText>

<text><a href="https://catalog.archives.gov/id/647992" target="_blank">National Archives Catalog Description</a></text>

</elementText>

</elementTextContainer>

</element>

<name>Provenance</name>

<text>Clinton Presidential Records: White House Staff and Office Files</text>

</elementText>

</elementTextContainer>

</element>

<name>Format</name>

<description>The file format, physical medium, or dimensions of the resource</description>

<text>Adobe Acrobat Document</text>

</elementText>

</elementTextContainer>

</element>

<name>Publisher</name>

<description>An entity responsible for making the resource available</description>

<text>Clinton Presidential Library & Museum</text>

</elementText>

</elementTextContainer>

</element>

<name>Medium</name>

<description>The material or physical carrier of the resource.</description>

<text>Reproduction-Reference</text>

</elementText>

</elementTextContainer>

</element>

<name>Date Created</name>

<description>Date of creation of the resource.</description>

</elementText>

</elementTextContainer>

</element>

<name>Source</name>

<description>A related resource from which the described resource is derived</description>

<text>647992-education-hispanic-2b.pdf</text>

</elementText>

</elementText>

</elementTextContainer>

</element>

</elementContainer>

</elementSet>

</elementSetContainer>

</item>