Friday, June 7, 2019

The Past, Present, and Future of Automated Scoring Essay Example for Free

The Past, Present, and Future of modify Scoring EssayNo sensible decision can be made any longer without taking into account non only the world as it is, and the world as it volition be Isaac Asimov (5)IntroductionAlthough some realities of the classroom remain constant they wouldnt exist without the presence, whether actual or virtual, of scholarly persons and teachers the technology age is changing non only the federal agency that we teach, precisely also how students learn. plot the implications of this affect tout ensemble disciplines, it is acutely evident in the teaching of opus. In the last twenty years, we ingest seen a rapid transform in how we read, compile, and c ar for text. Compositionist Carl Whithaus maintains that piece of music is becoming an increasingly multimodal and mul datedia activity (xxvi). It is no surprise then, that there ar currently 100 million blogs in existence world-wide and 171 billion email messages sent daily (Olson 23), and the trend toward digitally-based constitution is also contemptible into the classroom. The typical student today writes almost exclusively on a electronic figurer, typically atomic number 53 equipped with automated tools to help them spell, check grammar, and even choose the right words (Cavanaugh 10). Furthermore, CCC notes that increasingly, classes and programs in writing require that students frame up digitally (785).Given the effect of technology on writing and the current culture of high gear stakes foot raceing ushered in by the mandates of the No tike Left Behind Act of 2001, a seemingly natural product of the combination of the dickens is computer-based valuatement of writing. An idea still in its infancy, the accomplish of technical change in combination with federal test mandates has resulted in several republics incorporating computer-based examination into their writing estimations, not only beca uptake of students widespread familiarity with compute rs, but also because of the demands of college and the workplace, where word-processing skills are a must (Cavanaugh 10).Although it bring abouts sense to claim students accustomed to composing on computer write in the same mode for high-stakes tests, does it make sense to worst their writing by computer as well? This is a controversial question that has both curbers and detractors. Supporters like Stan Jones, Indianas Commissioner of Higher Education, believe that computerized audition grading is inevitable (Hurwitz n.p.), while detractors, primarily pedagogues, assert that such opinion defies what we know about writing and its assessment, because regardless of the medium all writing is social agreely, re routineee to and evaluation of writing are human activities (CCC 786).Even so, the reality is that the law requires testing nationwide, and in all probability that mandate is not going to change anytime soon. With NCLB up for revision this year, even politicians like Sen. Edward Kennedy of Massachusetts agree that standards are a good idea and that testing is one way to suss out that they are met. At some loony toons, we need to pull away from all-or-none polarization and create a new paradigm. The instead we realize that computer technology will link up assessment technology in some way (Penrod 157), the sooner we will be able to address how we, as teachers of writing, can use technology efficaciously for assessment.In the past, Brian Huot notes that teachers responses have been reactionary, cobbled together at the last minute in response to an outside call (150). Teachers need to be proactive in addressing technical convergence in the written report classroom, because if we dont, others can will impose certain technologies on our teaching (Penrod 156). or else of passively leaving the development of assessment software product solely to programmers, teachers need to be actively compound with the process in order to ensure the applicati on of sound pedagogy in its creation and application.This test will argue that automated essay scoring (AES) is an inevitability that provides many an(prenominal) more positive possibilities than negative ones. While the research presented here spans K-16 statement, this essay will primarily address its application in secondary environments, primarily foc utilize on high school juniors, a group currently consisting of approximately 4 million students in the united States, because this group represents the targeted population for secondary school high stakes testing in this country (U.S. Census Bureau). It will outset present a brief history of AES, then explore the current state of AES, and finally consider the implications of AES for writing control and assessment in the future.A Brief History of Computers and AssessmentThe primary time standardized objective testing in writing occurred was in 1916 at the University of Missouri as leave of a Carnegie Foundation sponsored st udy (Savage 284). As the 20th century continued, these tests began to grow in popularity because of their efficiency and perceived reliability, and are the cornerstone of what Kathleen Blake Yancey describes as the first agitate of writing assessment (484). To articulate the progression of composition assessment, Kathleen Blake Yancey identifies cardinal distinct, yet overlapping, waves (483). The first wave, occurring approximately from 1950-1970, primarily focused on using objective (multiple choice) tests to assess writing simply because, as she quotes Michael Williams, they were the topper response that could be tied to testing theory, to institutional need, to cost, and ultimately to efficiency (Yancey 489).During Yanceys first wave of composition assessment, another wave was forming in the parallel universe of computer software design, where developers began to address the possibilities of not only programming computers to mimic the process of human reading material, but to emulate the value judgments that human readers make when they read student writing in the context of large scale assessment (Herrington and Moran 482). Herrington and Moran identify The Analysis of Essays by Computer, a 1968 book by Ellis Page and Dieter Paulus, as one of the first composition studies books to address AES.Their goal was to evaluate student writing as faithfully as human readers, and they attempted to identify computer-measurable text features that would correlate with the kinds of intrinsic features that are the basis for human judgments , settling on thirty quantifiable features, which included essay length in words, average word length, total and kind of punctuation, number of common words, and number of spelling errors (Herrington and Moran 482). In their study, they found a high enough statistical correlation, .71, to support the use of the computer to score student writing. The authors note that the response of the composition community in 1968 to Page and Pauluss book was one of indignation and uproar.In 2007, not much has changed in terms of the composition communitys position regarding computer-based assessment of student writing. To many, it is something that is an unknown, mystifying Orwellian entity waiting in the shadows for the perfect moment to jump out and usurp teachers autonomy in the classroom. Nancy Patterson describes computerized writing assessment as a horror story that may come sooner than we realize (56). Furthermore, P.L. Thomas offers the following question and response How can a computer determine accuracy, originality, valuable elaboration, empty language, language maturity, and a long count of similar qualities that are central to assessing writing? Computers cant. WE must ensure that the human element remains the dominant factor in the assessing of student writing (29).Herrington and Moran make the issue a central one in the teaching of writing and have serious concerns about the potential effects of ma chine reading of student writing on our teaching, on our students learning, and therefore on the profession of English (495). Finally, CCC definitively writes, We oppose the use of machine-scored writing in the assessment of writing (789). While the argument against AES is clear here, the responses appear to be based on a lack of understanding of the technology and an unwillingness to change. Instead of taking a reactionary position, it might be more constructive for teachers to assume the inevitability of computerized assessment technology it is not going away and to use that assumption as the basis for taking a proactive role in its implementation.The Current Culture of High-Stakes TestingAt any given time in the United States, there are approximately 16 million 15-18 year-olds, the majority of whom receive a high school education (U.S. Census). Even when factoring in a maximum of 10 percent (1.6 million) who may drop out or otherwise not receive a diploma, there is a strong o ccur of students, 14-15 million, who are attending high school. The majority of these students are members of the national school system and as such must be tested annually according to NCLB, though the most significant focus group for high-stakes testing is eleventh arrange students.Currently in myocardial infarct, 95% of any given public high schools junior population must sit for the MME, Michigan Merit Exam, in order for the school to qualify for AYP, Adequate Yearly Progress1. Interestingly, those students do not all have to pass currently, though by 2014 the government mandates a 100% passing rate, a number that most admit is an impossibility and will probably be addressed as the NCLB Act is up for review this year. In the past, as part of the previous 11th grade examination, the MEAP, Michigan Educational Assessment Program, requisite students to complete an essay response, which was assessed by a variety of people, mostly college students and retired teachers, for a min imal amount of money, usually in the $7.50 $10.00 per hr range.As a side note, neighboring Ohio sends its writing test to North Carolina to be scored by workers receiving $9.50 per hour (Patterson 57), a wage that fast food employees make in some states. Because of this, it was consistently difficult for the state to assess these writings in a short period of time, causing huge delays in distributing the results of the exams backside to the school districts, posing a huge problem as schools could not use the testing information in order to address educational shortfalls of their students or programs in a timely manner, one of the purposes behind getting prompt feedback.This year (2007), as a result of increased graduation requirements and testing mandates goaded by NCLB, the Michigan Department of Education began administering a new examination to 11th graders, the MME, an ACT fueled assessment, as ACT was awarded the testing contract. The MME is comprised of several sections and required most high schools to administer it over a period of 2-3 days. Day one consists of the ACT + written material, a 3.5 hour test that includes an argumentative essay.Days two/three (depending on district implementation), consist of the ACT WorkKeys, a basic work skills test of math and English, further mathematics testing (to address curricular content not covered by the ACT + theme), and a social studies test, which incorporates another essay that the state combines with the argumentative essay in the ACT + Writing in order to determine an overall writing score. Miraculously, under the auspices of ACT, students received their ACT + Writing scores in the mail approximately three weeks after testing, unlike the MEAP, where some schools did not receive test scores for six months. In 2005, a MEAP official admitted that the cost of scoring the writing assessment was forcing the state to go another route (Patterson 57), and now it has.So how is this related to automated essay sco ring? My hypothesis is that as states are required to test writing as part of NCLB, there is going to be a lack of qualified people to be able to read and assess student essays and determine results within a reasonable amount of time to purposefully inform demand curricular and instructional change, which is supposed to be the point of testing in the first place. Four million plus essays to evaluate each year (sometimes more if more writing is required, like Michigan requiring two essays) on a national level is a huge amount. Michigan Virtual Universitys Jamey Fitzpatrick says, Lets face it. Its a very effortful task to sit down and read essays (Stover n.p.). Furthermore, it only makes sense that instead of states working on their own test management, they will contract state-wide testing to big testing agencies, like Michigan and Illinois have with ACT, to reduce costs and improve efficiency. Because of the move to contract ACT, my guess is that we are moving in the direction of having all of these writings scored by computer.In email correspondence that I had with Harry Barfoot at Vantage Learning in early 2007, a company that creates and markets AES software, said, Ed Roeber has been to visit us and he is the high stakes assessment guru in Michigan, and who was part of the MEAP 11th grade becoming an ACT test, which Vantage will end up being part of under the covers of ACT. This indicates the inevitability of AES as part of high-stakes testing. In spite of the fact that there are no states that rely on computer assessment of writing yet, state education officials are looking at the potential of this technology to limit the need for costly human scorers and reduce the time needed to grade tests and get them back in the hands of classroom teachers (Stover n.p.). Because we live in an age where the budget axe frequently cuts funding to public education, it is in the interest of states to save money any way they can, and states stand to save millions of do llars by adopting computerized writing assessment (Patterson 56).Although AES is not a reality yet, all(prenominal) indication is that we are moving toward it as a solution to the cost and efficiency issues of standardized testing. Herrington and Moran observe that pressures for common assessments across state public K-12 systems and high education both for placement and for proficiency testing make attractive a machine that promises to assess the writing of large numbers of students in a fast and reliable way (481). To date, one of the two readers (the other is still human) for the GMAT is e-Rater, an AES software program, and some universities are using Vantages WritePlacerPlus software in order to place first year university students (Herrington and Moran 480). However, one of the largest obstacles in bringing AES to K-12 is one of access. In order for students writing to be assessed electronically, it must be inputted electronically, convey that every student will have to c ompose their essays via computer.Sean Cavanaghs article of two months ago maintains that ACT has already suggested delivering computers to districts who do not have fit technology in order to accommodate technology differences (10). As of last month, March 2007, Indiana is the only state that relies on computer scoring of 11th grade essays for the state-mandated English examination (Stover n.p.) for 80 percent of their 60,000 11th graders (Associated Press), though their Assistant Superintendent for Assessment, Research, and Information, West Bruce, says that the states computer software assigns a confidence rating to each essay, where low confidence essays are referred to a human scorer (Stover n.p.). In addition, in 2005 West Virginia began using an AES program to grade 44,000 middle and high school writing samples from the states writing assessment (Stover n.p.). At present, only ten percent of states currently incorporate computers into their writing assessments, and two more a re piloting such exams (Cavanagh 10). As technology becomes more accessible for all public education students, the possibilities for not only computer-based assessment but also AES become very real.Automated Essay ScoringWeighing the technological possibilities against logistical considerations, however, when might we expect to see full-scale implementation of AES? Semire Dikli, a Ph.D. candidate from Florida State University, writes that for practical reasons the transition of large-scale writing assessment from paper to computer delivery will be a gradual one (2). Similarly, Russell and Haney suspect that it will be some years before schools generally develop the capacity to administer wide-ranging assessments via computer (16 of 20).The natural extension of this, then, is that AES cannot fade on a large-scale until we are able to provide conditions that allow each student to compose essays via computer with Internet access to upload files. At issue as well is the reliability o f the company contracted to do the assessing. A March 24, 2007 Steven Carter article in The Oregonian reports that access issues resulted in the state of Oregon canceling its contract with Vantage and signing a long-term contract with American Institutes for Research, the long-standing company that does NAEP testing. Even though the state tests only reading, science, and math this way (not writing), it nevertheless indicates that reliable access is an ongoing issue that must be resolved.Presently, there are four commercially available AES systems Project Essay shape (Measurement, Inc.), Intelligent Essay Assessor (Pearson), Intellimetric (Vantage), and e-Rater (ETS) (Dikli 5). All of these incorporate the same process in the software, where First, the developers identify relevant text features that can be extracted by computer (e.g., the similarity of the words used in an essay to the words used in high-scoring essays, the average word length, the frequency of grammatical errors, t he number of words in the response). Next, they create a program to extract those features. Third, they combine the extracted features to form a score. And finally, they evaluate the machine scores empirically,(Dikli 5).At issue with the programming, however, is that the weighting of text features derived by an automated scoring system may not be the same as the one that would result from the judgments of writing experts (Dikli 6). There is still a significant difference between statistically optimal approaches to measurement and scientific or educational approaches to measurement, where the aspects of writing that students need to focus on to improve their scores are not the ones that writing experts most value (Dikli 6). This is the tension that Diane Penrod addresses in Composition in Convergence that was mentioned earlier, in which she recommends that teachers and compositionists become proactive by getting involved in the creation of the software instead of leaving it exclusive ly to programmers.And this makes sense. Currently, there are 50-60 features of writing that can be extracted from text, but current programs only use about 8-12 of the most predictive features of writing to determine scores (Powers et. al. 413). Moreover, Thomas writes that composition experts must determine what students learn about writing if that is left to the programmers and the testing experts, we have failed (29). If compositionists and teachers can enmesh themselves in the creation of software, working with programmers, then the product would likely be one that is more palatable and suitable based on what we know good writing is. While the aura of mystery behind the creation of AES software is of concern to educators, it could be easily addressed by education and involvement. CCC reasons that since we can not know the criteria by which the computer scores the writing, we can not know whether particular kinds of bias may have been reinforced into the scoring (489). It stands to reason, then, that if we take an active role in the development of the software, we will have more control over issues such as bias.Another point of contention with moving toward computer-based writing and assessment is the concern that high-stakes testing will result in students having a narrow view of good writing, particularly those moving to the college level, where writing skill is expected to be more comprehensive than a prompt-based five-paragraph essay written in 30 minutes. Grand Valley State Universitys Nancy Patterson opposes computer scoring of high stakes testing, saying that no computer can evaluate subtle or creative styles of writing nor can they assess the quality of an essays intellectual content (Stover n.p.). She also writes that standardized writing assessment is already having an adverse effect on the teaching of writing, luring many teachers into more formulaic approaches and an over-emphasis on surface features (Patterson 57).Again, education is key here , specifically teacher education. Yes, we live in a culture of high-stakes testing, and students must be lively to write successfully for this genre. But, test-writing is just that, a genre, and should be taught as such just not to the detriment of the rest of a writing program something that the authors of Writing of Demand assert when they write We believe it is possible to integrate writing on demand into a plan for teaching based on best practices (5). AES is not an attack on best practices, but a tool for cost-effective and efficient scoring. Even though Thomas warns against the demands of standards and high stakes testing becoming the entire writing program, we still must realize that computers for composition and assessment can have positive results, and many of the roadblocks to more effective writing instruction the paper load, the time involved in writing instruction and assessment, the need to address surface features individually can be lessened by using computer pr ograms (29).In addition to pedagogical concerns, skeptics of AES are leery of the companies themselves, particularly the aggressive marketing tactics that are used, particularly those that teachers perceive to be threats not only to their autonomy, but their jobs. To begin, companies aggressively market because we live in a capitalist society and they are out to make money. But, to cite Penrod, both computers and assessment are by-products of capitalist thinking applied to education, in that the two reflect speed and efficiency in textual production (157). This is no different than the first standardized testing experiments by the Carnegie Foundation at the beginning of the 20th Century, and it is definitely nothing new. Furthermore, Herrington and Moran admit that computer power has increased exponentially, text- and content- analysis programs have become more plausible as replacements for human readers, and our administrators are now the targets of heavy marketing from companies t hat offer to read and evaluate student writing right away and cheaply (480).In addition they see a threat in companies marketing programs that define the task of reading, evaluating, and responding to student writing not as a complex, demanding, and rewarding aspect of our teaching, but as a burden that should be elevate from our shoulders (480). In response to their first concern, teachers becoming involved in the process of creating assessment software will help to define the task the computers perform. Also, teachers will always read, evaluate, and respond, but probably differently. Not all writing is for high-stakes testing. Secondly, and maybe Im alone in this (but I think not), but Id love to have the tedious task of assessing student writing lifted from my plate, especially on sunny weekends when Im stuck inside for most of the daylight hours assessing student work. To be a dedicated writing teacher does not necessarily involve martyrdom, and if some of the tedious work is removed, it can give us more time to actually teach writing. Imagine thatThe Future of Automated Essay ScoringOn March 14th, 2007, an article appeared in Education Week that says that beginning in 2011, the National Association for Educational Progress will begin conducting the testing of writing for 8th and 12th grade students by having the students compose on computers, a decision unanimously approved as part of their new writing assessment framework. This new assessment will require students to write two 30-minute essays and evaluate students ability to write to persuade, to explain, and to convey experience, typically tasks deemed necessary both in school and in the workplace (Olson 23).Currently, NAEP testing is assessed by AIR (mentioned above), and will no doubt incorporate AES for assessing these writings. In response, Kathleen Blake Yancey, Florida State University prof and president-elect of NCTE, said the framework Provides for a more rhetorical view of writing, where pur pose and audience are at the center of writing tasks, while also requiring students to write at the keyboard, providing a direct link to the kind of composing writers do in college and in the workplace, thus bringing assessment in line with lifelong composing practices (Olson 23). We are on the cusp of a new era.With the excitement of new possibilities, though, we must remember, as P.L. Thomas reminds us, that while technology can be a wonderful thing, it has never been and never will be a panacea (29). At the same time, we must also discard our tendency to head off change and embrace the overwhelming possibilities of incorporating computers and technology with writing instruction. Thomas also says that writing teachers need to see the inevitability of computer-assisted writing instruction and assessment as a great opportunity.We should work to see that this influx of technology can help increase the time students spend actually composing in our classrooms and increase the amount o f writing students produce (29). Moreover, we must consider that the methods used to program AES software are not very different than the rubrics that classroom teachers use in holistic scoring, something Penrod identifies as having numerous subsets and criteria that do indeed divide the students work into pieces (93). I argue that our time is better spent working within the system to ensure that its inevitable changes reflect sound pedagogy, because the trend that were seeing is not substantially differently from previous ones. The issue is in how we choose to address it. Instead of eschewing change, we should embrace it and make the most of its possibilities.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.