On The Possible Advantages Of Robot Graders

Some very interesting news from the trenches about robot graders, which notes the ‘strong case against using robo-graders for assigning grades and test scores’ and then goes on to note:

But there’s another use for robo-graders — a role for them to play in which…they may not only be as good as humans, but better. In this role, the computer functions not as a grader but as a proofreader and basic writing tutor, providing feedback on drafts, which students then use to revise their papers before handing them in to a human.

Instructors at the New Jersey Institute of Technology have been using a program called E-Rater…and they’ve observed a striking change in student behavior…Andrew Klobucar, associate professor of humanities at NJIT, notes that students almost universally resist going back over material they’ve written. But [Klobucar’s] students are willing to revise their essays, even multiple times, when their work is being reviewed by a computer and not by a human teacher. They end up writing nearly three times as many words in the course of revising as students who are not offered the services of E-Rater, and the quality of their writing improves as a result…students who feel that handing in successive drafts to an instructor wielding a red pen is “corrective, even punitive” do not seem to feel rebuked by similar feedback from a computer….

The computer program appeared to transform the students’ approach to the process of receiving and acting on feedback…Comments and criticism from a human instructor actually had a negative effect on students’ attitudes about revision and on their willingness to write, the researchers note….interactions with the computer produced overwhelmingly positive feelings, as well as an actual change in behavior — from “virtually never” revising, to revising and resubmitting at a rate of 100 percent. As a result of engaging in this process, the students’ writing improved; they repeated words less often, used shorter, simpler sentences, and corrected their grammar and spelling. These changes weren’t simply mechanical. Follow-up interviews with the study’s participants suggested that the computer feedback actually stimulated reflectiveness in the students — which, notably, feedback from instructors had not done.

Why would this be? First, the feedback from a computer program like Criterion is immediate and highly individualized….Second, the researchers observed that for many students in the study, the process of improving their writing appeared to take on a game-like quality, boosting their motivation to get better. Third, and most interesting, the students’ reactions to feedback seemed to be influenced by the impersonal, automated nature of the software.

Not all interactions with fellow humans are positive; many features of conversations and face-to-face spaces act to inhibit the full participation of those present. Some of these shortcomings can be compensated for, and directly addressed, by the nature of computerized, automated interlocutors (as, for instance, in the settings described above). The history of online communication showed how new avenues for verbal and written expression opened for those inhibited in previously valorized physical spaces; robot graders similarly promise to reveal interesting new personal dimensions of automation’s spaces for interaction.

Get Your Computer’s Hands off my Students’ Essays

Last week, the New York Times alerted readers to the possibility of computers grading college-level student essays. As with any news featuring the use of ‘artificial intelligence’ to replace humans, reactions to this announcement feature the usual skewed mix of techno-boosterism, assertions of human uniqueness, and fears of deskilling and job loss.

First, a sample of the boosterism:

Anant Agarwal, an electrical engineer who is president of EdX, predicted that the instant-grading software would be a useful pedagogical tool, enabling students to take tests and write essays over and over and improve the quality of their answers. He said the technology would offer distinct advantages over the traditional classroom system, where students often wait days or weeks for grades.

“There is a huge value in learning with instant feedback,” Dr. Agarwal said. “Students are telling us they learn much better with instant feedback.”

Then, the assertions of human uniqueness.

From the Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment:

Let’s face the realities of automatic essay scoring. Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.

From  Professor Alison Frank Johnson at Harvard University (Letters Section):

Only a human grader can recognize irony, can appreciate a beautiful turn of phrase, can smile with unanticipated pleasure at the poetic when only the accurate was required.

From Professor Cathy Bernard of the New York Institute of Technology (Letters again), resisting the submit, get feedback, revise, resubmit model (which to my mind, sounds a great deal like a standard iterative writing process):

Writing is thinking, and revision is a slow process, unpredictable and exploratory. A piece of writing, like a cake taken from the oven, needs some time to cool before the revision process can even begin.

Finally, on Twitter, there was ample commentary to the effect that the ‘other tasks’ in ‘freeing professors for other tasks’ would merely be the the revision of resumes as professors looked for work elsewhere, outside the academy, their jobs taken over by the latest technological marvel promoted by the Borg-like combination of corporate rent seekers eager to enter the academic market, and university administrators eager to let them in. Fair enough.

Still, I wonder if matters are quite so straightforward.

First, the assertions of human uniqueness in the domain of reading and writing leave me cold. They do not take into account the possibility that greater exposure to human examples of writing, especially essays graded by human professors–better learning data–will significantly improve the quality of these automated graders, which rely on statistical machine learning techniques. (I’m note sure what ‘realities’ the PAMSSEHSA want us to face; computers can read; you can verify this the next time you enter your name in an online form. As for the lacunae they point out, I fail to see an argument that these are not achievable in principle.)

Second, I note that both Professors Johnson and Bernard are writing from private institutions where, I presume, professors often have teaching assistants to help them with grading. From my vantage position at a severely underfunded public university, I admit to wanting help with grading student assignments. I would be able to assign more assignments–especially in upper-tier core classes–if I could rely on some assistance with grading. In general, as with the discourse surrounding online education, the same fallacy is committed here: comparing the worst of A with the best of B. The original New York Times article features the obligatory story of ‘stupid computers’ fooled by human trickery, and the letter writers wax lyrical about the magic of professors grading student writing. But what about tired, overworked professors who can only provide cursory feedback? Would their students do better with computer grading? Would every human professor always do better than an automated grader?