January 28, 2025
By Philip D. Bunn
Higher education faces a crisis at the hands of new “Artificial Intelligence” tools. Everyone is on edge, and sides are being chosen. Some side with the optimists, gleefully proclaiming the advantages of our new tools and the work they will do for us. Others, like me, are less sanguine about the state of education in the ChatGPT era. Everyone is grappling with questions of academic dishonesty, the possibility of being made irrelevant or obsolete, and more.
So, when the I made following claim, it caused a minor stir:
Students have always cheated" is the most useless, out of touch response to the crisis of generative AI. No, they have not all always cheated this pervasively and undetectably. You don't believe that. Stop saying it.
The reactions were quite dramatic. Some repeated the hackneyed claim to which I objected, other suggested AI tools are the future and I had fallen behind, and others still compared generative AI to calculators. Here, I broadly explain the problem of AI in higher education as I see it from my perspective as a teacher of undergraduate students. I then tie my concerns about AI use today to my concerns about technology and liberty more broadly. I conclude by responding specifically to some of the more salient and perceptive arguments made by my interlocutors, both on Twitter (I will not call it X) and elsewhere.
The State of the Problem
I will concede one thing from the outset: Students have always cheated. I do not deny this, as it would be foolish to do so. I have found my tests in online repositories, learned from students that my tests have been kept in physical test banks, discovered traditional plagiarism in written work, found evidence of purchasing and sharing of papers, and the list goes on. Students are creative—especially so when it comes to getting out of work required of them. That college-aged students do cheat, have cheated in the past, and will continue to cheat is no surprise to me.
Our current situation is, however, quite literally unprecedented. While students were always able to steal someone else’s work or copy verbatim from an article in the library or on the internet, the strictures and structures of assignment-writing could account for this somewhat. A sufficiently savvy professor could, for example, include enough qualifications and details in her writing prompts to make ready-made internet answers less responsive to the prompt. These methods were, in earlier eras, effective at both encouraging real engagement by students and detecting cheating when necessary. This besides the fact that elementary plagiarism, which often takes the form of copying from a publicly available source, is sometimes detectable with a simple Google search.
In contrast to “traditional” cheating methods, generative AI tools are now capable of producing lengthy responses that account for and adapt to these baked-in and course-specific nuances. A savvy student could prompt an instance of a large language model (LLM) with the course syllabus and then ask it to answer writing prompts with material from that syllabus. That same student could ask the generative AI tool to rephrase its output to sound more formal, or more casual, more like a college freshman, or perhaps to expound and elaborate on some underdeveloped point.
When the student is satisfied, copies the result, and submits work that is not their own, it is true that the resulting product may be lackluster in some way. But, with all due respect to my students, the writing of college students is often lackluster. That some students produce bad writing is a truth universally acknowledged, so the fact of generative AI being below-average does not inherently or automatically make it detectable, even to perceptive and trained eyes.
That AI models sometimes “hallucinate,” invent citations and sources or make dramatic mistakes in the predictive generation of text, is a truism often referenced in these conversations. Some teachers catch cheaters when unedited AI outputs have invented sources that do not exist or have pontificated on things easily proven factually untrue. It is true that careful attention to the output of these tools will occasionally reveal the weaknesses of these LLMs and the writing they produce. We ought certainly to be concerned that the internet is currently being flooded with material produced by these complicated, flawed machines.
But those who rest on hallucinations and errors as their source of hope in the face of the flood of both low-quality AI content and rampant, worldwide academic misconduct will find their hope built on shifting sands. The environment is constantly changing. As these models have grown more sophisticated, so too have their outputs. The simple fact is that, since their introduction, public, free or low-cost AI tools have simply gotten better at mimicking human writing and responding to writing prompts that many professors world use in their assessments.
It is important to note that I leave aside here philosophical considerations about the extent or possibility of these tools “understanding,” possessing “consciousness” and the like, not because they are unimportant, but because educators face a practical and factual question. That is: can this machine, this tool that many or most students are currently using, replicate the kind of output I would expect from an average undergraduate student without me being able to tell that the writing in question was not human generated? Despite protestations to the contrary, the answer is increasingly yes.
As a personal example, I have regularly, at intervals, asked various AI tools to respond to essay prompts in my courses since the first public-facing version of ChatGPT went live late in the fall semester of 2022. One common prompt I have used consists of some variation of “compare and contrast the three major social contract thinkers,” providing no additional clarification.
The early public versions of ChatGPT produced responses to this prompt that were clunky and full of mistakes, sometimes correctly identifying the “Big Three” of Hobbes, Locke, and Rousseau, but nevertheless often making elementary errors in summary, confusing terms, inventing fictions, and the like. These errors were easily detectable by subject matter experts, and such mistakes, when present in essays or exams, operated as clues that students either had misunderstood the material at a fundamental level or else had used a flawed AI tool in the writing of their response.
More recent versions of ChatGPT and other LLMs are, I am afraid to say, far more impressive. While I doubt the outputs of these tools will be winning any awards for creative writing or advancing scholarship independently any time soon, they have proven quite capable of at least appearing to understand complex thinkers and ideas, even providing additional, roughly accurate clarifications when prompted. My ability to detect whether the writing in front of me is from a less-than-stellar human writer or from an AI has declined apace, and I am skeptical of professors who argue they exceed me in this. My skepticism is not based on a strong estimation of my competence relative to that of others, but of a growing realization that generative AI has wildly exceeded what I expected it to be capable of producing.
I am also skeptical because this is not merely my anecdotal experience. While evidence currently is sparse, as these tools are novel and human-generated research and writing takes time, what evidence we do have suggests that professors and graders are overconfident in their ability to detect AI use, compared to their actual success in detecting AI-generated writing, which is mixed at best.
For professors who prefer to use commercial “AI detection” tools, the results are even more bleak. While humans may have been fairly reliably capable of noticing AI-generated content from earlier iterations of these large language models, some research has shown that commercial AI detection applications “too often present false positives and false negatives” to be effectively used as evidence of academic misconduct. Anecdotally, my own, fully-human-generated writing has occasionally been flagged by these tools in personal tests as “likely AI-generated”, while essays I have generated using public AI tools have been cleared as “100% human-generated.” Based on this research and my own experience, I cannot trust these tools or even necessarily my own intuition to detect the kind of cheating that I nevertheless know to be prevalent.
In short, I would disagree with those critics of my earlier statement who suggested I suffer from a kind of misconduct myopia. They occasionally and variously suggest that “this is always the way it has been,” or that “AI generated text is like unto using a calculator for a math class.” In my understanding, the situation educators face is entirely novel. It has changed dramatically in only a two-year window, with these new, fun, fancy tools moving from novelties with obvious weaknesses to pernicious and pervasive replacements for every step of student learning.
Technology and Liberty
My key concern through this entire discussion is the intellectual development of these students in our care. Through preparing to be a teacher, I have been convinced that developing both a knowledge base and a kind of capacity for good judgment in our students should be goals in our coursework, assignments, and assessments. The practice of this judgment is, I would argue, one of the most essential kinds of freedom available to individuals. But to the extent that we offload our intellectual independence to algorithms that choose for us, we and our students have stunted our ability to develop that judgment and have thus hampered our ability to remain meaningfully free.
This is an intuition that calls back to early eras of what we now think of as “liberalism.” In his book A Third Concept of Liberty, philosopher Samuel Fleischacker suggests that liberal thinkers like Adam Smith and John Stuart Mill develop an idea of liberty that is not precisely aligned with the famous “positive” and “negative” liberty distinction advanced by Isaiah Berlin and others. Instead, Fleischacker argues these thinkers conceive of liberty as the practice of a developed faculty of “independent” judgment. The free exercise of judgment is “independent” in the sense that it is made by an individual, not by a parent or a guardian or a warden or anyone else on behalf of a subject under tutelage.
This faculty can be developed when we make judgements and subsequent choices and submit them to the evaluation of others. For example, Smith envisions someone developing “moral sentiments” in social situations, such as telling a joke that falls flat. One might question why the joke did not land, if the situation, audience, and timing were appropriate, and revise their future judgments accordingly. This process of development, in both low-stakes social situations and high-stakes moral and political situations, teaches us to be more perceptive to the feelings of others, to tamp down our own unsocial emotions and behaviors, and, by extension, to become better people ourselves. Someone who has developed their capacity for this kind of judgment is meaningfully “free” on this account.
By analogy, students develop their capacity to explore ideas and own and understand them capably through a similar process of submission of their own judgments in the form of written and oral work to the judgment of their professors in the form of number and letter grades and accompanying feedback. A student who puts serious time and effort into understanding the material, producing a piece of writing in response to a prompt, and taking feedback on that writing has taken a serious step in their own intellectual development. They will then, hopefully, take the feedback received into account in the future practice of writing and the practice of study more generally.
This kind of iterative, reflective education produces someone who understands, not just someone who parrots. That some of our students would be merely parrots, learning by “cram,” was one of the fears of John Stuart Mill, articulated in an essay published in the Edinburgh Review entitled “On Genius.” In this essay, Mill reflects on the human capacity for genius, in response to another essay in the Edinburgh Review that effectively complained there was little room for the contemporary exercise of genius on the level of a Galileo, Newton, or Bacon. Since so much has been discovered and so little relatively remains to be discovered, Mill’s interlocutor wonders, how many humans can really exercise and demonstrate “genius” today?
Against his interlocutor, Mill argues that “genius” properly understood is the human capacity for understanding. Practicing genius is not necessarily learning something altogether new, putting together thoughts or facts in some combination never previously uncovered, but instead each individual personally understanding something anew for the first time. A proper education, Mill argues, is not one where students are taught merely to “cram” facts without understanding, to echo their teacher’s maxims without internalizing them, or to perform what he literally calls algorithmic operations absent human intellectual intervention. Instead, a proper education cultivates “genius” insofar as it ensures students hear, apprehend, comprehend, understand, and can then take some independent ownership of the ideas under consideration.
Mill’s reflections on “genius” help illuminate the problems of at least one response to my unintentionally provocative claim that occasioned this reflection. It is very common for proponents of generative AI use in the classroom to compare these new tools to calculators. Calculators, they reason, shortcut the operations of mathematical calculations. No longer are we committed to using an abacus or even our own brains, hands, pencil, and paper to perform longhand calculations. Instead, calculators supposedly free up mental resources by efficiently performing operations that would be onerous to perform by hand or by less sophisticated instruments. This then allows students and practitioners to use the freed-up mental RAM to engage in other, higher tasks.
Even leaving aside the possible deleterious effects of overreliance on calculators my opponents are understating, the difference Mill has articulated between an education of cram or rote repetition and a properly human education shows us the error of this way of thinking. This is not unlike the famous thought experiment of Searle's Chinese Room: if I, for example, perfectly learn how to translate complex physics equations into useable form for my calculator, which then solves those equations for me and gives a technically correct output in response to a homework problem, “I” have not, properly speaking, done anything or understood anything, much less learned and grown as a result of my work. I have merely transcribed symbols in front of me for the sake of a machine that then gives me an output which I transcribe in turn, and at no point in this process have I understood anything about the underlying math or physics involved.
This way of using a calculator contrasts to how a capable mathematician might use a calculator, to perform functions he already understands for the sake of freeing up time for higher, more complex thought and operation. If a calculator substitutes for my understanding, it is undercutting the goals of my education. If I possess understanding and use a calculator to build upon what I already understand, then it has become a useful tool. It is only the most blindered AI optimist who could look at our current educational situation and think that students are, on the whole, using generative AI in a way that looks more like the latter than the former.
Conclusion
I want to close by considering the most interesting and constructive pushback I receive when talking about the problem of AI in higher education. That is: that using these generative AI tools will be an important skill going forward in a host of professions, such that students who know how to use them will be advantaged, and those who are prevented from learning how to use them will be hindered. Those who know how to use these tools well and apply them effectively in their work will succeed economically, so the argument goes, while those who do not bother or are not permitted to adapt to changing circumstances will be left behind. In other words, it is “more” important in some way for our students to learn to use AI well than it is for us to ensure they are not submitting something that they did not personally birth from their heads and type on their keyboards.
I do not personally know the future and thus do not know enough to determine whether the specific claim of the relative importance of “prompt engineering” as a skill is accurate. What I do know is that generative AI, if it is to be used well, requires a user with knowledge and skill. That is, I do not doubt that generative AI tools can produce useful, robust, “correct” outputs depending on the use case. I would argue, however, that it will always be true that those who are most capable of utilizing these tools will be those most skilled and knowledgeable in the underlying material.
Returning to the flawed analogy of a calculator: the uses to which a complex, powerful calculator can be put are many, and the time saved for those using it great. However, given my personal lack of mathematical acumen, my use of a calculator will always be limited and relatively fruitless compared to someone using it who understands how to apply it to the problems that trouble them. That the calculator can perform a calculation does not directly absolve the user of the need to learn the underlying math, how to transform an equation into something the calculator can work with, and so on.
Generative AI can certainly produce serviceable output in the hands of users with little understanding or expertise. This is part of the problems of AI in higher education I have identified above. But despite the advantages the “generative” part of generative AI lends these tools, it seems true to me that the best users of these tools going forward will be those with the requisite knowledge to recognize mistakes and correct them when the machines go awry. As a scholar of certain texts, I can recognize AI hallucinations when the AI inappropriately summarizes a text I know intimately, or when it invents a citation to secondary literature I know does not exist. I am able effectively to ask it to improve upon its output only when I am equipped to know that the output is mistaken or flawed in some way.
But the regular, habitual use of these AI tools discourages me from mastering the material that would help me in this very process of using AI well. If our hope for our students is both for their professional success and their personal intellectual and moral growth, we could hardly do worse than incentivizing them to substitute the machine for the operations of their own intellects, to substitute artificially “intelligent” outputs for their own developing, independent, human voices and choices. To what extent we can, teachers ought to endeavor to design classrooms and assignments that discourage and punish the use of these AI tools until such a time as students can show themselves capable of using them productively as aids to their own judgment and not substitutes for it.
Philip D. Bunn, Ph.D., is an Assistant Professor of Political Science at Covenant College in Lookout Mountain, Georgia.