We would like to thank Marilyn Derby, Associate Director, Student Support and Judicial Affairs at UC Davis, for her contribution to the ideas in this piece and for some of the visual resources used in this post. We would also like to thank Amanda Clarke, English department chair at Viewpoint School in California, for her excellent reference on distinguishing the characteristics of student writing versus AI writing.
Perhaps you've received an essay and you have a gut feeling that the writing does not belong to the student in your class. You run it through Pangram, and it comes back 99.9% confident that the writing is AI-generated.
Or perhaps you are an academic integrity officer, and a professor reports a student's work for AI-generated plagiarism, yet, the student and the parent absolutely insist that the student wrote it themself.
You read the writing and it has all of the telltale signs of AI writing. "In today's technological era," the writing begins. The student explains that an author "weaves the details intricately together through a rich tapestry of perspectives." The grammatically perfect, evenly structured essay ends with the classic phrase "In conclusion,..." or maybe "Overall,..."
You know deep down that your student has not written the assignment, but you just can't prove it. When the people you're trying to convince say, "AI detectors don't work and can't be trusted," or "It is impossible to know for sure," what do you do?
As we've talked about before, a positive AI detection is just the beginning of the conversation, and can never stand alone when punitive action against a student is being considered. Although we stand by the accuracy of our product, we also believe a holistic approach should be used when the stakes are high, and more evidence needs to be collected after the AI detection score in order to prove beyond a reasonable doubt that a student's work is inauthentic or unoriginal.
Today, we'll talk through 7 strategies to collect additional evidence for such cases.
AI-generated writing is never "given away" by one particular phrase or word choice: Pangram makes its decision based on accumulation of many weak signals in the text. Similarly, you can look for many of the signals present in AI-generated text, and use them all in totality to demonstrate that the AI signals could not have shown up by random chance. You should first look for common AI phrases, and see if they show up often. In clear-cut cases, the AI-generated writing contains so many of these that it is very hard to argue that they are a coincidence, such as in the samples below.
Common AI phrases and words
You can view a comprehensive list of commonly used AI vocabulary words and phrasing patterns in Jenna Russell's guide.
Pangram can pull out these phrases automatically as well, along with their frequencies. It's important to understand that no one of these phrases is a proof that the text was AI-generated, but many of these phrases in combination is very strong evidence because it becomes vanishingly unlikely that all of these phrases would have appeared by coincidence.
Example of AI phrase frequency analysis
Beyond the individual words and phrases level, you can also look for high-level characteristics of AI writing as well.
Guide to distinguishing student vs AI writing
This excellent guide from Amanda Clarke shows some of the differences in style and tone that are present in student writing versus AI-generated writing. To summarize the guide, some of the most important points are:
It is also worth noting that when a student's authentic writing is mixed with AI-generated writing, there can often be abrupt shifts in tone and style.
When work is written by a student, it happens as a result of a process of writing the document: brainstorming, outlining, drafting, revising, and proofreading. When work is plagiarized from generative AI, it is often just copied and pasted.
A simple way to check in on a student's proof of writing process is simply to ask them for artifacts: ask them for their notes, their brainstorms and their outlines. If it is a final draft, ask to see their rough draft. Many times this is enough to see proof of writing process: innocent students are unafraid to prove it, and cheating students often simply cannot produce these artifacts.
There are also tools available to check the student's writing process. For example, Draftback is a Chrome extension that you can use to replay the student's writing history in Google Docs. We are also aware of Brisk Teaching, Cursive Technologies and Visible AI. When used in combination with Pangram, these can be powerful tools.
Example of Draftback replay data
In the above Draftback trace, you can see where the student was editing their writing, or if there was one big copy and paste.
Writing process tools alone should not be seen as bulletproof evidence. Knowing that teachers now check revision history to check for academic integrity, students are wise to the fact that a copy and paste leaves them vulnerable. Some students will just transcribe the ChatGPT outputs into their document, making it appear that they wrote it themselves.
Worse, there now exist software tools that fake revision history, such as this "Human Auto Typer" Chrome extension.
Example of a "Human Auto Typer" Chrome extension
Beware that while examining a student's writing process and revision history can be useful, there are now ways that students can get around these simple checks.
Generative AI will often make up citations, misquote sources, and make other mistakes in attributing work that are obvious to spot. When AI chatbots do not know what sources back up a claim they make, most of the time, they are happy to simply make up a fictitious citation. See the example from Claude below.
Example of Claude making up citations
Citation mistakes are often some of the most compelling evidence in AI cases because an intentional falsification of a research source is in itself an academic integrity violation. Often you can simply look at the bibliography or the works cited and check to see if the entries represent real papers. If you Google the first paper and it is not a real paper, that is incredibly strong evidence of a violation.
Again, one must be careful: real citations do not indicate for sure that the student did not use AI. New tools, such as Deep Research, and Perplexity, actually do cite correct sources, and chatbots are rapidly improving at not hallucinating false sources.
One of the easiest ways to check if a student's work is original versus falsified is just by asking them questions about the paper. If the writing level of the submission does not match the writing level of the student, then ask them about the most complex parts of the writing. Sometimes for younger students, just asking them about the meaning of a complicated word that ChatGPT uses often that students at that level never use (such as "axiomatic") are often enough to get the student to just admit they used AI.
At the university level, where students may be expected to come up with novel, original ideas, you may want to ask them questions about how they came up with the idea. Often, this can lead into a discussion about writing process where you can collect information about how the writing came together, as we described in point 2.
It is important to have empathy and to create a safe space for discussion. An academic integrity discussion with students can be highly stressful, and the student may be defensive when presented with evidence. The best way to frame the conversation with the student is to simply come to a meaningful understanding of what happened, so that you can do your best to help the student succeed in the future. Give the student a chance to correct their mistakes and explain why they needed to resort to using AI instead of doing the assignment themselves. We also encourage openness to the fact that the use of AI may have been the result of a misunderstanding rather than an intentional act of wrongdoing. We wrote more about how to have these kinds of conversations in one of our previous blog posts.
Particularly applicable to younger or developing students, AI writing is often significantly ahead of the writing level that one might expect from a student's writing.
We recommend pulling up instances of previous writing from a student. Universities often have central databases where essays from other classes can be pulled. If the student is new to you, you should feel free to ask their previous teacher for a few writing samples from that student.
A sudden jump in writing level from a student who writes poorly to a student who writes with perfect spelling and grammar is cause to be concerned.
ChatGPT does not often have a lot of variance in its output. When you paste the same prompt into ChatGPT twice, it will not return the exact same text, but often it will have a lot of close similarities that are difficult to be made by coincidence.
Example of side-by-side comparison with ChatGPT
Using Pangram's Side By Side feature, you can see the submission next to the ChatGPT submission automatically. While the phrases won't be exactly the same, we highlight and associate phrases that are very similar in meaning to each other.
Another tactic is to generate multiple responses from ChatGPT and look at the similarity. If the submission cannot be picked out of the bunch easily, then it is likely AI as well.
It helps if you know the assignment: that way you can use the assignment directly as the prompt to ChatGPT. But if the assignment is unknown, you can still come up with a reasonable prompt. Try to figure out a prompt that is specific enough that it would produce an essay like the one you are looking at, but not so specific as to make it the exact same just via copying. Using ChatGPT itself can be a useful resource for this: paste the essay into ChatGPT, and ask what the main ideas, topics, and questions the paper is addressing, and try multiple prompts to see what produces reasonably similar essays semantically, so you can check to see if they match stylistically.
According to Russell et. al., research from the University of Maryland that we have previously discussed, experts can be 92.7% accurate at determining whether or not a text is AI-generated or not. However, a council of 5 experts, when looking at the majority vote, can be nearly perfect (in the 300 texts that the researchers studied, the majority was perfectly 100% accurate).
We encourage you to train others in your department or at school on how to detect AI generated text by eye, so that you can get multiple opinions when there are difficult cases. Talking through some of the different signals that each individual judge picks up on is a great way to gain more confidence in assessing the authenticity of a piece of writing.
Additionally, as in all legal-adjacent cases, individuals can be subconsciously or consciously biased in their decision-making for reasons outside the control of the student. Using a panel of multiple people in determining whether or not a student has violated academic integrity can not only help you be more accurate, but at the end of the day, it should help your process be more fair as well.
In this blog post, we've looked at a number of ways that you can go beyond the score and use Pangram and other tools to help you build evidence for your case towards either improper use of AI, or protecting a student who is accused of AI cheating and is actually innocent.
No single piece of evidence is absolutely foolproof in determining the outcome of the case, but the more evidence that you can collect and accumulate, the more fair and defensible your academic integrity process can be.