
Stanford study finds AI tops law professors
Law professors preferred answers generated by artificial intelligence over responses written by fellow academics in a Stanford University-led study examining how large language models perform on legal reasoning tasks.
The study involved 16 professors from 14 US law schools who created 40 contract law questions, with Google's Gemini 2.5 Pro winning 75.92% of blinded comparisons against human instructors and NotebookLM winning 74.75%.
“Observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs’ success reflects alignment with common disciplinary criteria,”
The researchers wrote.
Researchers found AI models outperformed human instructors across recall questions, hypotheticals and policy discussions, while Gemini and NotebookLM recorded harmfulness rates of 3.41% and 3.64% respectively compared with 12.06% for professor-written responses.
A separate analysis ranked Anthropic’s Claude Opus 4.7 first, followed by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Pro, with every AI model evaluated outperforming human instructors on average.
The researchers cautioned that the study did not determine whether AI-generated answers matched individual teaching preferences and suggested some responses may have been viewed as broadly acceptable rather than tailored to a specific professor’s approach.
The findings come as courts, law firms and universities increasingly adopt AI tools, although concerns remain after incidents such as a recent filing by Sullivan & Cromwell that contained AI-generated fake citations submitted to a US bankruptcy court.