– how does a plagiarism software (like Turnitin) work?

148 viewsOtherTechnology

I’m talking about softwares like Turnitin which is used mostly for academic purposes. Recently it also has come up with feature of AI detection where it identifies if any part is copied from chatgpt or some other AI chatbot.

How does the platform work? How do they know which sites I’ve copied text from? Specifically, how does it know whether any part of the text copied is from some AI platform?

In: Technology

2 Answers

Anonymous 0 Comments

There are two distinct processes at work here. To search for unoriginal text that is found somewhere else, it works like any internet search engine: process vast quantities of text into an index. When you want to test a document, break it into chunks and search for them in your index. There are lots of optimization steps involved, but that’s the general idea.

This lets you specify exactly which text was copied and link to the original document. It’s strong evidence of a match.

Then there’s the “this seems to be generated by an AI” process. That is more mysterious: a machine learning classifier is trained on volumes of text that were generated by AIs and volumes of text that were generated by humans. But that only gets you a black box classifier that assigns a probability like “that appears 82% likely to be AI generated.”

This cannot link you to a copy of the original text, because there isn’t one. And the company’s ability to inspect *why* it arrived at that answer is actually very poor. They just have a score.

A lot of universities are choosing not to use the machine learning-based checker just for that reason: it’s not proof of wrong-doing, it’s a suggestion.

Anonymous 0 Comments

>Specifically, how does it know whether any part of the text copied is from some AI platform?

It guesses. The exact details on how it makes the guess are secret. It is very often wrong.

>How do they know which sites I’ve copied text from?

They search the internet. It’s no different from your teacher seeing a suspicious phrase and copy/pasting it into Google.