One small bit of terminology that will help. String is the term linguists use for a sequence of words.
The shorter a string is, the less likely it is to be unique. Think about a two word string. Virtually every noun in English is going to exist in a two word string “the NOUN”. A plagiarism checker isn’t going to even bother trying to spot those things, because there will be so many similarities with so many sources that it would be pointless. But the longer a string is, the more likely it is to be unique. Let’s take a completely innocuous sentence: “Reddit is very fun”. This is a 4-string, and if we do a google search for it, we end up with ~5,070 results. Now let’s add just a little bit more to it: “I think Reddit is very fun”. Just adding those two words to the beginning of the sentences drops the results from ~5,070 to 0. It is actually fairly easy to search for similarities in strings of words, and what plagiarism checkers basically do is compare sequences of strings in a document to sources that are available online as well as to other sources that are available in their database of student papers.
A couple things here. I won’t speak for all plagiarism checkers, but Turnitin doesn’t identify something as plagiarism or not. It checks for similarities, and leaves the identification of plagiarism up to the person reading the report. They will often identify direct quotations as being similar, and the person reading the report will recognize based on the context that they are quotations and not plagiarism. Sometimes it will note similarities that are incidental. They are more common with shorter sentences, but can also happen for longer ones if the sentence contains a phrase that has to occur in a particular way or that is about a bit of factual information where there aren’t honestly that many ways of phrasing the information. For instance, a source discussing the Patient Protection and Affordable Care Act is already going to have similarities to a bunch of other sources just by virtue of its name, and sentences containing those kinds of phrases might be more likely to be flagged erroneously.
One thing to note is that plagiarism detectors aren’t perfect. They don’t catch all plagiarism, and just because they detect a similarity, it doesn’t mean that it is plagiarism. However, an identical passage that is two or more substantive sentences long is almost always going to be plagiarism. There are sometimes ways to avoid detection by plagiarism checkers. For instance, students will sometimes substitute synonyms for some of the words or will delete or change some parts of the sentence. But to avoid notice, the number of words that have to be changed is typically so large that the meaning of the passage is altered in a way that may not be apparent to students but often is apparent to professors. Also, importantly, this is still a form of plagiarism.
Latest Answers