How do anti-plagiarism detectors and programs work to find exact wordings from other sources and determine that they aren’t just coincidence?

989 views

How do anti-plagiarism detectors and programs work to find exact wordings from other sources and determine that they aren’t just coincidence?

In: Technology

9 Answers

Anonymous 0 Comments

They do detect coincidences. That effect can be educed by looking for larger word blocks. The code can exclude the most common conflicts among the background documents, and consider them to be coincidences. These tools still require some thought to interpret.

Anonymous 0 Comments

they don’t. these detectors never take into account that there are only so many ways to phrase the same thought without looking like you perused a thesaurus for oblique phraseology

Anonymous 0 Comments

It is VERY rare for a single sentence to be repeated by random chance. The odds of an entire paragraph or page being copied are zero.

To give you some context. I’ve written several 100,000 word books. Often I’ll think to myself “Oh, I need to edit this section or that section” and I’ll use the search feature. If I can remember one or two words from the section I want to find then there are never more than 3 hits and usually its just 1. If you want you can test it out with an online bible. If you remember any few word you can usually immediately find the bible verse you were thinking of.

plagiarism software shouldn’t (and almost never is) used to try and find one sentence in a multi-thousand word essay. It looks for patterns of behavior where big blocks of text have been taken from a handful of sources. And that just never happens honestly.

Anonymous 0 Comments

They don’t and they commonly match things that may have one or two word differences to stop students from copying then changing a few words. They also don’t recognize if you cite the source so if you are writing a quote then it will flag that as plagiarism. That’s why the teachers or TAs have to double check the report the system spits out to check for actual plagiarism. I straight up had a teacher said they only looked at reports that had above a certain plagiarism percentage.

Anonymous 0 Comments

It has been 15 years, but my wife took a few sentences that I typed from the top of my head on a technology subject, completely original sentences and the plagiarism detector flagged those sentences alone as not fitting into the rest of her document. I believe the system compiles the wording a person uses on average and the tone and sentence structure was different enough that they were flagged. She wasn’t penalized since there was no plagiarism.

Anonymous 0 Comments

The system is able to detect commonly quoted authors, definitions, etc. People will literally rip off entire sections from source materials and put it in word for word in their papers. Put quotes around it and cite your work and you are totally fine. If you are writing about abortion, many sentences will be very similar across the spectrum because the subject has been talked to death. Similar sentence is OK but exactly the same is very suspect. It doesn’t report a ‘pass/fail’, it gives the professor a percentage of similarity and it is up to them to decide whether you plagiarized. I used turnitin for some philosophy classes I took a few years ago, we were talking about very well trodden subjects and my arguments were not particularly unique or insightful and my papers were ~5% similar to other papers. When I have talked to professors who have nailed people for plagiarism they have told me whole paragraphs were wholesale copied. They can tell when you were just being to close to the source material. The more you read another author the more you will adopt their way of talking/writing and it is a skill to take their ideas and describe them from your perspective. Depending on who is teaching you, while you might not get nailed for plagiarism but you may get a bad grade on the paper. When you are talking about the difference between a graduate school writer and an undergrad writer, this skill is one of the defining factors between those two cohorts of students.

Anonymous 0 Comments

One small bit of terminology that will help. String is the term linguists use for a sequence of words.

The shorter a string is, the less likely it is to be unique. Think about a two word string. Virtually every noun in English is going to exist in a two word string “the NOUN”. A plagiarism checker isn’t going to even bother trying to spot those things, because there will be so many similarities with so many sources that it would be pointless. But the longer a string is, the more likely it is to be unique. Let’s take a completely innocuous sentence: “Reddit is very fun”. This is a 4-string, and if we do a google search for it, we end up with ~5,070 results. Now let’s add just a little bit more to it: “I think Reddit is very fun”. Just adding those two words to the beginning of the sentences drops the results from ~5,070 to 0. It is actually fairly easy to search for similarities in strings of words, and what plagiarism checkers basically do is compare sequences of strings in a document to sources that are available online as well as to other sources that are available in their database of student papers.

A couple things here. I won’t speak for all plagiarism checkers, but Turnitin doesn’t identify something as plagiarism or not. It checks for similarities, and leaves the identification of plagiarism up to the person reading the report. They will often identify direct quotations as being similar, and the person reading the report will recognize based on the context that they are quotations and not plagiarism. Sometimes it will note similarities that are incidental. They are more common with shorter sentences, but can also happen for longer ones if the sentence contains a phrase that has to occur in a particular way or that is about a bit of factual information where there aren’t honestly that many ways of phrasing the information. For instance, a source discussing the Patient Protection and Affordable Care Act is already going to have similarities to a bunch of other sources just by virtue of its name, and sentences containing those kinds of phrases might be more likely to be flagged erroneously.

One thing to note is that plagiarism detectors aren’t perfect. They don’t catch all plagiarism, and just because they detect a similarity, it doesn’t mean that it is plagiarism. However, an identical passage that is two or more substantive sentences long is almost always going to be plagiarism. There are sometimes ways to avoid detection by plagiarism checkers. For instance, students will sometimes substitute synonyms for some of the words or will delete or change some parts of the sentence. But to avoid notice, the number of words that have to be changed is typically so large that the meaning of the passage is altered in a way that may not be apparent to students but often is apparent to professors. Also, importantly, this is still a form of plagiarism.

Anonymous 0 Comments

It’s all statistics. At a certain point, a plagurised work will have far more matching sentences, phrases, and paragraphs than non-plagurised worked. And the number doesn’t even have to be very high. The probability of an entire paragraph matching by pure coincidence is next to zero. The worst offenders will copy far more than one paragraph.

In a 3 page paper, a whole sentence might be a carbon copy and be coincidences. 10 sentences??? Better be 10 cited quotations.

Anonymous 0 Comments

Back when I was a TA we used a method where the student would submit the paper through the plagarism detector. The detector would scan the paper and highlight any sections of text that matched other sections of text it had found on the internet or in other papers.

Determining whether or not something was coincidence was up to us, but I mean it’s pretty obvious when someone just took a whole paragraph from Wikipedia or their page is the same as someone else’s or they took a chunk from some paper they found online.