It all started in 2018 with research…
As part of one of my personal projects, on 12/02/2018 I conducted research on almost all existing by the moment online tools for plagiarism checking with free access or using a trial account.
I randomly took one of the texts and it showed in a google search and led to the site where the original is posted. It served as a benchmark for my research. Further in this text I replaced only 7 Latin characters (i, a, e, o, p, c, x) with similar ones by style (і, а, е, о, р, с, х) in the Ukrainian keyboard layout. This text served as a test text.
After that, I constantly checked 11 online tools for detecting plagiarism with benchmark and test texts.
They all found plagiarism in the benchmark text. None of them found plagiarism in the test text, and only one tool reported that there are suspiciously many non-Latin characters.
Vulnerability status for tools in my research from 12/02/2018
…and continued in 2021
On 07/27/2021 after reading the article about the bug bounty program from Grammarly, I re-checked the plagiarism detection services. The check showed that the problem has not been fixed for almost 3 years and still exists for most of the services on the list.
Also, new players have appeared in this market. Many of them have the issue I found, but some have either already fixed it or did not have it from the very beginning.
Vulnerability status for tools in my research from 07/27/2021 to 08/18/2021
You can read my detailed research by the link https://docs.google.com/document/d/1gckTyg_5tGg3lyMjcl0Nta6P1P9-zSYKDqAfZugl5Mc/edit?usp=sharing
What the hack?
Replacing the characters i, a, e, o, p, c, x in the text with similar ones in the Ukrainian keyboard layout leads to the fact that plagiarism detectors (Grammarly plagiarism checker and others) skip such text, mark it as unique (without any plagiarism) and do not even signal that the characters have been replaced with Cyrillic ones. The text with replaced characters looks natural and has original readability.
Steps To Reproduce (with an example for Grammarly plagiarism checker):
- Take a sample text that has been posted on the Internet for a long time (“benchmark text”) and easily shows the source url by checking with Google.
- In “benchmark text” replace the following symbols with another ones according the table to get a “test text” (all character codes are taken from the table Windows-1251 character set table https://en.wikipedia.org/wiki/Windows-1251): a (0061) → а (0430), c (0063) → с (0441), e (0065) → е (0435), i (0069) → і (0456), o (006F) → о (043E), p (0070) → р (0440), x (0078) → х (0445)
- Go to the url https://www.grammarly.com/plagiarism-checker
- Insert “benchmark text” in the text edit box and press “Scan for plagiarism” button
- You will receive a report stating that significant plagiarism was found
- Go to the url https://www.grammarly.com/plagiarism-checker again
- Insert “test text” in the text edit box and press “Scan for plagiarism” button
- You will receive a report stating that no plagiarism was found.
What about impact?
My guess is that in this case business end users of the product (companies that paid to check for plagiarism) could be impacted by this vulnerability. If some well-known company collects reviews from users on its website, provides rewards for the reviews and uses this vulnerable plagiarism checker system, then an attacker can easily bypass this plagiarism check using the specified method. Through a set of fake accounts, attacker can upload reviews simply by copying them from other sites, or even from the same one, without spending resources on content uniquelization, and the plagiarism checker system will not signal this and say that this is 100% unique content. Thus, a company that using such a vulnerable plagiarism checker has the following negative impacts:
- massive placement of plagiarism on own site
- rewarding costs for fake reviews
- visual duplication of the reviews (the site may contain both original reviews and reviews with replaced symbols)
- reputational losses, if it is revealed by those who write original reviews and report it to the press, which is greedy for sensations about well-known companies
- SEO drawdown, since such content is pessimized by Google and the entire site also receives a penalty
Another example is freelance exchanges or services that use such vulnerable plagiarism checking systems where customers can order the writing of an article, advertising post, scientific work, essay etc. In this case, the attacker
- simply copies someone else’s work or compose source text from several works
- makes a replacement by described method
- sends it to the system that checks the work through a vulnerable plagiarism checking system
- the system says that everything is clean (!)
- after passing the plagiarism checking work will be sent to the customer
- customer pays money and receives 100% plagiarism
- this plagiarism then posted on the website
- customer receives all the negative impact I described above
Anyway, the described method leads to a violation of the functionality of plagiarism checking and the returned results of the plagiarism check, and the text with replaced characters visually indistinguishable from the original (if website uses fonts where described set of the Cyrillic characters (a,o,i,e,x,c,p) have the same style as Latin).
Examples of vulnerability
https://www.grammarly.com/plagiarism-checker
Original text
Text with replaced chars
https://copyleaks.com/
Original text
Text with replaced chars
Tools that still have the vulnerability to characters replacement
- https://www.grammarly.com/plagiarism-checker
- https://copyleaks.com/
- https://www.plagium.com/en/plagiarismchecker
- https://www.quetext.com/
- https://www.duplichecker.com/
- https://www.plagiarismchecker.co/
- https://plagiarismsearch.com/
- https://plagiarismcheckerx.com/
- https://writer.com/plagiarism-checker/
- https://myassignmenthelp.com/plagiarism-checker.php
- https://1text.com/plagiarismchecker
Tools that don’t have the vulnerability to characters replacement (or already fixed it)
- https://www.plagscan.com/
- https://unicheck.com/
- https://pltext.com/
- https://onlineplagiarismchecker.net/
- https://plagiarismcheck.org/
How to fix it?
I’ve found 3 good examples on how to notify users about text with replaced characters. These services are invulnerable to my method and I think that described approaches below are good practice of how to fix the issue I found in other services that are still vulnerable.
1. Unicheck
The service shows both 100% plagiarism and symbols that have been replaced according to the method I described.
2. Plagscan
The service does not show plagiarism, but it shows a warning about the mixed alphabet from different systems, as well as characters that have been replaced by the method I described.
3. Plagiarismcheck
The service notified users about text with replaced characters. Also their report contains only text with cutted off non-Latin characters.
Grammarly plagiarism checker case on the HackerOne platform
As I wrote above, on 07/27/21, after reading the article about the bug bounty program from Grammarly , I remembered my research from 2018, reproduced it and sent the case on HackerOne platform https://hackerone.com/grammarly/ . Almost a month later, I received an official response from the representatives of the Grammarly team.
The responsible team decided not to track this report as a security or major product issue. According to the team’s response, the described behaviour is a known limitation.
For this vulnerability in business logic I did not receive any reward from Grammarly, because in their opinion
The described behavior is a functionality limitation that does not affect the security and privacy of our product
Ok, I didn’t dispute their decision, but simply requested permission to publish my report and have received official permission:
You can share the details of your research publicly since we’ve decided not to take immediate action on this report.
On October 28, the representative of the Grammarly team decided to disclose the information and now I can, with all my conscience, list my research in detail in public.
Here is the link to my report on HackerOne https://hackerone.com/reports/1282282
Conclusion
I still think that my report is a pretty interesting case and will be useful for many IT guys and companies that are playing in the plagiarism checking market.
The influence I have described is at the moment just my assumption and I just want to warn such sites that could be potential victims of attackers who used the method I described.
I will be glad to see a fix for this issue on the sites I mentioned above and will gladly update their status in my article as soon as they take into account the approach I described for bypassing the plagiarism check.