Our Measures Against Plagiarism

2 min read

What sets the passionate programmers apart from the ones who resort to plagiarism in coding competitions? We believe it is the love for programming. CodeChef is a product of this love, and the collective sentiment of the community has made the platform what it is today. 

Plagiarism is an unfortunate by-product of almost any online contest, and ours has been no exception. To plagiarise in a platform that is built to strengthen one’s skill in programming is just counterproductive in the long run. 

We’ve had plagiarisers since almost the beginning, but we started taking measures against them only after a few years. But over the last many months, we have had a significantly larger number of cases of plagiarism, partly because the number of participants has increased significantly and partly because of our inaction at a particular penalisation step. 

A typical short contest these days gets about 50,000 – 90,000 submissions while a long challenge gets about 5-6 lakhs. Not an easy number to weed out plagiarism from, but we do try our best. For the understanding of the community, these are the steps that we follow –

  • All submissions are filtered through MOSS (a plagiarism checker tool), that throws up pairs of submissions with similar code percentage.
  • Then, we manually look at those to identify the right parameters to find similar-looking solutions.
  • Once identified, we disqualify those submissions, and as a result, the corresponding users go down the contest rank list.
  • However, we do allow them to make an appeal for their case, informing them of the same via email. 
  • If they reply and we are convinced that they were wrongly accused, their disqualification is revoked. The onus lies with them to prove their innocence.
  • If found guilty, we drop their ratings by 275 points.

The whole process takes a tremendous amount of our time, and we have tried to be as fair as we can be. The above process requires manual scrutiny. We try and make sure that we do not allow any false positives to go through, and therefore, facilitate an extensive appeal process. 

And hence, while most of the above steps are being implemented, we have not been able to execute the last step from the month of August 2018. We understand that this has encouraged many more participants to take the system lightly. However, we want to assure you that it was not that we have been casual about this. Being a small team, we just could not prioritise this over so many other things that we have been doing. 

Given that the situation has become so dire, we want to sincerely apologise for the same, and make sure that this is not continued. Also, going forward, we will be taking up a proactive approach against those users who are caught plagiarising. 

We want to inform the entire community that we are currently in the process of dropping ratings for all users who have been found to have plagiarized in any contest between [AUG18] and [FEB20]. The penalisation for the contests after FEB20 will be done in the coming months. As mentioned above, the whole process takes a lot of time, and we will also try and come up with ideas where we can take help of the community in some way to speed up the process. 

For future contests, we will continue to punish users engaging in plagiarism with a 275 point rating drop. Along with this, we will also be banning users who are caught plagiarising three times, with the count starting from AUG20.

We want to reiterate that while taking reference from study materials and solutions to solve problems is acceptable, ripping off someone else’s solutions isn’t in the best spirits of the competition. 

If you need to report any cases of plagiarism, you can always do so here.

We hope that these measures will help address a long pending issue at hand. 

Happy Coding.
The Chef

Going For Gold: Meet The IOI 2020 Singapore Finalists…

The 32nd International Olympiad in Informatics is upon us, and now we know the names of the young, Indian coders who made it to...
riddhi_225
2 min read

A Learning Program That You Always Wanted!

tl;dr: CodeChef is launching live courses on Competitive Programming, taught by the most passionate competitive programmers, and helped by a great team of teaching...
anup
3 min read

We Expanded Our Kitchen With A New YouTube Channel!

Since the beginning of time, we have aimed to provide various learning opportunities for the community. For the last 11 years, we have done...
debanjan321
2 min read

12 Replies to “Our Measures Against Plagiarism”

  1. I am just curious about the number of cases where MOSS detects a similarity and manual checking happens, and what is the percentage of cases which are false positives. Because if it is less, the second step talks about taking a manual look and it seems like maybe you could actually just mail to the users when there is a sufficiently high similarity without manual checking and let the users appeal to defend. Also, if it is possible you could publish the similarity scores publicly and reward users with x Laddus per plagiarism detection which may further help reduce some manual checks and speed up the process.

    1. Hi, Rathi_22.
      The number of cases for MOSS that we receive is in thousands. There are several cases that are false positives due to various reasons such as similar (or same) code template, codes taken from an online source (ex geeksforgeeks, cp-algorithms) that were published before the contest started, short codes, similar way of writing codes, etc. This is why manual checking is unavoidable. Even if we publish the similarity scores, there would simply be too many requests. This would in a way slow up the process instead of speeding it since then we’d have to check each request individually. This can even be more than the number of MOSS cases and would involve many complications.

      1. When I mentioned publishing similarity scores, I was talking about a system similar to CSGO Overwatch (came across this idea in ‘Discuss’). Basically, you set a threshold of experience (say 2years on CC and 1800+ rating, or some other criteria) and call some users as ‘trusted’. Now for each plagiarism case, suppose 7 trusted users give their verdict (Y/N on whether or not plagiarism has happened) and you conclude the case based on the majority verdict. Trusted users can also be weighted based on the %cases where their verdict has been deemed as the majority verdict. To begin with, the current team which handles this can establish themselves as trusted users and incentivize more people to join by rewarding users with some Laddus every time the verdict they give is deemed correct.

        1. The idea here is that this will enable you guys to outsource a part of the work to users who volunteer to fight against plagiarism, without your team personally having to go through all the cases.

  2. Really, it takes 2 years to drop ratings?
    lol
    so u are saying that if someone cheats today , his rating will not be dropped for years???
    then what action are u taking against cheaitng??
    looolchef

  3. Every user caught in plagiarising was informed of the punishment they’d receive beforehand, and they already knew the consequences and the rules. Also, we’re planning to drop the ratings in a way that’d be quite fair for the users, especially the ones who have cheated in a single contest, and after that not cheated for many contests. It Ideally doesn’t take this long to drop ratings, and that’s the reason why we are apologising.

    As for your question. “If someone cheats today, his rating will not be dropped for years???”, we didn’t say that, and that’s what the blog talks about.

    And for the question, “Then what action are u taking against cheating”, well, the whole blog is about our measures against cheaters caught in plagiarism.

     

  4. SOLUTION to this : After the contest ends, give the codes to audience to survey whether they are same or not. If majority of people who checked the code found it same reduce their ranks etc. and if someone fills an appeal then you guys can check the code manually or have some trusted coders to check it.

    1. The Moss % match should not be mentioned as this would lead to plagiarizers finding the way out to check for the code percentage from the source that they are copying.

Leave a Reply