https://github.com/cptanalatriste/copycat-detector

A Naive-Bayes classifier for detecting plagiarism.

https://github.com/cptanalatriste/copycat-detector

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.9%) to scientific vocabulary

Keywords

amazon-sagemaker naive-bayes-classifier scikit-learn
Last synced: 10 months ago · JSON representation

Repository

A Naive-Bayes classifier for detecting plagiarism.

Basic Info
  • Host: GitHub
  • Owner: cptanalatriste
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 593 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 13
  • Releases: 0
Topics
amazon-sagemaker naive-bayes-classifier scikit-learn
Created about 6 years ago · Last pushed over 3 years ago
Metadata Files
Readme

README.md

copycat-detector

love_island_new_roster

A Naive-Bayes classifier for detecting plagiarism, trained over a dataset of short answers developed by Clough and Stevenson.

Getting started

To train the classifier, be sure to do the following first:

  1. Clone this repository.
  2. Download a modified version of the dataset.
  3. Place the dataset files in your cloned copy of the repository.
  4. Make sure you have installed all the Python packages defined in requirements.txt.

Instructions

The feature engineering steps are defined in the 2_Plagiarism_Feature_Engineering.ipynb jupyter notebook. Most of the code is contained in the copycat_detector module.

For training, notebook 3_Training_a_Model.ipynb was run on an Amazon SageMaker instance.

Owner

  • Name: Carlos Gavidia-Calderon
  • Login: cptanalatriste
  • Kind: user
  • Location: London, United Kingdom
  • Company: @alan-turing-institute

Systems engineer by training, software developer by trade. Research Software Engineer at @alan-turing-institute .

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 26
  • Average time to close issues: N/A
  • Average time to close pull requests: 5 months
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.46
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 26
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (24)
Top Labels
Issue Labels
Pull Request Labels
dependencies (24)

Dependencies

requirements.txt pypi
  • Jinja2 ==2.11.1
  • MarkupSafe ==1.1.1
  • Pillow ==7.0.0
  • Pygments ==2.6.1
  • QtPy ==1.9.0
  • Send2Trash ==1.5.0
  • appnope ==0.1.0
  • attrs ==19.3.0
  • backcall ==0.1.0
  • beautifulsoup4 ==4.9.0
  • bleach ==3.1.0
  • boto3 ==1.12.47
  • botocore ==1.15.47
  • certifi ==2020.4.5.1
  • chardet ==3.0.4
  • cycler ==0.10.0
  • decorator ==4.4.2
  • defusedxml ==0.6.0
  • docutils ==0.15.2
  • entrypoints ==0.3
  • future ==0.18.2
  • idna ==2.9
  • importlib-metadata ==1.5.0
  • ipykernel ==5.1.4
  • ipython ==7.13.0
  • ipython-genutils ==0.2.0
  • ipywidgets ==7.5.1
  • jedi ==0.16.0
  • jmespath ==0.9.5
  • joblib ==0.14.1
  • jsonschema ==3.2.0
  • jupyter ==1.0.0
  • jupyter-client ==6.1.2
  • jupyter-console ==6.1.0
  • jupyter-core ==4.6.3
  • kiwisolver ==1.0.1
  • matplotlib ==3.1.3
  • mistune ==0.8.4
  • mkl-fft ==1.0.15
  • mkl-service ==2.3.0
  • nbconvert ==5.6.1
  • nbformat ==5.0.4
  • nltk ==3.4.5
  • notebook ==6.0.3
  • numpy ==1.18.1
  • olefile ==0.46
  • packaging ==20.3
  • pandas ==1.0.3
  • pandocfilters ==1.4.2
  • parso ==0.6.2
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prometheus-client ==0.7.1
  • prompt-toolkit ==3.0.4
  • protobuf ==3.11.3
  • protobuf3-to-dict ==0.1.5
  • ptyprocess ==0.6.0
  • pyparsing ==2.4.6
  • pyrsistent ==0.16.0
  • python-dateutil ==2.8.1
  • pytz ==2019.3
  • pyzmq ==18.1.1
  • qtconsole ==4.7.2
  • requests ==2.23.0
  • requests-toolbelt ==0.9.1
  • s3transfer ==0.3.3
  • sagemaker ==1.56.1
  • scikit-learn ==0.22.1
  • scipy ==1.4.1
  • six ==1.14.0
  • smdebug-rulesconfig ==0.1.2
  • soupsieve ==2.0
  • terminado ==0.8.3
  • testpath ==0.4.4
  • torch ==1.4.0
  • torchvision ==0.5.0
  • tornado ==6.0.4
  • tqdm ==4.45.0
  • traitlets ==4.3.3
  • udacity-pa ==0.2.9
  • urllib3 ==1.25.8
  • wcwidth ==0.1.9
  • webencodings ==0.5.1
  • widgetsnbextension ==3.5.1
  • zipp ==2.2.0