shiny-decision-trees

Shiny app for creating spam filter decision trees

https://github.com/foggalong/shiny-decision-trees

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

decision-trees machine-learning r shiny spam-filtering
Last synced: 9 months ago · JSON representation ·

Repository

Shiny app for creating spam filter decision trees

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
decision-trees machine-learning r shiny spam-filtering
Created almost 6 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Decision Tree Shiny App

Isabella Deutsch and myself gave a mini lecture and tutorial on machine learning as part of the Sutton Trust's 2019 and 2020 summer schools. One question on our problem sheet asked students to explore building a decision tree for spam filtering using this Shiny App. The app is currently deployed on ShinyApps.io.

Background

The webapp is written in the R programming language using a toolkit called Shiny to create the interface.

R itself is a reasonably mature language (26 years old at time of writing) well favoured by statisticians across academia and industry for the depth of the tools it provides [1][2]. For example, a function called rpart does a considerably portion of the heavy lifting here in calculating the decision tree. If you do a math degree at University you will no doubt come across R in your statistics courses.

Shiny by comparison is relatively new; development started about 8 years ago but it's really come into its own over the last couple of years. Creating intuitive, interactive webapps is normally a tricky business and it's an area of development in its own right. Part of what makes Shiny so appealing is that it strips UI back to a basic selection of customisable widgets, all handled in R. This makes it incredibly easy for statisticians, most of whom have no background in UI or web development, to create and deploy nice looking applications for others to explore their work.

Data Set

The app uses the Spambase Data Set from the UCI Machine Learning Repository [3]. A partitioned version of this dataset is included with this repository, but in summary each row is a different and the columns are as follows:

  • 1 to 48 are "word" (i.e. a sequence of on-whitespace characters) frequencies as percentages of the email body text,
  • 49 to 54 are character frequencies as percentages of the email body text,
  • 55 is the average length of an uninterrupted sequences of capital letters,
  • 56 is the length of the longest uninterrupted sequence of capital letters,
  • 57 is the total number number of capital letters in the email,
  • 58 is a bool (i.e. true/false) variable as to whether the email is truly spam.

There are 4,601 emails of which 1,813 are spam and 2,788 are not. As mentioned we've randomly partitioned this into two files, one for training (70% of the entries) and one for testing (the remaining 30% of entries).

License

The app.R file is released under an MIT License and provided without warranty.

References

  • [1] Fox, John & Andersen, Robert (2005), "Using the R Statistical Computing Environment to Teach Social Statistics Courses". Department of Sociology, McMaster University.
  • [2] Vance, Ashlee (2009). "Data Analysts Captivated by R's Power". New York Times.
  • [3]: Dua, Dheeru and Graff, Casey (2017), UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences.

Owner

  • Name: Josh Fogg
  • Login: Foggalong
  • Kind: user
  • Location: Edinburgh, UK
  • Company: University of Edinburgh

Mathematician, Writer, Gamer, and Techie. Not necessarily in that order.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
title: "Decision Tree Shiny App"
date-released: 2020-06-17
abstract: "A Shiny App built with R to demonstrate the building of decision trees for email spam filtering systems."
authors: 
  - family-names: Fogg
    given-names: Josh
    affiliation: "University of Edinburgh"
  - family-names: Deutsch
    given-names: Isabella
    affiliation: "University of Edinburgh"
version: 1.0.0
license: MIT
repository-code: "https://github.com/Foggalong/shiny-decision-trees"
keywords: 
  - "machine-learning"
  - r
  - shiny
  - "decision-trees"
  - "spam-filtering"
...

GitHub Events

Total
  • Delete event: 1
  • Create event: 1
Last Year
  • Delete event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels