phishing-dataset

Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/

https://github.com/gregavrbancic/phishing-dataset

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.8%) to scientific vocabulary

Keywords

dataset machine-learning phishing phishing-websites-detection

Keywords from Contributors

interactive evolutionary-algorithms swarm-intelligence metaheuristics optimization-algorithms nature-inspired-algorithms mesh interpretability sequences generic
Last synced: 7 months ago · JSON representation ·

Repository

Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/

Basic Info
Statistics
  • Stars: 65
  • Watchers: 3
  • Forks: 21
  • Open Issues: 4
  • Releases: 0
Topics
dataset machine-learning phishing phishing-websites-detection
Created almost 7 years ago · Last pushed about 3 years ago
Metadata Files
Readme Citation

README.md

Datasets for Phishing Websites Detection

In this repository the two variants of the phishing dataset are presented.

Web application

To preview the dataset interactively and/or tailor it to your needs, please visit a dedicated web application.

dataset_full.csv

Short description of the full variant dataset: - Total number of instances: 88,647 - Number of legitimate website instances (labeled as 0): 58,000 - Number of phishing website instances (labeled as 1): 30,647 - Total number of features: 111 (without target)

dataset_small.csv

Short description of the small variant dataset: - Total number of instances: 58,645 - Number of legitimate website instances (labeled as 0): 27,998 - Number of phishing website instances (labeled as 1): 30,647 - Total number of features: 111 (without target)

Extracted Features

| Feature | Description | |----------------------------|----------------------------------------------------| | qtydoturl | count (.) in URL | | qtyhyphenurl | count (-) in URL | | qtyunderlineurl | count () in URL | | qtyslashurl | count (/) in URL | | qtyquestionmarkurl | count (?) in URL | | qtyequalurl | count (=) in URL | | qtyaturl | count (@) in URL | | qtyandurl | count (&) in URL | | qtyexclamationurl | count (!) in URL | | qtyspaceurl | count ( ) in URL | | qtytildeurl | count (~) in URL | | qtycommaurl | count (,) in URL | | qtyplusurl | count (+) in URL | | qtyasteriskurl | count (*) in URL | | qtyhashtagurl | count (#) in URL | | qtydollarurl | count ($) in URL | | qtypercenturl | count (%) in URL | | qtytldurl | top-level-domain length | | lengthurl | URL length | | qtydotdomain | count (.) in domain | | qtyhyphendomain | count (-) in domain | | qtyunderlinedomain | count () in domain | | qtyslashdomain | count (/) in domain | | qtyquestionmarkdomain | count (?) in domain | | qtyequaldomain | count (=) in domain | | qtyatdomain | count (@) in domain | | qtyanddomain | count (&) in domain | | qtyexclamationdomain | count (!) in domain | | qtyspacedomain | count ( ) in domain | | qtytildedomain | count (~) in domain | | qtycommadomain | count (,) in domain | | qtyplusdomain | count (+) in domain | | qtyasteriskdomain | count (*) in domain | | qtyhashtagdomain | count (#) in domain | | qtydollardomain | count ($) in domain | | qtypercentdomain | count (%) in domain | | qtyvowelsdomain | count vowels in domain | | domainlength | domain length | | domaininip | URL domain in IP address format | | serverclientdomain | domain contains the keywords "server" or "client" | | qtydotdirectory | count (.) in directory | | qtyhyphendirectory | count (-) in directory | | qtyunderlinedirectory | count () in directory | | qtyslashdirectory | count (/) in directory | | qtyquestionmarkdirectory | count (?) in directory | | qtyequaldirectory | count (=) in directory | | qtyatdirectory | count (@) in directory | | qtyanddirectory | count (&) in directory | | qtyexclamationdirectory | count (!) in directory | | qtyspacedirectory | count ( ) in directory | | qtytildedirectory | count (~) in directory | | qtycommadirectory | count (,) in directory | | qtyplusdirectory | count (+) in directory | | qtyasteriskdirectory | count (*) in directory | | qtyhashtagdirectory | count (#) in directory | | qtydollardirectory | count ($) in directory | | qtypercentdirectory | count (%) in directory | | directorylength | directory length | | qtydotfile | count (.) in file | | qtyhyphenfile | count (-) in file | | qtyunderlinefile | count () in file | | qtyslashfile | count (/) in file | | qtyquestionmarkfile | count (?) in file | | qtyequalfile | count (=) in file | | qtyatfile | count (@) in file | | qtyandfile | count (&) in file | | qtyexclamationfile | count (!) in file | | qtyspacefile | count ( ) in file | | qtytildefile | count (~) in file | | qtycommafile | count (,) in file | | qtyplusfile | count (+) in file | | qtyasteriskfile | count (*) in file | | qtyhashtagfile | count (#) in file | | qtydollarfile | count ($) in file | | qtypercentfile | count (%) in file | | filelength | file length | | qtydotparams | count (.) in parameters | | qtyhyphenparams | count (-) in parameters | | qtyunderlineparams | count () in parameters | | qtyslashparams | count (/) in parameters | | qtyquestionmarkparams | count (?) in parameters | | qtyequalparams | count (=) in parameters | | qtyatparams | count (@) in parameters | | qtyandparams | count (&) in parameters | | qtyexclamationparams | count (!) in parameters | | qtyspaceparams | count ( ) in parameters | | qtytildeparams | count (~) in parameters | | qtycommaparams | count (,) in parameters | | qtyplusparams | count (+) in parameters | | qtyasteriskparams | count (*) in parameters | | qtyhashtagparams | count (#) in parameters | | qtydollarparams | count ($) in parameters | | qtypercentparams | count (%) in parameters | | paramslength | parameters length | | tldpresentparams | TLD presence in arguments | | qtyparams | number of parameters | | emailinurl | email present in URL | | timeresponse | search time (response) domain (lookup) | | domainspf | domain has SPF | | asnip | AS Number (or ASN) | | timedomainactivation | time (in days) of domain activation | | timedomainexpiration | time (in days) of domain expiration | | qtyipresolved | number of resolved IPs | | qtynameservers | number of resolved name servers (NameServers - NS) | | qtymxservers | number of MX Servers | | ttlhostname | time-to-live (TTL) value associated with hostname | | tlssslcertificate | valid TLS / SSL Certificate | | qtyredirects | number of redirects | | urlgoogleindex | check if URL is indexed on Google | | domaingoogleindex | check if domain is indexed on Google | | urlshortened | check if URL is shortened | | phishing | is phishing website |

Cite this dataset

G. Vrbančič, I. Jr. Fister, V. Podgorelec. Datasets for Phishing Websites Detection. Data in Brief, Vol. 33, 2020, DOI: 10.1016/j.dib.2020.106438

Owner

  • Name: Grega Vrbančič
  • Login: GregaVrbancic
  • Kind: user
  • Location: Maribor, Slovenia
  • Company: University of Maribor, Faculty of Electrical Engineering and Computer Science

Assistant professor at University of Maribor, Faculty of Electrical Engineering and Computer Science

Citation (CITATION.cff)

# YAML 1.2
---
authors: 
  -
    family-names: "Vrbančič"
    given-names: Grega
  -
    family-names: "Fister Jr."
    given-names: Iztok
  -
    family-names: "Podgorelec"
    given-names: Vili
cff-version: "1.1.0"
date-released: 2020
doi: "10.1016/j.dib.2020.106438"
license: MIT
title: "Datasets for Phishing Websites Detection"
...

GitHub Events

Total
  • Watch event: 12
  • Fork event: 3
Last Year
  • Watch event: 12
  • Fork event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 81
  • Total Committers: 3
  • Avg Commits per committer: 27.0
  • Development Distribution Score (DDS): 0.42
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Grega Vrbančič g****c@g****m 47
dependabot[bot] 4****] 32
Iztok Fister Jr i****k@i****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 2
  • Total pull requests: 98
  • Average time to close issues: 3 days
  • Average time to close pull requests: 14 days
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.81
  • Merged pull requests: 42
  • Bot issues: 0
  • Bot pull requests: 88
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Adrimeov (1)
Pull Request Authors
  • dependabot[bot] (88)
  • GregaVrbancic (8)
  • firefly-cpp (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (88) hacktoberfest-accepted (1)

Dependencies

web-app/package.json npm
  • @rollup/plugin-commonjs ^22.0.0 development
  • @rollup/plugin-node-resolve ^14.0.1 development
  • @rollup/plugin-url ^7.0.0 development
  • gh-pages ^4.0.0 development
  • rollup 2.79.0 development
  • rollup-plugin-livereload ^2.0.0 development
  • rollup-plugin-svelte ^7.0.0 development
  • rollup-plugin-terser ^7.0.0 development
  • svelte ^3.38.3 development
  • svelte-feather-icons ^4.0.0 development
  • bulma ^0.9.1
  • fusioncharts ^3.15.3
  • node-sass ^7.0.1
  • papaparse ^5.3.0
  • postcss ^8.3.5
  • rollup-plugin-css-only ^3.1.0
  • rollup-plugin-postcss ^4.0.0
  • sirv-cli ^2.0.1
  • svelte-content-loader ^1.1.3
  • svelte-data-grid ^3.0.0
  • svelte-fusioncharts ^1.0.0
  • svelte-preprocess ^4.5.1
  • svelte-select ^4.2.6
  • svelte-simple-modal ^1.0.0
.github/workflows/deploy.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v2.1.2 composite
  • peaceiris/actions-gh-pages v3 composite
web-app/pnpm-lock.yaml npm
  • 489 dependencies