phishing-dataset
Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.8%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/
Basic Info
- Host: GitHub
- Owner: GregaVrbancic
- Language: Svelte
- Default Branch: master
- Homepage: https://gregavrbancic.github.io/Phishing-Dataset/
- Size: 19.3 MB
Statistics
- Stars: 65
- Watchers: 3
- Forks: 21
- Open Issues: 4
- Releases: 0
Topics
Metadata Files
README.md
Datasets for Phishing Websites Detection
In this repository the two variants of the phishing dataset are presented.
Web application
To preview the dataset interactively and/or tailor it to your needs, please visit a dedicated web application.
dataset_full.csv
Short description of the full variant dataset: - Total number of instances: 88,647 - Number of legitimate website instances (labeled as 0): 58,000 - Number of phishing website instances (labeled as 1): 30,647 - Total number of features: 111 (without target)
dataset_small.csv
Short description of the small variant dataset: - Total number of instances: 58,645 - Number of legitimate website instances (labeled as 0): 27,998 - Number of phishing website instances (labeled as 1): 30,647 - Total number of features: 111 (without target)
Extracted Features
| Feature | Description | |----------------------------|----------------------------------------------------| | qtydoturl | count (.) in URL | | qtyhyphenurl | count (-) in URL | | qtyunderlineurl | count () in URL | | qtyslashurl | count (/) in URL | | qtyquestionmarkurl | count (?) in URL | | qtyequalurl | count (=) in URL | | qtyaturl | count (@) in URL | | qtyandurl | count (&) in URL | | qtyexclamationurl | count (!) in URL | | qtyspaceurl | count ( ) in URL | | qtytildeurl | count (~) in URL | | qtycommaurl | count (,) in URL | | qtyplusurl | count (+) in URL | | qtyasteriskurl | count (*) in URL | | qtyhashtagurl | count (#) in URL | | qtydollarurl | count ($) in URL | | qtypercenturl | count (%) in URL | | qtytldurl | top-level-domain length | | lengthurl | URL length | | qtydotdomain | count (.) in domain | | qtyhyphendomain | count (-) in domain | | qtyunderlinedomain | count () in domain | | qtyslashdomain | count (/) in domain | | qtyquestionmarkdomain | count (?) in domain | | qtyequaldomain | count (=) in domain | | qtyatdomain | count (@) in domain | | qtyanddomain | count (&) in domain | | qtyexclamationdomain | count (!) in domain | | qtyspacedomain | count ( ) in domain | | qtytildedomain | count (~) in domain | | qtycommadomain | count (,) in domain | | qtyplusdomain | count (+) in domain | | qtyasteriskdomain | count (*) in domain | | qtyhashtagdomain | count (#) in domain | | qtydollardomain | count ($) in domain | | qtypercentdomain | count (%) in domain | | qtyvowelsdomain | count vowels in domain | | domainlength | domain length | | domaininip | URL domain in IP address format | | serverclientdomain | domain contains the keywords "server" or "client" | | qtydotdirectory | count (.) in directory | | qtyhyphendirectory | count (-) in directory | | qtyunderlinedirectory | count () in directory | | qtyslashdirectory | count (/) in directory | | qtyquestionmarkdirectory | count (?) in directory | | qtyequaldirectory | count (=) in directory | | qtyatdirectory | count (@) in directory | | qtyanddirectory | count (&) in directory | | qtyexclamationdirectory | count (!) in directory | | qtyspacedirectory | count ( ) in directory | | qtytildedirectory | count (~) in directory | | qtycommadirectory | count (,) in directory | | qtyplusdirectory | count (+) in directory | | qtyasteriskdirectory | count (*) in directory | | qtyhashtagdirectory | count (#) in directory | | qtydollardirectory | count ($) in directory | | qtypercentdirectory | count (%) in directory | | directorylength | directory length | | qtydotfile | count (.) in file | | qtyhyphenfile | count (-) in file | | qtyunderlinefile | count () in file | | qtyslashfile | count (/) in file | | qtyquestionmarkfile | count (?) in file | | qtyequalfile | count (=) in file | | qtyatfile | count (@) in file | | qtyandfile | count (&) in file | | qtyexclamationfile | count (!) in file | | qtyspacefile | count ( ) in file | | qtytildefile | count (~) in file | | qtycommafile | count (,) in file | | qtyplusfile | count (+) in file | | qtyasteriskfile | count (*) in file | | qtyhashtagfile | count (#) in file | | qtydollarfile | count ($) in file | | qtypercentfile | count (%) in file | | filelength | file length | | qtydotparams | count (.) in parameters | | qtyhyphenparams | count (-) in parameters | | qtyunderlineparams | count () in parameters | | qtyslashparams | count (/) in parameters | | qtyquestionmarkparams | count (?) in parameters | | qtyequalparams | count (=) in parameters | | qtyatparams | count (@) in parameters | | qtyandparams | count (&) in parameters | | qtyexclamationparams | count (!) in parameters | | qtyspaceparams | count ( ) in parameters | | qtytildeparams | count (~) in parameters | | qtycommaparams | count (,) in parameters | | qtyplusparams | count (+) in parameters | | qtyasteriskparams | count (*) in parameters | | qtyhashtagparams | count (#) in parameters | | qtydollarparams | count ($) in parameters | | qtypercentparams | count (%) in parameters | | paramslength | parameters length | | tldpresentparams | TLD presence in arguments | | qtyparams | number of parameters | | emailinurl | email present in URL | | timeresponse | search time (response) domain (lookup) | | domainspf | domain has SPF | | asnip | AS Number (or ASN) | | timedomainactivation | time (in days) of domain activation | | timedomainexpiration | time (in days) of domain expiration | | qtyipresolved | number of resolved IPs | | qtynameservers | number of resolved name servers (NameServers - NS) | | qtymxservers | number of MX Servers | | ttlhostname | time-to-live (TTL) value associated with hostname | | tlssslcertificate | valid TLS / SSL Certificate | | qtyredirects | number of redirects | | urlgoogleindex | check if URL is indexed on Google | | domaingoogleindex | check if domain is indexed on Google | | urlshortened | check if URL is shortened | | phishing | is phishing website |
Cite this dataset
G. Vrbančič, I. Jr. Fister, V. Podgorelec. Datasets for Phishing Websites Detection. Data in Brief, Vol. 33, 2020, DOI: 10.1016/j.dib.2020.106438
Owner
- Name: Grega Vrbančič
- Login: GregaVrbancic
- Kind: user
- Location: Maribor, Slovenia
- Company: University of Maribor, Faculty of Electrical Engineering and Computer Science
- Website: https://grega.xyz
- Twitter: GregaVrbancic
- Repositories: 62
- Profile: https://github.com/GregaVrbancic
Assistant professor at University of Maribor, Faculty of Electrical Engineering and Computer Science
Citation (CITATION.cff)
# YAML 1.2
---
authors:
-
family-names: "Vrbančič"
given-names: Grega
-
family-names: "Fister Jr."
given-names: Iztok
-
family-names: "Podgorelec"
given-names: Vili
cff-version: "1.1.0"
date-released: 2020
doi: "10.1016/j.dib.2020.106438"
license: MIT
title: "Datasets for Phishing Websites Detection"
...
GitHub Events
Total
- Watch event: 12
- Fork event: 3
Last Year
- Watch event: 12
- Fork event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Grega Vrbančič | g****c@g****m | 47 |
| dependabot[bot] | 4****] | 32 |
| Iztok Fister Jr | i****k@i****u | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 2
- Total pull requests: 98
- Average time to close issues: 3 days
- Average time to close pull requests: 14 days
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.81
- Merged pull requests: 42
- Bot issues: 0
- Bot pull requests: 88
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Adrimeov (1)
Pull Request Authors
- dependabot[bot] (88)
- GregaVrbancic (8)
- firefly-cpp (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- @rollup/plugin-commonjs ^22.0.0 development
- @rollup/plugin-node-resolve ^14.0.1 development
- @rollup/plugin-url ^7.0.0 development
- gh-pages ^4.0.0 development
- rollup 2.79.0 development
- rollup-plugin-livereload ^2.0.0 development
- rollup-plugin-svelte ^7.0.0 development
- rollup-plugin-terser ^7.0.0 development
- svelte ^3.38.3 development
- svelte-feather-icons ^4.0.0 development
- bulma ^0.9.1
- fusioncharts ^3.15.3
- node-sass ^7.0.1
- papaparse ^5.3.0
- postcss ^8.3.5
- rollup-plugin-css-only ^3.1.0
- rollup-plugin-postcss ^4.0.0
- sirv-cli ^2.0.1
- svelte-content-loader ^1.1.3
- svelte-data-grid ^3.0.0
- svelte-fusioncharts ^1.0.0
- svelte-preprocess ^4.5.1
- svelte-select ^4.2.6
- svelte-simple-modal ^1.0.0
- actions/checkout v2 composite
- actions/setup-node v2.1.2 composite
- peaceiris/actions-gh-pages v3 composite
- 489 dependencies