compatible-traf
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.7%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Basic Info
- Host: GitHub
- Owner: GameTaco
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Size: 22.9 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Created over 2 years ago
· Last pushed over 2 years ago
Metadata Files
Readme
Changelog
Contributing
License
Citation
README.md
⚠ This is a testing repo for incompatibility issues when using duckduckgo-search, or presumably any repo using lxml. You can find the original repo here.
Owner
- Name: Matt
- Login: GameTaco
- Kind: user
- Repositories: 1
- Profile: https://github.com/GameTaco
Citation (CITATION.cff)
authors:
- family-names: Barbaresi
given-names: Adrien
orcid: https://orcid.org/0000-0002-8079-8694
cff-version: 1.2.0
identifiers:
- description: "This is the collection of archived snapshots of all versions of Trafilatura"
type: doi
value: 10.5281/zenodo.3460969
message: "If you use this software, please cite both the article from preferred-citation and the software itself."
preferred-citation:
authors:
- family-names: Barbaresi
given-names: Adrien
title: "Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction"
type: article
year: 2021
repository: https://github.com/adbar/trafilatura
repository-code: https://github.com/adbar/trafilatura
title: Trafilatura
type: software
url: https://trafilatura.readthedocs.io/
GitHub Events
Total
Last Year
Dependencies
docs/requirements.txt
pypi
- docutils >=0.20.1
- pydata-sphinx-theme >=0.13.3
- sphinx >=7.2.4
- trafilatura *
requirements-dev.txt
pypi
setup.py
pypi
- certifi *
- charset_normalizer *
- courlan *
- htmldate *
- justext *
- lxml *
- urllib3 *
tests/eval-requirements.txt
pypi
- beautifulsoup4 ==4.12.1 test
- boilerpy3 ==1.0.6 test
- goose3 ==3.1.13 test
- html-text ==0.5.2 test
- html2text ==2020.1.16 test
- inscriptis ==2.3.2 test
- justext ==3.0.0 test
- news-please ==1.5.22 test
- newspaper3k ==0.2.8 test
- readabilipy ==0.2.0 test
- readability-lxml ==0.8.1 test
- resiliparse ==0.14.3 test
- trafilatura ==1.5.0 test