Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: GameTaco
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Size: 22.9 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

⚠ This is a testing repo for incompatibility issues when using duckduckgo-search, or presumably any repo using lxml. You can find the original repo here.

Owner

  • Name: Matt
  • Login: GameTaco
  • Kind: user

Citation (CITATION.cff)

authors:
  - family-names: Barbaresi
    given-names: Adrien
    orcid: https://orcid.org/0000-0002-8079-8694
cff-version: 1.2.0
identifiers:
  - description: "This is the collection of archived snapshots of all versions of Trafilatura"
    type: doi
    value: 10.5281/zenodo.3460969
message: "If you use this software, please cite both the article from preferred-citation and the software itself."
preferred-citation:
  authors:
    - family-names: Barbaresi
      given-names: Adrien
  title: "Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction"
  type: article
  year: 2021
repository: https://github.com/adbar/trafilatura
repository-code: https://github.com/adbar/trafilatura
title: Trafilatura
type: software
url: https://trafilatura.readthedocs.io/

GitHub Events

Total
Last Year

Dependencies

docs/requirements.txt pypi
  • docutils >=0.20.1
  • pydata-sphinx-theme >=0.13.3
  • sphinx >=7.2.4
  • trafilatura *
requirements-dev.txt pypi
setup.py pypi
  • certifi *
  • charset_normalizer *
  • courlan *
  • htmldate *
  • justext *
  • lxml *
  • urllib3 *
tests/eval-requirements.txt pypi
  • beautifulsoup4 ==4.12.1 test
  • boilerpy3 ==1.0.6 test
  • goose3 ==3.1.13 test
  • html-text ==0.5.2 test
  • html2text ==2020.1.16 test
  • inscriptis ==2.3.2 test
  • justext ==3.0.0 test
  • news-please ==1.5.22 test
  • newspaper3k ==0.2.8 test
  • readabilipy ==0.2.0 test
  • readability-lxml ==0.8.1 test
  • resiliparse ==0.14.3 test
  • trafilatura ==1.5.0 test