scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Keywords
Repository
Scrapy, a fast high-level web crawling & scraping framework for Python.
Basic Info
- Host: GitHub
- Owner: scrapy
- License: bsd-3-clause
- Language: Python
- Default Branch: master
- Homepage: https://scrapy.org
- Size: 27.4 MB
Statistics
- Stars: 58,095
- Watchers: 1,774
- Forks: 11,028
- Open Issues: 685
- Releases: 43
Topics
Metadata Files
README.rst
|logo|
.. |logo| image:: https://raw.githubusercontent.com/scrapy/scrapy/master/docs/_static/logo.svg
:target: https://scrapy.org
:alt: Scrapy
:width: 480px
|version| |python_version| |ubuntu| |macos| |windows| |coverage| |conda| |deepwiki|
.. |version| image:: https://img.shields.io/pypi/v/Scrapy.svg
:target: https://pypi.org/pypi/Scrapy
:alt: PyPI Version
.. |python_version| image:: https://img.shields.io/pypi/pyversions/Scrapy.svg
:target: https://pypi.org/pypi/Scrapy
:alt: Supported Python Versions
.. |ubuntu| image:: https://github.com/scrapy/scrapy/workflows/Ubuntu/badge.svg
:target: https://github.com/scrapy/scrapy/actions?query=workflow%3AUbuntu
:alt: Ubuntu
.. |macos| image:: https://github.com/scrapy/scrapy/workflows/macOS/badge.svg
:target: https://github.com/scrapy/scrapy/actions?query=workflow%3AmacOS
:alt: macOS
.. |windows| image:: https://github.com/scrapy/scrapy/workflows/Windows/badge.svg
:target: https://github.com/scrapy/scrapy/actions?query=workflow%3AWindows
:alt: Windows
.. |coverage| image:: https://img.shields.io/codecov/c/github/scrapy/scrapy/master.svg
:target: https://codecov.io/github/scrapy/scrapy?branch=master
:alt: Coverage report
.. |conda| image:: https://anaconda.org/conda-forge/scrapy/badges/version.svg
:target: https://anaconda.org/conda-forge/scrapy
:alt: Conda Version
.. |deepwiki| image:: https://deepwiki.com/badge.svg
:target: https://deepwiki.com/scrapy/scrapy
:alt: Ask DeepWiki
Scrapy_ is a web scraping framework to extract structured data from websites.
It is cross-platform, and requires Python 3.9+. It is maintained by Zyte_
(formerly Scrapinghub) and `many other contributors`_.
.. _many other contributors: https://github.com/scrapy/scrapy/graphs/contributors
.. _Scrapy: https://scrapy.org/
.. _Zyte: https://www.zyte.com/
Install with:
.. code:: bash
pip install scrapy
And follow the documentation_ to learn how to use it.
.. _documentation: https://docs.scrapy.org/en/latest/
If you wish to contribute, see Contributing_.
.. _Contributing: https://docs.scrapy.org/en/master/contributing.html
Owner
- Name: Scrapy project
- Login: scrapy
- Kind: organization
- Website: https://scrapy.org
- Repositories: 26
- Profile: https://github.com/scrapy
An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 639
- Total pull requests: 1,363
- Average time to close issues: over 1 year
- Average time to close pull requests: 8 months
- Total issue authors: 347
- Total pull request authors: 389
- Average comments per issue: 4.08
- Average comments per pull request: 2.26
- Merged pull requests: 645
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 170
- Pull requests: 584
- Average time to close issues: 8 days
- Average time to close pull requests: 5 days
- Issue authors: 72
- Pull request authors: 116
- Average comments per issue: 1.28
- Average comments per pull request: 1.59
- Merged pull requests: 310
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- wRAR (125)
- Gallaecio (52)
- mohmad-null (26)
- kmike (18)
- Prometheus3375 (11)
- Ehsan-U (11)
- elacuesta (5)
- GeorgeA92 (5)
- pawelmhm (5)
- starrify (4)
- ddebernardy (4)
- Urahara (4)
- jtoallen (4)
- lopuhin (4)
- zuozhehao (3)
Pull Request Authors
- wRAR (392)
- Gallaecio (123)
- Laerte (50)
- Rotzbua (14)
- jxlil (14)
- mlmsmith (14)
- GeorgeA92 (13)
- Rohitkr117 (10)
- Sintivrousai (10)
- MehrazRumman (9)
- elramen (9)
- mery16q (9)
- kumar-sanchay (9)
- thalissonvs (9)
- LucasSD (8)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 8
-
Total downloads:
- pypi 2,096,674 last-month
- Total docker downloads: 588,557
-
Total dependent packages: 142
(may contain duplicates) -
Total dependent repositories: 2,831
(may contain duplicates) - Total versions: 161
- Total maintainers: 8
- Total advisories: 15
pypi.org: scrapy
A high-level Web Crawling and Web Scraping framework
- Homepage: https://scrapy.org/
- Documentation: https://docs.scrapy.org/
- License: bsd-3-clause
-
Latest release: 2.13.3
published 8 months ago
Rankings
Maintainers (4)
Advisories (15)
- Scrapy decompression bomb vulnerability
- Duplicate Advisory: Scrapy decompression bomb vulnerability
- Duplicate Advisory: ReDos vulnerability of XMLFeedSpider
- Duplicate Advisory: Scrapy authorization header leakage on cross-domain redirect
- Scrapy authorization header leakage on cross-domain redirect
- Scrapy before 2.6.2 and 1.8.3 vulnerable to one proxy sending credentials to another
- Scrapy cookie-setting is not restricted based on the public suffix list
- Incorrect Authorization and Exposure of Sensitive Information to an Unauthorized Actor in scrapy
- Scrapy HTTP authentication credentials potentially leaked to target websites
- Scrapy vulnerable to ReDoS via XMLFeedSpider
- ...and 5 more
conda-forge.org: scrapy
Scrapy is an open source and collaborative framework for extracting the data you need from websites in a fast, simple, yet extensible way.
- Homepage: https://scrapy.org/
- License: BSD-3-Clause-Clear
-
Latest release: 2.7.1
published over 3 years ago
Rankings
pypi.org: pylab-utils
python utility tools
- Homepage: https://github.com/scrapy/scrapy
- Documentation: https://pylab-utils.readthedocs.io/
- License: BSD
-
Latest release: 0.5
published over 4 years ago
Rankings
Maintainers (1)
anaconda.org: scrapy
Scrapy is an open source and collaborative framework for extracting the data you need from websites in a fast, simple, yet extensible way.
- Homepage: https://scrapy.org
- License: BSD-3-Clause
-
Latest release: 2.13.3
published 6 months ago
Rankings
pypi.org: scrapy-hls
scrapy integration for m3u8 files
- Homepage: https://github.com/scrapy/scrapy
- Documentation: https://scrapy-hls.readthedocs.io/
- License: BSD
-
Latest release: 0.1
published almost 5 years ago
Rankings
Maintainers (1)
pypi.org: bf-scrapy-base
A high-level Web Crawling and Web Scraping framework
- Homepage: https://github.com/pypa/sampleproject
- Documentation: https://docs.scrapy.org/
- License: BSD-3-Clause
-
Latest release: 0.0.3
published 6 months ago
Rankings
Maintainers (1)
pypi.org: scrapy-qfm
A high-level Web Crawling and Web Scraping framework
- Homepage: https://scrapy.org
- Documentation: https://docs.scrapy.org/
- License: BSD
-
Latest release: 2.11.2
published over 1 year ago
Rankings
Maintainers (1)
pypi.org: aminer-scrapy
A high-level Web Crawling and Web Scraping framework
- Homepage: https://scrapy.org
- Documentation: https://docs.scrapy.org/
- License: BSD
-
Latest release: 2.11.1
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- pre-commit/action v3.0.0 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- pypa/gh-action-pypi-publish v1.6.4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- sphinx ==5.0.2
- sphinx-hoverxref ==1.1.1
- sphinx-notfound-page ==0.8
- sphinx-rtd-theme ==1.0.0
- 23.8.0 *
- Twisted >=18.9.0,<23.8.0
- cryptography >=36.0.0
- cssselect >=0.9.1
- itemadapter >=0.1.0
- itemloaders >=1.0.1
- lxml >=4.4.1
- packaging *
- parsel >=1.5.0
- protego >=0.1.15
- pyOpenSSL >=21.0.0
- queuelib >=1.4.2
- service_identity >=18.1.0
- setuptools *
- tldextract *
- w3lib >=1.17.0
- zope.interface >=5.1.0
- attrs * test
- bpython * test
- brotli ==1.0.9 test
- brotli * test
- ipython * test
- pyftpdlib * test
- pytest * test
- pytest-cov ==4.0.0 test
- pytest-xdist * test
- pywin32 * test
- testfixtures * test
- uvloop * test
- zstandard * test