https://github.com/abdoulfataoh/lefaso-net-scraper
The ultimate library for data scientist to scrape data from https://www.lefaso.net
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Keywords
Repository
The ultimate library for data scientist to scrape data from https://www.lefaso.net
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 16
Topics
Metadata Files
README.md
lefaso-net-scraper
Description
lefaso-net-scraper is a robust and versatile Python library designed to efficiently extract articles from the popular online news source in Burkina Faso, www.lefaso.net. This powerful scraping tool allows users to effortlessly collect article content and user comments on lefaso.net.
Important
This scraper, like other scrapers, is based on the structure of the target website. Changes to the website's structure can affect the scraper. We use automated workflows to detect these issues frequently, but we cannot catch all of them. Please report any issues you encounter and use the latest version.
JSON/dictionary Fields
Installation
- To install support for Python script files only
bash
pip install --upgrade lefaso-net-scraper
- To additionally include support for Jupyter Notebook (optional)
bash
pip install --upgrade lefaso-net-scraper[notebook]
Usage
```python
coding: utf-8
from lefasonetscraper import LefasoNetScraper
topicurl = 'https://lefaso.net/spip.php?rubrique473' scraper = LefasoNetScraper(topicurl) data = scraper.run() ```
- Settings Pagination range
```python
coding: utf-8
from lefasonetscraper import LefasoNetScraper
topicurl = 'https://lefaso.net/spip.php?rubrique473' scraper = LefasoNetScraper(topicurl) scraper.setpaginationrange(start=20, stop=100) data = scraper.run() ```
- Save data to csv
```python
coding: utf-8
from lefasonetscraper import LefasoNetScraper import pandas as pd
topicurl = 'https://lefaso.net/spip.php?rubrique473' scraper = LefasoNetScraper(topicurl) data = scraper.run() df = pd.DataFrame.fromrecords(data) df.tocsv('path/to/df.csv') ```
We ❤ open source
Owner
- Login: abdoulfataoh
- Kind: user
- Repositories: 2
- Profile: https://github.com/abdoulfataoh
GitHub Events
Total
- Create event: 3
- Issues event: 1
- Release event: 2
- Push event: 10
- Gollum event: 4
Last Year
- Create event: 3
- Issues event: 1
- Release event: 2
- Push event: 10
- Gollum event: 4
Packages
- Total packages: 1
-
Total downloads:
- pypi 46 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 16
- Total maintainers: 1
pypi.org: lefaso-net-scraper
The ultimate library for data scientist to scrape data from https://www.lefaso.net
- Homepage: https://github.com/abdoulfataoh/lefaso-net-scraper
- Documentation: https://github.com/abdoulfataoh/lefaso-net-scraper/wiki
- License: other
-
Latest release: 0.4.0
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- aiohttp 3.8.5
- aiosignal 1.3.1
- appnope 0.1.3
- asttokens 2.4.0
- async-timeout 4.0.3
- attrs 23.1.0
- backcall 0.2.0
- beautifulsoup4 4.12.2
- certifi 2023.7.22
- charset-normalizer 3.2.0
- colorama 0.4.6
- decorator 5.1.1
- environs 9.5.0
- exceptiongroup 1.1.3
- executing 1.2.0
- flake8 6.1.0
- frozenlist 1.4.0
- idna 3.4
- iniconfig 2.0.0
- ipython 8.15.0
- jedi 0.19.0
- marshmallow 3.20.1
- matplotlib-inline 0.1.6
- mccabe 0.7.0
- multidict 6.0.4
- mypy 1.5.1
- mypy-extensions 1.0.0
- packaging 23.1
- parso 0.8.3
- pexpect 4.8.0
- pickleshare 0.7.5
- pluggy 1.3.0
- prompt-toolkit 3.0.39
- ptyprocess 0.7.0
- pure-eval 0.2.2
- pycodestyle 2.11.0
- pyflakes 3.1.0
- pygments 2.16.1
- pytest 7.4.2
- python-dotenv 1.0.0
- requests 2.31.0
- six 1.16.0
- soupsieve 2.5
- stack-data 0.6.2
- tomli 2.0.1
- traitlets 5.10.0
- typing-extensions 4.8.0
- unidecode 1.3.6
- urllib3 2.0.4
- wcwidth 0.2.6
- yarl 1.9.2
- aiohttp ^3.8.5
- beautifulsoup4 ^4.12.2
- environs ^9.5.0
- pytest ^7.4.2
- python ^3.10
- requests ^2.31.0
- unidecode ^1.3.6
- actions/checkout v3 composite