https://github.com/abdoulfataoh/lefaso-net-scraper

The ultimate library for data scientist to scrape data from https://www.lefaso.net

https://github.com/abdoulfataoh/lefaso-net-scraper

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary

Keywords

data-science python3 scraper
Last synced: 5 months ago · JSON representation

Repository

The ultimate library for data scientist to scrape data from https://www.lefaso.net

Basic Info
  • Host: GitHub
  • Owner: abdoulfataoh
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 15.5 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 16
Topics
data-science python3 scraper
Created about 3 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

lefaso-net-scraper

PyPI version

Description

lefaso-net-scraper is a robust and versatile Python library designed to efficiently extract articles from the popular online news source in Burkina Faso, www.lefaso.net. This powerful scraping tool allows users to effortlessly collect article content and user comments on lefaso.net.

Important

This scraper, like other scrapers, is based on the structure of the target website. Changes to the website's structure can affect the scraper. We use automated workflows to detect these issues frequently, but we cannot catch all of them. Please report any issues you encounter and use the latest version.

JSON/dictionary Fields

| Field | Description | |------------------------|------------------------------------------------------| | `article_topic` | Category or subject of the article. | | `article_title` | The main headline or title of the article. | | `article_published_date`| Date when the article was published. | | `article_origin` | Source or platform where the article was published. | | `article_url` | Web link to the article. | | `article_content` | Full text or body of the article. | | `article_comments` | Feedback or responses from readers. |

Installation

  • To install support for Python script files only

bash pip install --upgrade lefaso-net-scraper

  • To additionally include support for Jupyter Notebook (optional)

bash pip install --upgrade lefaso-net-scraper[notebook]

Usage

```python

coding: utf-8

from lefasonetscraper import LefasoNetScraper

topicurl = 'https://lefaso.net/spip.php?rubrique473' scraper = LefasoNetScraper(topicurl) data = scraper.run() ```

  • Settings Pagination range

```python

coding: utf-8

from lefasonetscraper import LefasoNetScraper

topicurl = 'https://lefaso.net/spip.php?rubrique473' scraper = LefasoNetScraper(topicurl) scraper.setpaginationrange(start=20, stop=100) data = scraper.run() ```

  • Save data to csv

```python

coding: utf-8

from lefasonetscraper import LefasoNetScraper import pandas as pd

topicurl = 'https://lefaso.net/spip.php?rubrique473' scraper = LefasoNetScraper(topicurl) data = scraper.run() df = pd.DataFrame.fromrecords(data) df.tocsv('path/to/df.csv') ```

We ❤ open source

Owner

  • Login: abdoulfataoh
  • Kind: user

GitHub Events

Total
  • Create event: 3
  • Issues event: 1
  • Release event: 2
  • Push event: 10
  • Gollum event: 4
Last Year
  • Create event: 3
  • Issues event: 1
  • Release event: 2
  • Push event: 10
  • Gollum event: 4

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 46 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 16
  • Total maintainers: 1
pypi.org: lefaso-net-scraper

The ultimate library for data scientist to scrape data from https://www.lefaso.net

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 46 Last month
Rankings
Dependent packages count: 7.4%
Average: 38.1%
Dependent repos count: 68.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/test-action.yaml actions
  • actions/checkout v3 composite
poetry.lock pypi
  • aiohttp 3.8.5
  • aiosignal 1.3.1
  • appnope 0.1.3
  • asttokens 2.4.0
  • async-timeout 4.0.3
  • attrs 23.1.0
  • backcall 0.2.0
  • beautifulsoup4 4.12.2
  • certifi 2023.7.22
  • charset-normalizer 3.2.0
  • colorama 0.4.6
  • decorator 5.1.1
  • environs 9.5.0
  • exceptiongroup 1.1.3
  • executing 1.2.0
  • flake8 6.1.0
  • frozenlist 1.4.0
  • idna 3.4
  • iniconfig 2.0.0
  • ipython 8.15.0
  • jedi 0.19.0
  • marshmallow 3.20.1
  • matplotlib-inline 0.1.6
  • mccabe 0.7.0
  • multidict 6.0.4
  • mypy 1.5.1
  • mypy-extensions 1.0.0
  • packaging 23.1
  • parso 0.8.3
  • pexpect 4.8.0
  • pickleshare 0.7.5
  • pluggy 1.3.0
  • prompt-toolkit 3.0.39
  • ptyprocess 0.7.0
  • pure-eval 0.2.2
  • pycodestyle 2.11.0
  • pyflakes 3.1.0
  • pygments 2.16.1
  • pytest 7.4.2
  • python-dotenv 1.0.0
  • requests 2.31.0
  • six 1.16.0
  • soupsieve 2.5
  • stack-data 0.6.2
  • tomli 2.0.1
  • traitlets 5.10.0
  • typing-extensions 4.8.0
  • unidecode 1.3.6
  • urllib3 2.0.4
  • wcwidth 0.2.6
  • yarl 1.9.2
pyproject.toml pypi
  • aiohttp ^3.8.5
  • beautifulsoup4 ^4.12.2
  • environs ^9.5.0
  • pytest ^7.4.2
  • python ^3.10
  • requests ^2.31.0
  • unidecode ^1.3.6
.github/workflows/publish-action.yaml actions
  • actions/checkout v3 composite