web-scraping-python

Web Scraping with Python

https://github.com/carpentries-incubator/web-scraping-python

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Keywords

carpentries-incubator english helpwanted-list lesson pre-alpha python web-scraping

Last synced: 6 months ago · JSON representation ·

Repository

Web Scraping with Python

Basic Info

Host: GitHub
Owner: carpentries-incubator
License: other
Language: Jupyter Notebook
Default Branch: main
Homepage: http://carpentries-incubator.github.io/web-scraping-python/
Size: 12.5 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Topics

carpentries-incubator english helpwanted-list lesson pre-alpha python web-scraping

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

Web Scraping with Python

This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes.

Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package.

In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup.

Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium.

This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:

Install and import packages and modules
Use lists and dictionaries
Use conditional statements (if, else, elif)
Use for loops
Calling functions, understanding parameters/arguments and return values

The rendered version of the lesson is available at: https://ucsbcarpentry.github.io/web-scraping-python/

Teaching and contributing

We'd love to know if you are teaching this lesson and the suggestions you have for improving it!

You can do this by submitting an issue in this repo, or sending an email to dreamlab\@library.ucsb.edu or jose_nino\@ucsb.edu.

If you want to know more about contributing to this lesson and other Carpentries efforts, please read the CONTRIBUTING guide.

Maintainer

Current maintainer of this lesson: - Jose Niño Muriel

Acknowledgements

Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB.

Citation

Please cite this lesson using the information in the CITATION.CFF file when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material.

License

The use and adaptation of this instructional content is made available under the Creative Commons Attribution license - CC BY 4.0. Review the LICENSE.md file for additional information.

Owner

Name: carpentries-incubator
Login: carpentries-incubator
Kind: organization

Repositories: 107
Profile: https://github.com/carpentries-incubator

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Web Scraping with Python
message: >-
  Please cite this lesson using the information in this file
  when you refer to it in publications, and/or if you
  re-use, adapt, or expand on the content in your own
  training material.
type: dataset
authors:
  - given-names: Jose David
    family-names: Niño Muriel
    email: jose_nino@ucsb.edu
    affiliation: 'University of California, Santa Barbara'
  - name: DREAM Lab - UCSB Library
    address: Bldg. 525 UCEN Road
    city: Santa Barbara
    country: US
    post-code: '93106'
    email: dreamlab@library.ucsb.edu
    website: 'https://www.library.ucsb.edu/dreamlab'
abstract: >-
  This lesson teaches people with basic Python knowledge the
  tools and libraries to do web scraping, which means
  extracting data from websites
keywords:
  - web scraping
  - python
  - BeautifulSoup
  - b4s
  - Selenium
  - requests
license: CC-BY-4.0
version: '1.0'
date-released: '2025-06-11'

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

Dependencies

.github/workflows/pr-close-signal.yaml actions

actions/upload-artifact v4 composite

.github/workflows/pr-comment.yaml actions

actions/checkout v4 composite
carpentries/actions/check-valid-pr main composite
carpentries/actions/comment-diff main composite
carpentries/actions/download-workflow-artifact main composite

.github/workflows/pr-post-remove-branch.yaml actions

carpentries/actions/download-workflow-artifact main composite
carpentries/actions/remove-branch main composite

.github/workflows/pr-preflight.yaml actions

carpentries/actions/check-valid-pr main composite
carpentries/actions/comment-diff main composite

.github/workflows/pr-receive.yaml actions

actions/checkout v4 composite
actions/upload-artifact v4 composite
carpentries/actions/check-valid-pr main composite
carpentries/actions/setup-lesson-deps main composite
carpentries/actions/setup-sandpaper main composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

.github/workflows/sandpaper-main.yaml actions

actions/checkout v4 composite
carpentries/actions/setup-lesson-deps main composite
carpentries/actions/setup-sandpaper main composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

.github/workflows/update-cache.yaml actions

actions/checkout v4 composite
carpentries/actions/check-valid-credentials main composite
carpentries/actions/update-lockfile main composite
carpentries/create-pull-request main composite
r-lib/actions/setup-r v2 composite

.github/workflows/update-workflows.yaml actions

actions/checkout v4 composite
carpentries/actions/check-valid-credentials main composite
carpentries/actions/update-workflows main composite
carpentries/create-pull-request main composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

web-scraping-python

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Web Scraping with Python

Teaching and contributing

Maintainer

Acknowledgements

Citation

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies