web-scraping-python

Web Scraping with Python

https://github.com/carpentries-incubator/web-scraping-python

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

carpentries-incubator english helpwanted-list lesson pre-alpha python web-scraping
Last synced: 6 months ago · JSON representation ·

Repository

Web Scraping with Python

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
carpentries-incubator english helpwanted-list lesson pre-alpha python web-scraping
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Web Scraping with Python

This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes.

Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package.

In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup.

Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium.

This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:

  • Install and import packages and modules
  • Use lists and dictionaries
  • Use conditional statements (if, else, elif)
  • Use for loops
  • Calling functions, understanding parameters/arguments and return values

The rendered version of the lesson is available at: https://ucsbcarpentry.github.io/web-scraping-python/

Teaching and contributing

We'd love to know if you are teaching this lesson and the suggestions you have for improving it!

You can do this by submitting an issue in this repo, or sending an email to dreamlab\@library.ucsb.edu or jose_nino\@ucsb.edu.

If you want to know more about contributing to this lesson and other Carpentries efforts, please read the CONTRIBUTING guide.

Maintainer

Current maintainer of this lesson: - Jose Niño Muriel

Acknowledgements

Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB.

Citation

Please cite this lesson using the information in the CITATION.CFF file when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material.

License

The use and adaptation of this instructional content is made available under the Creative Commons Attribution license - CC BY 4.0. Review the LICENSE.md file for additional information.

Owner

  • Name: carpentries-incubator
  • Login: carpentries-incubator
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Web Scraping with Python
message: >-
  Please cite this lesson using the information in this file
  when you refer to it in publications, and/or if you
  re-use, adapt, or expand on the content in your own
  training material.
type: dataset
authors:
  - given-names: Jose David
    family-names: Niño Muriel
    email: jose_nino@ucsb.edu
    affiliation: 'University of California, Santa Barbara'
  - name: DREAM Lab - UCSB Library
    address: Bldg. 525 UCEN Road
    city: Santa Barbara
    country: US
    post-code: '93106'
    email: dreamlab@library.ucsb.edu
    website: 'https://www.library.ucsb.edu/dreamlab'
abstract: >-
  This lesson teaches people with basic Python knowledge the
  tools and libraries to do web scraping, which means
  extracting data from websites
keywords:
  - web scraping
  - python
  - BeautifulSoup
  - b4s
  - Selenium
  - requests
license: CC-BY-4.0
version: '1.0'
date-released: '2025-06-11'

GitHub Events

Total
  • Push event: 3
Last Year
  • Push event: 3

Dependencies

.github/workflows/pr-close-signal.yaml actions
  • actions/upload-artifact v4 composite
.github/workflows/pr-comment.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/check-valid-pr main composite
  • carpentries/actions/comment-diff main composite
  • carpentries/actions/download-workflow-artifact main composite
.github/workflows/pr-post-remove-branch.yaml actions
  • carpentries/actions/download-workflow-artifact main composite
  • carpentries/actions/remove-branch main composite
.github/workflows/pr-preflight.yaml actions
  • carpentries/actions/check-valid-pr main composite
  • carpentries/actions/comment-diff main composite
.github/workflows/pr-receive.yaml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • carpentries/actions/check-valid-pr main composite
  • carpentries/actions/setup-lesson-deps main composite
  • carpentries/actions/setup-sandpaper main composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/sandpaper-main.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/setup-lesson-deps main composite
  • carpentries/actions/setup-sandpaper main composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/update-cache.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/check-valid-credentials main composite
  • carpentries/actions/update-lockfile main composite
  • carpentries/create-pull-request main composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/update-workflows.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/check-valid-credentials main composite
  • carpentries/actions/update-workflows main composite
  • carpentries/create-pull-request main composite