web-scraping-python
Web Scraping with Python
https://github.com/carpentries-incubator/web-scraping-python
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Repository
Web Scraping with Python
Basic Info
- Host: GitHub
- Owner: carpentries-incubator
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: http://carpentries-incubator.github.io/web-scraping-python/
- Size: 12.5 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Web Scraping with Python
This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes.
Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package.
In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup.
Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium.
This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:
- Install and import packages and modules
- Use lists and dictionaries
- Use conditional statements (if, else, elif)
- Use for loops
- Calling functions, understanding parameters/arguments and return values
The rendered version of the lesson is available at: https://ucsbcarpentry.github.io/web-scraping-python/
Teaching and contributing
We'd love to know if you are teaching this lesson and the suggestions you have for improving it!
You can do this by submitting an issue in this repo, or sending an email to dreamlab\@library.ucsb.edu or jose_nino\@ucsb.edu.
If you want to know more about contributing to this lesson and other Carpentries efforts, please read the CONTRIBUTING guide.
Maintainer
Current maintainer of this lesson: - Jose Niño Muriel
Acknowledgements
Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB.
Citation
Please cite this lesson using the information in the CITATION.CFF file when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material.
License
The use and adaptation of this instructional content is made available under the Creative Commons Attribution license - CC BY 4.0. Review the LICENSE.md file for additional information.
Owner
- Name: carpentries-incubator
- Login: carpentries-incubator
- Kind: organization
- Repositories: 107
- Profile: https://github.com/carpentries-incubator
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Web Scraping with Python
message: >-
Please cite this lesson using the information in this file
when you refer to it in publications, and/or if you
re-use, adapt, or expand on the content in your own
training material.
type: dataset
authors:
- given-names: Jose David
family-names: Niño Muriel
email: jose_nino@ucsb.edu
affiliation: 'University of California, Santa Barbara'
- name: DREAM Lab - UCSB Library
address: Bldg. 525 UCEN Road
city: Santa Barbara
country: US
post-code: '93106'
email: dreamlab@library.ucsb.edu
website: 'https://www.library.ucsb.edu/dreamlab'
abstract: >-
This lesson teaches people with basic Python knowledge the
tools and libraries to do web scraping, which means
extracting data from websites
keywords:
- web scraping
- python
- BeautifulSoup
- b4s
- Selenium
- requests
license: CC-BY-4.0
version: '1.0'
date-released: '2025-06-11'
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3
Dependencies
- actions/upload-artifact v4 composite
- actions/checkout v4 composite
- carpentries/actions/check-valid-pr main composite
- carpentries/actions/comment-diff main composite
- carpentries/actions/download-workflow-artifact main composite
- carpentries/actions/download-workflow-artifact main composite
- carpentries/actions/remove-branch main composite
- carpentries/actions/check-valid-pr main composite
- carpentries/actions/comment-diff main composite
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- carpentries/actions/check-valid-pr main composite
- carpentries/actions/setup-lesson-deps main composite
- carpentries/actions/setup-sandpaper main composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- actions/checkout v4 composite
- carpentries/actions/setup-lesson-deps main composite
- carpentries/actions/setup-sandpaper main composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- actions/checkout v4 composite
- carpentries/actions/check-valid-credentials main composite
- carpentries/actions/update-lockfile main composite
- carpentries/create-pull-request main composite
- r-lib/actions/setup-r v2 composite
- actions/checkout v4 composite
- carpentries/actions/check-valid-credentials main composite
- carpentries/actions/update-workflows main composite
- carpentries/create-pull-request main composite