thefork-scraping-code

the fork assignment for your data analysis courses

https://github.com/fbietti/thefork-scraping-code

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

the fork assignment for your data analysis courses

Basic Info
  • Host: GitHub
  • Owner: fbietti
  • Language: R
  • Default Branch: main
  • Size: 71.3 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

the fork scraping code

This project arose from the need to find new topics for my data analysis assignments at the university. The idea was to identify data online, structure it into a dataset, then conduct analyses and engage the students in practical exercises!

In this project, you will find a file for web scraping data from The Fork website. It is a Python code and is well-commented. It does not require prior knowledge of HTML or CSS. You just need to know how to identify the element you are interested in on the site and copy the corresponding code into the for loop.

File: scraping_code

This file contains the code for performing web scraping. First, you need to go to The Fork website and initiate your search. In my case, I was interested in Parisian pizzerias, so I launched a search with the keyword 'pizzeria.' The site responded with 168 pages of results. Each page contains multiple results, and each result corresponds to a pizzeria. Each result includes various elements, such as the address, price, pizzeria name, etc.

You should inspect the element you are interested in. In my case, I focused on the pizzeria name, rating, price, review, and address. I initialized lists for each of these elements and also set up page numbers so that the code could navigate from one page to another.

Next, using the lists, I created a dictionary, and I transformed the dictionary into a data frame. Finally, with the data frame, I exported a CSV file.

File: celaningdataset

This file contains R commands to effectively structure our database. The manipulations involve extracting the district number from the address, enabling the comparison of average prices for pizzerias in different neighborhoods of Paris.

File: assignmetpizzaprice_paris

This file contains an exercise that, if you are a teacher, you can give to your data analysis students. I wrote it in R, but it can be easily translated into Python if you prefer. It starts with some commands to change variable types or find individuals, etc., and continues with recoding and bivariate analyses. I ask students to create tables, formulate hypotheses, test them, and create a graph. The file contains the answers to each question

Alt text

Owner

  • Login: fbietti
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Bietti
    given-names: Federico
    orcid: https://orcid.org/0000-0002-3912-3951
title: "The price of pizza in Paris: an example of an assignment in data analysis using The Fork website"
version: 
identifiers:
  - type: 
    value: 
date-released: 2023-01-23

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1