gtfs-data-pipeline-tfnsw-bus

GTFS Data Pipeline for TfNSW Bus Datasets

https://github.com/teckkean/gtfs-data-pipeline-tfnsw-bus

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 5 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

data-pipeline datapipeline gtfs gtfs-realtime gtfs-static open-data opendata python tfnsw
Last synced: 6 months ago · JSON representation ·

Repository

GTFS Data Pipeline for TfNSW Bus Datasets

Basic Info
  • Host: GitHub
  • Owner: teckkean
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 12 MB
Statistics
  • Stars: 8
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 2
Topics
data-pipeline datapipeline gtfs gtfs-realtime gtfs-static open-data opendata python tfnsw
Created over 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Citation

README.md

DOI

GTFS Data Pipeline for TfNSW Bus Datasets

Pipeline

Table of Contents

Introduction

Research Project Title: Smart City Applications in Land Use and Transport (SCALUT)

This is a data pipeline developed as part of the research project, SCALUT, at the University of Sydney's TransportLab.

The datasets generated using this pipeline has been used to validate the performance of TfNSW's Transit Signal Priority Request via Public Transport Information and Priority System (PTIPS).

The data pipeline is written in Python and has been tested to work on Windows, Linux and Mac using the Version 1 GTFS TfNSW Bus Datasets.

Note: A seperate data pipeline is currently being developed and tested to work with a wider collection of GTFS datasets.

Installation

You can either download this file (or clone it) from Github, or you can install via pip with

pip install gtfs_dpl

Data Availability Statement

The datasets generated will be made available to public on the University of Sydney Data Repository.

On-going static and realtime datasets are available on the Transport for NSW Open Data Hub: * GTFS Static Datasets: https://opendata.transport.nsw.gov.au/dataset/timetables-complete-gtfs * GTFS Realtime v1 Datasets: - Trip Update - https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-trip-update - Vehicle Position - https://opendata.transport.nsw.gov.au/dataset/public-transport-realtime-vehicle-positions

Data Pipeline Directory Structure

GTFS_TfNSW_Bus_DataWareHouse ├───10_Raw_PB │ └───FileTP ├───10_Raw_Static ├───10_TfNSW_Traffic_Lights_Location ├───10_TfNSW_Traffic_Volume_Viewer ├───11_CSV_Raw_TU │ └───FileTP ├───11_CSV_Raw_VP │ └───FileTP ├───12_CSV_Transformed_TU │ └───FileTP ├───12_CSV_Transformed_VP │ └───FileTP ├───12_CSV_Transformed_VP_byAgency │ └───FileTP ├───13_CSV_Cleaned_Unique_TU │ └───FileTP ├───13_CSV_Cleaned_Unique_TU_byAgency │ └───FileTP ├───21_SA_Static │ ├───GTFS_Static_StaticId │ └───TL_Location_StaticId ├───22_CSV_Fu_Nodes_Links │ └───FileTP ├───22_SHP_Fu_Nodes_Links │ └───FileTP ├───22_SHP_VP_GIS │ └───FileTP └───22_SHP_VP_GIS_byAgency └───FileTP

Usage instructions

1.1 Convert .PB.GZ (Gzipped Protocol Buffer) to .CSV Files python TU_PBtoCSV.py <DataDir> <FileTP> python VP_PBtoCSV.py <DataDir> <FileTP> Note: The tfnswgtfsrealtime_pb2.py file is required to be stored in the same folder.

1.2 Transform .CSV Files python TU_Transform.py <DataDir> <FileTP> <FileIdStatic> python VP_Transform.py <DataDir> <FileTP> 1.2A Transform .CSV Files by Agency (Daily to Monthly) python VP_Transform_byAgency.py <DataDir> <FileTP> <FileIdStatic> <DaysInMonth> <Flt_Agency> 1.3 Prepare Cleaned Unique Datasets python TU_ClnUnique_byAgency.py <DataDir> <FileTP> <FileIdStatic> <DaysInMonth> <Flt_Agency>

Usage example

The package comes with some data for you to explore. If you installed the package via pip you can find the path to the data with the following command under the category "Location":

pip show gtfs_dpl

To process the example data included with the package, you can run:

python TU_PBtoCSV.py /path/to/gtfs_dpl/example_data/ <FileTP> python VP_PBtoCSV.py /path/to/gtfs_dpl/example_data/ <FileTP> python TU_Transform.py /path/to/gtfs_dpl/example_data/ <FileTP> <FileIdStatic> python VP_Transform.py /path/to/gtfs_dpl/example_data/<FileTP> python VP_Transform_byAgency.py /path/to/gtfs_dpl/example_data/ <FileTP> <FileIdStatic> <DaysInMonth> <Flt_Agency> python TU_ClnUnique_byAgency.py /path/to/gtfs_dpl/example_data/ <FileTP> <FileIdStatic> <DaysInMonth> <Flt_Agency>

Owner

  • Name: Teck Kean Chin
  • Login: teckkean
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Please consider to cite this work using the metadata provided below."
authors:
- family-names: "Chin"
  given-names: "Teck Kean"
  orcid: "https://orcid.org/0000-0003-1223-6822"
- family-names: "Marks"
  given-names: "Benjy"
  orcid: "https://orcid.org/0000-0003-2928-9349"
- family-names: "Moylan"
  given-names: "Emily"
  orcid: "https://orcid.org/0000-0003-1680-6407"
title: "GTFS Data Pipeline for TfNSW Bus Datasets"
version: 1.0.0
doi: 10.5281/zenodo.5594397
date-released: 2021-10-24
url: "https://github.com/teckkean/GTFS-Data-Pipeline-TfNSW-Bus"

GitHub Events

Total
Last Year

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 36
  • Total Committers: 5
  • Avg Commits per committer: 7.2
  • Development Distribution Score (DDS): 0.361
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Teck Kean Chin 8****n 23
Tim-Xian 5****n 8
Teck Kean Chin T****k@O****m 3
Benjamin Marks b****s@s****u 1
Teck Kean Chin 8****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 31 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • benjym (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads: unknown
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 2
proxy.golang.org: github.com/teckkean/GTFS-Data-Pipeline-TfNSW-Bus
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 6 months ago
proxy.golang.org: github.com/teckkean/gtfs-data-pipeline-tfnsw-bus
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 6 months ago

Dependencies

setup.py pypi
  • flat_table *
  • geopandas *
  • gzip *
  • json *
  • pandas *
  • pytz *
  • zipfile *