opentimes

Free travel times between U.S. Census geographies

https://github.com/dfsnow/opentimes

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary

Keywords

accessibility big-data osm python spatial-data travel-time-estimation travel-times

Last synced: 10 months ago · JSON representation ·

Repository

Free travel times between U.S. Census geographies

Basic Info

Host: GitHub
Owner: dfsnow
License: mit
Language: Python
Default Branch: main
Homepage: https://opentimes.org
Size: 2.91 MB

Statistics

Stars: 159
Watchers: 2
Forks: 1
Open Issues: 2
Releases: 1

Topics

accessibility big-data osm python spatial-data travel-time-estimation travel-times

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

OpenTimes

About OpenTimes | Interactive Map | Data Directory

OpenTimes is a database of pre-computed, point-to-point travel times between United States Census geographies. The travel times are stored as partitioned Parquet files, which allows them to be downloaded directly, read using various libraries, or queried with SQL using DuckDB.

Below is an example of the main travel time data (sourced from this file), where the id-suffixed columns are Census GEOIDs for counties and duration_sec is the driving time between the centroids of those counties (in seconds).

| originid | destinationid | duration_sec | |:---------:|:--------------:|-------------:| | 17031 | 17031 | 0 | | 17031 | 17043 | 1926 | | 17031 | 17197 | 3080 | | 17031 | 18089 | 3463 | | ... | ... | ... |

OpenTimes is essentially just a few hundred billion records that look exactly like this, compressed and stored in a way that makes them easy to use and cheap to serve.

Getting the data

Direct download

OpenTimes has a file directory of all its public files. Individual Parquet files can be downloaded with a click. They can also be read directly into your software of choice using open-source libraries:

```r

Using R's arrow implementation

library(arrow)

times <- read_parquet(paste0( "https://data.opentimes.org/times/version=0.0.1/mode=car/year=2024", "/geography=tract/state=17/times-0.0.1-car-2024-tract-17-0.parquet" )) ```

```python

Using Python's pandas

import pandas as pd

times = pd.read_parquet(( "https://data.opentimes.org/times/version=0.0.1/mode=car/year=2024" "/geography=tract/state=17/times-0.0.1-car-2024-tract-17-0.parquet" ))

Or the equivalent in polars

import polars as pl

times = pl.read_parquet(( "https://data.opentimes.org/times/version=0.0.1/mode=car/year=2024" "/geography=tract/state=17/times-0.0.1-car-2024-tract-17-0.parquet" )) ```

Using DuckDB

In addition to individual files, OpenTimes also provides DuckDB pointer databases. These database files contain links to all the relevant static Parquet files in each table. That means the entire OpenTimes database can be queried directly with SQL. For example, using R:

```r library(DBI) library(duckdb)

Create a temporary database in memory and attach to the pointer database

conn <- dbConnect(duckdb(), dbdir = ":memory:") dbExecute( conn = conn, " INSTALL httpfs; LOAD httpfs; ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes;" )

Query only tract-level times starting from Cook County, Illinois

times <- dbGetQuery( conn = conn, " SELECT originid, destinationid, durationsec FROM opentimes.public.times WHERE version = '0.0.1' AND mode = 'car' AND year = '2024' AND geography = 'tract' AND state = '17' AND originid LIKE '17031%'" ) ```

Or Python:

```python import duckdb

Create a temporary database in memory and attach to the pointer database

conn = duckdb.connect(database=":memory:") conn.execute(""" INSTALL httpfs; LOAD httpfs; ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes; """)

Query only tract-level times starting from Cook County, Illinois

times = conn.execute(""" SELECT originid, destinationid, durationsec FROM opentimes.public.times WHERE version = '0.0.1' AND mode = 'car' AND year = '2024' AND geography = 'tract' AND state = '17' AND originid LIKE '17031%' """).fetchdf()

```

Some notes on using DuckDB:

Use as many partition keys as possible in the WHERE clause of your query. Similarly, specify only the columns you need in SELECT. Doing both of these will greatly increase query speed.
The OpenTimes data is pretty big — roughly 140 billion rows and 500GB compressed. If you try to SELECT * the whole times table you'll probably crash DuckDB. Be warned.
Conversely, querying individual pairs using DuckDB is highly performant. If you specify all partition keys, an origin ID, and a destination ID, you'll usually get a response in a few seconds.

Example queries

Below are some common query patterns for OpenTimes using DuckDB.

Get pair coordinates

Match the GEOID of the Census geography to its respective coordinates from the points table. These are the coordinates after they're snapped to the OSM network.

sql SELECT po.lon_snapped AS origin_x, po.lat_snapped AS origin_y, pd.lon_snapped AS destination_x, pd.lat_snapped AS destination_y, t.duration_sec FROM opentimes.public.times t LEFT JOIN opentimes.public.points po ON t.origin_id = po.id AND po.mode = t.mode AND po.year = t.year AND po.geography = t.geography AND po.state = t.state AND po.point_type = 'origin' LEFT JOIN opentimes.public.points pd ON t.destination_id = pd.id AND pd.mode = t.mode AND pd.year = t.year AND pd.geography = t.geography AND pd.state = t.state AND pd.point_type = 'destination' WHERE t.version = '0.0.1' AND t.mode = 'car' AND t.year = '2024' AND t.geography = 'tract' AND t.state = '17' AND t.origin_id LIKE '17031%'"

Coverage

OpenTimes data covers and includes times for:

All 50 states plus Washington D.C.
All years after 2020 (inclusive)
The following Census geographies (see this chart for the relationship hierarchy):
- States
- Counties
- County subdivisions
- Tracts
- Block groups
- ZCTAs (ZIP codes)

All routing is performed from each origin in a state to all destinations in the same state plus a 300km buffer around the state. Routing only occurs between geographies of the same type i.e. tracts route to tracts, counties to counties, etc.

Data is updated once new Census geographies are released (usually fall of a given year). Yearly updates are considered a SemVer minor version. Small data corrections and tweaks are typically patch versions.

Limitations

OpenTimes is relatively complete (i.e. there are few missing pairs), but still has major limitations:

It doesn't include traffic data. Traffic is basically assumed to be free-flowing at the maximum speed limit allowed by OpenStreetMap tags. As a result, times tend to be optimistic (greatly so in cities). Traffic data is expensive, usually proprietary, and hard-to-come-by, so this isn't likely to be fixed soon.
OSRM routing is imprecise compared to something like Google Maps or even Valhalla. It doesn't have elevation handling, accurate turn penalties, administrative boundaries, or a whole host of other accuracy-increasing measures.
No transit times are included. I couldn't find a routing engine fast enough to do continent-scale transit routing. This may change in the future if Valhalla adds multi-modal support to their Matrix API.
Travel distances are limited to within a state plus a 300km buffer around it. This limit is self-imposed in order to make routing work on GitHub Actions (only a tiny portion of the national OSRM graph can fit in runner memory).

Database structure

Tables

OpenTimes is made up of four tables, each of which is stored in a separate set of static files. Each table contains the columns specified below, in addition to the partition columns shared by all tables.

1. times

This is the primary table and contains origin-destination pairs and the travel time between each pair.

| Column | Type | Description | |:------------------------|:-------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------| | origin_id | varchar | GEOID of the Census geography centroid that routing started at. | | destination_id | varchar | GEOID of the Census geography centroid that routing ended at. | | duration_sec | double | Travel time in seconds between the points. There is no maximum duration, however routing only occurs between points in the same state + a buffer. |

2. points

Describes the location of each origin and destination point in space.

| Column | Type | Description | |:------------------------------|:----------|:------------------------------------------------------------------------------------------------------------------------| | point_type | varchar | One of "origin" or "destination". Corresponds to the equivalent column in the times table. | | id | varchar | Census GEOID of the point. Joins to the times ID columns. | | lon / lat | double | Coordinates of the GEOID's population-weighted centroid. | | lon_snapped / lat_snapped | double | Coordinates after being snapped to the nearest OpenStreetMap way. Snap location changes based on routing mode. | | is_snapped | boolean | Boolean. True if the point moved from its original location. |

3. metadata

Information about how the times were generated, what inputs were used, how long it took, etc. Note that chunk columns are mostly for diagnostic purposes and don't affect the public files.

| Column |:------------------------------|:- | run_id | calc_datetime_finished | calc_time_elapsed_sec | calc_chunk_id | calc_chunk_n_origins | calc_chunk_n_destinations | calc_n_origins | calc_n_destinations | calc_n_pairs | calc_n_missing_pairs | git_commit_sha_short | git_commit_sha_long | param_network_buffer_m | param_destination_buffer_m | param_max_split_size | param_use_snapped | file_input_origins_md5 | file_input_destinations_md5 | Type | Description | ---------|:----------------------------------------------------------------------------------------------------------------------------------------------------| | varchar | Unique identifier for the run/outputs | | datetime | The datetime when routing finished. | | int | The time elapsed for the routing, in seconds. | | varchar | Identifier for the chunk, where numbers to the left of the underscore index the origins file, and numbers to the right index the destinations file. | | int | Number of origin points in the chunk. | | int | Number of destination points in the chunk. | | int | Total number of origin points in the calculation. | | int | Total number of destination points in the calculation. | | int | Total number of origin-destination pairs in the calculation, excluding missing pairs. | | int | Number of missing origin-destination pairs in the calculation. | | varchar | Short version of the Git commit SHA. | | varchar | Long version of the Git commit SHA. | | int | Network buffer parameter in meters. | | int | Destination buffer parameter in meters. | | int | Maximum split size parameter. | | boolean | Boolean if snapped points were used. | | varchar | MD5 checksum of the input origins file. | | varchar | MD5 checksum of the input destinations file. |

4. missing_pairs

This is essentially just the NULL values of the times table; it contains point pairs that were unroutable for various reasons. These are kept separate because it seems to help with Parquet compression. The most common cause of a pair being unroutable is one point being on an island.

ER Diagram

```mermaid erDiagram

times { varchar originid PK,FK varchar destinationid PK,FK double duration_sec }

points { varchar pointtype PK varchar id PK,FK double lon double lat double lonsnapped double latsnapped boolean issnapped }

missingpairs { varchar originid PK,FK varchar destination_id PK,FK }

metadata { varchar runid datetime calcdatetimefinished varchar calcchunkid __ _________________________ ___ _________________________ varchar fileinputoriginsmd5 varchar fileinputdestinationsmd5 }

metadata ||--o{ times : "describes" points ||--o{ times : "between" metadata ||--o{ missingpairs : "describes" points ||--o{ missingpairs : "between" ```

Partitioning

OpenTimes uses Hive-style partitioning to split its tables into smaller files and save space. Files are split by partition keys and organized into folders. Visit the data directory to see the full file structure.

All tables use the following partition keys, in order:

| Partition Key | Description | |:----------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | version | The OpenTimes data release version. Follows Semantic Versioning. | | mode | Travel mode. Currently one of "car", "bicycle", or "foot", where each corresponds to one of the default Open Source Routing Machine profiles. | | year | Census geography and OpenStreetMap data year. Origin and destinations points are pulled from the TIGER/Line files. OSM data is from archived Geofabrik North America extracts. | | geography | Census geography type. See Coverage. | | state | Census state-level FIPS code. Includes all 50 states and Washington D.C. | | centroid_type | Census geography centroid type, one of "weighted" or "unweighted". Currently only weighted centroids are used. | | chunk_id | Not technically a partition key. This value is derived from the filename of each Parquet file after it is written to a staging bucket. It is included in most tables but can be ignored for most use cases. |

Specifying partition key values when reading or joining files is highly recommended, as most query engines will skip reading any unnecessary files. See the Using DuckDB section for an example.

License

OpenTimes uses the MIT license. Input data is from OpenStreetMap and the U.S. Census. The basemap on the homepage is from OpenFreeMap. Times are calculated using OSRM.

Attribution

Attribution is required when using OpenTimes data.

Please see the CITATION file. You can also generate APA and BibTeX citations directly from the project sidebar above.

Owner

Name: Dan Snow
Login: dfsnow
Kind: user
Location: San Francisco, CA
Company: @turquoisehealth

Website: sno.ws
Repositories: 24
Profile: https://github.com/dfsnow

Researcher @turquoisehealth. Formerly Director of Data Science @ccao-data. Focused on improving public goods and explaining complex systems.

Citation (CITATION.cff)

title: "OpenTimes"
version: "0.0.1"
type: dataset
message: "If you use this software, please cite it using these metadata."
abstract: "Free travel times between U.S. Census geographies"
authors:
  - family-names: Snow
    given-names: Dan
url: "https://opentimes.org"
repository-code: "https://github.com/dfsnow/opentimes"
cff-version: 1.2.0
date-released: "2024-11-01"
license: "MIT"
keywords:
  - "accessibility"
  - "spatial access"
  - "travel times"
  - "travel time estimation"

GitHub Events

Total

Create event: 25
Issues event: 2
Release event: 1
Watch event: 135
Delete event: 23
Issue comment event: 4
Push event: 502
Pull request event: 29
Fork event: 1

Last Year

Create event: 25
Issues event: 2
Release event: 1
Watch event: 135
Delete event: 23
Issue comment event: 4
Push event: 502
Pull request event: 29
Fork event: 1

Committers

Last synced: about 1 year ago

All Time

Total Commits: 480
Total Committers: 1
Avg Commits per committer: 480.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 480
Committers: 1
Avg Commits per committer: 480.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Dan Snow	d**n@s**s	480

Committer Domains (Top 20 + Academic)

sno.ws: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 19
Average time to close issues: N/A
Average time to close pull requests: about 4 hours
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.26
Merged pull requests: 17
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 19
Average time to close issues: N/A
Average time to close pull requests: about 4 hours
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.26
Merged pull requests: 17
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

dfsnow (1)
ajfriend (1)

Pull Request Authors

dfsnow (25)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 11 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

pypi.org: opentimes

Free travel times between U.S. Census geographies

Documentation: https://opentimes.readthedocs.io/
License: MIT License Copyright (c) Dan Snow Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Latest release: 0.0.1
published over 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 11 Last month

Rankings

Dependent packages count: 9.5%

Average: 31.4%

Dependent repos count: 53.4%

Maintainers (1)

dfsnow

Last synced: 11 months ago

Dependencies

.github/actions/build-docker/action.yaml actions

actions/checkout v4 composite
docker/build-push-action v6 composite
docker/login-action v3 composite
docker/setup-buildx-action v3 composite

.github/actions/setup-cloudflare-s3/action.yaml actions

.github/actions/setup-dvc/action.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite

.github/workflows/calculate-times.yaml actions

./.github/actions/build-docker * composite
./.github/actions/setup-cloudflare-s3 * composite
./.github/actions/setup-dvc * composite
actions/cache/restore v4 composite
actions/cache/save v4 composite
actions/checkout v4 composite

data/Dockerfile docker

ubuntu 22.04 build

pyproject.toml pypi

requests >=2.32.3

.github/actions/fetch-locations/action.yaml actions

./.github/actions/setup-dvc * composite
actions/cache/restore v4 composite
actions/cache/save v4 composite

.github/workflows/create-public-files.yaml actions

./.github/actions/setup-cloudflare-s3 * composite
actions/checkout v4 composite
actions/setup-python v5 composite

.github/workflows/update-data-site.yaml actions

./.github/actions/setup-cloudflare-s3 * composite
actions/checkout v4 composite
actions/setup-python v5 composite

.github/actions/fetch-valhalla-tiles/action.yaml actions

./.github/actions/setup-dvc * composite
actions/cache/restore v4 composite
actions/cache/save v4 composite

.github/workflows/create-valhalla-tiles.yaml actions

./.github/actions/build-docker * composite
./.github/actions/setup-cloudflare-s3 * composite
./.github/actions/setup-dvc * composite
actions/checkout v4 composite
actions/download-artifact v4 composite

.github/workflows/pre-commit.yaml actions

actions/cache v4 composite
actions/checkout v4 composite
astral-sh/setup-uv v5 composite

.github/workflows/pypi-publish.yaml actions

actions/checkout v4 composite
astral-sh/setup-uv v5 composite
pypa/gh-action-pypi-publish release/v1 composite

site/package-lock.json npm

@eslint-community/eslint-utils 4.4.1 development
@eslint-community/regexpp 4.12.1 development
@eslint/config-array 0.19.1 development
@eslint/core 0.9.1 development
@eslint/eslintrc 3.2.0 development
@eslint/js 9.17.0 development
@eslint/object-schema 2.1.5 development
@eslint/plugin-kit 0.2.4 development
@humanfs/core 0.19.1 development
@humanfs/node 0.16.6 development
@humanwhocodes/module-importer 1.0.1 development
@humanwhocodes/retry 0.4.1 development
@humanwhocodes/retry 0.3.1 development
@types/estree 1.0.6 development
@types/json-schema 7.0.15 development
acorn 8.14.0 development
acorn-jsx 5.3.2 development
ajv 6.12.6 development
ansi-styles 4.3.0 development
argparse 2.0.1 development
balanced-match 1.0.2 development
brace-expansion 1.1.11 development
callsites 3.1.0 development
chalk 4.1.2 development
color-convert 2.0.1 development
color-name 1.1.4 development
concat-map 0.0.1 development
cross-spawn 7.0.6 development
debug 4.4.0 development
deep-is 0.1.4 development
escape-string-regexp 4.0.0 development
eslint 9.17.0 development
eslint-scope 8.2.0 development
eslint-visitor-keys 3.4.3 development
eslint-visitor-keys 4.2.0 development
espree 10.3.0 development
esquery 1.6.0 development
esrecurse 4.3.0 development
estraverse 5.3.0 development
esutils 2.0.3 development
fast-deep-equal 3.1.3 development
fast-json-stable-stringify 2.1.0 development
fast-levenshtein 2.0.6 development
file-entry-cache 8.0.0 development
find-up 5.0.0 development
flat-cache 4.0.1 development
flatted 3.3.2 development
glob-parent 6.0.2 development
globals 14.0.0 development
has-flag 4.0.0 development
ignore 5.3.2 development
import-fresh 3.3.0 development
imurmurhash 0.1.4 development
is-extglob 2.1.1 development
is-glob 4.0.3 development
isexe 2.0.0 development
js-yaml 4.1.0 development
json-buffer 3.0.1 development
json-schema-traverse 0.4.1 development
json-stable-stringify-without-jsonify 1.0.1 development
keyv 4.5.4 development
levn 0.4.1 development
locate-path 6.0.0 development
lodash.merge 4.6.2 development
minimatch 3.1.2 development
ms 2.1.3 development
natural-compare 1.4.0 development
optionator 0.9.4 development
p-limit 3.1.0 development
p-locate 5.0.0 development
parent-module 1.0.1 development
path-exists 4.0.0 development
path-key 3.1.1 development
prelude-ls 1.2.1 development
punycode 2.3.1 development
resolve-from 4.0.0 development
shebang-command 2.0.0 development
shebang-regex 3.0.0 development
strip-json-comments 3.1.1 development
supports-color 7.2.0 development
type-check 0.4.0 development
uri-js 4.4.1 development
which 2.0.2 development
word-wrap 1.2.5 development
yocto-queue 0.1.0 development
fzstd 0.1.1
hyparquet 1.6.4
hyparquet-compressors 1.0.0
hysnappy 1.0.0

site/package.json npm

@eslint/js ^9.17.0 development
eslint ^9.17.0 development
hyparquet ^1.6.4
hyparquet-compressors ^1.0.0

opentimes

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

OpenTimes

About OpenTimes | Interactive Map | Data Directory

Getting the data

Direct download

Using R's arrow implementation

Using Python's pandas

Or the equivalent in polars

Using DuckDB

Create a temporary database in memory and attach to the pointer database

Query only tract-level times starting from Cook County, Illinois

Create a temporary database in memory and attach to the pointer database

Query only tract-level times starting from Cook County, Illinois

Example queries

Get pair coordinates

Coverage

Limitations

Database structure

Tables

1. times

2. points

3. metadata

4. missing_pairs

ER Diagram

Partitioning

License

Attribution

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: opentimes

Rankings

Maintainers (1)

Dependencies