pnw-ml
A ML-ready curated data set for a wide range of seismic signals from Pacific Northwest
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.7%) to scientific vocabulary
Keywords
Repository
A ML-ready curated data set for a wide range of seismic signals from Pacific Northwest
Basic Info
Statistics
- Stars: 15
- Watchers: 2
- Forks: 4
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Pacific Northwest Curated Seismic Dataset
A curated dataset for a wide range of sources from the Pacific Northwest

Overview
Each dataset is made by two files: waveform (in HDF5 format) and metadata (in CSV format). All follow the structure of seisbench format. See here to learn more about the file structure.
Datasets
We are hosting two copies of the dataset: one on Google Drive, another on UW-ESS server. All datasets are also available through SeisBench.
1. ComCat Events
EH, BH, and HH channel (velocity)
EN (accelerometer)
2. Noise Waveform (EH, BH, and HH)
3. Exotic Events (EH, BH, and HH)
4. Northern California Sequence (December 2022)
5. ML-enhanced catalog
- CSV (~93 MB): [GDrive]
Access
Quick tour to the dataset
Here are several ways to use the PNW dataset.
- Jupyter Notebook
A jupyter notebook is available to load and plot PNW dataset at here. Download and run it on a local machine to enable the interactive plotting (e.g., zoom in/out for checking the picks).
A notebook is available here on accessing the dataset with SeisBench APIs.
If you are more familiar with Google Colab, go to the link above. Note that interactive plotting is not available on Colab.
Demo sets
A micro version of the dataset, which contains 10 earthquake streams, 10 explosion streams, 10 sonic boom streams, 10 thunder streams, and 10 surface event streams. See data/microPNW.
A mini version of the dataset, which contains 500 earthquake streams, 500 explosion streams, 500 surface event streams, 126 sonic boom streams, and 94 thunder quake streams.
- waveform (640 MB): [Google Drive] | [UW-ESS]
- metadata (424 KB): [Google Drive] | [UW-ESS]
A meso version with 10% of the full ComCat dataset (only earthquake + explosion).
- waveform (6.3 GB): [Google Drive] | [UW-ESS]
- metadata (4.7 MB): [Google Drive] | [UW-ESS]
Metadata
| Attribute | Description | Example | | ----------- | ----------- |-------| | eventid | Event identifier | uw10564613 | | sourceorigintime | Source origin time in UTC | 2002-10-03T01:56:49.530000Z | | sourcelatitudedeg | - | 48.553 | | sourcelongitudedeg | - | -122.52 | | sourcetype | - | earthquake | | sourcetypepnsnlabel | PNSN AQMS event type | eq | | sourcedepthkm | - | 14.907 | | sourcemagnitudepreferred | - | 2.1 | | sourcemagnitudetypepreferred | - | Md | | sourcemagnitudeuncertaintypreferred | - | 0.03 | | sourcelocal/duration/handmagnitude | Ml, Md, and Mh if available | 1.32 | | sourcelocal/durationmagnitudeuncertainty | magnitude uncertainty if available | 0.15 | | sourcedepthuncertaintykm | - | 1.69 | | sourcehorizontaluncertaintykm | - |0.694 | | stationnetworkcode | FDSN network code | UW | | stationcode | FDSN station code | GNW | | stationlocationcode | FDSN location code | 01 | | stationchannelcode | FDSN channel code (first two digits) | BH | | stationlatitudedeg | - | 47.5641 | | stationlongitudedeg | - | -122.825 | | stationelevationm | - | 220.0 | | tracename | Bucket and array index | bucket1\$0,:3:15001 | | tracesamplingratehz | All traces resampled to 100 Hz | 100 | | tracestarttime | Trace start time in UTC | 2002-10-03T01:55:59.530000Z | | traceP/Sarrivalsample | Closest sample index of arrival | 8097 | | traceP/Sarrivaluncertaintys | Picking uncertainty in second | 0.02 | | traceP/Sonset | - | emergent | | tracePpolarity | P-wave onset polarity | positive, negative, or undecidable | | tracehasoffset | Any visible offset in the trace | 1 | | tracemissingchannel | Number of missing channel of the trace | 2 | | tracesnrdb | SNR for each component | 6.135|3.065|11.766 |
Reference
Ni, Y., Hutko, A., Skene, F., Denolle, M., Malone, S., Bodin, P., Hartog, R., & Wright, A. (2023). Curated Pacific Northwest AI-ready Seismic Dataset. Seismica, 2(1). https://doi.org/10.26443/seismica.v2i1.368
BiBTex:
bibtex
@article{ni2023pnw,
title={Curated Pacific Northwest AI-ready Seismic Dataset},
volume={2},
url={https://seismica.library.mcgill.ca/article/view/368},
number={1},
journal={Seismica},
author={Ni, Yiyu and Hutko, Alexander and Skene, Francesca and Denolle, Marine and Malone, Stephen and Bodin, Paul and Hartog, Renate and Wright, Amy},
year={2023},
month={05},
doi={10.26443/seismica.v2i1.368}
}
Known issues
- [August 2023] Very few events (~15) in the ComCat dataset may have inconsistent
event_type_pnsn_labelandevent_type. This issue comes from the outdated ComCat event metadata. Please prioritize PNSN label when such inconsistent occurs. - [June 2025] The
trace_start_timefield in the exotic metadata was delayed by 50 seconds. The metadata has now been corrected for all affected files.
Report bugs
If you find any issue in the dataset, please report through GitHub Issue or Email.
Owner
- Name: Yiyu Ni
- Login: niyiyu
- Kind: user
- Location: Seattle, Washington
- Company: University of Washington
- Website: https://niyiyu.github.io
- Twitter: niyiyu_
- Repositories: 15
- Profile: https://github.com/niyiyu
PhD Student at Earth and Space Sciences, University of Washington
Citation (citation.cff)
cff-version: 1.2.0
message: "Please cite the following reference if you use the data & code for your work"
authors:
- family-names: "Ni"
given-names: "Yiyu"
url: "https://github.com/niyiyu/PNW-ML"
doi: 10.5281/zenodo.7627103
preferred-citation:
type: article
authors:
- family-names: "Ni"
given-names: "Yiyu"
- family-names: "Hutko"
given-names: "Alexander"
- family-names: "Skene"
given-names: "Francesca"
- family-names: "Denolle"
given-names: "Marine"
- family-names: "Malone"
given-names: "Stephen"
- family-names: "Bodin"
given-names: "Paul"
- family-names: "Hartog"
given-names: "Renate"
- family-names: "Wright"
given-names: "Amy"
title: "Curated Pacific Northwest AI-ready Seismic Dataset."
journal: "Seismica"
month: May
volume: 2
number: 1
doi: https://doi.org/10.26443/seismica.v2i1.368
url: https://seismica.library.mcgill.ca/article/view/368
GitHub Events
Total
- Watch event: 4
- Push event: 4
Last Year
- Watch event: 4
- Push event: 4