reverse-engineering-yjmob100k-grid
Revealing urban area from mobile positioning data
https://github.com/pintergreg/reverse-engineering-yjmob100k-grid
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Keywords
Repository
Revealing urban area from mobile positioning data
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Revealing urban area from mobile positioning data
Usage
The pyproject.toml document the required dependencies. It's suggested to use the Poetry packaging tool. In this case, just issue the poetry install command to set up a virtual environment with all the necessary dependencies.
After the development environment has set up, run the notebooks in the following order to reproduce the results.
- plot_heatmaps.ipynb
- this will reproduce the heatmaps [Figure 6] from the data description paper,
- do the inverse-transformed plots, and
- some related plots for the paper
- generate_grid.ipnyb
- this will locate the observation area within Japan and generate the grid
These two notebooks contain the main work. The detect_homes.ipynb, validatehomedetection.ipynb, calculategridcomplexity.ipynb, and the src/plot_grid.ipynb are optional steps to reproduce the figure in the technical validation section of the paper.
Upscaling grid
- Upscale heatmap as a template
- merges the neighboring cells while summing the activity in the four cells resulting lower resolution template heatmaps
- Locate upscaled observation area
- plots the land area of the selected six prefectures proportionally to the upscaled heatmap (grid) and applies template matching
Other cities
- Helsinki
- the Helsinki notebook processes the data for different grid sizes in one run
- London
rxandryparameters are for the grid size, use either 500, 1000, 2000, or 4000
- Toronto
- it is the same notebook as for London, because the dataset is the same
- enable the Toronto parameter block
- Dallas--Fort Worth
RESparameter is for the H3 resolution, vales between 6 and 10 were applied
User identifiability
A user is considered k-identifiable if the most frequently visited k location are distinguishable [^zang2011anonymization]. The top four location have been determined for every user, then the grid cell were upscaled to 1, 2, 4, 8, and 16 km.
The following table compares the top-four-location identifiable users by upscaled grids. The relevant notebook is here.
| distinguishable cells | 1 km x 1 km | 2 km x 2 km | 4 km x 4 km | 8 km x 8 km | 16 km x 16 km | |------------------------:|--------------:|--------------:|--------------:|--------------:|----------------:| | 4 | 35469 | 12882 | 5090 | 1810 | 470 | | 3 | 48228 | 42323 | 28457 | 16752 | 7438 | | 2 | 15582 | 38548 | 50987 | 52608 | 44939 | | 1 | 721 | 6247 | 15466 | 28830 | 47153 |
[^zang2011anonymization]: Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: a large-scale measurement study. In Proceedings of the 17th annual international conference on Mobile computing and networking (MobiCom '11). Association for Computing Machinery, New York, NY, USA, 145156. https://doi.org/10.1145/2030613.2030630
Results
The results are included to be available without executing the code. Most notably, the reproduced grid (in EPSG:2449 projection).
Choropleth maps using the reproduced grid
The spatial distribution of the activity (first) and the number of unique users (second) per cell using the reproduced grid.

Citation
Use the following BibTeX entry to cite the paper.
BibTeX
@article{pinter2024revealing,
title={Revealing urban area from mobile positioning data},
author={Pint{\'e}r, Gerg{\H{o}}},
journal={Scientific Reports},
volume={14},
number={1},
pages={30948},
year={2024},
publisher={Nature Publishing Group UK London}
}
The code can be cited via GitHub.
Data sources
- Mobility data: YJMob100K
- details about how to prepare it
- OpenStreetMap data
- Copyrighted by OpenStreetMap contributors. It is available under the Open Database License (ODbL).
- Administrative data is from OpenStreetMap
- downloaded from OSM-Boundaries
- prefectures (admin level 4), then filtered manually
- municipalities (admin level 7), then filtered manually
- wards (admin level 8), then filtered to Nagoya
- downloaded from OSM-Boundaries
- Coastline is downloaded from https://osmdata.openstreetmap.de/data/land-polygons.html
- the islands of Japan was extracted using the prefecture boundaries
- Census data
- The Population Census 2020, Population, Households, Sex, Age and Marital status, Table 1-1 was downloaded from the Portal Site of Official Statistics of Japan website (https://www.e-stat.go.jp/)
License
- The code is licensed under BSD-3-Clause
- The documentation and figures are CC BY 4.0
- The shape files are from OpenStreetMap and licensed under the Open Data Commons Open Database License (ODbL)
- The census data was downloaded from the Portal Site of Official Statistics of Japan website (https://www.e-stat.go.jp/)
More details in the REUSE.toml, based on the REUSE definition.
Owner
- Name: Gergő Pintér
- Login: pintergreg
- Kind: user
- Location: Budapest, Hungary
- Twitter: pintergreg
- Repositories: 24
- Profile: https://github.com/pintergreg
data scientist, PhD | Research Fellow, Corvinus University of Budapest
GitHub Events
Total
- Release event: 1
- Push event: 4
- Create event: 1
Last Year
- Release event: 1
- Push event: 4
- Create event: 1
Dependencies
- ipykernel ^6.29.2 develop
- pandas-stubs ^2.2.1.240316 develop
- rich ^13.7.1 develop
- contextily ^1.3.0
- ecomplexity ^0.5.2
- geopandas ^0.14.2
- h3 ^3.7.7
- haversine ^2.8.1
- jinja2 ^3.1.4
- mapclassify ^2.6.1
- matplotlib ^3.8.2
- networkx ^3.2.1
- numpy ^1.26.3
- opencv-python ^4.9.0
- openpyxl ^3.1.2
- osmnx ^1.6.0
- pandarallel ^1.6.5
- pandas ^2.2.0
- pyaml ^23.9.7
- pyarrow ^15.0.0
- pyogrio ^0.6.0
- python ^3.12
- scipy ^1.12.0
- seaborn ^0.13.0
- structlog ^24.1.0
- tabulate ^0.9.0