reverse-engineering-yjmob100k-grid

Revealing urban area from mobile positioning data

https://github.com/pintergreg/reverse-engineering-yjmob100k-grid

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary

Keywords

humob2023-challenge mobile-positioning-data reverse-engineering urban-mobility yjmob100k
Last synced: 6 months ago · JSON representation

Repository

Revealing urban area from mobile positioning data

Basic Info
  • Host: GitHub
  • Owner: pintergreg
  • License: bsd-3-clause
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 41.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
humob2023-challenge mobile-positioning-data reverse-engineering urban-mobility yjmob100k
Created almost 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Revealing urban area from mobile positioning data

Usage

The pyproject.toml document the required dependencies. It's suggested to use the Poetry packaging tool. In this case, just issue the poetry install command to set up a virtual environment with all the necessary dependencies.

After the development environment has set up, run the notebooks in the following order to reproduce the results.

  1. plot_heatmaps.ipynb
    • this will reproduce the heatmaps [Figure 6] from the data description paper,
    • do the inverse-transformed plots, and
    • some related plots for the paper
  2. generate_grid.ipnyb
    • this will locate the observation area within Japan and generate the grid

These two notebooks contain the main work. The detect_homes.ipynb, validatehomedetection.ipynb, calculategridcomplexity.ipynb, and the src/plot_grid.ipynb are optional steps to reproduce the figure in the technical validation section of the paper.

Upscaling grid

  • Upscale heatmap as a template
    • merges the neighboring cells while summing the activity in the four cells resulting lower resolution template heatmaps
  • Locate upscaled observation area
    • plots the land area of the selected six prefectures proportionally to the upscaled heatmap (grid) and applies template matching

Other cities

  1. Helsinki
    • the Helsinki notebook processes the data for different grid sizes in one run
  2. London
    • rx and ry parameters are for the grid size, use either 500, 1000, 2000, or 4000
  3. Toronto
    • it is the same notebook as for London, because the dataset is the same
    • enable the Toronto parameter block
  4. Dallas--Fort Worth
    • RES parameter is for the H3 resolution, vales between 6 and 10 were applied

User identifiability

A user is considered k-identifiable if the most frequently visited k location are distinguishable [^zang2011anonymization]. The top four location have been determined for every user, then the grid cell were upscaled to 1, 2, 4, 8, and 16 km.

The following table compares the top-four-location identifiable users by upscaled grids. The relevant notebook is here.

| distinguishable cells | 1 km x 1 km | 2 km x 2 km | 4 km x 4 km | 8 km x 8 km | 16 km x 16 km | |------------------------:|--------------:|--------------:|--------------:|--------------:|----------------:| | 4 | 35469 | 12882 | 5090 | 1810 | 470 | | 3 | 48228 | 42323 | 28457 | 16752 | 7438 | | 2 | 15582 | 38548 | 50987 | 52608 | 44939 | | 1 | 721 | 6247 | 15466 | 28830 | 47153 |

[^zang2011anonymization]: Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: a large-scale measurement study. In Proceedings of the 17th annual international conference on Mobile computing and networking (MobiCom '11). Association for Computing Machinery, New York, NY, USA, 145156. https://doi.org/10.1145/2030613.2030630

Results

The results are included to be available without executing the code. Most notably, the reproduced grid (in EPSG:2449 projection).

Choropleth maps using the reproduced grid

The spatial distribution of the activity (first) and the number of unique users (second) per cell using the reproduced grid.

spatial distribution of activity spatial distribution of unique users

Citation

Use the following BibTeX entry to cite the paper.

BibTeX
@article{pinter2024revealing,
  title={Revealing urban area from mobile positioning data},
  author={Pint{\'e}r, Gerg{\H{o}}},
  journal={Scientific Reports},
  volume={14},
  number={1},
  pages={30948},
  year={2024},
  publisher={Nature Publishing Group UK London}
}
  

The code can be cited via GitHub.

Data sources

  1. Mobility data: YJMob100K
  2. OpenStreetMap data
    • Copyrighted by OpenStreetMap contributors. It is available under the Open Database License (ODbL).
    • Administrative data is from OpenStreetMap
      • downloaded from OSM-Boundaries
        • prefectures (admin level 4), then filtered manually
        • municipalities (admin level 7), then filtered manually
        • wards (admin level 8), then filtered to Nagoya
    • Coastline is downloaded from https://osmdata.openstreetmap.de/data/land-polygons.html
      • the islands of Japan was extracted using the prefecture boundaries
  3. Census data

License

  • The code is licensed under BSD-3-Clause
  • The documentation and figures are CC BY 4.0
  • The shape files are from OpenStreetMap and licensed under the Open Data Commons Open Database License (ODbL)
  • The census data was downloaded from the Portal Site of Official Statistics of Japan website (https://www.e-stat.go.jp/)

More details in the REUSE.toml, based on the REUSE definition.

Owner

  • Name: Gergő Pintér
  • Login: pintergreg
  • Kind: user
  • Location: Budapest, Hungary

data scientist, PhD | Research Fellow, Corvinus University of Budapest

GitHub Events

Total
  • Release event: 1
  • Push event: 4
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 4
  • Create event: 1

Dependencies

pyproject.toml pypi
  • ipykernel ^6.29.2 develop
  • pandas-stubs ^2.2.1.240316 develop
  • rich ^13.7.1 develop
  • contextily ^1.3.0
  • ecomplexity ^0.5.2
  • geopandas ^0.14.2
  • h3 ^3.7.7
  • haversine ^2.8.1
  • jinja2 ^3.1.4
  • mapclassify ^2.6.1
  • matplotlib ^3.8.2
  • networkx ^3.2.1
  • numpy ^1.26.3
  • opencv-python ^4.9.0
  • openpyxl ^3.1.2
  • osmnx ^1.6.0
  • pandarallel ^1.6.5
  • pandas ^2.2.0
  • pyaml ^23.9.7
  • pyarrow ^15.0.0
  • pyogrio ^0.6.0
  • python ^3.12
  • scipy ^1.12.0
  • seaborn ^0.13.0
  • structlog ^24.1.0
  • tabulate ^0.9.0