deepgravity

a PyTorch implementation of the paper "Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information"

https://github.com/scikit-mobility/deepgravity

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

deep-learning flow flow-generator human-mobility mobility mobility-model pytorch

Keywords from Contributors

mobility-analysis mobility-flows network-science risk-assessment scikit-mobility synthetic-flows
Last synced: 6 months ago · JSON representation ·

Repository

a PyTorch implementation of the paper "Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information"

Basic Info
  • Host: GitHub
  • Owner: scikit-mobility
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 16.1 MB
Statistics
  • Stars: 94
  • Watchers: 2
  • Forks: 38
  • Open Issues: 9
  • Releases: 1
Topics
deep-learning flow flow-generator human-mobility mobility mobility-model pytorch
Created over 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme Citation

README.md

A Deep Gravity model for mobility flows generation

Table of contents

  1. Citing
  2. Abstract
  3. Architecture of Deep Gravity
  4. Running Deep Gravity

Citing

If you use the code in this repository, please cite our paper:

F. Simini, G. Barlacchi, M. Luca, L. Pappalardo, A Deep Gravity model for mobility flows generation, Nature Communications 12, 6576 (2021). https://doi.org/10.1038/s41467-021-26752-4

@article{Simini2021, author = {Simini, Filippo and Barlacchi, Gianni and Luca, Massimilano and Pappalardo, Luca}, doi = {10.1038/s41467-021-26752-4}, issn = {2041-1723}, journal = {Nature Communications}, number = {1}, pages = {6576}, title = {{A Deep Gravity model for mobility flows generation}}, url = {https://doi.org/10.1038/s41467-021-26752-4}, volume = {12}, year = {2021}}

and the official code repository: DOI

Abstract

The movements of individuals within and among cities influence critical aspects of our society, such as well-being, the spreading of epidemics, and the quality of the environment. When information about mobility flows is not available for a particular region of interest, we must rely on mathematical models to generate them. We propose Deep Gravity, an effective model to generate flow probabilities that exploits many features (e.g., land use, road network, transport, food, health facilities) extracted from voluntary geographic data, and uses deep neural networks to discover non-linear relationships between those features and mobility flows. Our experiments, conducted on mobility flows in England, Italy, and New York State, show that Deep Gravity achieves a significant increase in performance, especially in densely populated regions of interest, with respect to the classic gravity model and models that do not use deep neural networks or geographic data. Deep Gravity has good generalization capability, generating realistic flows also for geographic areas for which there is no data availability for training. Finally, we show how flows generated by Deep Gravity may be explained in terms of the geographic features and highlight crucial differences among the three considered countries interpreting the model’s prediction with explainable AI techniques.

Performances of DG vs G in an highly populated area in England Figure 1. Performances in terms of Common Part of Commuters (CPC) of Deep Gravity (DG) vs the gravity model (G) in an highly populated area in England

Architecture of Deep Gravity

To generate the flows from a given origin location (e.g., ), Deep Gravity uses a number of input features to compute the probability that any of the locations in the region of interest (e.g., ) is the destination of a trip from . Specifically, the model output is a n-dimensional vector of probabilities for . These probabilities are computed in three steps (see figure below).

Architecture of Deep Gravity Figure 2. Architecture of Deep Gravity

  1. The input vectors for are obtained performing a concatenation of the following input features: , the feature vector of the origin location ; the feature vector of the destination location ; and the distance between origin and destination . For each origin location (e.g. ), input vectors with are created, one for each location in the region of interest that could be a potential destination.

  2. The input vectors are fed in parallel to the same feed-forward neural network. The network has 15 hidden layers of dimensions 256 (the bottom six layers) and 128 (the other layers) with LeakyReLu activation function, . Specifically, the output of hidden layer is given by the vector for the first layer () and for are matrices whose entries are parameters learned during training.

  3. The output of the last layer is a scalar called score: the higher the score for a pair of locations , the higher the probability to observe a trip from to according to the model. Finally, the scores are transformed into probabilities using a softmax function, , which transforms all scores into positive numbers that sum up to one. The generated flow between two locations is then obtained by multiplying the probability (i.e., the model's output) and the origin's total outflow.

The location feature vector provides a spatial representation of an area, and it contains features describing some properties of location , e.g., the total length of residential roads or the number of restaurants therein. Its dimension, , is equal to the total number of features considered. The location features we use include the population size of each location and geographical features extracted from OpenStreetMap belonging to the following categories:

  • Land use areas (5 features): total area (in squared km) for each possible land use class, i.e., residential, commercial, industrial, retail and natural;
  • Road network (3 features): total length (in km) for each different types of roads, i.e., residential, main and other;
  • Transport facilities (2 features): total count of Points Of Interest (POIs) and buildings related to each possible transport facility, e.g., bus/train station, bus stop, car parking;
  • Food facilities (2 features): total count of POIs and buildings related to food facilities, e.g., bar, cafe, restaurant;
  • Health facilities (2 features): total count of POIs and buildings related to health facilities, e.g., clinic, hospital, pharmacy;
  • Education facilities (2 features): total count of POIs and buildings related to education facilities, e.g., school, college, kindergarten;
  • Retail facilities (2 features): total count of POIs and buildings related to retail facilities, e.g., supermarket, department store, mall.

In addition, Deep Gravity includes as feature the geographic distance, , between two locations and , which is defined as the distance measured along the surface of the earth between the centroids of the two polygons representing the locations. All values of features for a given location (excluding distance) are normalized dividing them by the location's area.

Each flow in Deep Gravity is hence described by 39 features (18 geographic features of the origin and 18 of the destination, distance between origin and destination, and their populations).

The loss function of Deep Gravity is the cross-entropy:

where is the fraction of observed flows from that go to and is the model's probability of a unit flow from to . Note that the sum over of the cross-entropies of different origin locations follows from the assumption that flows from different locations are independent events, which allows us to apply the additive property of the cross-entropy for independent random variables.

The network is trained for 20 epochs with the RMSprop optimizer with momentum 0.9 and learning rate using batches of size 64 origin locations. To reduce the training time, we use negative sampling and consider up to 512 randomly selected destinations for each origin location.

Running Deep Gravity

Setup

Make sure you have the following dependencies installed:

  • pytorch 1.7.1
  • numpy 1.19.2
  • pandas 1.2.4
  • geopandas 0.9.0
  • scikit-mobility 1.1.0
  • area

Experiments

Once you installed all the packages correctly, you can run the experiments.

We expect to find some datasets in a path named data/<country_name> where country name is a parameter that can be passed to the model. In particular, we expect to find:

  • tessellation.geojson or tessellation.shp. The tessellation can also be generated by using the parameters tessellation-area and tessellation-size when the model is called.
  • output_areas.geojson or output_areas.shp. A file containing the location code and the geometry of the output areas. the column containing the location code can be specified using the parameter oa-id-column when calling the model.
  • flows.csv containing three columns indicating the origin, destination and the actual flow of people. The columns with the information can be called specifying the parameters flow-origin-column, flow-destination-column and flow-flows-column. Due to GitHub policy, the file containing the flows for the running example of New York have to be downloaded from here. Data are derived starting from the GeoDS COVID-19 project
  • features.csv containing at least a column named like oa-id-column and a set of other columns representing the features of the model

An example of dataset collected in New York is already loaded in the repository and the following examples are based on that. Note that when main.py is launched for the first time, a set of additional files are generated in a folder called processed. These files should not be removed.

The model can be run with the following command:

python main.py --dataset new_york --oa-id-column GEOID --flow-origin-column geoid_o --flow-destination-column geoid_d --flow-flows-column pop_flows --epochs 1 --device cpu --mode train

you can also include some parameters related to the model:

  • batch-size to specify the input batch size for training. Deafult is 1
  • test-batch-size to specify the batch size at test time. Default is 1
  • epochs default is 10
  • lr that is the learning rate. Default is 5e-6
  • momentum default is 0.9
  • seed
  • device can be cpu or gpu
  • mode that can be train or test

There are also some parameters related to the

Once your model is trained, you will find the results of the test phase in a file in the results directory. The file will be named tile2cpc_<model-type>_<country>_<no-round>.csv. In the same folder, you will also find the trained model named model_<model-type>_<country>_<no-round>.pt

Plot of the results

Once you have the results for all the four models in at least a country and at least for one no-round, you can reproduce Figure 3 and Table 1 of the paper using the notebook plot_results.ipynb

Additional Data

The datasets used in the experiments can be found at: - England - https://census.ukdataservice.ac.uk/use-data/guides/flow-data.aspx - https://census.ukdataservice.ac.uk/use-data/guides/boundary-data - Italy - http://datiopen.istat.it/datasetPND.php - https://www.istat.it/it/archivio/104317#accordions - New York - https://github.com/GeoDS/COVID19USFlows - https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2020&layergroup=Census+Tracts

Data related to POIs should be retrieved from appropriate services. Examples are Overpass API, HOTosm or - suggested - by downloading a local copy of the OSM database in a PostgreSQL instance and by running appropriate queries. The query we used to retrieved POIs information is available in osm_query.yaml

Owner

  • Name: scikit-mobility
  • Login: scikit-mobility
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this code, please cite this software."
abstract: The movements of individuals within and among cities influence key aspects of our society, such as the objective and subjective well-being, the diffusion of innovations, the spreading of epidemics, and the quality of the environment. For this reason, there is increasing interest around the challenging problem of flow generation, which consists in generating the flows between a set of geographic locations, given the characteristics of the locations and without any information about the real flows. Existing solutions to flow generation are mainly based on mechanistic approaches, such as the gravity model and the radiation model, which suffer from underfitting and overdispersion, neglect important variables such as land use and the transportation network, and cannot describe non-linear relationships between these variables. In this paper, we propose the Multi-Feature Deep Gravity (MFDG) model as an effective solution to flow generation. On the one hand, the MFDG model exploits a large number of variables (e.g., characteristics of land use and the road network; transport, food, and health facilities) extracted from voluntary geographic information data (OpenStreetMap). On the other hand, our model exploits deep neural networks to describe complex non-linear relationships between those variables. Our experiments, conducted on commuting flows in England, show that the MFDG model achieves a significant increase in the performance (up to 250\% for highly populated areas) than mechanistic models that do not use deep neural networks, or that do not exploit geographic voluntary data. Our work presents a precise definition of the flow generation problem, which is a novel task for the deep learning community working with spatio-temporal data, and proposes a deep neural network model that significantly outperforms current state-of-the-art statistical models.
authors:
  - family-names: Simini
    given-names: Filippo
    orcid: https://orcid.org/0000-0001-8675-3529
  - family-names: Barlacchi
    given-names: Gianni
    orcid: https://orcid.org/0000-0002-9896-0610
  - family-names: Luca
    given-names: Massimiliano
    orcid: 
  - family-names: Pappalardo
    given-names: Luca
    orcid: https://orcid.org/0000-0002-1547-6007
title: Deep Gravity
version: 1.1.0
doi: https://arxiv.org/abs/2012.00489
date-released: 2021-08-03
keywords:
  - human mobility
  - deep learning
  - data science
  - artificial intelligence
  - explainable AI
  - AI
  - urban informatics
  - research software
license: "CC-BY-4.0"

GitHub Events

Total
  • Issues event: 2
  • Watch event: 14
  • Issue comment event: 6
  • Fork event: 3
Last Year
  • Issues event: 2
  • Watch event: 14
  • Issue comment event: 6
  • Fork event: 3

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 26
  • Total Committers: 4
  • Avg Commits per committer: 6.5
  • Development Distribution Score (DDS): 0.615
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Luca Pappalardo l****4@g****m 10
FilippoSimini f****i@g****m 8
Massimiliano Luca m****l@g****m 6
Luca Pappalardo l****o@d****t 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 9
  • Total pull requests: 1
  • Average time to close issues: about 20 hours
  • Average time to close pull requests: less than a minute
  • Total issue authors: 9
  • Total pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 20 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • fatimavrojas (1)
  • sunshineYin (1)
  • Umaruchain (1)
  • lawrence1999 (1)
  • ShuaiQi2025 (1)
  • j4freeman (1)
  • dmolitor (1)
  • xiang526 (1)
  • newbie0621 (1)
Pull Request Authors
  • MassimilianoLuca (1)
Top Labels
Issue Labels
help wanted (3) bug (1)
Pull Request Labels