scikit-mobility

scikit-mobility: mobility analysis in Python

https://github.com/scikit-mobility/scikit-mobility

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: springer.com, nature.com, zenodo.org
✓
Committers with academic emails
2 of 25 committers (8.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Keywords

complex-systems data-analysis data-science human-mobility mobility-analysis mobility-flows network-science risk-assessment scikit-mobility statistics synthetic-flows

Last synced: 10 months ago · JSON representation ·

Repository

scikit-mobility: mobility analysis in Python

Basic Info

Host: GitHub
Owner: scikit-mobility
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage: https://scikit-mobility.github.io/scikit-mobility/
Size: 34.5 MB

Statistics

Stars: 768
Watchers: 31
Forks: 162
Open Issues: 64
Releases: 7

Topics

complex-systems data-analysis data-science human-mobility mobility-analysis mobility-flows network-science risk-assessment scikit-mobility statistics synthetic-flows

Created about 7 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

README.md

GitHub release (latest by date) GitHub milestones GitHub GitHub contributors

scikit-mobility - mobility analysis in Python

Try `scikit-mobility` without installing it

in a MyBinder notebook:
on Jovian

scikit-mobility is a library for human mobility analysis in Python. The library allows to:

represent trajectories and mobility flows with proper data structures, TrajDataFrame and FlowDataFrame.
manage and manipulate mobility data of various formats (call detail records, GPS data, data from social media, survey data, etc.);
extract mobility metrics and patterns from data, both at individual and collective level (e.g., length of displacements, characteristic distance, origin-destination matrix, etc.)
generate synthetic individual trajectories using standard mathematical models (random walk models, exploration and preferential return model, etc.)
generate synthetic mobility flows using standard migration models (gravity model, radiation model, etc.)
assess the privacy risk associated with a mobility data set

Documentation
Citing
Collaborate with us
Installation
Tutorials
Examples

Documentation

The documentation of scikit-mobility's classes and functions is available at: https://scikit-mobility.github.io/scikit-mobility/

Citing

if you use scikit-mobility please cite the following paper:

Pappalardo, L., Simini, F., Barlacchi, G., & Pellungrini, R. (2022). scikit-mobility: A Python Library for the Analysis, Generation, and Risk Assessment of Mobility Data. Journal of Statistical Software, 103(1), 1–38. https://doi.org/10.18637/jss.v103.i04

Bibtex: @article{JSSv103i04, title={scikit-mobility: A Python Library for the Analysis, Generation, and Risk Assessment of Mobility Data}, volume={103}, url={https://www.jstatsoft.org/index.php/jss/article/view/v103i04}, doi={10.18637/jss.v103.i04}, number={1}, journal={Journal of Statistical Software}, author={Pappalardo, Luca and Simini, Filippo and Barlacchi, Gianni and Pellungrini, Roberto}, year={2022}, pages={1–38} }

Collaborate with us

scikit-mobility is an active project and any contribution is welcome.

If you would like to include your algorithm in scikit-mobility, feel free to fork the project, open an issue and contact us.

Installation

scikit-mobility for Python >= 3.8 and all it's dependencies are available from conda-forge and can be installed using conda install -c conda-forge scikit-mobility.

Note that it is NOT recommended to install scikit-mobility from PyPI! If you're on Windows or Mac, many GeoPandas / scikit-mobility dependencies cannot be pip installed (for details see the corresponding notes in the GeoPandas documentation).

installation with pip (python >= 3.8 required)

Create an environment skmob
```
python3 -m venv skmob
```
Activate
```
source skmob/bin/activate
```
Install skmob
```
pip install scikit-mobility
```
OPTIONAL to use scikit-mobility on the jupyter notebook

- Activate the virutalenv:

        source skmob/bin/activate

- Install jupyter notebook:

        pip install jupyter

- Run jupyter notebook

        jupyter notebook

- (Optional) install the kernel with a specific name

        ipython kernel install --user --name=skmob

installation with conda - miniconda

Create an environment skmob and install pip

conda create -n skmob pip python=3.9 rtree

Activate
```
conda activate skmob
```

Install skmob

conda install -c conda-forge scikit-mobility

OPTIONAL to use scikit-mobility on the jupyter notebook

- Install the kernel

      conda install jupyter -c conda-forge

- Open a notebook and check if the kernel `skmob` is on the kernel list. If not, run the following:
    - On Mac and Linux

          env=$(basename `echo $CONDA_PREFIX`)
          python -m ipykernel install --user --name "$env" --display-name "Python [conda env:"$env"]"

   - On Windows

         python -m ipykernel install --user --name skmob --display-name "Python [conda env: skmob]"

:exclamation: You may run into dependency issues if you try to import the package in Python. If so, try installing the following packages as followed.

conda install -n skmob pyproj urllib3 chardet markupsafe

Test the installation

```

source activate skmob (skmob)> python

import skmob

```

Google Colab

scikit-mobility can be installed on Google Colab using the following commands: !apt-get install -qq curl g++ make !curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz import os os.chdir('spatialindex-src-1.8.5') !./configure !make !make install !pip install rtree !ldconfig !pip install scikit-mobility

Tutorials

You can some tutorials on scikit-mobility here: https://github.com/scikit-mobility/tutorials.

Examples

Create a `TrajDataFrame`

In scikit-mobility, a set of trajectories is described by a TrajDataFrame, an extension of the pandas DataFrame that has specific columns names and data types. A TrajDataFrame can contain many trajectories, and each row in the TrajDataFrame represents a point of a trajectory, described by three mandatory fields (aka columns): - latitude (type: float); - longitude (type: float); - datetime (type: date-time).

Additionally, two optional columns can be specified: - uid (type: string) identifies the object associated with the point of the trajectory. If uid is not present, scikit-mobility assumes that the TrajDataFrame contains trajectories associated with a single moving object; - tid specifies the identifier of the trajectory to which the point belongs to. If tid is not present, scikit-mobility assumes that all rows in the TrajDataFrame associated with a uid belong to the same trajectory;

Note that, besides the mandatory columns, the user can add to a TrajDataFrame as many columns as they want since the data structures in scikit-mobility inherit all the pandas DataFrame functionalities.

Create a TrajDataFrame from a list:

```python

import skmob

create a TrajDataFrame from a list

datalist = [[1, 39.984094, 116.319236, '2008-10-23 13:53:05'], [1, 39.984198, 116.319322, '2008-10-23 13:53:06'], [1, 39.984224, 116.319402, '2008-10-23 13:53:11'], [1, 39.984211, 116.319389, '2008-10-23 13:53:16']] tdf = skmob.TrajDataFrame(datalist, latitude=1, longitude=2, datetime=3)

print a portion of the TrajDataFrame

print(tdf.head()) 0 lat lng datetime 0 1 39.984094 116.319236 2008-10-23 13:53:05 1 1 39.984198 116.319322 2008-10-23 13:53:06 2 1 39.984224 116.319402 2008-10-23 13:53:11 3 1 39.984211 116.319389 2008-10-23 13:53:16python print(type(tdf)) ```

Create a TrajDataFrame from a pandas DataFrame:

```python

import pandas as pd

create a DataFrame from the previous list

datadf = pd.DataFrame(datalist, columns=['user', 'latitude', 'lng', 'hour'])

print the type of the object

print(type(data_df)) <class 'pandas.core.frame.DataFrame'>python

now create a TrajDataFrame from the pandas DataFrame

tdf = skmob.TrajDataFrame(datadf, latitude='latitude', datetime='hour', userid='user')

print the type of the object

print(type(tdf)) <class 'skmob.core.trajectorydataframe.TrajDataFrame'>python

print a portion of the TrajDataFrame

print(tdf.head()) ``` uid lat lng datetime 0 1 39.984094 116.319236 2008-10-23 13:53:05 1 1 39.984198 116.319322 2008-10-23 13:53:06 2 1 39.984224 116.319402 2008-10-23 13:53:11 3 1 39.984211 116.319389 2008-10-23 13:53:16

We can also create a TrajDataFrame from a file. For example, in the following we create a TrajDataFrame from a portion of a GPS trajectory dataset collected in the context of the GeoLife project by 178 users in a period of over four years from April 2007 to October 2011.

```python

download the file from https://raw.githubusercontent.com/scikit-mobility/scikit-mobility/master/examples/geolife_sample.txt.gz

read the trajectory data (GeoLife, Beijing, China)

tdf = skmob.TrajDataFrame.fromfile('geolifesample.txt.gz', latitude='lat', longitude='lon', user_id='user', datetime='datetime')

print a portion of the TrajDataFrame

print(tdf.head()) ``` lat lng datetime uid 0 39.984094 116.319236 2008-10-23 05:53:05 1 1 39.984198 116.319322 2008-10-23 05:53:06 1 2 39.984224 116.319402 2008-10-23 05:53:11 1 3 39.984211 116.319389 2008-10-23 05:53:16 1 4 39.984217 116.319422 2008-10-23 05:53:21 1

A TrajDataFrame can be plotted on a folium interactive map using the plot_trajectory function.

```python

tdf.plot_trajectory(zoom=12, weight=3, opacity=0.9, tiles='Stamen Toner') ```

Plot Trajectory

Create a `FlowDataFrame`

In scikit-mobility, an origin-destination matrix is described by the FlowDataFrame structure, an extension of the pandas DataFrame that has specific column names and data types. A row in a FlowDataFrame represents a flow of objects between two locations, described by three mandatory columns: - origin (type: string); - destination (type: string); - flow (type: integer).

Again, the user can add to a FlowDataFrame as many columns as they want since the FlowDataFrame data structure inherits all the pandas DataFrame functionalities.

In mobility tasks, the territory is often discretized by mapping the coordinates to a spatial tessellation, i.e., a covering of the bi-dimensional space using a countable number of geometric shapes (e.g., squares, hexagons), called tiles, with no overlaps and no gaps. For instance, for the analysis or prediction of mobility flows, a spatial tessellation is used to aggregate flows of people moving among locations (the tiles of the tessellation). For this reason, each FlowDataFrame is associated with a spatial tessellation, a geopandas GeoDataFrame that contains two mandatory columns: - tile_ID (type: integer) indicates the identifier of a location; - geometry indicates the polygon (or point) that describes the geometric shape of the location on a territory (e.g., a square, a voronoi shape, the shape of a neighborhood).

Note that each location identifier in the origin and destination columns of a FlowDataFrame must be present in the associated spatial tessellation.

Create a spatial tessellation from a file describing counties in New York state:

```python

import skmob import geopandas as gpd

load a spatial tessellation

urltess = skmob.utils.constants.NYCOUNTIES2011 tessellation = gpd.readfile(urltess).rename(columns={'tileid': 'tile_ID'})

print a portion of the spatial tessellation

print(tessellation.head()) ``` tile_ID population geometry 0 36019 81716 POLYGON ((-74.006668 44.886017, -74.027389 44.... 1 36101 99145 POLYGON ((-77.099754 42.274215, -77.0996569999... 2 36107 50872 POLYGON ((-76.25014899999999 42.296676, -76.24... 3 36059 1346176 POLYGON ((-73.707662 40.727831, -73.700272 40.... 4 36011 79693 POLYGON ((-76.279067 42.785866, -76.2753479999...

Create a FlowDataFrame from a spatial tessellation and a file of real flows between counties in New York state:

```python

load real flows into a FlowDataFrame

fdf = skmob.FlowDataFrame.fromfile(skmob.utils.constants.NYFLOWS2011, tessellation=tessellation, tileid='tile_ID', sep=",")

print a portion of the flows

print(fdf.head()) ``` flow origin destination 0 121606 36001 36001 1 5 36001 36005 2 29 36001 36007 3 11 36001 36017 4 30 36001 36019

A FlowDataFrame can be visualized on a folium interactive map using the plot_flows function, which plots the flows on a geographic map as lines between the centroids of the tiles in the FlowDataFrame's spatial tessellation:

```python

fdf.plotflows(flowcolor='red') ```

Plot Fluxes

Similarly, the spatial tessellation of a FlowDataFrame can be visualized using the plot_tessellation function. The argument popup_features (type:list, default:[constants.TILE_ID]) allows to enhance the plot's interactivity displaying popup windows that appear when the user clicks on a tile and includes information contained in the columns of the tessellation's GeoDataFrame specified in the argument’s list:

```python

fdf.plottessellation(popupfeatures=['tile_ID', 'population']) ```

Plot Tessellation

The spatial tessellation and the flows can be visualized together using the map_f argument, which specifies the folium object on which to plot:

```python

m = fdf.plottessellation() # plot the tessellation fdf.plotflows(flowcolor='red', mapf=m) # plot the flows ```

Plot Tessellation and Flows

Trajectory preprocessing

As any analytical process, mobility data analysis requires data cleaning and preprocessing steps. The preprocessing module allows the user to perform four main preprocessing steps: - noise filtering; - stop detection; - stop clustering; - trajectory compression;

Note that, if a TrajDataFrame contains multiple trajectories from multiple users, the preprocessing methods automatically apply to the single trajectory and, when necessary, to the single moving object.

Noise filtering

In scikit-mobility, the function filter filters out a point if the speed from the previous point is higher than the parameter max_speed, which is by default set to 500km/h.

```python

from skmob.preprocessing import filtering

filter out all points with a speed (in km/h) from the previous point higher than 500 km/h

ftdf = filtering.filter(tdf, maxspeedkmh=500.) print(ftdf.parameters) {'from_file': 'geolife_sample.txt.gz', 'filter': {'function': 'filter', 'max_speed_kmh': 500.0, 'include_loops': False, 'speed_kmh': 5.0, 'max_loop': 6, 'ratio_max': 0.25}}python ndeletedpoints = len(tdf) - len(ftdf) # number of deleted points print(ndeletedpoints) ``` 54

Note that the TrajDataFrame structure as the parameters attribute, which indicates the operations that have been applied to the TrajDataFrame. This attribute is a dictionary the key of which is the signature of the function applied.

Stop detection

Some points in a trajectory can represent Point-Of-Interests (POIs) such as schools, restaurants, and bars, or they can represent user-specific places such as home and work locations. These points are usually called Stay Points or Stops, and they can be detected in different ways. A common approach is to apply spatial clustering algorithms to cluster trajectory points by looking at their spatial proximity. In scikit-mobility, the stay_locations function, contained in the detection module, finds the stay points visited by a moving object. For instance, to identify the stops where the object spent at least minutes_for_a_stop minutes within a distance spatial_radius_km \time stop_radius_factor, from a given point, we can use the following code:

```python

from skmob.preprocessing import detection

compute the stops for each individual in the TrajDataFrame

stdf = detection.staylocations(tdf, stopradiusfactor=0.5, minutesforastop=20.0, spatialradiuskm=0.2, leaving_time=True)

print a portion of the detected stops

print(stdf.head()) lat lng datetime uid leaving_datetime 0 39.978030 116.327481 2008-10-23 06:01:37 1 2008-10-23 10:32:53 1 40.013820 116.306532 2008-10-23 11:10:19 1 2008-10-23 23:45:27 2 39.978419 116.326870 2008-10-24 00:21:52 1 2008-10-24 01:47:30 3 39.981166 116.308475 2008-10-24 02:02:31 1 2008-10-24 02:30:29 4 39.981431 116.309902 2008-10-24 02:30:29 1 2008-10-24 03:16:35python print('Points of the original trajectory:\t%s'%len(tdf)) print('Points of stops:\t\t\t%s'%len(stdf)) ``` Points of the original trajectory: 217653 Points of stops: 391

A new column leaving_datetime is added to the TrajDataFrame in order to indicate the time when the user left the stop location. We can then visualize the detected stops using the plot_stops function:

```python

m = stdf.plottrajectory(maxusers=1, startendmarkers=False) stdf.plotstops(maxusers=1, map_f=m) ```

Plot Stops

Trajectory compression

The goal of trajectory compression is to reduce the number of trajectory points while preserving the structure of the trajectory. This step results in a significant reduction of the number of trajectory points. In scikit-mobility, we can use one of the methods in the compression module under the preprocessing module. For instance, to merge all the points that are closer than 0.2km from each other, we can use the following code:

```python

from skmob.preprocessing import compression

compress the trajectory using a spatial radius of 0.2 km

ctdf = compression.compress(tdf, spatialradiuskm=0.2)

print the difference in points between original and filtered TrajDataFrame

print('Points of the original trajectory:\t%s'%len(tdf)) print('Points of the compressed trajectory:\t%s'%len(ctdf)) ``` Points of the original trajectory: 217653 Points of the compressed trajectory: 6281

Mobility measures

Several measures have been proposed in the literature to capture the patterns of human mobility, both at the individual and collective levels. Individual measures summarize the mobility patterns of a single moving object, while collective measures summarize mobility patterns of a population as a whole. scikit-mobility provides a wide set of mobility measures, each implemented as a function that takes in input a TrajDataFrame and outputs a pandas DataFrame. Individual and collective measures are implemented the in skmob.measure.individual and the skmob.measures.collective modules, respectively.

For example, the following code compute the radius of gyration, the jump lengths and the home locations of a TrajDataFrame:

```python

from skmob.measures.individual import jumplengths, radiusofgyration, homelocation

load a TrajDataFrame from an URL

url = "https://snap.stanford.edu/data/loc-brightkitetotalCheckins.txt.gz" df = pd.readcsv(url, sep='\t', header=0, nrows=100000, names=['user', 'check-intime', 'latitude', 'longitude', 'location id']) tdf = skmob.TrajDataFrame(df, latitude='latitude', longitude='longitude', datetime='check-intime', user_id='user')

compute the radius of gyration for each individual

rgdf = radiusofgyration(tdf) print(rgdf) uid radius_of_gyration 0 0 1564.436792 1 1 2467.773523 2 2 1439.649774 3 3 1752.604191 4 4 5380.503250python

compute the jump lengths for each individual

jldf = jumplengths(tdf.sortvalues(by='datetime')) print(jldf.head()) ``` uid jump_lengths 0 0 [19.640467328877936, 0.0, 0.0, 1.7434311010381... 1 1 [6.505330424378251, 46.75436600375988, 53.9284... 2 2 [0.0, 0.0, 0.0, 0.0, 3.6410097195943507, 0.0, ... 3 3 [3861.2706300798827, 4.061631313492122, 5.9163... 4 4 [15511.92758595804, 0.0, 15511.92758595804, 1....

Note that for some measures, such as jump_length, the TrajDataFrame must be order in increasing order by the column datetime (see the documentation for the measures that requires this condition https://scikit-mobility.github.io/scikit-mobility/reference/measures.html).

```python

compute the home location for each individual

hldf = homelocation(tdf) print(hl_df.head()) uid lat lng 0 0 39.891077 -105.068532 1 1 37.630490 -122.411084 2 2 39.739154 -104.984703 3 3 37.748170 -122.459192 4 4 60.180171 24.949728python

now let's visualize a cloropleth map of the home locations

import folium from folium.plugins import HeatMap m = folium.Map(tiles = 'openstreetmap', zoomstart=12, controlscale=True) HeatMap(hldf[['lat', 'lng']].values).addto(m) m ```

Cloropleth map home locations

Collective generative models

Collective generative models estimate spatial flows between a set of discrete locations. Examples of spatial flows estimated with collective generative models include commuting trips between neighborhoods, migration flows between municipalities, freight shipments between states, and phone calls between regions.

In scikit-mobility, a collective generative model takes in input a spatial tessellation, i.e., a geopandas GeoDataFrame. To be a valid input for a collective model, the spatial tessellation should contain two columns, geometry and relevance, which are necessary to compute the two variables used by collective algorithms: the distance between tiles and the importance (aka "attractiveness") of each tile. A collective algorithm produces a FlowDataFrame that contains the generated flows and the spatial tessellation. scikit-mobility implements the most common collective generative algorithms: - the Gravity model; - the Radiation model.

Gravity model

The class Gravity, implementing the Gravity model, has two main methods: - fit, which calibrates the model's parameters using a FlowDataFrame; - generate, which generates the flows on a given spatial tessellation.

Load the spatial tessellation and a data set of real flows in a FlowDataFrame:

```python

from skmob.utils import utils, constants import geopandas as gpd from skmob.models.gravity import Gravity import numpy as np

load a spatial tessellation

urltess = skmob.utils.constants.NYCOUNTIES2011 tessellation = gpd.readfile(urltess).rename(columns={'tileid': 'tile_ID'})

load the file with the real fluxes

fdf = skmob.FlowDataFrame.fromfile(skmob.utils.constants.NYFLOWS2011, tessellation=tessellation, tileid='tile_ID', sep=",")

compute the total outflows from each location of the tessellation (excluding self loops)

totoutflows = fdf[fdf['origin'] != fdf['destination']].groupby(by='origin', axis=0)[['flow']].sum().fillna(0) tessellation = tessellation.merge(totoutflows, lefton='tileID', righton='origin').rename(columns={'flow': 'totoutflow'}) ```

Instantiate a Gravity model object and generate synthetic flows:

```python

instantiate a singly constrained Gravity model

gravitysingly = Gravity(gravitytype='singly cons/tetrained') print(gravitysingly) ``` Gravity(name="Gravity model", deterrencefunctype="powerlaw", deterrencefuncargs=[-2.0], originexp=1.0, destinationexp=1.0, gravity_type="singly constrained") ```python

start the generation of the synthetic flows

np.random.seed(0) synthfdf = gravitysingly.generate(tessellation, tileidcolumn='tileID', totoutflowscolumn='totoutflow', relevancecolumn= 'population', outformat='flows')

print a portion of the synthetic flows

print(synth_fdf.head()) ``` origin destination flow 0 36019 36101 101 1 36019 36107 66 2 36019 36059 1041 3 36019 36011 151 4 36019 36123 33

Fit the parameters of the Gravity model from the FlowDataFrame and generate the synthetic flows:

```python

instantiate a Gravity object (with default parameters)

gravitysinglyfitted = Gravity(gravitytype='singly constrained') print(gravitysinglyfitted) ``` Gravity(name="Gravity model", deterrencefunctype="powerlaw", deterrencefuncargs=[-2.0], originexp=1.0, destinationexp=1.0, gravity_type="singly constrained") ```python

fit the parameters of the Gravity from the FlowDataFrame

gravitysinglyfitted.fit(fdf, relevancecolumn='population') print(gravitysinglyfitted) ``` Gravity(name="Gravity model", deterrencefunctype="powerlaw", deterrencefuncargs=[-1.9947152031914186], originexp=1.0, destinationexp=0.6471759552223144, gravity_type="singly constrained") ```python

generate the synthetics flows

np.random.seed(0) synthfdffitted = gravitysinglyfitted.generate(tessellation, tileidcolumn='tileID', totoutflowscolumn='totoutflow', relevancecolumn= 'population', outformat='flows')

print a portion of the synthetic flows

print(synthfdffitted.head()) ``` origin destination flow 0 36019 36101 102 1 36019 36107 66 2 36019 36059 1044 3 36019 36011 152 4 36019 36123 33

Plot the real flows and the synthetic flows:

```python

m = fdf.plotflows(minflow=100, flowexp=0.01, flowcolor='blue') synthfdffitted.plotflows(minflow=1000, flowexp=0.01, mapf=m) ```

Gravity model: real flows vs synthetic flows

Radiation model

The Radiation model is parameter-free and has only one method: generate. Given a spatial tessellation, the synthetic flows can be generated using the Radiation class as follows:

```python

from skmob.models.radiation import Radiation

instantiate a Radiation object

radiation = Radiation()

start the simulation

np.random.seed(0) radflows = radiation.generate(tessellation, tileidcolumn='tileID', totoutflowscolumn='totoutflow', relevancecolumn='population', outformat='flowssample')

print a portion of the synthetic flows

print(rad_flows.head()) ``` origin destination flow 0 36019 36033 11648 1 36019 36031 4232 2 36019 36089 5598 3 36019 36113 1596 4 36019 36041 117

Individual generative models

The goal of individual generative models of human mobility is to create a population of agents whose mobility patterns are statistically indistinguishable from those of real individuals. An individual generative model typically generates a synthetic trajectory corresponding to a single moving object, assuming that an object is independent of the others.

scikit-mobility implements the most common individual generative models, such as the Exploration and Preferential Return model and its variants, and DITRAS. Each generative model is a python class with a public method generate, which starts the generation of synthetic trajectories.

The following code generate synthetic trajectories using the DensityEPR model:

```python

from skmob.models.epr import DensityEPR

load a spatial tesellation on which to perform the simulation

url = skmob.utils.constants.NYCOUNTIES2011 tessellation = gpd.read_file(url)

starting and end times of the simulation

starttime = pd.todatetime('2019/01/01 08:00:00') endtime = pd.todatetime('2019/01/14 08:00:00')

instantiate a DensityEPR object

depr = DensityEPR()

start the simulation

tdf = depr.generate(starttime, endtime, tessellation, relevancecolumn='population', nagents=100) print(tdf.head()) uid datetime lat lng 0 1 2019-01-01 08:00:00.000000 42.452018 -76.473618 1 1 2019-01-01 08:32:30.108708 42.170344 -76.306260 2 1 2019-01-01 09:09:11.760703 43.241550 -75.435903 3 1 2019-01-01 10:00:22.832309 42.170344 -76.306260 4 1 2019-01-01 14:00:25.923314 42.267915 -77.383591python print(tdf.parameters) ``` {'model': {'class': , 'generate': {'startdate': Timestamp('2019-01-01 08:00:00'), 'enddate': Timestamp('2019-01-14 08:00:00'), 'gravitysingly': {}, 'nagents': 100, 'relevancecolumn': 'population', 'randomstate': None, 'verbose': True}}}

Privacy

Mobility data is sensitive since the movements of individuals can reveal confidential personal information or allow the re-identification of individuals in a database, creating serious privacy risks. In the literature, privacy risk assessment relies on the concept of re-identification of a moving object in a database through an attack by a malicious adversary. A common framework for privacy risk assessment assumes that during the attack a malicious adversary acquires, in some way, the access to an anonymized mobility data set, i.e., a mobility data set in which the moving object associated with a trajectory is not known. Moreover, it is assumed that the malicious adversary acquires, in some way, information about the trajectory (or a portion of it) of an individual represented in the acquired data set. Based on this information, the risk of re-identification of that individual is computed estimating how unique that individual's mobility data are with respect to the mobility data of the other individuals represented in the acquired data set.

scikit-mobility provides several attack models, each implemented as a python class. For example in a location attack model, implemented in the LocationAttack class, the malicious adversary knows a certain number of locations visited by an individual, but they do not know the temporal order of the visits. To instantiate a LocationAttack object we can run the following code:

```python

import skmob from skmob.privacy import attacks at = attacks.LocationAttack(knowledge_length=2) ```

The argument knowledge_length specifies how many locations the malicious adversary knows of each object's movement. The re-identification risk is computed based on the worst possible combination of knowledge_length locations out of all possible combinations of locations.

To assess the re-identification risk associated with a mobility data set, represented as a TrajDataFrame, we specify it as input to the assess_risk method, which returns a pandas DataFrame that contains the uid of each object in the TrajDataFrame and the associated re-identification risk as the column risk (type: float, range: $[0,1]$ where 0 indicates minimum risk and 1 maximum risk).

```python

tdf = skmob.TrajDataFrame.fromfile(filename="privacytoy.csv") tdfrisk = at.assessrisk(tdf) print(tdf_risk.head()) ``` uid risk 0 1 0.333333 1 2 0.500000 2 3 0.333333 3 4 0.333333 4 5 0.250000

Since risk assessment may be time-consuming for more massive datasets, scikit-mobility provides the option to focus only on a subset of the objects with the argument targets. For example, in the following code, we compute the re-identification risk for the object with uid 1 and 2 only:

```python

tdfrisk = at.assessrisk(tdf, targets=[1,2]) print(tdf_risk) ``` uid risk 0 1 0.333333 1 2 0.500000

Downloading datasets

The data module of scikit-mobility provides users with an easy way to: 1) Download ready-to-use mobility data (e.g., trajectories, flows, spatial tessellations, and auxiliary data); 2) Load and transform the downloaded dataset into standard skmob structures (TrajDataFrame, GeoDataFrame, FlowDataFrame, DataFrame); 3) Allow developers and contributors to add new datasets to the library.

The data module provides three functions: - list_datasets - get_dataset_info - load_dataset

The user can download the list of all datasets currently available in the library using list_datasets:

```python

import skmob from skmob.data.load import list_datasets

listdatasets() ``` ['flowfoursquarenyc', 'foursquarenyc', 'nycboundaries', 'parkingsanfrancisco', 'taxisan_francisco']

The user can retrieve information about a specific dataset in the library using get_dataset_info:

```python

import skmob from skmob.data.load import getdatasetinfo

getdatasetinfo("foursquare_nyc") ```

{'name': 'Foursquare_NYC',
 'description': 'Dataset containing the Foursquare checkins of individuals moving in New York City',
 'url': 'http://www-public.it-sudparis.eu/~zhang_da/pub/dataset_tsmc2014.zip',
 'hash': 'cbe3fdab373d24b09b5fc53509c8958c77ff72b6c1a68589ce337d4f9a80235b',
 'auth': 'no',
 'data_type': 'trajectory',
 'download_format': 'zip',
 'sep': '   ',
 'encoding': 'ISO-8859-1'}

Finally, the user can download a specific dataset using load_dataset:

```python

import skmob from skmob.data.load import loaddataset, listdatasets

tdfnyc = loaddataset("foursquarenyc", dropcolumns=True) print(tdf_nyc.head()) ``` uid lat lng datetime 0 470 40.719810 -74.002581 2012-04-03 18:00:09+00:00 1 979 40.606800 -74.044170 2012-04-03 18:00:25+00:00 2 69 40.716162 -73.883070 2012-04-03 18:02:24+00:00 3 395 40.745164 -73.982519 2012-04-03 18:02:41+00:00 4 87 40.740104 -73.989658 2012-04-03 18:03:00+00:00

Related packages

movingpandas is a similar package that deals with movement data. Instead of implementing new data structures tailored for trajectories (TrajDataFrame) and mobility flows (FlowDataFrame), movingpandas describes a trajectory using a geopandas GeoDataFrame. There is little overlap in the covered use cases and implemented functionality (comparing scikit-mobility tutorials and movingpandas tutorials): scikit-mobility focuses on computing human mobility metrics, generating synthetic trajectories and assessing privacy risks of mobility datasets. movingpandas on the other hand focuses on spatio-temporal data exploration with corresponding functions for data manipulation and analysis.

Owner

Name: scikit-mobility
Login: scikit-mobility
Kind: organization

Repositories: 4
Profile: https://github.com/scikit-mobility

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this code, please cite this software."
abstract: The last decade has witnessed the emergence of massive mobility data sets, such as tracks generated by GPS devices, call detail records, and geo-tagged posts from social media platforms. These data sets have fostered a vast scientific production on various applications of mobility analysis, ranging from computational epidemiology to urban planning and transportation engineering. A strand of literature addresses data cleaning issues related to raw spatiotemporal trajectories, while the second line of research focuses on discovering the statistical "laws" that govern human movements. A significant effort has also been put on designing algorithms to generate synthetic trajectories able to reproduce, realistically, the laws of human mobility. Last but not least, a line of research addresses the crucial problem of privacy, proposing techniques to perform the re-identification of individuals in a database. A view on state of the art cannot avoid noticing that there is no statistical software that can support scientists and practitioners with all the aspects mentioned above of mobility data analysis. In this paper, we propose scikit-mobility, a Python library that has the ambition of providing an environment to reproduce existing research, analyze mobility data, and simulate human mobility habits. scikit-mobility is efficient and easy to use as it extends pandas, a popular Python library for data analysis. Moreover, scikit-mobility provides the user with many functionalities, from visualizing trajectories to generating synthetic data, from analyzing statistical patterns to assessing the privacy risk related to the analysis of mobility data sets.
authors:
  - family-names: Pappalardo
    given-names: Luca
    orcid: https://orcid.org/0000-0002-1547-6007
  - family-names: Simini
    given-names: Filippo
    orcid: https://orcid.org/0000-0001-8675-3529
  - family-names: Barlacchi
    given-names: Gianni
    orcid: https://orcid.org/0000-0002-9896-0610
  - family-names: Pellungrini
    given-names: Roberto
    orcid: 
title: scikit-mobility
version: 1.1.0
doi: 10.5281/zenodo.3273053
date-released: 2019-07-08
keywords:
  - human mobility
  - data science
  - artificial intelligence
  - urban informatics
  - research software
license: "CC-BY-4.0"

GitHub Events

Total

Issues event: 4
Watch event: 50
Issue comment event: 1
Fork event: 5

Last Year

Issues event: 4
Watch event: 50
Issue comment event: 1
Fork event: 5

Committers

Last synced: over 2 years ago

All Time

Total Commits: 712
Total Committers: 25
Avg Commits per committer: 28.48
Development Distribution Score (DDS): 0.681

Past Year

Commits: 57
Committers: 10
Avg Commits per committer: 5.7
Development Distribution Score (DDS): 0.614

Top Committers

Name	Email	Commits
Michele Ferretti	m**i@g**m	227
lucpappalard	l**4@g**m	195
Filippo Simini	f**i@g**m	78
gbarlacchi	p*****	68
Roberto	r**i@d**t	44
Giuliano Cornacchia	3**a@u**m	31
Michele Ferretti	m**r@u**m	19
Luca Pappalardo	l**o@d**t	7
Sebastian Wolf	s**f@g**m	7
gbarlacchi	g**i@g**m	7
Michele Girolami	m**i@i**t	4
Lorenzo	l**o@g**m	3
pareyesv	p**v@u**m	3
dependabot[bot]	4**]@u**m	2
Anita Graser	a**r@g**t	2
Massimiliano Luca	m**l@g**m	2
Lorenzo F. Lucchnini	3**i@u**m	2
John	j**n@r**m	2
Arash Badie Modiri	a**i@a**i	2
Giovanni	m**i@g**m	2
Shiqing	s**8@y**m	1
Marco De Nadai	me@m****t	1
Larry Dong	l**g@m**a	1
Graser Anita	A**r@a**t	1
Violet	v**5@g**m	1

Committer Domains (Top 20 + Academic)

di.unipi.it: 2 ait.ac.at: 1 mail.mcgill.ca: 1 marcodena.it: 1 yandex.com: 1 aalto.fi: 1 rystadenergy.com: 1 gmx.at: 1 isti.cnr.it: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 89
Total pull requests: 39
Average time to close issues: about 1 year
Average time to close pull requests: about 1 month
Total issue authors: 49
Total pull request authors: 16
Average comments per issue: 1.34
Average comments per pull request: 0.49
Merged pull requests: 24
Bot issues: 0
Bot pull requests: 8

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: about 23 hours
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jonpappalord (17)
gbarlacchi (5)
apiszcz (4)
FilippoSimini (4)
mauruscz (3)
vlingenfelter (3)
GiulianoCornacchia (3)
michelegirolami (2)
stevegaunt (2)
SmolakK (2)
valavakilian (2)
denadai2 (2)
cachatj (2)
txusma23 (2)
gegen07 (2)

Pull Request Authors

miccferr (12)
dependabot[bot] (7)
GiulianoCornacchia (4)
gbarlacchi (2)
vlingenfelter (2)
denadai2 (2)
Apolsus (2)
pitmonticone (1)
mirqr (1)
lgtm-com[bot] (1)
francescodizzia (1)
FilippoSimini (1)
michelegirolami (1)
javandres (1)
lwdovico (1)

Top Labels

Issue Labels

enhancement (21) bug (14) library usage (8) documentation issue (6) question (4) help wanted (4) installation issue (2) wontfix (1) invalid (1) good first issue (1)

Pull Request Labels

dependencies (7) bug (1) enhancement (1)

Packages

Total packages: 3
Total downloads:
- pypi 4,531 last-month
Total docker downloads: 34

Total dependent packages: 4
(may contain duplicates)
Total dependent repositories: 11
(may contain duplicates)
Total versions: 12
Total maintainers: 1

pypi.org: scikit-mobility

A toolbox for analyzing and processing mobility data.

Documentation: https://scikit-mobility.github.io/scikit-mobility
License: new BSD
Latest release: 1.3.1
published about 4 years ago

Versions: 6
Dependent Packages: 4
Dependent Repositories: 9
Downloads: 4,531 Last month
Docker Downloads: 34

Rankings

Dependent packages count: 1.9%

Docker downloads count: 3.4%

Average: 3.5%

Downloads: 3.8%

Dependent repos count: 4.8%

Maintainers (1)

gbarlacchi

Last synced: 11 months ago

proxy.golang.org: github.com/scikit-mobility/scikit-mobility

Documentation: https://pkg.go.dev/github.com/scikit-mobility/scikit-mobility#section-documentation
License: bsd-3-clause
Latest release: v1.1.2
published over 5 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Forks count: 2.4%

Stargazers count: 2.6%

Average: 4.0%

Dependent packages count: 5.4%

Dependent repos count: 5.7%

Last synced: 11 months ago

conda-forge.org: scikit-mobility

scikit-mobility is a library for human mobility analysis in Python. The library allows to: (i) represent trajectories and mobility flows with proper data structures, TrajDataFrame and FlowDataFrame; (ii) manage and manipulate mobility data of various formats (call detail records, GPS data, data from social media, survey data, etc.); (ii) extract mobility metrics and patterns from data, both at individual and collective level (e.g., length of displacements, characteristic distance, origin-destination matrix, etc.); (iii) generate synthetic individual trajectories using standard mathematical models (random walk models, exploration and preferential return model, etc.); (iv) generate synthetic mobility flows using standard migration models (gravity model, radiation model, etc.); (v) assess the privacy risk associated with a mobility data set.

Homepage: https://github.com/scikit-mobility/scikit-mobility
License: BSD-3-Clause
Latest release: 1.3.1
published about 4 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 2

Rankings

Forks count: 15.3%

Stargazers count: 16.2%

Dependent repos count: 20.1%

Average: 25.8%

Dependent packages count: 51.5%

Last synced: 11 months ago

Dependencies

poetry.lock pypi

atomicwrites 1.4.0 develop
cachecontrol 0.12.11 develop
cachy 0.3.0 develop
cffi 1.15.0 develop
cfgv 3.3.1 develop
cleo 0.8.1 develop
clikit 0.6.2 develop
coverage 5.5 develop
coverage-badge 1.1.0 develop
crashtest 0.3.1 develop
cryptography 37.0.2 develop
distlib 0.3.4 develop
filelock 3.7.1 develop
html5lib 1.1 develop
identify 2.5.1 develop
importlib-metadata 4.11.4 develop
iniconfig 1.1.1 develop
jeepney 0.8.0 develop
keyring 23.6.0 develop
lockfile 0.12.2 develop
msgpack 1.0.4 develop
nodeenv 1.6.0 develop
pastel 0.2.1 develop
pexpect 4.8.0 develop
pkginfo 1.8.3 develop
platformdirs 2.5.2 develop
pluggy 1.0.0 develop
poetry 1.1.13 develop
poetry-core 1.0.8 develop
pre-commit 2.19.0 develop
ptyprocess 0.7.0 develop
py 1.11.0 develop
pycparser 2.21 develop
pylev 1.4.0 develop
pytest 6.2.5 develop
pywin32-ctypes 0.2.0 develop
pyyaml 6.0 develop
requests-toolbelt 0.9.1 develop
rtree 0.9.7 develop
secretstorage 3.3.2 develop
shellingham 1.4.0 develop
toml 0.10.2 develop
tomlkit 0.11.0 develop
tox 3.25.0 develop
virtualenv 20.14.1 develop
webencodings 0.5.1 develop
zipp 3.8.0 develop
appdirs 1.4.4
attrs 21.4.0
branca 0.5.0
certifi 2022.5.18.1
charset-normalizer 2.0.12
click 8.1.3
click-plugins 1.1.1
cligj 0.7.2
colorama 0.4.4
cycler 0.11.0
fiona 1.8.21
folium 0.12.1.post1
fonttools 4.33.3
geojson 2.5.0
geopandas 0.10.2
h3 3.7.4
idna 3.3
igraph 0.9.11
jinja2 3.1.2
joblib 1.1.0
kiwisolver 1.4.3
markupsafe 2.1.1
matplotlib 3.5.2
mpmath 1.2.1
munch 2.5.0
numpy 1.22.4
packaging 20.9
pandas 1.4.2
patsy 0.5.2
pillow 9.1.1
pooch 1.6.0
powerlaw 1.5
pyparsing 3.0.9
pyproj 3.3.1
python-dateutil 2.8.2
python-igraph 0.9.11
pytz 2022.1
requests 2.28.0
scikit-learn 1.1.1
scipy 1.6.1
setuptools-scm 6.4.2
shapely 1.8.2
six 1.16.0
statsmodels 0.13.1
texttable 1.6.4
threadpoolctl 3.1.0
tomli 2.0.1
tqdm 4.64.0
urllib3 1.26.9

pyproject.toml pypi

Rtree ^0.9.7 develop
coverage ^5.5 develop
coverage-badge ^1.0.1 develop
poetry ^1.1.6 develop
pre-commit ^2.12.1 develop
pytest ^6.2.4 develop
tox ^3.23.1 develop
folium 0.12.1.post1
geojson ^2.5.0
geopandas ^0.10.2
h3 ^3.7.3
pandas ^1.1.5
pooch ^1.6.0
powerlaw ^1.4.6
python >=3.8,<4
python-igraph ^0.9.1
requests ^2.25.1
scikit-learn *
statsmodels ^0.13.0
tqdm ^4.60.0

.github/workflows/ci.yaml actions

JamesIves/github-pages-deploy-action 4.1.1 composite
actions/checkout v2 composite
actions/download-artifact v2 composite
actions/setup-python v2 composite
actions/upload-artifact v2 composite

scikit-mobility

Science Score: 77.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

scikit-mobility - mobility analysis in Python

Try scikit-mobility without installing it

Table of contents

Documentation

Citing

Collaborate with us

Installation

installation with pip (python >= 3.8 required)

installation with conda - miniconda

Test the installation

Google Colab

Tutorials

Examples

Create a TrajDataFrame

create a TrajDataFrame from a list

print a portion of the TrajDataFrame

create a DataFrame from the previous list

print the type of the object

now create a TrajDataFrame from the pandas DataFrame

print the type of the object

print a portion of the TrajDataFrame

download the file from https://raw.githubusercontent.com/scikit-mobility/scikit-mobility/master/examples/geolife_sample.txt.gz

read the trajectory data (GeoLife, Beijing, China)

print a portion of the TrajDataFrame

Create a FlowDataFrame

load a spatial tessellation

print a portion of the spatial tessellation

load real flows into a FlowDataFrame

print a portion of the flows

Trajectory preprocessing

Noise filtering

filter out all points with a speed (in km/h) from the previous point higher than 500 km/h

Stop detection

compute the stops for each individual in the TrajDataFrame

print a portion of the detected stops

Trajectory compression

compress the trajectory using a spatial radius of 0.2 km

print the difference in points between original and filtered TrajDataFrame

Mobility measures

load a TrajDataFrame from an URL

compute the radius of gyration for each individual

compute the jump lengths for each individual

compute the home location for each individual

now let's visualize a cloropleth map of the home locations

Collective generative models

Gravity model

load a spatial tessellation

load the file with the real fluxes

compute the total outflows from each location of the tessellation (excluding self loops)

instantiate a singly constrained Gravity model

start the generation of the synthetic flows

print a portion of the synthetic flows

instantiate a Gravity object (with default parameters)

fit the parameters of the Gravity from the FlowDataFrame

generate the synthetics flows

print a portion of the synthetic flows

Radiation model

instantiate a Radiation object

start the simulation

print a portion of the synthetic flows

Individual generative models

load a spatial tesellation on which to perform the simulation

starting and end times of the simulation

instantiate a DensityEPR object

start the simulation

Privacy

Downloading datasets

Related packages

Owner

Citation (CITATION.cff)

GitHub Events

Try `scikit-mobility` without installing it

Create a `TrajDataFrame`

Create a `FlowDataFrame`