beiwe-forest
Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 16 DOI reference(s) in README -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.5%) to scientific vocabulary
Repository
Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data
Basic Info
- Host: GitHub
- Owner: onnela-lab
- License: bsd-3-clause
- Language: Python
- Default Branch: develop
- Homepage: https://forest.beiwe.org
- Size: 3.15 MB
Statistics
- Stars: 34
- Watchers: 7
- Forks: 19
- Open Issues: 14
- Releases: 4
Metadata Files
README.md

The Onnela Lab at the Harvard T.H. Chan School of Public Health has developed the Forest library to analyze smartphone-based high-throughput digital phenotyping data. The main intellectual challenge in smartphone-based digital phenotyping has moved from data collection to data analysis. Our research focuses on the development of mathematical and statistical methods for analyzing intensive high-dimensional data. We are actively developing the Forest library for analyzing smartphone-based high-throughput digital phenotyping data collected with the Beiwe platform. Forest will implement our methods for analyzing Beiwe data as a Python package and is released under the BSD-3 open-source license. The Forest library will continue to grow over the coming years as we develop new analytical methods.
Forest can be run locally but is also integrated into the Beiwe back-end on AWS, consistent with the preferred big-data computing paradigm of moving computation to the data. Integrated with Beiwe, Forest can be used to generate on-demand analytics, most importantly daily or hourly summary statistics of collected data, which are stored in a relational database on AWS. The system also implements an API for Tableau, which supports the creation of customizable workbooks and dashboards to view data summaries and troubleshoot any issues with data collection. Tableau is commercial software but is available under free viewer licenses and may be free to academic users for the first year (see Tableau for more information).
For more detailed info on specific subpackages, see our Documentation.
Description
Description of how beiwe data looks (folder structure + on/off cycles)
Input: typically raw data from smartphones Output: typically summary files
- Creating synthetic data
- Want to try out our methods, but don't have smartphone data at hand? Use bonsai
- Data preparation
- Identifying time zones and unit conversion: use poplar
- Collate Beiwe survey data into .csvs per participant or per study: use sycamore
- Data imputation
- State-of-the-art GPS imputation: use jasmine
- Data summarizing (see tables below for summary metrics)
- Mobility metrics from GPS data: use jasmine
- Daily summaries of call & text metadata: use willow
- Survey completion time from survey metadata: use sycamore
Usage
Please note that Forest is tested only against Python 3.11 and 3.12 (preferred). To install:
console
pip install beiwe-forest
Alternatively, install directly from GitHub with pip. As the repo is public, it won't prompt you to login. If you've used forest in the past, it might be prudent to do a pip uninstall forest first.
console
pip install git+https://github.com/onnela-lab/forest
To immediately test out forest, adapt the filepaths in the code below and run:
```python
Currently, all imports from forest must be explicit. For the below example you need to import the following
In the future, it would be great to have all functions import automatically
import datetime
from forest.bonsai.simulatelogdata import simlogdata from forest.bonsai.simulategpsdata import simgpsdata, gpstocsv from forest.jasmine.traj2stats import Frequency, gpsstatsmain from forest.willow.logstats import logstats_main
1. If you don't have any smartphone data (yet) you can generate fake data
pathtosyntheticgpsdata = "ENTER/PATH1/HERE" pathtosyntheticlogdata = "ENTER/PATH2/HERE" pathtogpssummary = "ENTER/PATH/TO/DESIRED/OUTPUT/FOLDER1/HERE" pathtologsummary = "ENTER/PATH/TO/DESIRED/OUTPUT/FOLDER2/HERE"
Generate fake call and text logs
Because of the explicit imports, you don't have to precede the functions with forest.subpackage.
simlogdata(pathtosyntheticlogdata)
Generate synthetic gps data and communication logs data as csv files
Define parameters for generating the data
To save smartphone battery power, we typically collect location data intermittently: e.g. during an on-cycle of 3 minutes, followed by an off-cycle of 12 minutes. We'll generate data in this way
number of persons to generate
n_persons = 1
location of person to generate format: Country2letterISOcode/City_Name
location = "GB/Bristol"
start date of generated trajectories
start_date = datetime.date(2021, 10, 1)
end date of trajectories
end_date = datetime.date(2021, 10, 5)
api key for openroute service, generated from https://openrouteservice.org/
apikey = "mockapi_key"
Length of off-cycle + length of on-cycle in minutes
cycle = 15
Length off-cycle / (length off-cycle + length on-cycle)
percentage = 0.8
dictionary of personal attributes for each user, set to None if random, check Attributes class for usage in simulategpsdata module.
personalattributes = { "User 1": { "mainemployment": "none", "vehicle" : "car", "travellingstatus": 10, "activestatus": 7 },
"Users 2-4":
{
"main_employment": "university",
"vehicle" : "bicycle",
"travelling_status": 8,
"active_status": 8,
"active_status-16": 2
},
"User 5":
{
"main_employment": "office",
"vehicle" : "foot",
"travelling_status": 9,
"travelling_status-20": 1,
"preferred_exits": ["cafe", "bar", "cinema"]
}
} samplegpsdata = simgpsdata(npersons, location, startdate, enddate, cycle, percentage, apikey, personal_attributes)
save data in format of csv files
gpstocsv(samplegpsdata, pathtosyntheticgpsdata, startdate, enddate)
2. Specify parameters for imputation
See https://forest.beiwe.org/en/latest/jasmine.html for details
time zone where the study took place (assumes that all participants were always in this time zone)
tz_str = "Etc/GMT-1"
Generate summary metrics e.g. Frequency.HOURLY, Frequency.DAILY or Frequency.HOURLYANDDAILY (see Frequency class in constants.py)
frequency = Frequency.DAILY
Save imputed trajectories?
save_traj = False
Hyperparameters class for imputation (default leave None), from forest.jasmine.traj2stats import Hyperparameters
parameters = None
list of locations to track if visited, leave None if don't want these summary statistics
placesofinterest = ['cafe', 'bar', 'hospital']
list of OpenStreetMap tags to use for identifying locations, leave None to default to amenity and leisure tagged locations or if you don't want to use OSM (see OSMTags class in constants.py)
osm_tags = None
3. Impute location data and generate mobility summary metrics using the simulated data above
gpsstatsmain( studyfolder = pathtosyntheticgpsdata, outputfolder = pathtogpssummary, tzstr = tzstr, frequency = frequency, savetraj = savetraj, parameters = parameters, placesofinterest = placesofinterest, osmtags = osm_tags, )
4. Generate daily summary metrics for call/text logs
option = Frequency.DAILY timestart = None timeend = None participant_ids = None
logstatsmain(pathtosyntheticlogdata, pathtologsummary, tzstr, option, timestart, timeend, participant_ids) ```
More info
Publications
- Straczkiewicz, M., Huang, E.J., and Onnela, JP. A “one-size-fits-most” walking recognition method for smartphones, smartwatches, and wearable accelerometers. npj Digit. Med. 6, 29 (2023)
Open Access
- Huang E, Yan K, and Onnela JP. Smartphone-Based Activity Recognition Using Multistream Movelets Combining Accelerometer and Gyroscope Data. Sensors 22 (7), 2618 (2022)
- Onnela JP, Dixon C, Griffin K, Jaenicke T, Minowada L, Esterkin S, Siu A, Zagorsky J, and Jones E. Beiwe: A data collection platform for high-throughput digital phenotyping. Journal of Open Source Software, 6(68), 3417 (2021)
- Liu G and Onnela JP. Bidirectional imputation of spatial GPS trajectories with missingness using sparse online Gaussian Process. Journal of the American Medical Informatics Association 28(8), 1777 (2021)
- Barnett I and Onnela JP. Inferring mobility measures from GPS with missing data. Biostatistics 21:2, e98, 2020
Open Access
- Huang E and Onnela JP. Augmented movelet method for activity classification using smartphone gyroscope and accelerometer data. Sensors 20(13), 3706, 2020
Owner
- Name: onnela-lab
- Login: onnela-lab
- Kind: organization
- Repositories: 10
- Profile: https://github.com/onnela-lab
Citation (CITATION.cff)
cff-version: 1.2.0
title: Forest
message: "If you use Forest, please cite it using the metadata from this file."
type: software
authors:
- family-names: Onnela
given-names: Jukka-Pekka
orcid: "https://orcid.org/0000-0001-6613-8668"
- family-names: Barback
given-names: Josh
- family-names: Clement
given-names: Zachary
orcid: "https://orcid.org/0000-0003-2279-5265"
- family-names: Dawood
given-names: Hassan
orcid: "https://orcid.org/0000-0002-2190-5146"
- family-names: Efstathiadis
given-names: Georgios
orcid: "https://orcid.org/0009-0006-2278-1882"
- family-names: Emedom-Nnamdi
given-names: Patrick
orcid: "https://orcid.org/0000-0003-4442-924X"
- family-names: Huang
given-names: Emily J.
orcid: "https://orcid.org/0000-0003-1964-5231"
- family-names: Karas
given-names: Marta
orcid: "https://orcid.org/0000-0001-5889-3970"
- family-names: Liu
given-names: Gang
orcid: "https://orcid.org/0000-0003-3544-363X"
- family-names: Ponarul
given-names: Nellie
orcid: "https://orcid.org/0009-0003-1279-3757"
- family-names: Straczkiewicz
given-names: Marcin
orcid: "https://orcid.org/0000-0002-8703-4451"
- family-names: Sytchev
given-names: Ilya
orcid: "https://orcid.org/0009-0003-0647-5613"
- family-names: Beukenhorst
given-names: Anna
orcid: "https://orcid.org/0000-0002-1765-4890"
repository-code: "https://github.com/onnela-lab/forest"
url: "https://forest.beiwe.org"
abstract: "Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data."
keywords:
- "digital phenotyping"
- smartphone
- statistics
- accelerometer
- GPS
license: BSD-3-Clause
references:
- authors:
- family-names: Straczkiewicz
given-names: Marcin
affiliation: "Department of Biostatistics, Harvard University"
orcid: "https://orcid.org/0000-0002-8703-4451"
- family-names: Huang
given-names: Emily J.
affiliation: "Department of Statistical Sciences, Wake Forest University"
orcid: "https://orcid.org/0000-0003-1964-5231"
- family-names: Onnela
given-names: Jukka-Pekka
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0001-6613-8668"
doi: 10.1038/s41746-022-00745-z
journal: "npj Digital Medicine"
month: 2
start: 29
title: "A “one-size-fits-most” walking recognition method for smartphones, smartwatches, and wearable accelerometers"
type: article
volume: 6
year: 2023
- authors:
- family-names: Huang
given-names: Emily J.
affiliation: "Department of Mathematics and Statistics, Wake Forest University"
orcid: "https://orcid.org/0000-0003-1964-5231"
- family-names: Yan
given-names: Kebin
affiliation: "Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania"
- family-names: Onnela
given-names: Jukka-Pekka
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0001-6613-8668"
doi: 10.3390/s22072618
issue: 7
journal: Sensors
start: 2618
title: "Smartphone-Based Activity Recognition Using Multistream Movelets Combining Accelerometer and Gyroscope Data"
type: article
volume: 22
year: 2022
- authors:
- family-names: Liu
given-names: Gang
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0003-3544-363X"
- family-names: Onnela
given-names: Jukka-Pekka
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0001-6613-8668"
doi: 10.1093/jamia/ocab069
issue: 8
journal: "Journal of the American Medical Informatics Association"
start: 1777
title: "Bidirectional imputation of spatial GPS trajectories with missingness using sparse online Gaussian Process"
type: article
volume: 28
year: 2021
- authors:
- family-names: Onnela
given-names: Jukka-Pekka
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0001-6613-8668"
- family-names: Dixon
given-names: Caleb
affiliation: "Zagaran, Inc."
- family-names: Griffin
given-names: Keary
affiliation: "Rocket Farm Studios"
- family-names: Jaenicke
given-names: Tucker
affiliation: "Zagaran, Inc."
- family-names: Minowada
given-names: Leila
affiliation: "Zagaran, Inc."
- family-names: Esterkin
given-names: Sean
affiliation: "Zagaran, Inc."
- family-names: Siu
given-names: Alvin
affiliation: "Zagaran, Inc."
- family-names: Zagorsky
given-names: Josh
affiliation: "Zagaran, Inc."
- family-names: Jones
given-names: Eli
affiliation: "Zagaran, Inc."
doi: 10.21105/joss.03417
issue: 68
journal: "Journal of Open Source Software"
month: 12
start: 3417
title: "Beiwe: A data collection platform for high-throughput digital phenotyping"
type: article
volume: 6
year: 2021
- authors:
- family-names: Barnett
given-names: Ian
affiliation: "Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania"
orcid: "https://orcid.org/0000-0003-3256-5703"
- family-names: Onnela
given-names: Jukka-Pekka
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0001-6613-8668"
doi: 10.1093/biostatistics/kxy059
end: e112
issue: 2
journal: Biostatistics
start: e98
title: "Inferring mobility measures from GPS with missing data"
type: article
volume: 21
year: 2020
- authors:
- family-names: Huang
given-names: Emily J.
affiliation: "Department of Mathematics and Statistics, Wake Forest University"
orcid: "https://orcid.org/0000-0003-1964-5231"
- family-names: Onnela
given-names: Jukka-Pekka
affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
orcid: "https://orcid.org/0000-0001-6613-8668"
doi: 10.3390/s20133706
issue: 13
journal: Sensors
start: 3706
title: "Augmented movelet method for activity classification using smartphone gyroscope and accelerometer data"
type: article
volume: 20
year: 2020
GitHub Events
Total
- Create event: 31
- Release event: 3
- Issues event: 45
- Watch event: 6
- Delete event: 26
- Member event: 1
- Issue comment event: 67
- Push event: 106
- Pull request review comment event: 2
- Pull request review event: 21
- Pull request event: 29
- Fork event: 4
Last Year
- Create event: 31
- Release event: 3
- Issues event: 45
- Watch event: 6
- Delete event: 26
- Member event: 1
- Issue comment event: 67
- Push event: 106
- Pull request review comment event: 2
- Pull request review event: 21
- Pull request event: 29
- Fork event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 6
- Total pull requests: 7
- Average time to close issues: 7 months
- Average time to close pull requests: 4 months
- Total issue authors: 4
- Total pull request authors: 6
- Average comments per issue: 0.17
- Average comments per pull request: 0.29
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 6
- Average time to close issues: 7 months
- Average time to close pull requests: 4 days
- Issue authors: 4
- Pull request authors: 5
- Average comments per issue: 0.17
- Average comments per pull request: 0.17
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hackdna (24)
- biblicabeebli (4)
- JiaxinCCC (3)
- muschellij2 (2)
- YUHUINI1995 (1)
- GeorgeEfstathiadis (1)
Pull Request Authors
- GeorgeEfstathiadis (7)
- hackdna (7)
- clementzach (3)
- MMel099 (2)
- jprince127 (1)
- hydawo (1)
- NielsGudd (1)
- biblicabeebli (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 142 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 3
pypi.org: beiwe-forest
Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data
- Homepage: https://forest.beiwe.org
- Documentation: https://forest.beiwe.org
- License: BSD-3-Clause
-
Latest release: 1.0
published 8 months ago
Rankings
Maintainers (3)
Dependencies
- myst-parser ==0.17.2
- sphinx ==4.5.0
- sphinx-copybutton ==0.5.0
- sphinx_rtd_theme ==1.0.0
- flake8 ==4.0.1
- mypy ==0.950
- pytest ==7.1.2
- pytest-mock ==3.7.0
- types-dataclasses ==0.6.5
- types-pytz ==2021.3.7
- types-requests ==2.27.25
- types-setuptools ==57.4.14
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite