beiwe-forest

Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data

https://github.com/onnela-lab/forest

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 16 DOI reference(s) in README
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data

Basic Info
  • Host: GitHub
  • Owner: onnela-lab
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: develop
  • Homepage: https://forest.beiwe.org
  • Size: 3.15 MB
Statistics
  • Stars: 34
  • Watchers: 7
  • Forks: 19
  • Open Issues: 14
  • Releases: 4
Created over 6 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

build Documentation Status PyPI version

Forest logo

The Onnela Lab at the Harvard T.H. Chan School of Public Health has developed the Forest library to analyze smartphone-based high-throughput digital phenotyping data. The main intellectual challenge in smartphone-based digital phenotyping has moved from data collection to data analysis. Our research focuses on the development of mathematical and statistical methods for analyzing intensive high-dimensional data. We are actively developing the Forest library for analyzing smartphone-based high-throughput digital phenotyping data collected with the Beiwe platform. Forest will implement our methods for analyzing Beiwe data as a Python package and is released under the BSD-3 open-source license. The Forest library will continue to grow over the coming years as we develop new analytical methods.

Forest can be run locally but is also integrated into the Beiwe back-end on AWS, consistent with the preferred big-data computing paradigm of moving computation to the data. Integrated with Beiwe, Forest can be used to generate on-demand analytics, most importantly daily or hourly summary statistics of collected data, which are stored in a relational database on AWS. The system also implements an API for Tableau, which supports the creation of customizable workbooks and dashboards to view data summaries and troubleshoot any issues with data collection. Tableau is commercial software but is available under free viewer licenses and may be free to academic users for the first year (see Tableau for more information).

For more detailed info on specific subpackages, see our Documentation.

Description

Description of how beiwe data looks (folder structure + on/off cycles)

Input: typically raw data from smartphones Output: typically summary files

  • Creating synthetic data
    • Want to try out our methods, but don't have smartphone data at hand? Use bonsai
  • Data preparation
    • Identifying time zones and unit conversion: use poplar
    • Collate Beiwe survey data into .csvs per participant or per study: use sycamore
  • Data imputation
    • State-of-the-art GPS imputation: use jasmine
  • Data summarizing (see tables below for summary metrics)
    • Mobility metrics from GPS data: use jasmine
    • Daily summaries of call & text metadata: use willow
    • Survey completion time from survey metadata: use sycamore

Usage

Please note that Forest is tested only against Python 3.11 and 3.12 (preferred). To install: console pip install beiwe-forest

Alternatively, install directly from GitHub with pip. As the repo is public, it won't prompt you to login. If you've used forest in the past, it might be prudent to do a pip uninstall forest first.

console pip install git+https://github.com/onnela-lab/forest

To immediately test out forest, adapt the filepaths in the code below and run:

```python

Currently, all imports from forest must be explicit. For the below example you need to import the following

In the future, it would be great to have all functions import automatically

import datetime

from forest.bonsai.simulatelogdata import simlogdata from forest.bonsai.simulategpsdata import simgpsdata, gpstocsv from forest.jasmine.traj2stats import Frequency, gpsstatsmain from forest.willow.logstats import logstats_main

1. If you don't have any smartphone data (yet) you can generate fake data

pathtosyntheticgpsdata = "ENTER/PATH1/HERE" pathtosyntheticlogdata = "ENTER/PATH2/HERE" pathtogpssummary = "ENTER/PATH/TO/DESIRED/OUTPUT/FOLDER1/HERE" pathtologsummary = "ENTER/PATH/TO/DESIRED/OUTPUT/FOLDER2/HERE"

Generate fake call and text logs

Because of the explicit imports, you don't have to precede the functions with forest.subpackage.

simlogdata(pathtosyntheticlogdata)

Generate synthetic gps data and communication logs data as csv files

Define parameters for generating the data

To save smartphone battery power, we typically collect location data intermittently: e.g. during an on-cycle of 3 minutes, followed by an off-cycle of 12 minutes. We'll generate data in this way

number of persons to generate

n_persons = 1

location of person to generate format: Country2letterISOcode/City_Name

location = "GB/Bristol"

start date of generated trajectories

start_date = datetime.date(2021, 10, 1)

end date of trajectories

end_date = datetime.date(2021, 10, 5)

api key for openroute service, generated from https://openrouteservice.org/

apikey = "mockapi_key"

Length of off-cycle + length of on-cycle in minutes

cycle = 15

Length off-cycle / (length off-cycle + length on-cycle)

percentage = 0.8

dictionary of personal attributes for each user, set to None if random, check Attributes class for usage in simulategpsdata module.

personalattributes = { "User 1": { "mainemployment": "none", "vehicle" : "car", "travellingstatus": 10, "activestatus": 7 },

"Users 2-4":
{
    "main_employment": "university",
    "vehicle" : "bicycle",
    "travelling_status": 8,
    "active_status": 8,
    "active_status-16": 2 
},

"User 5":
{
    "main_employment": "office",
    "vehicle" : "foot",
    "travelling_status": 9,
    "travelling_status-20": 1,
    "preferred_exits": ["cafe", "bar", "cinema"] 
}

} samplegpsdata = simgpsdata(npersons, location, startdate, enddate, cycle, percentage, apikey, personal_attributes)

save data in format of csv files

gpstocsv(samplegpsdata, pathtosyntheticgpsdata, startdate, enddate)

2. Specify parameters for imputation

See https://forest.beiwe.org/en/latest/jasmine.html for details

time zone where the study took place (assumes that all participants were always in this time zone)

tz_str = "Etc/GMT-1"

Generate summary metrics e.g. Frequency.HOURLY, Frequency.DAILY or Frequency.HOURLYANDDAILY (see Frequency class in constants.py)

frequency = Frequency.DAILY

Save imputed trajectories?

save_traj = False

Hyperparameters class for imputation (default leave None), from forest.jasmine.traj2stats import Hyperparameters

parameters = None

list of locations to track if visited, leave None if don't want these summary statistics

placesofinterest = ['cafe', 'bar', 'hospital']

list of OpenStreetMap tags to use for identifying locations, leave None to default to amenity and leisure tagged locations or if you don't want to use OSM (see OSMTags class in constants.py)

osm_tags = None

3. Impute location data and generate mobility summary metrics using the simulated data above

gpsstatsmain( studyfolder = pathtosyntheticgpsdata, outputfolder = pathtogpssummary, tzstr = tzstr, frequency = frequency, savetraj = savetraj, parameters = parameters, placesofinterest = placesofinterest, osmtags = osm_tags, )

4. Generate daily summary metrics for call/text logs

option = Frequency.DAILY timestart = None timeend = None participant_ids = None

logstatsmain(pathtosyntheticlogdata, pathtologsummary, tzstr, option, timestart, timeend, participant_ids) ```

More info

Publications

  • Straczkiewicz, M., Huang, E.J., and Onnela, JP. A “one-size-fits-most” walking recognition method for smartphones, smartwatches, and wearable accelerometers. npj Digit. Med. 6, 29 (2023) DOI Open Access
  • Huang E, Yan K, and Onnela JP. Smartphone-Based Activity Recognition Using Multistream Movelets Combining Accelerometer and Gyroscope Data. Sensors 22 (7), 2618 (2022) DOI
  • Onnela JP, Dixon C, Griffin K, Jaenicke T, Minowada L, Esterkin S, Siu A, Zagorsky J, and Jones E. Beiwe: A data collection platform for high-throughput digital phenotyping. Journal of Open Source Software, 6(68), 3417 (2021) DOI
  • Liu G and Onnela JP. Bidirectional imputation of spatial GPS trajectories with missingness using sparse online Gaussian Process. Journal of the American Medical Informatics Association 28(8), 1777 (2021) DOI
  • Barnett I and Onnela JP. Inferring mobility measures from GPS with missing data. Biostatistics 21:2, e98, 2020 DOI Open Access
  • Huang E and Onnela JP. Augmented movelet method for activity classification using smartphone gyroscope and accelerometer data. Sensors 20(13), 3706, 2020 DOI DOI

Owner

  • Name: onnela-lab
  • Login: onnela-lab
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: Forest
message: "If you use Forest, please cite it using the metadata from this file."
type: software
authors:
  - family-names: Onnela
    given-names: Jukka-Pekka
    orcid: "https://orcid.org/0000-0001-6613-8668"
  - family-names: Barback
    given-names: Josh
  - family-names: Clement
    given-names: Zachary
    orcid: "https://orcid.org/0000-0003-2279-5265"
  - family-names: Dawood
    given-names: Hassan
    orcid: "https://orcid.org/0000-0002-2190-5146"
  - family-names: Efstathiadis
    given-names: Georgios
    orcid: "https://orcid.org/0009-0006-2278-1882"
  - family-names: Emedom-Nnamdi
    given-names: Patrick
    orcid: "https://orcid.org/0000-0003-4442-924X"
  - family-names: Huang
    given-names: Emily J.
    orcid: "https://orcid.org/0000-0003-1964-5231"
  - family-names: Karas
    given-names: Marta
    orcid: "https://orcid.org/0000-0001-5889-3970"
  - family-names: Liu
    given-names: Gang
    orcid: "https://orcid.org/0000-0003-3544-363X"
  - family-names: Ponarul
    given-names: Nellie
    orcid: "https://orcid.org/0009-0003-1279-3757"
  - family-names: Straczkiewicz
    given-names: Marcin
    orcid: "https://orcid.org/0000-0002-8703-4451"
  - family-names: Sytchev
    given-names: Ilya
    orcid: "https://orcid.org/0009-0003-0647-5613"
  - family-names: Beukenhorst
    given-names: Anna
    orcid: "https://orcid.org/0000-0002-1765-4890"
repository-code: "https://github.com/onnela-lab/forest"
url: "https://forest.beiwe.org"
abstract: "Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data."
keywords:
  - "digital phenotyping"
  - smartphone
  - statistics
  - accelerometer
  - GPS
license: BSD-3-Clause
references:
  - authors:
      - family-names: Straczkiewicz
        given-names: Marcin
        affiliation: "Department of Biostatistics, Harvard University"
        orcid: "https://orcid.org/0000-0002-8703-4451"
      - family-names: Huang
        given-names: Emily J.
        affiliation: "Department of Statistical Sciences, Wake Forest University"
        orcid: "https://orcid.org/0000-0003-1964-5231"
      - family-names: Onnela
        given-names: Jukka-Pekka
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0001-6613-8668"
    doi: 10.1038/s41746-022-00745-z
    journal: "npj Digital Medicine"
    month: 2
    start: 29
    title: "A “one-size-fits-most” walking recognition method for smartphones, smartwatches, and wearable accelerometers"
    type: article
    volume: 6
    year: 2023
  - authors:
      - family-names: Huang
        given-names: Emily J.
        affiliation: "Department of Mathematics and Statistics, Wake Forest University"
        orcid: "https://orcid.org/0000-0003-1964-5231"
      - family-names: Yan
        given-names: Kebin
        affiliation: "Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania"
      - family-names: Onnela
        given-names: Jukka-Pekka
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0001-6613-8668"
    doi: 10.3390/s22072618
    issue: 7
    journal: Sensors
    start: 2618
    title: "Smartphone-Based Activity Recognition Using Multistream Movelets Combining Accelerometer and Gyroscope Data"
    type: article
    volume: 22
    year: 2022
  - authors:
      - family-names: Liu
        given-names: Gang
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0003-3544-363X"
      - family-names: Onnela
        given-names: Jukka-Pekka
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0001-6613-8668"
    doi: 10.1093/jamia/ocab069
    issue: 8
    journal: "Journal of the American Medical Informatics Association"
    start: 1777
    title: "Bidirectional imputation of spatial GPS trajectories with missingness using sparse online Gaussian Process"
    type: article
    volume: 28
    year: 2021
  - authors:
      - family-names: Onnela
        given-names: Jukka-Pekka
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0001-6613-8668"
      - family-names: Dixon
        given-names: Caleb
        affiliation: "Zagaran, Inc."
      - family-names: Griffin
        given-names: Keary
        affiliation: "Rocket Farm Studios"
      - family-names: Jaenicke
        given-names: Tucker
        affiliation: "Zagaran, Inc."
      - family-names: Minowada
        given-names: Leila
        affiliation: "Zagaran, Inc."
      - family-names: Esterkin
        given-names: Sean
        affiliation: "Zagaran, Inc."
      - family-names: Siu
        given-names: Alvin
        affiliation: "Zagaran, Inc."
      - family-names: Zagorsky
        given-names: Josh
        affiliation: "Zagaran, Inc."
      - family-names: Jones
        given-names: Eli
        affiliation: "Zagaran, Inc."
    doi: 10.21105/joss.03417
    issue: 68
    journal: "Journal of Open Source Software"
    month: 12
    start: 3417
    title: "Beiwe: A data collection platform for high-throughput digital phenotyping"
    type: article
    volume: 6
    year: 2021
  - authors:
      - family-names: Barnett
        given-names: Ian
        affiliation: "Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania"
        orcid: "https://orcid.org/0000-0003-3256-5703"
      - family-names: Onnela
        given-names: Jukka-Pekka
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0001-6613-8668"
    doi: 10.1093/biostatistics/kxy059
    end: e112
    issue: 2
    journal: Biostatistics
    start: e98
    title: "Inferring mobility measures from GPS with missing data"
    type: article
    volume: 21
    year: 2020
  - authors:
      - family-names: Huang
        given-names: Emily J.
        affiliation: "Department of Mathematics and Statistics, Wake Forest University"
        orcid: "https://orcid.org/0000-0003-1964-5231"
      - family-names: Onnela
        given-names: Jukka-Pekka
        affiliation: "Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University"
        orcid: "https://orcid.org/0000-0001-6613-8668"
    doi: 10.3390/s20133706
    issue: 13
    journal: Sensors
    start: 3706
    title: "Augmented movelet method for activity classification using smartphone gyroscope and accelerometer data"
    type: article
    volume: 20
    year: 2020

GitHub Events

Total
  • Create event: 31
  • Release event: 3
  • Issues event: 45
  • Watch event: 6
  • Delete event: 26
  • Member event: 1
  • Issue comment event: 67
  • Push event: 106
  • Pull request review comment event: 2
  • Pull request review event: 21
  • Pull request event: 29
  • Fork event: 4
Last Year
  • Create event: 31
  • Release event: 3
  • Issues event: 45
  • Watch event: 6
  • Delete event: 26
  • Member event: 1
  • Issue comment event: 67
  • Push event: 106
  • Pull request review comment event: 2
  • Pull request review event: 21
  • Pull request event: 29
  • Fork event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 6
  • Total pull requests: 7
  • Average time to close issues: 7 months
  • Average time to close pull requests: 4 months
  • Total issue authors: 4
  • Total pull request authors: 6
  • Average comments per issue: 0.17
  • Average comments per pull request: 0.29
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 6
  • Average time to close issues: 7 months
  • Average time to close pull requests: 4 days
  • Issue authors: 4
  • Pull request authors: 5
  • Average comments per issue: 0.17
  • Average comments per pull request: 0.17
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hackdna (24)
  • biblicabeebli (4)
  • JiaxinCCC (3)
  • muschellij2 (2)
  • YUHUINI1995 (1)
  • GeorgeEfstathiadis (1)
Pull Request Authors
  • GeorgeEfstathiadis (7)
  • hackdna (7)
  • clementzach (3)
  • MMel099 (2)
  • jprince127 (1)
  • hydawo (1)
  • NielsGudd (1)
  • biblicabeebli (1)
Top Labels
Issue Labels
technical debt (14) bug (7) enhancement (5) documentation (4) question (1)
Pull Request Labels
technical debt (7) bug (3) documentation (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 142 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 3
pypi.org: beiwe-forest

Forest is a library for analyzing smartphone-based high-throughput digital phenotyping data

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 142 Last month
Rankings
Dependent packages count: 8.9%
Average: 29.6%
Dependent repos count: 50.2%
Maintainers (3)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • myst-parser ==0.17.2
  • sphinx ==4.5.0
  • sphinx-copybutton ==0.5.0
  • sphinx_rtd_theme ==1.0.0
requirements.txt pypi
  • flake8 ==4.0.1
  • mypy ==0.950
  • pytest ==7.1.2
  • pytest-mock ==3.7.0
  • types-dataclasses ==0.6.5
  • types-pytz ==2021.3.7
  • types-requests ==2.27.25
  • types-setuptools ==57.4.14
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/docs.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
setup.py pypi