unicef-ai4d-poverty-mapping
UNICEF AI4D Relative Wealth Mapping Project - datasets, models, and scripts for building relative wealth estimation models across Southeast Asia (SEA)
https://github.com/thinkingmachines/unicef-ai4d-poverty-mapping
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.7%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
UNICEF AI4D Relative Wealth Mapping Project - datasets, models, and scripts for building relative wealth estimation models across Southeast Asia (SEA)
Basic Info
- Host: GitHub
- Owner: thinkingmachines
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://thinkingmachines.github.io/unicef-ai4d-poverty-mapping
- Size: 151 MB
Statistics
- Stars: 24
- Watchers: 2
- Forks: 8
- Open Issues: 18
- Releases: 0
Topics
Metadata Files
README.md
📜 Description
The UNICEF AI4D Relative Wealth Project aims to develop open datasets and machine learning (ML) models for poverty mapping estimation across nine countries in Southeast Asia (SEA).
We also aim to open source all the scripts, experiments and other artifacts used for developing these datasets and models in order to allow others to replicate our work as well as to collaborate and extend our work for their own use cases.
This project is part of Thinking Machines's overall push for open science through the AI4D (AI for Development) Research Bank which aims to accelerate the development and adoption of effective machine learning (ML) models for development across Southeast Asia.
Documentation geared towards our methodology and experiments can be found here.
💻 Replicating model training and rollout for a country
Our final trained models and their use to produce nationwide estimates can replicated through our notebooks, assuming you've followed the Data and Local Development setup below.
For countries with available DHS training data (Cambodia, Myanmar, Philippines, and Timor-Leste), please refer to the notebooks here: https://github.com/thinkingmachines/unicef-ai4d-poverty-mapping/tree/main/notebooks/2023-02-21-single-country-rollouts
For the other countries without DHS training data (Indonesia, Laos, Malaysia, Thailand, and Vietnam), please refer to the notebooks here: https://github.com/thinkingmachines/unicef-ai4d-poverty-mapping/tree/main/notebooks/2023-02-21-cross-country-rollouts
All the output files (models, datasets, intermediate files) can all be downloaded from here.
📚 Data Setup
DHS Data
Due to the sensitive nature of the data and the DHS program terms of use, we cannot provide the raw DHS data used in our experiments.
You will have to request for access to raw data yourself on the DHS website.
Generally, for all the experiment notebooks in this repo, they assume that the DHS Stata and Shape zip files contents are unzipped to its own folder under data/dhs/<iso-country-code>/ where the <iso-country-code> is the two-letter ISO country code.
For example, from the data for the Philippines will have this directory structure:
data/
dhs/
ph/
PHGE81FL/
DHS_README.txt
GPS_Displacement_README.txt
PHGE81FL.cpg
PHGE81FL.dbf
PHGE81FL.prj
PHGE81FL.sbn
PHGE81FL.sbx
PHGE81FL.shp
PHGE81FL.shp.xml
PHGE81FL.shx
PHHR82DT/
PHHR82FL.DCT
PHHR82FL.DO
PHHR82FL.DTA
PHHR82FL.FRQ
PHHR82FL.FRW
PHHR82FL.MAP
If you create your own notebook, of course you are free to modify these conventions for filepaths yourself. But out-of-the-box, this is what our notebooks assume.
Night Lights from EOG
The only other data access requirement is for the EOG Nightlights Data which requires registering for an account. The notebooks require the use of these credentials (user name and password) to download the nightlights data automatically.
General Dataset Notes
All the other datasets used in this project are publically available and the notebooks provide the code necessary to automatically download and cache the data.
Due to the size of the datasets, please make sure you have enough disk space (minimum 40GB-50GB) to accommodate all the data used in building the models.
⚙️ Local Setup for Development
This repo assumes the use of miniconda for simplicity in installing GDAL.
Requirements
- Python 3.9
- make
- miniconda
🐍 One-time Set-up
Run this the very first time you are setting-up the project on a machine to set-up a local Python environment for this project.
Install miniconda for your environment if you don't have it yet.
bash wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh" bash Miniconda3-latest-$(uname)-$(uname -m).shCreate a local conda env and activate it. This will create a conda env folder in your project directory.
make conda-env conda activate ./envRun the one-time set-up make command.
make setupTo test if the setup was successful, run the tests. You should get a message that all the tests passed.
make test
At this point, you should be ready to run all the existing notebooks on your local.
📦 Dependencies
Over the course of development, you will likely introduce new library dependencies. This repo uses pip-tools to manage the python dependencies.
There are two main files involved:
* requirements.in - contains high level requirements; this is what we should edit when adding/removing libraries
* requirements.txt - contains exact list of python libraries (including depdenencies of the main libraries) your environment needs to follow to run the repo code; compiled from requirements.in
When you add new python libs, please do the ff:
Add the library to the
requirements.infile. You may optionally pin the version if you need a particular version of the library.Run
make requirementsto compile a new version of therequirements.txtfile and update your python env.Commit both the
requirements.inandrequirements.txtfiles so other devs can get the updated list of project requirements.
Note: When you are the one updating your python env to follow library changes from other devs (reflected through an updated
requirements.txtfile), simply runpip-sync requirements.txt
📜Documentation
We are using Quarto to maintain the Unicef AI4D Relative Wealth documentation site.
Here are some quick tips to running quarto/updating the doc site, assuming you're on Linux.
For other platforms, please refer to Quarto's website.
Download:
wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.2.247/quarto-1.2.247-linux-amd64.debInstall:
sudo dpkg -i quarto-1.2.247-linux-amd64.debPreview the site locally (view in http://localhost:4444) :
quarto preview --port 4444 --no-browserUpdate the site (must have maintainer role):
quarto publish gh-pages --no-browserPro-tip : If you are using VS Code as your code editor, install the Quarto extension to make editing/previewing the doc site a lot smoother.
☸️Running in Docker
We have created a docker image (ghcr.io/butchtm/povmap-jupyter) of the poverty mapping repo for those who want to view the notebooks or rollout the models for new countries and new data (e.g. new nightlights and ookla years)
To run these docker images please copy and paste the following scripts to run on your linux, mac or windows (wsl) terminals:
- View Jupyter notebooks (Read-only) This will run a jupyter notebook environment containing the poverty mapping notebooks at http://localhost:8888/lab/tree/notebooks
bash
curl -s https://raw.githubusercontent.com/thinkingmachines/unicef-ai4d-poverty-mapping/main/localscripts/run-povmap-jupyter-notebook.sh > run-povmap-jupyter-notebook.sh && \
chmod +x run-povmap-jupyter-notebook.sh && \
./run-povmap-jupyter-notebook.sh
* Country-wide rollout This will run an interactive dialog that will rollout the poverty mapping models for different countries
and different time periods
bash
curl -s https://raw.githubusercontent.com/thinkingmachines/unicef-ai4d-poverty-mapping/main/localscripts/run-povmap-rollout.sh > run-povmap-rollout.sh && \
chmod +x run-povmap-rollout.sh && \
./run-povmap-rollout.sh
- Copy rollout to local directory This will copy the contents of the rollout notebooks and rollout data into your current directory (after running a new rollout) to
rollout-dataandrollout-output-notebooks
bash
curl -s https://raw.githubusercontent.com/thinkingmachines/unicef-ai4d-poverty-mapping/main/localscripts/copy-rollout-to-local.sh > copy-rollout-to-local.sh && \
chmod +x copy-rollout-to-local.sh && \
./copy-rollout-to-local.sh
Note: These commands assume that
curlis installed and will download the scripts, change their permissions to executable as well as run them. After the initial download, you can just rerun the scripts which would would have been downloaded to your current directory.Note: The scripts create and use a docker volume named
povmap-datawhich contains the outputs as well as caches the data used for generating the features from public datasetsNote: Rolling out the notebooks requires downloading EOG nightlights data so a user id and password are required as detailed in the previous section above.
Owner
- Name: Thinking Machines Data Science
- Login: thinkingmachines
- Kind: organization
- Email: hello@thinkingmachin.es
- Website: https://thinkingmachin.es
- Twitter: thinkdatasci
- Repositories: 63
- Profile: https://github.com/thinkingmachines
Thinking Machines is a leading-edge data technology consultancy that transforms organizations by building enterprise AI and cloud data platforms.
GitHub Events
Total
- Watch event: 5
- Fork event: 1
Last Year
- Watch event: 5
- Fork event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Butch Landingin | b****h@t****s | 183 |
| tm_jc_nacpil | j****l@t****s | 69 |
| alron | a****n@t****s | 46 |
| Jace Peralta | j****p@t****s | 39 |
| tm-danna-ang | d****a@t****s | 7 |
| LevyMedinaII | 7****a | 2 |
| Ardie Orden | a****n@g****m | 2 |
| dependabot[bot] | 4****] | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 60
- Total pull requests: 42
- Average time to close issues: 30 days
- Average time to close pull requests: 3 days
- Total issue authors: 7
- Total pull request authors: 7
- Average comments per issue: 0.33
- Average comments per pull request: 1.9
- Merged pull requests: 35
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tm-kah-alforja (54)
- tm-jc-nacpil (1)
- AnthonyMockler (1)
- CEduardoSQUTEC (1)
- GIS243 (1)
- joshuacortez (1)
- tm-jace-peralta (1)
Pull Request Authors
- tm-jc-nacpil (14)
- tm-jace-peralta (14)
- butchtm (5)
- alronlam (4)
- ardieorden (2)
- tm-danna-ang (2)
- ghost (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- black *
- earthengine-api *
- gdal *
- geopandas *
- geowrangler *
- jupyterlab *
- loguru *
- numpy *
- pandas *
- pre-commit *
- pytest *
- rasterio *
- rasterstats *
- 139 dependencies
- python 3.9.16 build