https://github.com/alexanderquispe/ethiopia_raster_outcomes
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (2.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: alexanderquispe
- Language: Jupyter Notebook
- Default Branch: main
- Size: 60.1 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
EthiopiaRasterOutcomes
Environment
Python
- python version: 3.10.9
sh
pip install pipenv
pipenv install
Download Data
sh
python "./src/download/gejson.py"
Data
The data is organized according to the logic and format of the data. The files and directories created that were not present at the beginning of the project were: gjson, Raster, Shapefiles/9_internet_speed
📂Data
┣ 📂gjson
┃ ┣ 📂okkla
┃ ┃ ┗ 📜ookla_intenet.geojson
┃ ┣ 📜adm_0.geojson
┃ ┣ 📜adm_1.geojson
┃ ┣ 📜adm_2.geojson
┃ ┗ 📜adm_3.geojson
┣ 📂Raster
┃ ┣ 📂ALOS_topoDiversity
┃ ┃ ┗ 📜ethiopia.tif
┃ ┣ 📂ETH_Maternal_and_child_socioeconomic
┃ ┃ ┣ 📜ETH_DECISION_MEAN.tif
┃ ┃ ┣ 📜ETH_DECISION_SD.tif
┃ ┃ ┣ 📜ETH_HWEALTH_MEAN.tif
┃ ┃ ┣ 📜ETH_HWEALTH_SD.tif
┃ ┃ ┣ 📜ETH_MEDUCATION_MEAN.tif
┃ ┃ ┗ 📜ETH_MEDUCATION_SD.tif
┃ ┣ 📂osm
┃ ┃ ┗ 📜eth_osm_dst_road_100m_2016.tif
┃ ┣ 📂Population
┃ ┃ ┗ 📜eth_ppp_2020_constrained.tif
┃ ┣ 📂population_unconstrained
┃ ┃ ┗ 📜eth_ppp_2020.tif
┃ ┣ 📂settlement
┃ ┃ ┣ 📂Each
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C21.clr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C21.tif
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C21.tif.ovr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C21.zip
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C22.clr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C22.tif
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C22.tif.ovr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C22.zip
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C23.clr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C23.tif
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C23.tif.ovr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R8_C23.zip
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C21.clr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C21.tif
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C21.tif.ovr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C21.zip
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C22.clr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C22.tif
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C22.tif.ovr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C22.zip
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C23.clr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C23.tif
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C23.tif.ovr
┃ ┃ ┃ ┣ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2023A_54009_10_V1_0_R9_C23.zip
┃ ┃ ┃ ┣ 📜GHSL_Data_Package_2023_light.pdf
┃ ┃ ┃ ┣ 📜GHSL_Data_Package_2023_light(1).pdf
┃ ┃ ┃ ┣ 📜GHSL_Data_Package_2023_light(2).pdf
┃ ┃ ┃ ┣ 📜GHSL_Data_Package_2023_light(3).pdf
┃ ┃ ┃ ┣ 📜GHSL_Data_Package_2023_light(4).pdf
┃ ┃ ┃ ┗ 📜GHSL_Data_Package_2023_light(5).pdf
┃ ┃ ┣ 📜ethiopia_ghs_masked.tif
┃ ┃ ┣ 📜ethiopia_ghs.tif
┃ ┃ ┗ 📜GHS_BUILT_C_MSZ_E2018_GLOBE_R2022A_54009_10_V1_0.tif
┃ ┗ 📂viirs_100m
┃ ┗ 📜eth_viirs_100m_2016.tif
┣ 📂Shapefiles
┃ ┣ 📂9_internet_speed
┃ ┃ ┗ 📜2020-04-01_performance_fixed_tiles.zip
┃ ┣ 📜_Source.txt
┃ ┣ 📜ET_Admin0_2023.dbf
┃ ┣ 📜ET_Admin0_2023.shp
┃ ┣ 📜ET_Admin1_2023.dbf
┃ ┣ 📜ET_Admin1_2023.shp
┃ ┣ 📜ET_Admin2_2023.dbf
┃ ┣ 📜ET_Admin2_2023.shp
┃ ┣ 📜ET_Admin3_2023.dbf
┃ ┗ 📜ET_Admin3_2023.shp
┣ 📂Woreda-level
┃ ┣ 📂Population
┃ ┃ ┗ 📜eth_admpop_2023.xlsx
┃ ┣ 📜_Data Sources.xlsx
┃ ┗ 📜PSNP_Case load by woreda.xlsx
┣ 📜Banks.xlsx
┣ 📜Kits_Assignment_2023.02.27.xlsx
┣ 📜NIDP Centers_Locations_2023.02.27.csv
┗ 📜NIDP_Administrative Subdivision_2023.02.27_Original.xlsx
Shapefiles
In this directory, there is Data/Shapefiles/9_internet_speed/2020-04-01_performance_fixed_tiles.zip, which is a zip file containing a shapefile representing data released by Ookla on internet download and upload speeds worldwide. The processed file for Ethiopia is located at Data/gjson/okkla/ookla_intenet.geojson.
Gjson - GeoJSON files
Data/gjson/okkla/ookla_intenet.geojson: Processed data from Ookla.Data/gjson/adm_0.geojson: Country-level GeoJSON - Used in raster and data filtering to enhance computational power.Data/gjson/adm_1.geojson: Level 1 GeoJSON - Not used.Data/gjson/adm_2.geojson: Level 2 GeoJSON - Not used.Data/gjson/adm_3.geojson: Level 3 GeoJSON - Used in indicator creation.
Raster - TIF and TIFF Files
Data/Raster/settlement/GHS_BUILT_C_MSZ_E2018_GLOBE_R2022A_54009_10_V1_0.tif: This file delineates the boundaries of human settlements at a 10-meter resolution and describes their inner characteristics in terms of the morphology of the built environment and functional use.Data/raster/ALOS_topoDiversity/ethiopia.tif: Topographic diversity (D) represents the variety of temperature and moisture conditions available to species as local habitats. It expresses the logic that a higher variety of topo-climate niches should support higher diversity, especially plant diversity, and support species persistence given climatic change.Data/Raster/ETH_Maternal_and_child_socioeconomicEth_HWEALTH_mean.tif: Proportion of children aged 12 to 23 months born to the poorest/poorer households according to DHS/MICS-NICS classifications.Eth_MEDUCATION.tif: Proportion of children born to mothers who had no formal education.Eth_DECISION.tif: Proportion of women aged 15 to 49 years who did not participate in decision-making in their households.
Data/Raster/osm/eth_osm_dst_road_100m_2016.tif: Distance to OpenStreetMap major roads 2015 in Ethiopia at a 100-meter resolution.Data/Raster/viirs_100m/eth_viirs_100m_2016.tif: Resampled VIIRS night-time lights data for Ethiopia in 2016 at a 100-meter resolution.Data/Raster/population_unconstrained/eth_ppp_2020.tif: Estimated total number of people per grid cell. The dataset is available in Geotiff format at a resolution of 3 arc (approximately 100 meters at the equator). The projection is Geographic Coordinate System, WGS84. The units are the number of people per pixel. The mapping approach is Random Forest-based dasymetric redistribution.
CSVs
Data/csvs/buildings/each_google/*.tif: CSV files containing observations only for Ethiopia, processed bynotebooks/ref/building_google.ipynbto obtain a single CSV file, the result of which is located atData/csvs/buildings/etiopia_google.csv.Data/csvs/buildings/etiopia_google.csv: Final Result.
Notebooks
notebooks
┣ 📂ref
┃ ┣ 📜ghs_result.ipynb
┃ ┗ 📜ookla.ipynb
┃ ┗ 📜building_google.ipynb
┣ 📂types_data
┃ ┣ 📂__pycache__
┃ ┃ ┣ 📜__init__.cpython-310.pyc
┃ ┃ ┣ 📜settlement.cpython-310.pyc
┃ ┃ ┗ 📜utils.cpython-310.pyc
┃ ┣ 📜__init__.py
┃ ┣ 📜settlement.py
┃ ┗ 📜utils.py
┣ 📜__init__.py
┣ 📜0_GHS.ipynb
┣ 📜0_internet.ipynb
┣ 📜0_maternal_chil_socieconomic.ipynb
┣ 📜0_osm_viirs.ipynb
┣ 📜0_shapefiles_pop.ipynb
┗ 📜salem.ipynb
ref
Within the notebooks folder, there is a directory named ref, which contains 2 files:
notebooks/ref/ghs_result.ipynb: This file outlines the cleaning and transformation procedure for data related toGHSand the desired output.notebooks/ref/ookla.ipynb: This file filters the tiles of Ookla data worldwide to only include data for Ethiopia.
type_data
This is the generated Python package, which includes several files simplifying the following map, representing a single administration from data/gjson/adm_3.geojson.

To represent this, we have the file notebooks/types_data/utils.py and the class RasterIOInd. We can use the get_result() method, which performs the following steps:
Crop raster: The methodget_data_raster_shapefilesextracts the cropped raster.Extract z valuesContinuous variables: We use the_raster_to_datamethod, which returns metrics of mean, standard deviation, and sum within a single-row dataframe with corresponding row IDs.Other cases: We use the_metric_valuesmethod and the parametersettlementin the class, set to true, which returns a dataframe ofnrows with values in one column and counts of the values in another column.
To iterate over each row and generate an indicator for each and aggregate them into a final result dataframe.

Concat result: We create a dataframe to which we will append each result row from the previous step.GHS: For this case, we will usesettlement.py.- We use the
join_percent,generate_data, andjoin_dummymethods to obtain all existing categories. By default, it generates 2 types of results (with NaNs and without NaNs).
- We use the
Save Result:- Within each class, there is a
saveparameter which indicates the location of the CSV file to export.
- Within each class, there is a
Notebooks/.
0_GHS.ipynb: Works withData/Raster/settlement/GHS_BUILT_C_MSZ_E2018_GLOBE_R2022A_54009_10_V1_0.tifdatasets and generates:output/GHS/ghs_with_na.csvoutput/GHS/ghs_without_na.csv
0_internet.ipynb: Contains little relevant information (nothing exported).0_maternal_chil_socieconomic.ipynb: When using the package, generating the results is translated into just 3 lines of code:- Works with
Data/Raster/ETH_Maternal_and_child_socioeconomic/ETH_DECISION_MEAN.tifto generateoutput/maternal_child_socioeconomic/decision.csv. - Works with
Data/Raster/ETH_Maternal_and_child_socioeconomic/ETH_HWEALTH_MEAN.tifto generateoutput/maternal_child_socioeconomic/hwealth.csv. - Works with
Data/Raster/ETH_Maternal_and_child_socioeconomic/ETH_MEDUCATION_MEAN.tifto generateoutuput/maternal_child_socioeconomic/economic.csv.
- Works with
0_osm_viirs.ipynb: When using the package, generating the results is translated into just 3 lines of code:- Works with
Data/Raster/osm/eth_osm_dst_road_100m_2016.tifto generateoutput/osm/distance_osm.csv. - Works with
Data/raster/viirs_100m/eth_viirs_100m_2016.tifto generateoutput/virrs/night_time.csv. - Works with
Data/Raster/ALOS_topoDiversity/ethiopia.tifto generateoutput/Topodiversity/ALOS_topo_diversity.csv. - Works with
Data/Raster/population_unconstrained/eth_ppp_2020.tifto generateoutput/population_unconstrained/population_unconstrained.csv.
- Works with
0_shapefiles_pop.ipynb: As one of the early generated files, this served as a basis for building the package:- Works with
Data/Raster/Population/eth_ppp_2020_constrained.tifto generateoutput/population/pop.csv.
- Works with
salem.ipynb: In an attempt to optimize working time with rasters, thesalempackage was tested, which was much more efficient thanrasterioin terms of time. However, for large raster files, it is not recommended as it requires a lot of RAM resources:- For
Data/Raster/settlement/GHS_BUILT_C_MSZ_E2018_GLOBE_R2022A_54009_10_V1_0.tif: > 1 TB of RAM. - For
Data/Raster/settlement/ethiopia_ghs.tif: > 98 GB of RAM.
- For
0_building_google.ipynb:- Combines all CSV files into one, considering only the area of Ethiopia (~18 min).
- Based on other indicators, the code was adapted to generate the indicators (~8 min).
Output
``` ┣ 📂GHS ┃ ┣ 📜ghswithna.csv ┃ ┗ 📜ghswithoutna.csv ┣ 📂maternalchildsocioeconomic ┃ ┣ 📜decision.csv ┃ ┣ 📜education.csv ┃ ┗ 📜hwealth.csv ┣ 📂osm ┃ ┗ 📜distanceosm.csv ┣ 📂population ┃ ┗ 📜pop.csv ┣ 📂populationunconstrained ┃ ┗ 📜populationunconstrained.csv ┣ 📂Topodiversity ┃ ┗ 📜ALOStopodiversity.csv ┣ 📂googleareas ┃ ┗ 📜googlemetrics.csv ┗ 📂virrs ┗ 📜nighttime.csv
```
IDs
For the generation of the ID, the following columns are taken into consideration, which are within the original information of data/gjson/adm_3.geojson and will be horizontally merged with the metrics.
python
"id", "fnid", "parent_id", "admin_0", "admin_1", "admin_2", "admin_3",
Non-Continuous Variables
For non-continuous variables, the following column format generated from the raster is used, where the percentage of that category within the area is obtained.
```python
Example
variable_name = "ghs`
Indicator columns
values = [1, 2, 3, 4, 5, 11, 12, ...]
newcols = [f'{variablename}{x}' for x in values] newcols
[ghs1, ghs2, ghs_3, ...]
```
Settlement
┣ 📜ghs_with_na.csv
┗ 📜ghs_without_na.csv
For the particular case of GHS, its documentation states that the value of 255 is considered NA.
- Filtering NA values
| index | id | fnid | parentid | admin0 | admin1 | admin2 | admin3 | ghs1 | ghs2 | ghs3 | ghs4 | ghs5 | ghs11 | ghs12 | ghs13 | ghs14 | ghs15 | ghs21 | ghs22 | ghs23 | ghs24 | ghs25 | |-------|--------|---------------|-----------|----------|---------|---------|---------|------------------|------------------|------------------|-------|------------------|------------------|------------------|------------------|--------|--------|--------|--------|--------|--------|--------| | 112.0 | 222908 | ET2023A3020207| 222703 | Ethiopia | Afar | Kilbati | Afdera | 54.98789691015236| 0.1594760074042432| 0.002847785846504343| 0.0 | 0.44995016374768615| 44.39982913284921| 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | 761.0 | 223557 | ET2023A3050503| 222748 | Ethiopia | Somali | Korahe | Shilabo | 44.10304625799172| 10.851447912749155| 0.30763444904099285 | 0.0 | 0.051147047762316655| 40.89883414817601| 2.0985332831891688| 1.6893569010906355| 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | 797.0 | 223593 | ET2023A3050901| 222752 | Ethiopia | Somali | Liben | Filtu | 16.74891992038964| 35.83113541852074| 3.195741169236744 | 0.0 | 0.4611575864468213 | 39.26150062296727| 2.059837219462468 | 2.4417080629763275| 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
- Considering NA values
| index | id | fnid | parentid | admin0 | admin1 | admin2 | admin3 | ghs1 | ghs2 | ghs3 | ghs4 | ghs5 | ghs11 | ghs12 | ghs13 | ghs14 | ghs15 | ghs21 | ghs22 | ghs23 | ghs24 | ghs25 | ghs_255 | |-------|--------|---------------|-----------|----------|---------|---------|---------|-----------------|-----------------|------------------|-------|-----------------|-----------------|-----------------|-----------------|--------|--------|--------|--------|--------|--------|--------|-----------------| | 112.0 | 222908 | ET2023A3020207| 222703 | Ethiopia | Afar | Kilbati | Afdera | 0.024064201180061865 | 6.97910438698775e-05 | 1.2462686405335268e-06 | 0.0 | 0.0001969104452042972 | 0.019430574374558213 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 99.95623727668766 | | 761.0 | 223557 | ET2023A3050503| 222748 | Ethiopia | Somali | Korahe | Shilabo | 0.06264879958019161 | 0.01541458568335336 | 0.00043699768104883376 | 0.0 | 7.265487117682321e-05 | 0.058097185591761205 | 0.0029809866262255407| 0.002399
R
These are R files, which were used for comparison with Python. As a result, it was noticed that R was faster in this procedure.
etiopia_settlement.r: Generates the 2 GHS files only for Ethiopia.time_raster.r: Generates the time it takes for R to process and generate the R indicators (it's faster than Python).
src
Generates the necessary source files, mainly the files Data/gjson/adm_0.geojson, Data/gjson/adm_1.geojson, Data/gjson/adm_2.geojson, Data/gjson/adm_3.geojson.
Advices
- Corrupted source files
Data/Shapefiles/*.shp(a.shpfile comes along with the following files.shx, .prj, .dbf), were replaced by generation through queries to theapi, checksrc/download/gejson.pywhich generates, among others, the management files and saves them inData/gjson/*.geojson. - Several files within the
Data/Rasterfolder are manually downloaded data. - Settlement
- Files within
Data/Raster/settlement/Each/*were downloaded tile by tile to generate a 10m GHS raster for Ethiopia only, but it couldn't be generated due to lack of computing power. - The files
Data/Raster/settlement/ethiopia_ghs.tif, Data/Raster/settlement/ethiopia_ghs_masked.tifwere generated with R, using theterraandsfpackages as they were more optimized in resource usage.
- Files within
- Attempt of resource optimization with salem not possible as it requires a lot of RAM resources, which are not available.
notebooks/ref/building_google.ipynb(~30 min): to run this notebook, >= 29gb of available RAM is required.
Owner
- Name: Alexander Quispe
- Login: alexanderquispe
- Kind: user
- Repositories: 12
- Profile: https://github.com/alexanderquispe
GitHub Events
Total
Last Year
Dependencies
- black *
- geopandas *
- ipykernel *
- matplotlib *
- numpy *
- pandas *
- rasterio *
- requests *
- salem *
- scikit-image *
- shapely *
- tqdm *
- affine ==2.4.0
- asttokens ==2.4.1
- attrs ==23.2.0
- black ==24.3.0
- certifi ==2024.2.2
- cftime ==1.6.3
- charset-normalizer ==3.3.2
- click ==8.1.7
- click-plugins ==1.1.1
- cligj ==0.7.2
- colorama ==0.4.6
- comm ==0.2.2
- contourpy ==1.2.1
- cycler ==0.12.1
- debugpy ==1.8.1
- decorator ==5.1.1
- exceptiongroup ==1.2.0
- executing ==2.0.1
- fiona ==1.9.6
- fonttools ==4.50.0
- geopandas ==0.14.3
- idna ==3.6
- imageio ==2.34.0
- ipykernel ==6.29.4
- ipython ==8.23.0
- jedi ==0.19.1
- joblib ==1.3.2
- jupyter-client ==8.6.1
- jupyter-core ==5.7.2
- kiwisolver ==1.4.5
- lazy-loader ==0.3
- matplotlib ==3.8.3
- matplotlib-inline ==0.1.6
- mypy-extensions ==1.0.0
- nest-asyncio ==1.6.0
- netcdf4 ==1.6.5
- networkx ==3.2.1
- numpy ==1.26.4
- packaging ==24.0
- pandas ==2.2.1
- parso ==0.8.3
- pathspec ==0.12.1
- pillow ==10.3.0
- platformdirs ==4.2.0
- prompt-toolkit ==3.0.43
- psutil ==5.9.8
- pure-eval ==0.2.2
- pygments ==2.17.2
- pyparsing ==3.1.2
- pyproj ==3.6.1
- python-dateutil ==2.9.0.post0
- pytz ==2024.1
- pywin32 ==306
- pyzmq ==25.1.2
- rasterio ==1.3.9
- requests ==2.31.0
- salem ==0.3.10
- scikit-image ==0.22.0
- scipy ==1.13.0
- setuptools ==69.2.0
- shapely ==2.0.3
- six ==1.16.0
- snuggs ==1.4.7
- stack-data ==0.6.3
- tifffile ==2024.2.12
- tomli ==2.0.1
- tornado ==6.4
- tqdm ==4.66.2
- traitlets ==5.14.2
- typing-extensions ==4.10.0
- tzdata ==2024.1
- urllib3 ==2.2.1
- wcwidth ==0.2.13
- xarray ==2024.3.0