rnsurvey_bogota_dataanalysis
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Repository
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Rn Data analysis and interactive visualization
In this repository you can find the RAW Rn data collected in Bogotá, Colombia and the codes, written in Python pogramming language (.ipynb), that were used to analyse the RC data retrieved in the context of the publication Indoor 222Rn Modeling in Data-Scarce Regions: An Interactive Dashboard Approach for Bogotá, Colombia. Additionally, a dashboard was created to make the interaction with the data more user friendly and to facilitate the replicability of this type of studies in other study areas. Further information about the dashboard source code and functionality can be found here.
The repository is divided three jupyter notebooks and four data folders.
- Folders:
- Dataset for fitting
Folder with the Raw data (`Raw_Results_LR115.xlsx`) used in Data distribution.ipynb and the dataset with dependent and independent variables used for fitting the regression models (`Processed_DataFrame.csv`).
- Dataset for regression
Folder with the cadaster data to which the regression will be applied. This dataset must have the same independent variables than the dataset used for fitting the model. **In the repository this data is zipped for storage purposes**. When the `Multivariate analysis.ipynb` is ran the dataset is unzipped.
- Figures
Folder with all of the figures created in the Data distribution.ipynb and Multivariate analysis.ipynb.
- Regression results
This folder contains the results of regressions created in the Multivariate analysis notebook.
Notebooks
- Data distribution.ipynb
Jupyter notebook with basic statistical analysis of the raw RC data (RawResultsLR115.xlsx). - Multivariate anlysis.ipynb
Jupyter notebook with: - Multivariate analysis of the processed dataset (Processed_DataFrame.csv) [correlation matrix, PCA, etc.] - Fitting of RC data using predictors. - Perform feature selection - Estimate RC in the Dataset for regression (Cadastre information)Dashboard App
An improved and updated version of this dashboard can be accessed online here. Nevertheless, the datasets presented in this repository can be used as an example in the dashboard.
Publication abstract
Radon ($^{222}$Rn) is a naturally occurring gas that represents a health threat due to its causal relationship with lung cancer. Despite its potential health impacts, several regions have not conducted studies, mainly due to data scarcity and/or economic constraints. This study aims to bridge the baseline information gap by building an interactive dashboard that uses inferential statistical methods to estimate indoor radon concentration’s (IRC) spatial distribution for a target area. We demonstrate the functionality of the dashboard by modeling IRC in the city of Bogotá, Colombia, using 30 in situ measurements. The IRC were measured for 35 days using Alpha-track detectors (LR-115). IRC measured were the highest reported in the country, with a geometric mean of 91 ±14 Bq/m$^3$ and a maximum concentration of 407 Bq/m$^3$. In 56.66\% of the residences RC exceeded the WHO's recommendation of 100 Bq/m$^3$. A prediction map for houses registered in Bogotá’s cadaster was built in the dashboard by using a log-linear regression model fitted with the in situ measurements, together with meteorological, geologic and building specific variables. After feature selection, the log-linear model showed a cross-validation Root Mean Squared Error (RMSE) of 56.5 $\frac{Bq}{m^3}$. Furthermore, the model showed that the age of the house presented a statistically significant positive association with RC. According to the model, IRC measured in houses built before 1980 present a statistically significant increase of 71.60\% compared to those built after 1980 (p-value = 0.045). The prediction map showed higher IRC in older buildings most likely related to cracks in the structure that could enhance gas migration in older houses. This study highlights the importance of expanding $^{222}$Rn studies in countries with a lack of baseline values and provides a cost-effective alternative that could help deal with the scarcity of IRC data and get a better understanding of place-specific IRC spatial distribution.
Data distrbution code (Data distribution.ipynb)
This Jupyter notebook reads the RAW data (Raw_Results_LR115.xlsx) and create graphs for easier visualization and comparison with recommended levels and previous measurements in Latin America and the Caribbean (LAC) region.
Multivariate analysis code (Multivariate analysis.ipynb)
This Jupyter notebbok uses the information of the RC data (Dependent variable) and the independent variables (Processed_DataFrame.csv) to fit one log-linear regression model.
Subsequently, this notebook uses the data of Bogotá's cadastre to apply the regression model on all the houses with information of the independent variables (Information taken from Bogotá's cadaster). The data is rasterize using GDAL tools.
The outputs of this model are:
- Figures:
- Variable caracterization figure (Figures/Caracterization.png)
- Principal component biplot figure (Figures/PCA_RC.png)
- Percent change calculated for all independent variables (Figures/Regresión_LogLineal.png)
- Percent change calculated for independent variables after feature selection (Figures/Regresión_LogLineal_withFeatureSel.png)
- Residential RC estimated distribution (Figures/Estimated_Rn_Histogram.png)
- Files (To Regression results):
- RC estimated for each house in cadaster information LinReg_model_results.csv.
- Raster with RC regression results (Log_Linear_estimations.tif)
Dashboard app
Refer to the github repository here to see the source code and the online running version of the dashboard here to make use of it.
Initial display of Dashboard app.
Owner
- Name: Martin Dominguez
- Login: mdominguezd
- Kind: user
- Location: Wageningen, Netherlands
- Repositories: 1
- Profile: https://github.com/mdominguezd
Citation (CITATION.cff)
cff-version: 1.1.0 message: "If you use this software, please cite it as below." authors: - family-names: Martín given-names: Domínguez-Durán orcid: https://orcid.org/0009-0007-0134-9588 title: "Modeling of indoor 222Rn in data-scarce regions. Dataset and Code." version: v.0.1 date-released: 2024-04-22