https://github.com/co822ee/youchenshen.github.io
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 28 DOI reference(s) in README -
✓Academic publication links
Links to: researchgate.net, scholar.google, sciencedirect.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: co822ee
- Default Branch: main
- Size: 12.7 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Data Scientist/Analyst
With over 4 years of experience in using geospatial data to solve environmental problems. I am passionate about collaborating with diverse teams to drive data-driven decisions and creating appealing data visualizations. I value self-learning with growth mindset and maintaining data integrity, striving to make a meaningful impact through innovative solutions.
Technical Skills: R, Python, PostgreSQL, Google Earth Engine (Javascript), Google Cloud Platform, Google BigQuery, Arcpy, ArcGIS, QGIS
Education
- Ph.D., Environmental Health | Utrecht University | Sept 2020-Mar 2025
- M.S., Earth Surface and Water | Utrecht University | Sept 2018-Jul 2020
- B.S., Environmental & Agriculture Engineering | National Taiwan University | Sept 2014-Jun 2018
Work Experience
Scientist Innovator @ TNO (Apr 2025 - Present)
PhD candidate @ Utrecht University (Sept 2020 - Mar 2025)
EU-funded project EXPANSE -- air quality modelling
- Publish 3 peer-reviewed papers as first author with 36 citations, creating societal impact through developing data dashboards for visualizing annual and monthly geospatial maps – data featured in Guardian news
- Collaborate with 12 cross-functional researchers, ranging from software developer, environmental epidemiologists, project manager and model developers.
- Developed spatial models for estimating air pollution concentrations
- Wrangled and processed 1 TB Waze traffic jam data using BigQuery on Google Cloud Platform and created visualization with Looker Studio to make data-driven decisions with stakeholders
- Develop an anomaly detection algorithm that identifies outliers in time-series data with 75%+ validity for modelling purposes
- Enhance prediction accuracy by 19% in air quality models, providing actionable insights for stakeholders to promote a sustainable environment
- Present data visualization results to stakeholders at EU institutions to make actionable plans of effectively understanding health impacts of air quality and increasing public awareness through 5+ research projects, 2+ infographics, 10+ charts and 2+ data dashboards
- Create an automated data pipeline for web data scraping and processing via Python and R, reducing processing time of geospatial and remote sensing data by five-fold using cloud computation infrastructure (Google Earth Engine and Google Cloud Platform), allowing faster analysis and decision-making
- Share data insights with colleagues, resulting in 4 peer-reviewed papers (and 3 other papers in preparation)
- Attended 6 international conferences, and are involved in 10+ peer-reviewed papers (with 4 as first-author)
- Mentored 5 master students with their graduate research projects, focusing on mobility-related health studies and environmental predictive modelling with machine learning
EU-funded project EXPANSE -- road traffic noise modelling
- Analyzed and visualized large geospatial data to uncover key insights using PostgreSQL and R, resulting in a 4% increase in road traffic modelling, contributing to urban environmental sustainability and decision-making for 5+ EU institutions
- Predicted road traffic noise levels at 14 million building façade points via parallel computation on the Dutch national supercomputer (Snellius) with PostgreSQL
- Collaborated with 14 researchers from cross-functional teams, ranging from epidemiologists, model developers, software engineers, stakeholders, and statisticians
- Learnt and implemented domain-specific methods, demonstrating my capacity to adapt to new technical challenges
Research Assistant @ Utrecht University (Dec 2018 — Jun 2019)
- Quantified and analyzed dynamics in mangrove in Suriname using remote sensing imagery (Landsat)
- Communicated results in a scientific report and peer-reviewed paper to develop sustainable coastal protection methods and to enhance biodiversity
Projects
1. Europe-wide air quality modelling from 2000 to 2019 at a 25mx25m resolution
Visualization dashboard - Annual
Visualization dashboard - Monthly
Used R & Google Earth Engine (Javascript) to train three machine learning algorithms (Random Forests, Geographically Weighted Regression, Supervised Linear Regression) which estimated particulate matters, nitrogen dioxide, and ozone concentrations based on over 150 geospatial variables and 20k monitoring observations across Europe over 20 years. We found that using spatially-varying linear regression would give higher predictive accuracy than nonlinear regression and spatially-fixed linear regression. The resulting air quality maps allow us to disentangle key interactions between the environment and human health.

Europe-wide annual average ground-level NO2 concentrations (µg/m3) estimated by geographically- and temporally-weighted regression from 2000 to 2019, with a zoom-in on the Paris region

Europe-wide annual average ground-level NO2, O3, PM10, and PM2.5 concentrations (µg/m3) estimated by geographically- and temporally-weighted regression in 2000, 2005, 2010,2015, and 2019 (Base map source: Google Maps)
2. Europe-wide high-spatial resolution air quality models are improved by including traffic flow estimates on all roads
Used R & Google Earth Engine (Javascript) to develop Random Forests which estimated road traffic flow based on over 150 geospatial variables. We found regression variables can be used to accurately estimate on-road vehicle numbers at large spatial scales with high accuracy (r2 > 0.7).

We improved Europe-wide high spatial resolution air quality models using traffic flow estimates on all roads
3. Europe-wide road traffic noise modelling
Used R & PostgreSQL to estimate road traffic noise using a physically-based noise model (CNOSSOS-EU) at millions of points across Europe. I created an automated data pipeline for parallel computing on Dutch national supercomputer (Snellius).

Noise level estimates (in dBA) at the noisiest façade building points in the city center of Utrecht
Publications
- Shen, Y., de Hoogh, K., Schmitz, O., Clinton, N., Tuxen-Bettman, K., Brandt, J., Christensen, J.H., Frohn, L.M., Geels, C., Karssenberg, D., Vermeulen, R., Hoek, G., 2022a. Europe-wide air pollution modeling from 2000 to 2019 using geographically weighted regression. Environ. Int. 168, 107485. https://doi.org/10.1016/j.envint.2022.107485
- Shen, Y., de Hoogh, K., Schmitz, O., Gulliver, J., Vienneau, D., Vermeulen, R., Hoek, G., Karssenberg, D., 2024b. Europe-wide high-spatial resolution air pollution models are improved by including traffic flow estimates on all roads. Atmos. Environ. 335, 120719. https://doi.org/10.1016/J.ATMOSENV.2024.120719
- Shen, Y., Ruijsch, J., Lu, M., Sutanudjaja, E.H., Karssenberg, D., 2022b. Random forests-based error-correction of streamflow from a large-scale hydrological model: Using model state variables to estimate error terms. Comput. Geosci. 159, 105019. https://doi.org/10.1016/j.cageo.2021.105019
- Ndiaye, A., Shen, Y., Kyriakou, K., Karssenberg, D., Schmitz, O., Flückiger, B., Hoogh, K. de, Hoek, G., 2024. Hourly land-use regression modeling for NO2 and PM2.5 in the Netherlands. Environ. Res. 256, 119233. https://doi.org/10.1016/J.ENVRES.2024.119233
- Yuan, Z., Kerckhoffs, J., Shen, Y., de Hoogh, K., Hoek, G., Vermeulen, R., 2023. Integrating large-scale stationary and local mobile measurements to estimate hyperlocal long-term air pollution using transfer learning methods. Environ. Res. 228. https://doi.org/10.1016/J.ENVRES.2023.115836
- de Jong, S.M., Shen, Y., de Vries, J., Bijnaar, G., van Maanen, B., Augustinus, P., Verweij, P., 2021. Mapping mangrove dynamics and colonization patterns at the Suriname coast using historic satellite data and the LandTrendr algorithm. Int. J. Appl. Earth Obs. Geoinf. 97, 102293. https://doi.org/10.1016/j.jag.2020.102293
- Magni, M., Sutanudjaja, E.H., Shen, Y., Karssenberg, D., 2023. Global streamflow modelling using process-informed machine learning. J. Hydroinformatics 25, 1648–1666. https://doi.org/10.2166/HYDRO.2023.217
View all publications on Google Scholar or on ResearchGate
Owner
- Name: Youchen Shen
- Login: co822ee
- Kind: user
- Location: Utrecht, the Netherlands
- Company: Utrecht University
- Repositories: 1
- Profile: https://github.com/co822ee
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1