aims-dscbi

Data science course for Rwanda national statistical system (NSS) staff

https://github.com/dmatekenya/aims-dscbi

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Data science course for Rwanda national statistical system (NSS) staff

Basic Info
Statistics
  • Stars: 2
  • Watchers: 0
  • Forks: 25
  • Open Issues: 3
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Data Science Capacity Building Initiative (DSCBI)

This repository provides information about the capacity building initiative, a collaboration between the African Institute for Mathematical Sciences (AIMS), the National Institute of Statistics of Rwanda (NISR), and Cenfri. The initiative aims to deliver data science training to staff from over 20 institutions that are part of Rwanda's National Statistical System (NSS).

This repository provides information about the capacity building initiative, a collaboration between the African Institute for Mathematical Sciences (AIMS), the National Institute of Statistics of Rwanda (NISR), and Cenfri. The initiative aims to deliver data science training to staff from over 20 institutions that are part of Rwanda’s National Statistical System (NSS).

Course Modules

The training program is structured around the following core modules:

  1. Python Foundations for Data Science
    Covers core and advanced Python programming, object-oriented design, development tools, Git/GitHub workflows, Python packaging, and best practices for maintainable code.

  2. Data Analysis with Python
    Focuses on cleaning, transforming, and analyzing structured data using pandas and polars, with applications to real-world datasets like census and surveys.

  3. Working with Spatial Data in Python
    Introduces geospatial data handling using GeoPandas, shapely, and rasterio; includes spatial joins, projections, and mapping access to services.

  4. Working with Time Series Data in Python
    Covers handling temporal data, resampling, rolling windows, time series decomposition, and forecasting using statsmodels and Prophet.

  5. Databases and APIs
    Introduces SQL and relational databases, extracting and analyzing data from APIs, and building REST APIs with FastAPI.

  6. Introduction to Machine Learning with Scikit-learn
    Provides foundations in supervised and unsupervised learning, including models like logistic regression, decision trees, and PCA, along with model evaluation.

  7. Natural Language Processing (NLP) and Large Language Models (LLMs)
    Covers foundational NLP, LLMs, embeddings, vector search, and building AI-powered applications using frameworks like Hugging Face and LangChain.

  8. Advanced Topics in Data Science
    Explores interactive dashboards, data integration from unstructured sources, advanced database techniques, and cloud storage for analytics.

  9. Capstone Project
    Teams develop and present real-world data projects scoped from their institutional needs, applying techniques learned across modules.

Course Structure

The course is divided into self-contained modules, each designed to provide useful skills and knowledge. The modules are organized sequentially to build on skills learned in previous modules. To make the course engaging and informative, each module includes the following components:

  • Lecture
    Each lecture covers key conceptual knowledge for the topic at hand.

  • Practical Labs
    Programming activities provide learners with practical skills to implement solutions discussed in lectures. These labs include adaptable recipes for various use cases.

  • Case Studies
    Case studies showcase elaborate projects that demonstrate real-world applications.

  • Assessment
    Each module assessment combines theoretical (quizzes) and programming questions to evaluate learners' understanding of the concepts and skills covered in the module.

Repository Structure and Contents

This repository serves as the primary resource for accessing course content, including slides, Python programming labs, example applications using LLMs, and additional materials to support learning about Generative AI and building applications with LLMs. For easy navigation, use the link and contents outlined below.

Contents

Please visit the documentation website for a complete table of contents.

License

The template is licensed under the Mozilla Public License. Remember to replace the license if necessary. If open source, choose an open source license.

Owner

  • Name: Dunstan Matekenya
  • Login: dmatekenya
  • Kind: user
  • Location: Washington, DC
  • Company: The World Bank

Data Scientist at the World Bank Group

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Country borders or names do not necessarily reflect the World Bank Group’s official position. All maps are for illustrative purposes and do not imply the expression of any opinion on the part of the World Bank, concerning the legal status of any country or territory or concerning the delimitation of frontiers or boundaries."
title: "World Bank Data Lab Project Template"
authors:
  - affiliation: World Bank
    family-names: Stefanini Vicente
    given-names: Gabriel
    orcid: https://orcid.org/0000-0001-6530-3780
keywords:
  - Open Science
repository-code: https://github.com/worldbank/template/tree/main

GitHub Events

Total
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 3
  • Member event: 2
  • Push event: 28
  • Pull request event: 3
  • Fork event: 10
  • Create event: 3
Last Year
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 3
  • Member event: 2
  • Push event: 28
  • Pull request event: 3
  • Fork event: 10
  • Create event: 3

Dependencies

.github/workflows/gh-pages.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/setup-python v5 composite
  • actions/upload-pages-artifact v3 composite
.github/workflows/release.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
pyproject.toml pypi
  • bokeh >=3,<4
  • pandas >=2
  • pycountry >=22.3.5
  • requests >=2.28.1
.github/workflows/working-gh-pages.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/setup-python v5 composite
  • actions/upload-pages-artifact v3 composite
requirements.txt pypi
  • docutils >=0.18.1
  • jupyter-book >=0.15.0
  • myst-parser >=2.0.0
  • sphinx >=7.0.0
  • sphinx-book-theme >=1.0.0
  • sphinx-external-toc >=0.3.1
  • sphinx-multitoc-numbering >=0.1.3