public_open_source_data_science

A repository of open source data science projects for social good

https://github.com/neelsoumya/public_open_source_data_science

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary

Keywords

citizen-data-science citizen-science data-analysis data-science datascience datascience-social-good datascience-socialgood deep-learning machine-learning paper python social
Last synced: 6 months ago · JSON representation ·

Repository

A repository of open source data science projects for social good

Basic Info
  • Host: GitHub
  • Owner: neelsoumya
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 28 MB
Statistics
  • Stars: 3
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 4
Topics
citizen-data-science citizen-science data-analysis data-science datascience datascience-social-good datascience-socialgood deep-learning machine-learning paper python social
Created over 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme Funding Citation

README.md

Introduction

Source code and data for open source data science for social good. This is a data science portfolio.

List of projects

1) university_sexcrimes

Analysis of data on sex crimes in US university campuses.

2) heartdiseaserisk_prediction

Predicting heart disease risk from open data.

3) cancermortalityprediction

Predicting cancer survival using logistic regression from open data.

4) predictingnewspopularity

Predicting popularity of news articles from open data.

5) opensourcemappingproject

Open source mapping project. 

6) astroinformatics

Analysis of astronomy data using machine learning techniques.

7) scientific_collaboration

Project to analyze planetary scale scientific collaboration data.

8) accident_prediction

Road accident forecasting and data exploration project.

Interactive website using shiny at:

https://neelsoumya.shinyapps.io/accident_prediction/

9) patternsincrime

Predicting patterns of crime using data science. Larger cities have disproportionately more crime per capita compared to smaller cities (super-linear scaling of crime). We used techniques from dynamical systems and complex systems to explain the super-linear scaling of crime in cities and other socio-technological systems

10) spam_classification

Building an SVM based spam classifier trained on data from the UCI repository

11) breastcancerprediction

Downloads data from the UCI machine learning repository to make predictions
for breast cancer. A few features turn out to be really important for prediction like epithelial cell size. This uses a random forest.

12) fundingtrendsscience

Project to analyze data on funding trends in biomedical science.

13) infectiousdiseaseprediction

Project to analyze data on emerging infectious diseases.

14) forecasting_imports

Project to forecast imports and model supply chains.  

15) deeplearningbasic

Basic deep learning model using keras for prediction.

16) ai_healthcare

Machine learning and AI applied to healthcare.

17) aisocialgood

Machine learning, data science and AI for social good. 

18) aibigdatabiology

Machine learning and bioinformatics for big data in biology. 

19) browserbaseddata_science

Browser based data science for democratic access to data science tools. 

20) clinical_informatics

Open source privacy-preserving clinical informatics.

21) policypapergeneral_public

Policy paper for general public on Ethical Artificial Intelligence (EAI) for social good.

22) nlp

Resources, code and data for natural language processing.

23) selforganisingmapwinedataset

A self organising map (SOM) on the UCI wine dataset using the Orange data science tool. 

24) LLMs

Hackathons and resources for large-language models (LLMs).

25) outreach

Outreach for machine learning and AI for general public

26) teaching_resources

Teaching resources for machine learning, data science and AI for a general audience

What is this repository for?

  • Quick summary

    • Open source code and data for open source data science.

Citation

  • If you use this code, please cite the paper and code

    • Citizen Data Science for Social Good: Case Studies and Vignettes from Recent Projects https://doi.org/10.13140/RG.2.1.1846.6002
    • Citizen Data Science for Social Good in Complex Systems, Interdisciplinary Description of Complex Systems, 16(1):88-91, 2018 http://indecs.eu/index.php?s=x&y=2018&p=88-91
    • Banerjee, Soumya. (2017, September 3). Citizen Data Science for Social Good: Case Studies and Vignettes from Recent Projects (Supplementary Resources). Zenodo. http://doi.org/10.5281/zenodo.883783

    DOI](https://doi.org/10.5281/zenodo.883783)

  • These projects are an example of my approach to data science for good. I work very closely with domain experts and stakeholders and use computational tools for good. I outline my design and work philosophy below.

    • data science philosophy

Installation

Install R, R Studio, MATLAB and Python

Install R

https://www.r-project.org/

and R Studio

https://www.rstudio.com/products/rstudio/download/preview/

r source("https://raw.githubusercontent.com/neelsoumya/rlib/master/INSTALL_MANY_MODULES.R")

Install Python dependencies as follows:

r pip3 install -r requirements.txt

Contact

 Soumya Banerjee

 https://sites.google.com/site/neelsoumya/

 sb2333@cam.ac.uk

Owner

  • Name: Soumya Banerjee
  • Login: neelsoumya
  • Kind: user
  • Location: Cambridge, UK
  • Company: University of Cambridge

My research interests are in complex systems data science, machine learning, computational biology, computational immunology and computational immunogenomics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Banerjee"
  given-names: "Soumya"
  orcid: "https://orcid.org/0000-0001-7748-9885"
title: "CITIZEN DATA SCIENCE FOR SOCIAL GOOD IN COMPLEX SYSTEMS"
version: 1.0.0
doi: 10.7906/indecs.16.1.6
date-released: 2022-01-02
url: "http://indecs.eu/index.php?s=x&y=2018&p=88-91"

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 41
  • Total Committers: 1
  • Avg Commits per committer: 41.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Soumya Banerjee n****a@g****m 41

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels