python-datascientist

Dépôt associé au cours Python pour data scientists (ENSAE 2e année)

https://github.com/linogaliana/python-datascientist

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary

Keywords

data-science jupyter jupyter-notebook machine-learning opendata python teaching
Last synced: 6 months ago · JSON representation ·

Repository

Dépôt associé au cours Python pour data scientists (ENSAE 2e année)

Basic Info
Statistics
  • Stars: 134
  • Watchers: 1
  • Forks: 49
  • Open Issues: 5
  • Releases: 10
Topics
data-science jupyter jupyter-notebook machine-learning opendata python teaching
Created over 5 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Data Science with Python

DOI Production deployment

[!NOTE]
This is the English 🇬🇧🇺🇸 version of the README.
To see the French 🇫🇷 version, click here:
fr


📚 About

This repository contains the source files for the course Python for Data Science taught in the second year (Master 1) at ENSAE.

The course website is available here:
🌐 https://pythonds.linogaliana.fr/


🎨 Gallery

Some visualizations produced during the course:

Figure 1 Figure 7 Figure 3 Figure 8
Figure 5 Figure 6 Figure 2 Figure 4
Figure 13 Figure 9 Figure 14 Figure 11
Figure 15 Figure 16 Figure 10 Figure 12


📖 Course content

This course is suitable for both beginners and advanced learners.
The syllabus below is fully clickable and collapsible.

1. Getting started: why Python for data science? 🔗 https://pythonds.linogaliana.fr/en/content/getting-started/ - Getting a functional Python environment for data science - How to deal with a data set - Python basics
2. Data wrangling 🔗 https://pythonds.linogaliana.fr/en/content/manipulation/ - Numpy, the foundation of data science - Introduction to Pandas - Data wrangling with Pandas - Spatial data with GeoPandas - Webscraping with Python - Retrieving data with APIs - Mastering regular expressions - Importing data from Parquet and S3
3. Data visualisation and communication 🔗 https://pythonds.linogaliana.fr/en/content/visualisation/ - Building graphics with Python - Introduction to cartography
4. Modeling 🔗 https://pythonds.linogaliana.fr/en/content/modelisation/ - Why preprocessing matters - Evaluating model quality - Introduction to classification - Introduction to regression - Feature selection - Clustering
5. Natural Language Processing (NLP) 🔗 https://pythonds.linogaliana.fr/en/content/nlp/ - Cleaning and structuring texts - Bag-of-words approach - Text embeddings

🔗 Resources

The course content relies heavily on open data, including French datasets (from data.gouv and Insee) and American datasets.

Complementary course with Romain Avouac (@avouacr):
https://ensae-reproductibilite.github.io/website/


🚀 Accessing the course in Jupyter Notebooks

[!TIP]
Run examples instantly on SSP Cloud or Google Colab. Here is an example for Pandas chapter:

SSP Cloud VSCode SSP Cloud Jupyter Open in Colab


🤝 Contributing

I welcome contributions!

[!NOTE]
See the guide for contributors:
`CONTRIBUTING.md`

Owner

  • Name: Lino Galiana
  • Login: linogaliana
  • Kind: user
  • Location: Paris
  • Company: Insee

Data Scientist Insee - Teaching at ENSAE

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use some content of this repository, please cite it as below."
authors:
- family-names: "Galiana"
  given-names: "Lino"
  orcid: "https://orcid.org/0000-0001-8663-5100"
title: "Python pour la data science"
doi: 10.5281/zenodo.5386096
date-released: 2024-06-01
url: "https://github.com/linogaliana/python-datascientist"

GitHub Events

Total
  • Issues event: 28
  • Watch event: 23
  • Delete event: 40
  • Issue comment event: 16
  • Push event: 315
  • Pull request event: 71
  • Fork event: 3
  • Create event: 39
Last Year
  • Issues event: 28
  • Watch event: 23
  • Delete event: 40
  • Issue comment event: 16
  • Push event: 315
  • Pull request event: 71
  • Fork event: 3
  • Create event: 39

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 833
  • Total Committers: 14
  • Avg Commits per committer: 59.5
  • Development Distribution Score (DDS): 0.077
Past Year
  • Commits: 262
  • Committers: 5
  • Avg Commits per committer: 52.4
  • Development Distribution Score (DDS): 0.034
Top Committers
Name Email Commits
Lino Galiana l****a@i****r 769
Romain Avouac 4****r 27
Antoine Palazzolo 9****z 12
Julien PRAMIL 1****l 7
Thomas Faria 5****a 5
Kim A k****y@l****t 2
Raphaele Adjerad 5****d 2
lbaudin 1****n 2
tomseimandi t****i@g****m 2
Expressso 9****o 1
Idrissa KONKOBO 9****a 1
Mélissa Tamine 9****a 1
jblaval l****e@g****m 1
romanegajdos 7****s 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 64
  • Total pull requests: 102
  • Average time to close issues: 7 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 0.63
  • Average comments per pull request: 0.01
  • Merged pull requests: 88
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 12
  • Pull requests: 52
  • Average time to close issues: 30 days
  • Average time to close pull requests: 3 days
  • Issue authors: 5
  • Pull request authors: 4
  • Average comments per issue: 0.92
  • Average comments per pull request: 0.0
  • Merged pull requests: 38
  • Bot issues: 0
  • Bot pull requests: 5
Top Authors
Issue Authors
  • linogaliana (74)
  • fa5fou5 (3)
  • jpramil (3)
  • daniel-odc (3)
  • jaerdoster (1)
  • leomignot (1)
  • avouacr (1)
  • antoine-palazz (1)
  • bpezet (1)
  • Orlogskapten (1)
  • raphaelfournier (1)
Pull Request Authors
  • linogaliana (151)
  • jpramil (5)
  • dependabot[bot] (5)
  • avouacr (4)
  • ThomasFaria (2)
  • antoine-palazz (2)
  • lbaudin (1)
  • ntoulemonde (1)
  • fa5fou5 (1)
  • romanegajdos (1)
Top Labels
Issue Labels
Website (22) enhancement :rocket: (17) bug (12) Structure dépôt (7) Partie manipulation (6) Partie visualisation (6) cartographie (5) CI (5) Partie modélisation (4) git (4) Jupyter (3) pandas :panda_face: (3) matplotlib (2) scikit (2) geopandas (2) NLP :book: (2) Introduction (1) help wanted (1) question (1) numpy (1) API (1) exercice (1) notebooks :notebook: (1)
Pull Request Labels
Website (11) Partie manipulation (9) enhancement :rocket: (6) Structure dépôt (6) english 🇬🇧 (5) python (5) dependencies (5) Partie visualisation (5) Partie modélisation (4) scikit (4) Introduction (4) Jupyter (4) bug (4) git (3) CI (3) NLP :book: (3) numpy (3) pandas :panda_face: (3) geopandas (3) cartographie (2) documentation (2) matplotlib (1) liste projets élèves (1)

Dependencies

.github/workflows/netlify-test.yaml actions
  • actions/checkout v2 composite
  • actions/setup-node v2 composite
  • actions/upload-artifact v2 composite
.github/workflows/prod.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v2 composite
  • actions/upload-artifact v1 composite
.github/workflows/checks.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
.github/workflows/notebooks.yml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v2 composite
  • linogaliana/github-action-push-to-another-repository main composite
requirements.txt pypi
  • contextily *
  • geoplot *
  • graphviz *
  • kaleido *
  • plotnine *
  • pynsee *
  • pywaffle *
  • wordcloud *
  • xlrd *
  • yellowbrick *