https://github.com/csyhuang/data-science-from-scratch

code for Data Science From Scratch book

https://github.com/csyhuang/data-science-from-scratch

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 5 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

code for Data Science From Scratch book

Basic Info
  • Host: GitHub
  • Owner: csyhuang
  • License: unlicense
  • Language: Python
  • Default Branch: master
  • Size: 210 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of joelgrus/data-science-from-scratch
Created almost 9 years ago · Last pushed about 9 years ago
Metadata Files
Readme License

README.md

Data Science from Scratch

Here's all the code and examples from my book Data Science from Scratch. The code directory contains Python 2.7 versions, and the code-python3 direction contains the Python 3 equivalents. (I tested them in 3.5, but they should work in any 3.x.)

Each can be imported as a module, for example (after you cd into the /code directory):

python from linear_algebra import distance, vector_mean v = [1, 2, 3] w = [4, 5, 6] print distance(v, w) print vector_mean([v, w])

Or can be run from the command line to get a demo of what it does (and to execute the examples from the book):

bat python recommender_systems.py

Additionally, I've collected all the links from the book.

And, by popular demand, I made an index of functions defined in the book, by chapter and page number. The data is in a spreadsheet, or I also made a toy (experimental) searchable webapp.

Table of Contents

  1. Introduction
  2. A Crash Course in Python
  3. Visualizing Data
  4. Linear Algebra
  5. Statistics
  6. Probability
  7. Hypothesis and Inference
  8. Gradient Descent
  9. Getting Data
  10. Working With Data
  11. Machine Learning
  12. k-Nearest Neighbors
  13. Naive Bayes
  14. Simple Linear Regression
  15. Multiple Regression
  16. Logistic Regression
  17. Decision Trees
  18. Neural Networks
  19. Clustering
  20. Natural Language Processing
  21. Network Analysis
  22. Recommender Systems
  23. Databases and SQL
  24. MapReduce
  25. Go Forth And Do Data Science

Owner

  • Name: Clare S. Y. Huang
  • Login: csyhuang
  • Kind: user

Data Scientist. Climate Scientist. Ph.D in Geophysical Sciences (U of Chicago). Love coding, writing and playing music.

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 55
  • Total Committers: 5
  • Avg Commits per committer: 11.0
  • Development Distribution Score (DDS): 0.073
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Joel Grus j****s@g****m 51
Pandeesh p****h@g****m 1
unknown 曾****恺 1
Brooke Anderson g****s@j****u 1
Lucy Park me@l****r 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels