hello-penguins

Machine learning experiments with the Palmer Penguins dataset

https://github.com/h-fuzzy-logic/hello-penguins

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary

Keywords

explainable-ml machine-learning mlflow python
Last synced: 6 months ago · JSON representation

Repository

Machine learning experiments with the Palmer Penguins dataset

Basic Info
  • Host: GitHub
  • Owner: h-fuzzy-logic
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.36 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
explainable-ml machine-learning mlflow python
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Hello, Penguins

Machine learning experiments with the Palmer Penguins dataset

Palmer Penguins illustration

Illustration by @allison_horst

Welcome

Welcome to the “Hello Penguins” repository, a collection of machine learning experiments with the Palmer Penguins dataset.

Inspired by the “Hello, World!” programming tradition, this repository is a series of small experiments to illustrate foundational machine learning concepts. Each experiment includes evaluation metrics and visuals to verify the model predictions make sense and are explainable.

Software engineering concepts are used to ensure the code is testable and reproducible.

To learn more about the dataset, checkout the the official Palmer Penguins GitHub repo.

Training Approach and Technology

MLflow is used for model training and evaluation instead of notebooks.

Training happens locally and the experiment results are shared in an MLflow portfolio that is hosted with Google Cloud Run. The goal is to have the portfolio highly available, but there may be times when it is offline. The portfolio Docker container files are in the docker-portfolio directory.

Pre-Training Checks

  • Consider data bias

Acknowledgements and Sources

This repo builds on many foundations:

  • Allison Horst’s Palmer Penguins repo
    • Data downloaded 3/16/2025 curl -o data/penguins.csv https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv
  • Lynn Langit’s mentorship and amazing resources for learning cloud
  • Santiago Valdarrama’s ML School repo

Owner

  • Name: Heather Woods
  • Login: h-fuzzy-logic
  • Kind: user
  • Location: United States

Always learning new ways to use technology for the greater good. Software Engineer with expertise in Data Engineering and Data Science.

GitHub Events

Total
  • Delete event: 3
  • Push event: 7
  • Pull request event: 5
  • Create event: 5
Last Year
  • Delete event: 3
  • Push event: 7
  • Pull request event: 5
  • Create event: 5

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 15
  • Total Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 15
  • Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
h-fuzzy-logic h****c@g****m 15

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 minutes
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • h-fuzzy-logic (5)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
  • awscurl >=0.36
  • azure-ai-ml >=1.22.4
  • azureml-mlflow >=1.58.0
  • evidently >=0.5.0
  • ipykernel >=6.29.5
  • jax [cpu]>=0.4.20,<0.5.0
  • jupyter >=1.1.1
  • keras >=3.7.0
  • metaflow >=2.13
  • metaflow-card-html >=1.0.2
  • mlflow [extras]>=2.18.0
  • mlserver >=1.6.1
  • mlserver-mlflow >=1.6.1
  • numpy >=2.0.2
  • pandas >=2.2.3
  • pylint >=3.3.2
  • pytest >=8.3.4
  • scikit-learn >=1.6.0
  • seaborn >=0.13.2
docker-portfolio/Dockerfile docker
  • python 3.12.0-slim-bookworm build
docker-portfolio/requirements.txt pypi
  • mlflow *