Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Shashvat-Jain
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 16.2 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CO₂ Emissions Prediction from Vehicle Features
Authors: Shashvat Jain
Affiliation: Integrated M.Tech. in Mathematics & Computing, IIT Dhanbad
GitHub: https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features
co2emissionsml
CO₂ Emissions Prediction from Vehicle Features
End-to-end Python package for analyzing and predicting on-road vehicle CO₂ emissions (g/km) via machine learning.
Features
- Preprocessing & Feature Engineering: scaling, one-hot encoding, target transformation
- Baseline Models: linear, polynomial, ridge/lasso, random forest, XGBoost, LightGBM, CatBoost
- Stacked Ensemble: LightGBM + XGBoost + CatBoost → MLP meta-learner → Ridge residual correction
- Bayesian Hyperparameter Tuning: Optuna pruners, early stopping
- Diagnostics & Explainability: parity plots, residual analysis, learning curves, permutation importance, SHAP
Key result:
Test set: (R^2 = 0.9830), MAE ≈ 3.08 g/km, RMSE ≈ 8.64 g/km
📦 Repository Structure
bash
.
├── README.md
├── LICENSE
├── CITATION.cff
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DATA_DICTIONARY.md
├── .gitignore
├── environment.yml
├── requirements.txt
├── setup.py
├── Dockerfile
│
├── data/
│ └── New Dataset.csv
│
├── notebooks/
│ └── co2-emissions-predict.ipynb
│
├── src/
│ ├──models
│ └──co2_emissions_ml
│ ├── __init__.py
│ ├── preprocessing.py
│ ├── models.py
│ ├── evaluation.py
│ └── pipeline.py
│
├── tests/
│ └── test_pipeline.py
│
├── scripts/
│ └── train_and_save.py
│
├── Figures/
│ ├── parity_plot.png
│ ├── residual_hist.png
│ ├── qq_plot.png
│ ├── residuals_vs_pred.png
│ ├── mae_decile.png
│ ├── learning_curve.png
│ ├── perm_importance.png
│ ├── shap_summary.png
│ ├── shap_dependence.png
│ └── pipeline_diagram.png
│
├── Slides/
│ └── End Evaluation.pdf
│
└── Reports/
├── Split Report
└── Final Report with plag report.pdf
⚙️ Installation
```bash
From PyPI
pip install co2emissionsml
Or install latest from GitHub
pip install git+https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features.git ```
Quickstart
- Predict via CLI
bash
run_co2 \
--data path/to/your_new_data.csv \
--model path/to/pretrained_bundle.pkl \
--output path/to/predictions.csv
--data (required): input CSV with vehicle features
--model (optional): path to serialized bundle.pkl (default: models/bundle.pkl)
--output (optional): CSV path for predictions
--target (optional): dependent variable name in input CSV
- Programmatic API
```python import pandas as pd import joblib from co2emissionsml.models import predict_bundle
Load pre-trained bundle
bundle = joblib.load("models/bundle.pkl")
Prepare new data
dfnew = pd.readcsv("yournewdata.csv") Xnew = dfnew.copy()
Predict
dfnew["predictedCO2"] = predictbundle(bundle, Xnew) dfnew.tocsv("predictions.csv", index=False) ```
🚀 Usage of GitHub Repository
Prepare data Place New Dataset.csv under data/.
Run notebook Open and execute notebooks/co2emissionspredict.ipynb to reproduce EDA, model training, and evaluation.
Diagnostics & plots Generated in figures/:
- Parity plot
- Residual histogram & Q-Q plot
- Learning curve
- Permutation & SHAP importance charts
Note: The notebook co2emissionspredict.ipynb contains the complete code for the thesis whereas the src folder only contains the code for the new pipeline presented in this research.
📊 Results Snapshot
Figure:
Figure: .png)
📚 References
Smith A., Jones B., Lee C. (2020). Random Forest–Based Prediction of Vehicle CO₂ Emissions. Int. J. Automotive Technol.
Gupta R., Ramesh S. (2021). XGBoost Regression for Estimating Vehicle Emissions. IEEE Trans. Intelligent Vehicles.
Tansini A., Pavlović I., Fontaras G. (2022). Forecasting CO₂ Emissions Using Ensemble, ML & DL. PeerJ.
Zhao P., Zhang X., Li Y. (2023). Global Fuel- and Vehicle-Type-Specific CO₂ Emissions. Earth Syst. Sci. Data.
Government of Canada (2024). Fuel Consumption Ratings. Open Gov. Portal.
U.S. EPA (2022). 2022 EPA Automotive Trends Report. EPA-420-S-22-001.
(See full bibliography in reports/.)
📄 License
This project is licensed under the MIT License. See LICENSE for details.
Owner
- Name: shashvat_sj
- Login: Shashvat-Jain
- Kind: user
- Repositories: 4
- Profile: https://github.com/Shashvat-Jain
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work, please cite as below."
authors:
- family-names: "Jain"
given-names: "Shashvat "
title: "CO₂ Emissions Prediction from Vehicle Features"
version: "v1.0.0"
doi: "10.5281/zenodo.xxxxxxx"
date-released: "2025-05-28"
GitHub Events
Total
- Push event: 4
- Public event: 1
- Create event: 1
Last Year
- Push event: 4
- Public event: 1
- Create event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 127 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: co2-emissions-ml
End-to-end ML pipeline for vehicle CO2 emissions prediction
- Homepage: https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features/
- Documentation: https://co2-emissions-ml.readthedocs.io/
- License: MIT License
-
Latest release: 1.0.1
published 9 months ago
Rankings
Maintainers (1)
Dependencies
- jupyterlab *
- catboost >=1.1
- jupyterlab >=3.6
- lightgbm >=3.3
- matplotlib >=3.5
- numpy >=1.23
- optuna >=3.0
- pandas >=1.5
- scikit-learn >=1.2
- scipy >=1.7.0
- seaborn >=0.12
- shap >=0.40
- xgboost >=1.7
- catboost >=1.1
- jupyterlab >=3.6
- lightgbm >=3.3
- matplotlib >=3.5
- numpy >=1.23
- optuna >=3.0
- pandas >=1.5
- scikit-learn >=1.2
- scipy >=1.7.0
- seaborn >=0.12
- shap >=0.40
- xgboost >=1.7
- catboost >=1.1
- jupyterlab >=3.6
- lightgbm >=3.3
- matplotlib >=3.5
- numpy >=1.23
- optuna >=3.0
- pandas >=1.5
- scikit-learn >=1.2
- scipy >=1.7.0
- seaborn >=0.12
- shap >=0.40
- xgboost >=1.7