Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Shashvat-Jain
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 16.2 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 9 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

CO₂ Emissions Prediction from Vehicle Features

PyPI version Python versions Documentation Status

Authors: Shashvat Jain
Affiliation: Integrated M.Tech. in Mathematics & Computing, IIT Dhanbad
GitHub: https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features


co2emissionsml

CO₂ Emissions Prediction from Vehicle Features
End-to-end Python package for analyzing and predicting on-road vehicle CO₂ emissions (g/km) via machine learning.

Features

  • Preprocessing & Feature Engineering: scaling, one-hot encoding, target transformation
  • Baseline Models: linear, polynomial, ridge/lasso, random forest, XGBoost, LightGBM, CatBoost
  • Stacked Ensemble: LightGBM + XGBoost + CatBoost → MLP meta-learner → Ridge residual correction
  • Bayesian Hyperparameter Tuning: Optuna pruners, early stopping
  • Diagnostics & Explainability: parity plots, residual analysis, learning curves, permutation importance, SHAP

Key result:

Test set: (R^2 = 0.9830), MAE ≈ 3.08 g/km, RMSE ≈ 8.64 g/km


📦 Repository Structure

bash . ├── README.md ├── LICENSE ├── CITATION.cff ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── DATA_DICTIONARY.md ├── .gitignore ├── environment.yml ├── requirements.txt ├── setup.py ├── Dockerfile │ ├── data/ │ └── New Dataset.csv │ ├── notebooks/ │ └── co2-emissions-predict.ipynb │ ├── src/ │ ├──models │ └──co2_emissions_ml │ ├── __init__.py │ ├── preprocessing.py │ ├── models.py │ ├── evaluation.py │ └── pipeline.py │ ├── tests/ │ └── test_pipeline.py │ ├── scripts/ │ └── train_and_save.py │ ├── Figures/ │ ├── parity_plot.png │ ├── residual_hist.png │ ├── qq_plot.png │ ├── residuals_vs_pred.png │ ├── mae_decile.png │ ├── learning_curve.png │ ├── perm_importance.png │ ├── shap_summary.png │ ├── shap_dependence.png │ └── pipeline_diagram.png │ ├── Slides/ │ └── End Evaluation.pdf │ └── Reports/ ├── Split Report └── Final Report with plag report.pdf


⚙️ Installation

```bash

From PyPI

pip install co2emissionsml

Or install latest from GitHub

pip install git+https://github.com/Shashvat-Jain/CO2-predictions-using-Automotive-Features.git ```

Quickstart

  1. Predict via CLI

bash run_co2 \ --data path/to/your_new_data.csv \ --model path/to/pretrained_bundle.pkl \ --output path/to/predictions.csv

  • --data (required): input CSV with vehicle features

  • --model (optional): path to serialized bundle.pkl (default: models/bundle.pkl)

  • --output (optional): CSV path for predictions

  • --target (optional): dependent variable name in input CSV

  1. Programmatic API

```python import pandas as pd import joblib from co2emissionsml.models import predict_bundle

Load pre-trained bundle

bundle = joblib.load("models/bundle.pkl")

Prepare new data

dfnew = pd.readcsv("yournewdata.csv") Xnew = dfnew.copy()

Predict

dfnew["predictedCO2"] = predictbundle(bundle, Xnew) dfnew.tocsv("predictions.csv", index=False) ```

🚀 Usage of GitHub Repository

  1. Prepare data Place New Dataset.csv under data/.

  2. Run notebook Open and execute notebooks/co2emissionspredict.ipynb to reproduce EDA, model training, and evaluation.

  3. Diagnostics & plots Generated in figures/:

    • Parity plot
    • Residual histogram & Q-Q plot
    • Learning curve
    • Permutation & SHAP importance charts

Note: The notebook co2emissionspredict.ipynb contains the complete code for the thesis whereas the src folder only contains the code for the new pipeline presented in this research.

📊 Results Snapshot

Figure: Predicted vs. True CO₂ Emissions Figure: Learning Curve

📚 References

  • Smith A., Jones B., Lee C. (2020). Random Forest–Based Prediction of Vehicle CO₂ Emissions. Int. J. Automotive Technol.

  • Gupta R., Ramesh S. (2021). XGBoost Regression for Estimating Vehicle Emissions. IEEE Trans. Intelligent Vehicles.

  • Tansini A., Pavlović I., Fontaras G. (2022). Forecasting CO₂ Emissions Using Ensemble, ML & DL. PeerJ.

  • Zhao P., Zhang X., Li Y. (2023). Global Fuel- and Vehicle-Type-Specific CO₂ Emissions. Earth Syst. Sci. Data.

  • Government of Canada (2024). Fuel Consumption Ratings. Open Gov. Portal.

  • U.S. EPA (2022). 2022 EPA Automotive Trends Report. EPA-420-S-22-001.

  • (See full bibliography in reports/.)

📄 License

This project is licensed under the MIT License. See LICENSE for details.

Owner

  • Name: shashvat_sj
  • Login: Shashvat-Jain
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work, please cite as below."
authors:
  - family-names: "Jain"
    given-names: "Shashvat "
title: "CO₂ Emissions Prediction from Vehicle Features"
version: "v1.0.0"
doi: "10.5281/zenodo.xxxxxxx"
date-released: "2025-05-28"

GitHub Events

Total
  • Push event: 4
  • Public event: 1
  • Create event: 1
Last Year
  • Push event: 4
  • Public event: 1
  • Create event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 127 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: co2-emissions-ml

End-to-end ML pipeline for vehicle CO2 emissions prediction

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 127 Last month
Rankings
Dependent packages count: 9.0%
Average: 30.0%
Dependent repos count: 50.9%
Maintainers (1)
Last synced: 7 months ago

Dependencies

environment.yml pypi
  • jupyterlab *
requirements.txt pypi
  • catboost >=1.1
  • jupyterlab >=3.6
  • lightgbm >=3.3
  • matplotlib >=3.5
  • numpy >=1.23
  • optuna >=3.0
  • pandas >=1.5
  • scikit-learn >=1.2
  • scipy >=1.7.0
  • seaborn >=0.12
  • shap >=0.40
  • xgboost >=1.7
setup.py pypi
  • catboost >=1.1
  • jupyterlab >=3.6
  • lightgbm >=3.3
  • matplotlib >=3.5
  • numpy >=1.23
  • optuna >=3.0
  • pandas >=1.5
  • scikit-learn >=1.2
  • scipy >=1.7.0
  • seaborn >=0.12
  • shap >=0.40
  • xgboost >=1.7
src/co2_emissions_ml.egg-info/requires.txt pypi
  • catboost >=1.1
  • jupyterlab >=3.6
  • lightgbm >=3.3
  • matplotlib >=3.5
  • numpy >=1.23
  • optuna >=3.0
  • pandas >=1.5
  • scikit-learn >=1.2
  • scipy >=1.7.0
  • seaborn >=0.12
  • shap >=0.40
  • xgboost >=1.7