https://github.com/aarya-gupta/used_cars_price_prediction
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Aarya-Gupta
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1000 Bytes
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Used Cars Price Prediction - README
Overview
This project implements a regression analysis on a dataset of used cars to predict their prices. It uses 15 popular machine learning models and compares their performances using metrics such as R², relative error, and RMSE. Some of the complex models have been optimized for better results.
Table of Contents
Features
- Exploratory Data Analysis (EDA)
- Data Preprocessing and Feature Engineering
- Model Training and Hyperparameter Tuning
- Model Evaluation and Comparison
- Prediction and Insights
- Exploratory Data Analysis (EDA)
Usage
- Instructions for running the notebook
- Dependencies
- Instructions for running the notebook
Models Included
- Linear Regression
- Support Vector Machines (SVR and Linear SVR)
- Multi-Layer Perceptron Regressor (MLP)
- Stochastic Gradient Descent (SGD)
- Decision Tree and Random Forest Regressors
- XGBoost and LightGBM
- Gradient Boosting Regressor
- Ridge Regressor
- Bagging Regressor
- ExtraTrees Regressor
- AdaBoost Regressor
- Voting Regressor
- Linear Regression
Features
1. Dataset
- The dataset is downloaded and preprocessed to remove unnecessary or redundant columns and handle missing values.
- Target variable: price
- Features include year, manufacturer, condition, cylinders, fuel type, odometer reading, transmission type, drive type, vehicle type, and paint color.
2. Preprocessing
- Categorical features are encoded using
LabelEncoder. - Continuous features are scaled using
StandardScaler. - Data is split into training, validation, and testing sets.
3. Model Evaluation
- Models are evaluated using:
- R² score for goodness of fit
- Relative Error for accuracy
- RMSE for prediction errors
- R² score for goodness of fit
Usage
1. Install Dependencies
Ensure the following Python libraries are installed:
- Data Handling & Visualization: numpy, pandas, matplotlib
- Modeling: sklearn, xgboost, lightgbm
- Hyperparameter Tuning: hyperopt
To install the dependencies:
bash
pip install numpy pandas matplotlib scikit-learn xgboost lightgbm hyperopt
2. Run the Project
The main notebook is main.ipynb. Open it in Jupyter Notebook or JupyterLab, and execute the cells sequentially.
- Load the Dataset: Ensure the dataset is available in the specified path. Modify the file path in the notebook if required.
- Explore and Preprocess Data: The notebook performs EDA and data preparation.
- Train and Evaluate Models: Run each model and observe its performance metrics.
- Predict Prices: Use the best-performing model for predictions on unseen data.
Results
The best-performing models based on RMSE, R², and relative error are:
- LightGBM (LGBM)
- Bagging Regressor
- XGBoost (XGB)
Visualizations
- R² scores, relative errors, and RMSE for all models are plotted for comparison.
Contribution and Feedback
Comments and feedback are welcome to improve the implementation and results.
Owner
- Login: Aarya-Gupta
- Kind: user
- Repositories: 1
- Profile: https://github.com/Aarya-Gupta
GitHub Events
Total
- Public event: 1
- Push event: 4
- Pull request event: 1
- Fork event: 2
Last Year
- Public event: 1
- Push event: 4
- Pull request event: 1
- Fork event: 2