regression-algorithms-from-scratch
https://github.com/krishnaaggarwal2003/regression-algorithms-from-scratch
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: KrishnaAggarwal2003
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 262 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Regression Algorithms from Scratch
This repository demonstrates Linear Regression and Logistic Regression implemented from scratch using NumPy, with detailed training, evaluation, and visualisation. The goal is to provide a clear, educational look at how these foundational machine learning algorithms work under the hood for model fitting.
Contents
LR_code.ipynb: Linear Regression from scratch (for continuous targets)Logistic.ipynb: Logistic Regression from scratch (for binary classification)
1. Linear Regression (LR_code.ipynb)
Overview
- Data Generation: Synthetic data is created with random features, coefficients, and Gaussian noise.
- Model: Implements multivariate linear regression using gradient descent.
- Training: Tracks cost (MSE) and R² score (accuracy) over epochs, with early stopping.
- Evaluation: Reports MSE, MAE, R², and visualizes predictions, residuals, and learned coefficients.
Key Steps
Data Creation:
- Features (
X), coefficients (beta), and noise are randomly generated. - Target (
Y) is computed as a linear combination of features plus noise.
- Features (
Model Training:
- Custom
LinearRegressionclass with manual gradient descent. - Updates both coefficients and intercept.
- Early stopping if the cost converges.
- Custom
Evaluation & Visualization:
- Calculates MSE, MAE, and R² on the test set.
- Plots predicted vs. actual values, residual distribution, and compares true vs. learned coefficients.
Output obtained from Test-data
``` Epoch 0/500, Cost: 2.6241, Accuracy: -181.53% ... Epoch 499/500, Cost: 0.0003, Accuracy: 99.97%
Range of Y data: -354.16 to 400.74, i.e. 754.9 Mean-squared Error: 3.6998 Mean-Absolute Error: 1.5285 R² score (Accuracy): 99.9647% ``` The model achieved excellent results. With a very high R² score of 99.9647%, it explains almost all of the variance in the Y data. The low Mean Squared Error (3.6998) and Mean Absolute Error (1.5285) relative to the wide range of the Y data (754.9) further confirm the model's high accuracy and small prediction errors.
The graph clearly shows that the blue predicted points cluster tightly around the red "Ideal Fit" line. This strong alignment visually confirms the model's excellent performance, as indicated by the high R² score (99.9647%) and low error metrics previously discussed. The model's predictions are remarkably close to the true values across the entire range of data.
This "Distribution of Residuals" histogram demonstrates the model's excellent performance by showing that its errors are normally distributed and centred around zero. This ideal distribution indicates that most predictions are highly accurate with small, unbiased errors, reinforcing the model's overall robustness and reliability.
This graph visually confirms the model's success in learning the underlying data relationships, as the "Learned Coefficients" (black bars) closely mirror the "Actual Coefficients" (blue bars) in both magnitude and direction for each feature variable. This strong alignment demonstrates the model's high accuracy in identifying the true influence of each feature on the target.
2. Logistic Regression (Logistic.ipynb)
Overview
- Data Generation: Uses
make_classificationto create a synthetic binary classification dataset, with optional label noise. - Model: Implements logistic regression with options for L1, L2, or combined regularization.
- Training: Uses gradient descent, tracks loss and accuracy, and supports early stopping.
- Evaluation: Reports classification metrics, confusion matrix, ROC curve, and visualises loss/accuracy curves.
Key Steps
Data Creation:
- Features are standardised, and a bias term is added.
- Optional label noise for realism.
Model Training:
- Custom
LogisticRegressionclass with manual gradient descent. - Supports L1, L2, and combined regularization.
- Tracks cost and accuracy per epoch.
- Custom
Evaluation & Visualization:
- Classification report (precision, recall, f1-score, accuracy).
- Plots: loss curve, accuracy curve, confusion matrix, ROC curve with AUC.
Output from Test-data
Epoch 0/2000, Cost: 0.9063, Accuracy: 61.18%
...
Epoch 1999/2000, Cost: 0.5265, Accuracy: 88.98%
Classification Report
The "Loss Curve for Logistic Regression" illustrates the model's optimisation process. The rapid decrease in loss followed by its convergence to a stable minimum demonstrates effective training and efficient parameter optimisation, indicating the model successfully learned from the data and reached a state of optimal performance.
The plot above shows the progression of model accuracy over 2000 training iterations. Initially, accuracy increases rapidly, indicating that the model is learning effectively during the early phase of training. Around iteration 500, accuracy begins to plateau near 0.89, suggesting the model has reached convergence. After this point, the performance stabilizes with minimal fluctuation, reflecting a well-trained model with consistent accuracy.
Confusion Matrix
The confusion matrix summarises the classification performance of the model on the test dataset: - True Positives (1 predicted as 1): 1800 - True Negatives (0 predicted as 0): 1721 - False Positives (0 predicted as 1): 291 - False Negatives (1 predicted as 0): 188
The model demonstrates strong classification performance for both classes, with a relatively low number of misclassifications. It handles class 1 slightly better than class 0, as indicated by fewer false negatives. This matrix reinforces the overall high accuracy observed during training.
ROC curve
The ROC (Receiver Operating Characteristic) curve visualises the model's diagnostic ability across various threshold settings. The curve shows a strong upward trajectory with an Area Under the Curve (AUC) of 0.89, indicating that the model has high discriminative power in distinguishing between the two classes. An AUC close to 1.0 reflects a robust classifier, and the observed value of 0.89 suggests that the model maintains an excellent balance between sensitivity (true positive rate) and specificity (false positive rate).
3. Educational Value
- No high-level model fitting: All learning logic is implemented manually.
- Step-by-step: Each notebook walks through data creation, model logic, training, and evaluation.
- Visualisation: Plots help interpret model performance and learning dynamics.
4. Requirements
- Python 3.x
- NumPy
- scikit-learn
- matplotlib
- seaborn
- tqdm
- pandas
Install requirements with: ```bash pip install numpy scikit-learn matplotlib seaborn tqdm pandas
```
5. License
This repository is licensed under the MIT License. It is intended for educational and research purposes and demonstrates the inner workings of linear and logistic regression, including gradient descent, regularisation techniques, and performance evaluation metrics.
Owner
- Login: KrishnaAggarwal2003
- Kind: user
- Repositories: 1
- Profile: https://github.com/KrishnaAggarwal2003
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this code, please cite it using the metadata below."
title: "Regression Algorithms from Scratch"
authors:
- family-names: Aggarwal
given-names: Krishna
affiliation: Your Affiliation or University
date-released: 2025-05-28
version: "1.0.0"
repository-code: https://github.com/KrishnaAggarwal2003/Regression-Algorithms-from-Scratch
license: MIT
GitHub Events
Total
- Push event: 12
- Create event: 2
Last Year
- Push event: 12
- Create event: 2