diamondvaluationprediction

💎 Diamond Valuation Prediction

https://github.com/mindful-ai-assistants/diamondvaluationprediction

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary

Keywords

jupytrnotebooks knn-clustering latex-code mathematical-modelling oneness-consciousness powrebi preditcion python3 statics streamlit

Last synced: 6 months ago · JSON representation ·

Repository

💎 Diamond Valuation Prediction

Basic Info

Host: GitHub
Owner: Mindful-AI-Assistants
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://github.com/Mindful-AI-Assistants/DiamondValuationPrediction
Size: 21.2 MB

Statistics

Stars: 9
Watchers: 1
Forks: 3
Open Issues: 3
Releases: 0

Topics

jupytrnotebooks knn-clustering latex-code mathematical-modelling oneness-consciousness powrebi preditcion python3 statics streamlit

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme Contributing Funding License Code of conduct Citation Codeowners

README.md

[🇧🇷 Português] [🇺🇸 English]

💎 Diamond Valuation Prediction

https://github.com/user-attachments/assets/292156b0-1430-48f7-b6d7-2ce08a7d6fee

📺 Watch on YouTube

About This Projeto

This repository contains a Python and Jupyter Notebook project developed for the AI Project Showcase Competition 2024, organized by Ready Tensor AI.

The project focuses on analyzing a dataset of diamond characteristics to predict their prices using machine learning techniques, including linear regression and K-Nearest Neighbors (KNN).

For more information and access to the project, please visit the GitHub repository.

Introduction
Data Set
Methodology
Discoveries
Numerical-resource-analysis
Analysis of Categorical Resources
Insights
Recommendations
Conclusion
File Structure
Starting to Clone
Contributing
Git Commands
Data Analysis, Codes and Report
Access the Streamlit Site
QR Codes
Our Team
Code of Conduct
License

This project explores the fascinating world of diamonds and aims to predict their price based on a variety of factors. Our goal is to uncover hidden relationships between diamond characteristics and their value, contributing to a deeper understanding of the diamond market.

The purpose of this predictive analysis is to create a website that determines the price of a diamond based on its characteristics: carat, cut, color, clarity, price, depth, table, x (length), y (width), and z (depth). In extreme cases where a quick estimate is required, it is not feasible to define all of these characteristics. Therefore, a study is necessary to determine the minimum characteristics needed to estimate the price accurately.

For the database study, we will use various statistical strategies, including linear regression, and apply chemistry knowledge to formulate mathematical equations to define diamond prices based on their characteristics. Additionally, to clean the database, which contains missing values, and to predict the value of diamonds based on their characteristics, we will employ the KNN (K-Nearest Neighbors) clustering algorithm. This is a supervised learning algorithm that will be used for both cleaning and predictions. To estimate missing values in the database, the KNN algorithm will individually calculate the distance between diamonds with missing values and those with known values, based on the known characteristics of the diamonds. Then, the KNN will identify the diamonds closest to the one being analyzed and use this information to predict the missing value. The same process will be applied to predict the price of diamonds.

Dataset 📊

The dataset used in this project is "Diamondsvaluesfaltantes.csv" and includes the following columns:

| Column Name | Description | |---|---| | carat | Weight of the diamond in carats | | cut | Quality of the diamond's cut (Ideal, Premium, Very Good, Good, Fair) | | color | Color of the diamond (D, E, F, G, H, I, J) | | clarity | Clarity of the diamond (IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1) | | depth | Percentage of the diamond's depth | | table | Percentage of the diamond's table width | | price | Price of the diamond in US dollars | | x | Length of the diamond in millimeters | | y | Width of the diamond in millimeters | | z | Depth of the diamond in millimeters |

Metodology 🛠️

Loading and Data Exploration

```python import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import math import numpy as np from sklearn.impute import KNNImputer from sklearn.preprocessing import OrdinalEncoder

import random

Mudar o caminho da base de dados

path = r"DataBases\Diamondsvaluesfaltantes.csv" diamonds = pd.read_csv(fr"{path}")

diamonds ```

Visualization of the Linear Correlation Coefficient and Separation of the Dataset, for better KNN Implementation

👇 Below is the number of missing values per column

```python counter = {} for x in range(diamonds.shape[1]): columnname = diamonds.columns[x] counter[columnname] = diamonds.shape[0] - len(diamonds[column_name].dropna())

counterdf = pd.DataFrame(list(counter.items()), columns=['Coluna', 'Quantidade de NaN']) counterdf

plt.figure(figsize = (8, 6)) sns.heatmap((diamonds[["carat", "depth", "table", "price", "x", "y", "z"]]).corr(), vmin = -1, vmax = 1, annot = True, cmap = 'magma') plt.title("Coeficiente de Correlação Linear") plt.show() ```

Three Methods to Estimate Diamond Prices:

Request the Diamond's Mass from the Client:

$$\text{Carat} = \frac{200}{\text{Mass} \ (\text{mg})}$$

When the User Provides the Diamond's Points:

$$\text{Carat} = \frac{100}{\text{Points} \ (\text{pt})}$$

Using Four Elements to Estimate the Carat of the Diamond:

For the second method of estimating the diamond's carat, four elements are required: Length (mm), Width (mm), Depth (mm), and Density (\frac{mm}{mm^2}). We use the object's density calculation to first calculate the diamond's mass:

$$\text{Density} = \frac{\text{Volume}}{\text{Mass}} \rightarrow \text{Mass} = \text{Density} \times \text{Volume}$$

However, we don't have the diamond's volume. To obtain it, we'll break down the volume calculation of an object as follows:

$$\text{Volume} = \text{Length} \times \text{Width} \times \text{Depth}$$

### Substituting this into the original formula gives:

$$\text{Mass} = \text{Density} \times (\text{Length} \times \text{Width} \times \text{Depth})$$

Now, we need to find the diamond's carat. To do this, we'll use Formula 1 to estimate the diamond's carat:

$$\text{Carat} = \frac{\text{Mass}(\text{mg})}{200}$$

The general formula becomes:

$$\text{Carat} = \frac{\text{Density} \times \text{Volume}}{200}$$

$$\text{Carat} = \frac{\text{Length} \times \text{Width} \times \text{Depth} \times \text{Density}}{200}$$

Resource Engineering

Analysis of the Heat Map Above Based on Price:

We can conclude that the price does not have a good correlation with the total percentage of the diamond (depth) and also does not have a high correlation with the table, with an inversely proportional correlation of -0.0086 with depth, and a proportional relationship of 0.13 with the table. We can also conclude that the price has a good linear correlation with the carat of 0.92, x (length) of 0.89, y (width) of 0.89, and z (depth) of 0.88.

Based on this heat map analysis, we can conclude that the larger the carat, x (length), y (width), and z (depth), the higher the diamond's price can be.

However, there may be some cases where a diamond has a very high carat but a low price, just as there may be diamonds with a low carat but a high price. This can also happen with x (length), y (width), and z (depth). Because of this, we question the following: how well can the carat, x (length), y (width), and z (depth) determine the value of the diamond? To answer this, we need to derive the Coefficient of Determination.

python Copy code plt.figure(figsize=(8, 6)) sns.heatmap((diamonds[["carat", "depth", "table", "price", "x", "y", "z"]]).corr()**2, vmin=-1, vmax=1, annot=True, cmap='magma') plt.title("Coefficient of Determination") plt.show()

Analysis of the heat map above based on price:

When analyzing the heat map above, we can see that we can define the price of the diamond more reliably using the numerical variable carat, with 85% reliability. This means that although we can say that the higher the carat of the diamond, the higher its price, unfortunately, this rule is only valid for 85% of the data.

For x (length), y (width), and z (depth), this reliability is only 79% for length and width and 78% for depth, which is not a strong determination. Therefore, they may be disregarded if the categorical variables can accurately define the price of the diamond.

Below we are performing the process of separating the diamonds database so that the machine learning process is more effective.

- Cut has 5 classification types: Ideal, Premium, Good, Very Good, and Fair

- Color has 7 classification types: E, I, J, H, F, G, and D

- Clarity has 8 classification types: SI2, SI1, VS1, VS2, VVS2, VVS1, I1, and IF

Implementation of K-NN Algorithm

Setting length, width, and/or depth measurements of a diamond equal to 0 as NaN

python Copy code for x in range(diamonds.shape[0]): for y in range(7, diamonds.shape[1]): if diamonds.iloc[x, y] == 0: diamonds.iloc[x, y] = np.nan elif diamonds.iloc[x, y] >= 30: diamonds.iloc[x, y] = np.nan diamonds

👇 Below is the implementation of K-NN Algorithm in the numerical columns

ps: Some books advise using the formula (K = log n) where n is the number of rows in the database. To thus define the amount of K.

```python Copy code classification = KNNImputer(nneighbors=round(math.log(diamonds.shape[0]))) diamonds[["carat", "depth", "table", "price", "x", "y", "z"]] = classification.fittransform(diamonds[["carat", "depth", "table", "price", "x", "y", "z"]])

diamonds ```

Applying K-NN Algorithm for Categorical Columns Algorithm

```python Copy code '''KNN for categorical values''' encoder = OrdinalEncoder() diamondsencoder = encoder.fittransform(diamonds)

knnimputer = KNNImputer(nneighbors = round(math.log(diamonds.shape[0]))) diamondsimputer = knnimputer.fittransform(diamondsencoder) diamondsimputer = pd.DataFrame(diamondsimputer, columns = diamonds.columns) diamondsimputer = encoder.inversetransform(diamonds_imputer) ```

Angular Coefficient Graphic

### [Replacing missing values in the main diamonds database]() ```python Copy code for x in range(diamonds.shape[0]): for y in range(1, 4): if pd.isna(diamonds.iloc[x, y]): diamonds.iloc[x, y] = diamonds_imputer[x][y] diamonds ``` ### 👇[Below we are normalizing the numerical columns]() ```python standardization of numerical columns diamonds[["carat", "x", "y", "z"]] = round(diamonds[["carat", "x", "y", "z"]], 2) diamonds[["table", "price"]] = round(diamonds[["table", "price"]]) diamonds["depth"] = round(diamonds["depth"], 1) diamonds ``` ### [Coefficient of Determination Graphic]()

## [Price Prediction Model]() ### [Saving the already cleaned database without missing values]() ```Python path = r"DataBases\Diamonds_clean.csv" try: pd.read_csv(f"{path}") print(f"This dataframe already exists in the directory: {path}") except FileNotFoundError: diamonds.to_csv(fr"{path}", index=False) print(f'''Cleaned database added to directory: {path} successfully!!''') ``` ### [Analysis of the Price Relationship of the Numerical Columns]() #### **⭕️ IMPORTANT INFORMATION:** 1- **Carat is equivalent to 200mg** 2- **Points are equivalent to 0.01 carats** ### [👇 The graph below compares the relationship of the length of a diamond with the carat and with the price]() ```python plt.figure(figsize=(17, 10)) plt.subplot(2, 1, 1) sns.scatterplot(data=diamonds, x="x", y="price") plt.xlabel("Length (mm)") plt.ylabel("Price") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis="y", alpha=0.5) plt.subplot(2, 1, 2) sns.scatterplot(data=diamonds, x="x", y="carat") plt.xlabel("Length (mm)") plt.ylabel("Carat") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis="y", alpha=0.5) plt.show() ``` ### Relationship of a Diamond’s Length with the Carat and Price Graphic

### [👇 The graph below compares the relationship of the width of a diamond with the carat and with the price]() ```python plt.figure(figsize=(17, 10)) plt.subplot(2, 1, 1) sns.scatterplot(diamonds, x = "y", y = "price") plt.xlabel("Width (mm)") plt.ylabel("Price") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis = "y", alpha = 0.5) plt.subplot(2, 1, 2) sns.scatterplot(diamonds, x = "y", y = "carat") plt.xlabel("Width (mm)") plt.ylabel("Carat") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis = "y", alpha = 0.5) plt.show() ``` ### Relationship of a Diamond’s Width with the Carat and Price ![4 Relationship of a diamond’s width with the carat and price](https://github.com/Mindful-AI-Assistants/DiamondValuationPrediction/assets/113218619/a2b83a69-1570-4c76-85ba-3b98726160d4) ### [👇 The graph below compares the relationship of the depth of a diamond with the carat and with the price]() ```python plt.figure(figsize=(17, 10)) plt.subplot(2, 1, 1) sns.scatterplot(diamonds, x = "z", y = "price") plt.xlabel("Depth (mm)") plt.ylabel("Price") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis = "y", alpha = 0.5) plt.subplot(2, 1, 2) sns.scatterplot(diamonds, x = "z", y = "carat") plt.xlabel("Depth (mm)") plt.ylabel("Carat") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis = "y", alpha = 0.5) plt.show() ``` ### Relationship of the Depth of a Diamond with the Carat and with the Price

### [👇 The graph below compares the relationship of the carat of a diamond with the price]() ```python plt.figure(figsize=(17, 5)) sns.scatterplot(diamonds, x = "carat", y = "price") plt.xlabel("Carat") plt.ylabel("Price") plt.title("Price and Carat Relationship") plt.gca().spines["right"].set_visible(False) plt.gca().spines["top"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis = "y", alpha = 0.5) plt.show() ``` ### Relationship of the carat of a diamond with the price

### Dataset: - [Dataset Diamond Valuation - from kaggle](https://github.com/Mindful-AI-Assistants/DiamondValuationPrediction/blob/57c0e19bc9bd3efbdd8256442129261df9ac358f/Database/Diamonds_limpa%20(1).csv)
### Data Analysis Report: - [Data Analysis Report](https://github.com/Mindful-AI-Assistants/DiamondValuationPrediction/blob/86c2111a01c153279e7e8a7744f398041ef5d35b/Data%20Analyze%20Report/Data%20Analyse%20English/Data%20Analyse%20English.pdf) ## [👑 Access the Streamlit Site]()
🚀 [Tap here and teleport to the Streamlit Site](https://diamondsvalues.streamlit.app/) ## [QR Codes]() ###

👑 QR Code of the Site on Streamlit

QR Code 1

:octocat: QR Code of the GitHub Repository

QR Code 2

👥 Our Team

🤝 Codes odf Conduct

We are committed to fostering a welcoming and inclusive community for all team members. We expect everyone to adhere to the following principles:

Be respectful: Treat others with courtesy and respect, regardless of their background, identity, or opinions.
Be constructive: Focus on providing helpful feedback and constructive criticism.
Be open-minded: Be open to different perspectives and ideas.
Be open-minded: Be open to different perspectives and ideas.
Be accountable: Take responsibility for your actions and words.
Be inclusive: Promote a welcoming and inclusive environment for everyone.

If you witness any violation of this code of conduct, please contact [your contact information] so we can address the situation appropriately.

Name: 𖤐 Mindful AI ॐ
Login: Mindful-AI-Assistants
Kind: organization
Email: fabicampanari@proton.me
Location: Brazil

Website: https://github.com/Mindful-AI-Assistants
Repositories: 4
Profile: https://github.com/Mindful-AI-Assistants

𖤐 Empowering businesses with AI-driven technologies like Copilots, Agents, Bots and Predictions, alongside intelligent Decision-Making Support 𖤐

Citation (CITATION.cff)

cff-version: 1.2.0
title: 💎 Diamond Valuation Prediction repository
message: If you really want to cite this repository, here's how you should cite it.
type: software
authors:
  - given-names: Mindful-AI-Assistants
repository-code: https://github.com/Mindful-AI-Assistants/DiamondValuationPrediction
license: MIT License

GitHub Events

Total

Issues event: 4
Delete event: 84
Issue comment event: 9
Push event: 84
Pull request event: 162
Fork event: 1
Create event: 82

Last Year

Issues event: 4
Delete event: 84
Issue comment event: 9
Push event: 84
Pull request event: 162
Fork event: 1
Create event: 82

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 147
Average time to close issues: N/A
Average time to close pull requests: 2 days
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.03
Merged pull requests: 131
Bot issues: 0
Bot pull requests: 52

Past Year

Issues: 0
Pull requests: 52
Average time to close issues: N/A
Average time to close pull requests: 7 days
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.1
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 52

View more stats

Top Authors

Issue Authors

FabianaCampanari (23)

Pull Request Authors

FabianaCampanari (422)
dependabot[bot] (99)
ppvyctor (20)

Top Labels

Issue Labels

Pull Request Labels

dependencies (99) python (99)

Dependencies

package-lock.json npm

package.json npm

@primer/css 21.1.1

.devcontainer/Dockerfile docker

ghcr.io/containerbase/devcontainer 10.1.4 build
python 3.11-slim-buster build

requirements.txt pypi

HereistheEnglishtranslationoftheselectedtext *
InstallDependencies *
beautifulsoup4 ==4.10.0
ipywidgets ==7.6.5
jupyter ==1.0.0
keras ==2.6
matplotlib ==3.4.3
mpmath ==1.3.0
nltk ==3.6.6
notebook ==6.4.4
notebook ==6.4.0
numpy ==1.22.0
pandas ==1.3.3
plotly ==5.20.0
psycopg2-binary ==2.9.1
requests ==2.26.0
scikit-learn ==1.0.1
scipy ==1.11.1
seaborn ==0.11.2
spacy ==3.1.0
sqlalchemy ==1.4.23
streamlit ==1.36.0
tensorflow ==2.11.1

diamondvaluationprediction

Science Score: 44.0%

Keywords

Basic Info

Statistics

Topics

Metadata Files

💎 Diamond Valuation Prediction

Table of Contents

The dataset used in this project is "Diamondsvaluesfaltantes.csv" and includes the following columns:

Loading and Data Exploration

Mudar o caminho da base de dados

:octocat: QR Code of the GitHub Repository

Citation (CITATION.cff)

GitHub Events

Total

Last Year

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies