https://github.com/agnideeppoddar/netflix-data-cleaning-analysis-and-visualization
https://github.com/agnideeppoddar/netflix-data-cleaning-analysis-and-visualization
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: AgnideepPoddar
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 735 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
📊 Netflix Data: Cleaning, Analysis, and Visualization
Analyzing and visualizing Netflix content data (2008-2021) to explore trends in movies, TV shows, and original content using SQL, Python, and Tableau.
📂 Dataset
This dataset contains details about Netflix's vast catalog of movies and TV shows, including release years, genres, ratings, and addition dates. It has been cleaned and structured for analysis.
Dataset Features
- Show ID: Unique identifier for each show
- Type: Movie or TV Show
- Title: Name of the content
- Director: Director of the movie or show
- Cast: Lead actors in the movie or show
- Country: Country where the content was produced
- Date Added: When the content was added to Netflix
- Release Year: Year the content was released
- Rating: Audience rating (e.g., PG-13, TV-MA)
- Duration: Length of the content (minutes for movies, seasons for TV shows)
- Genres: Categories or genres of the content
- Description: Short summary of the movie or show
Use Cases
- 📊 Data Analysis: Identify trends in Netflix content additions
- 🎥 Genre Trends: Explore most popular content categories
- 🕒 Release Year Patterns: Find how Netflix's content library evolved
- 🌍 Country-Based Insights: Discover content distribution across countries
- 🔍 Rating Analysis: Analyze viewer ratings and audience preferences
🛠️ Tools & Technologies
- Python, SQL, Excel: Data cleaning and analysis
- PostgreSQL: Data storage and processing
- Tableau: Data visualization and dashboard creation
- Jupyter Notebook / VS Code: Development environment
- Pandas & NumPy: Data manipulation
- Matplotlib & Seaborn: Data visualization
🔍 Data Cleaning Process
To ensure high-quality analysis, the dataset is cleaned using PostgreSQL with the following steps:
🛑 Handling Null Values
- Fill missing values in key columns (e.g.,
Director,Cast) - Remove rows with excessive missing data
- Fill missing values in key columns (e.g.,
📌 Removing Duplicates
- Identify and remove duplicate records
- Identify and remove duplicate records
📊 Populating Missing Rows
- Infer missing values using available data
- Infer missing values using available data
🗑️ Dropping Unneeded Columns
- Remove irrelevant features to improve dataset efficiency
- Remove irrelevant features to improve dataset efficiency
🔍 Splitting Columns
- Extract key details from columns like
Duration,Genres, etc.
- Extract key details from columns like
🔹 Detailed cleaning steps and justifications are included in the code comments.
📈 Data Visualization
After cleaning, the dataset is visualized using Tableau to generate insights.
📌 Steps & Implementation
Dataset Exploration
- Load and inspect the raw data
- Identify missing values and duplicates
- Load and inspect the raw data
Data Cleaning with SQL & Pandas
- Perform data transformation, cleaning, and formatting
- Perform data transformation, cleaning, and formatting
Exploratory Data Analysis (EDA)
- Find trends in content additions, ratings, and genre popularity
- Find trends in content additions, ratings, and genre popularity
Data Visualization with Tableau
- Build interactive dashboards showcasing insights
- Build interactive dashboards showcasing insights
📊 Project Difficulty Level
Intermediate
🚀 Getting Started
- Clone the repository:
sh git clone https://github.com/AgnideepPoddar/Netflix-Data-Cleaning-Analysis-and-Visualization.git - Run the analysis scripts to explore insights.
🤝 Contributions
Contributions are welcome! Feel free to open issues or submit pull requests.
📜 License
This project is for educational purposes and follows the MIT License.
Owner
- Login: AgnideepPoddar
- Kind: user
- Repositories: 1
- Profile: https://github.com/AgnideepPoddar
GitHub Events
Total
- Push event: 3
- Create event: 2
Last Year
- Push event: 3
- Create event: 2