https://github.com/faisalhakimi22/telecom-churn-prediction
Telecom churn prediction and analysis project using Python libraries
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Repository
Telecom churn prediction and analysis project using Python libraries
Basic Info
- Host: GitHub
- Owner: Faisalhakimi22
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.94 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Telecom Churn Prediction and Analysis
Created By: Faisal Hakimi Date: May 10, 2024
Abstract This project explores and analyzes a Telecom Churn dataset to understand factors contributing to customer churn and to develop a predictive model for forecasting customer churn. The analysis includes data preprocessing, exploratory data analysis (EDA), feature engineering, predictive modeling using various algorithms, model tuning, interpretation of results, and recommendations for reducing churn. Telecom Churn Prediction and Analysis
Introduction Customer churn is a significant challenge for telecom companies, leading to revenue loss and impacting customer acquisition costs. This project aimed to explore and analyze factors contributing to customer churn in a telecom company's dataset. The goal was to develop a predictive model that identifies customers at high risk of churn with high accuracy. Techniques from Python's data manipulation, visualization, and predictive modeling libraries were employed.
Data Description The Telecom Churn dataset comprised information on customers, including: CustomerID (unique identifier) Gender (Male, Female) Age (customer age) Tenure (months with the company) ServiceCalls (number of customer service calls) MonthlyCharges (monthly bill amount) TotalCharges (total amount charged) Churn (customer churn status - Yes or No)
Methodology The project followed these steps: Data Preprocessing: Missing values were handled, data types converted, and necessary transformations performed. Exploratory Data Analysis (EDA): The distribution of key variables and their relationships, especially with churn, were analyzed. Insights were visualized with charts and plots. Feature Engineering: New features potentially impacting churn were derived, and relevant features were selected for model building. Predictive Modeling: Data was split into training and testing sets. Multiple models were chosen for churn prediction (Logistic Regression, Random Forest, SVM, AdaBoost, XGBoost, K-Nearest Neighbors). Models were trained and evaluated with accuracy, precision, recall, F1-score, and ROC-AUC metrics. Model Tuning: Model parameters were optimized through cross-validation and grid search techniques to improve performance. Interpretation and Conclusion: Model results were interpreted to identify key churn predictors. Actionable insights and recommendations for churn reduction were provided. Presentation: A presentation summarizing findings was created for a technical audience with clear visualizations and explanations.
Results and Discussion Data Preprocessing: Missing values for Total Charges (11 instances) were identified and removed due to the small number. Exploratory Data Analysis (EDA): Customer Churn: The analysis revealed an average churn rate aligning with industry trends (around 1.9% - 2%). Contract Length and Churn: Month-to-month contracts exhibited a positive correlation with churn, indicating customers are more likely to switch providers with this flexibility. Two-year contracts showed a negative correlation with churn, suggesting these contracts lock customers in and reduce churn. Services and Churn: Interestingly, services like online security, streaming TV, online backup, and tech support (without internet connection) had a negative correlation with churn. This suggests customers who value these add-on services are more likely to remain with the company. Demographics: The customer base was relatively balanced by gender (around 50% male and 50% female). A significant portion of the customers were younger people, with only 16% being senior citizens. Roughly 50% of customers had partners, and 30% had dependents. There was no significant difference in these distributions by gender or senior citizen status. Customer Account Information: Tenure distribution showed a concentration of customers with short tenures (1 month) and longer tenures (around 72 months), potentially reflecting different contract lengths. The majority of customers (74%) did not churn, indicating a class imbalance in the data (more nonchurning customers).
Conclusion This project successfully analyzed a telecom churn dataset to develop a churn prediction model. Key factors influencing churn were identified, including contract length and value-added services. The findings provide valuable insights for telecom companies to develop targeted retention strategies and reduce customer churn.
project was created using the following technologies:
- Python: Programming language used for development.
- Pandas: Library for data manipulation and analysis.
- NumPy: Library for numerical computations.
- Matplotlib: Library for creating static, animated, and interactive visualizations.
- Scikit-Learn: Machine learning library.
- Jupyter Notebook:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Owner
- Name: Faisal Hakimi
- Login: Faisalhakimi22
- Kind: user
- Location: Pakistan
- Website: https://medium.com/@faisalh5556
- Repositories: 1
- Profile: https://github.com/Faisalhakimi22
Computer Science | Aspiring Data Analyst | Ai Enthusiast | Machine Learning
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1