practicalstats-pucsp-2024
Statistical Measures in Python - Age and Salary Analysis
https://github.com/fabianacampanari/practicalstats-pucsp-2024
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Statistical Measures in Python - Age and Salary Analysis
Basic Info
- Host: GitHub
- Owner: FabianaCampanari
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://github.com/FabianaCampanari/statisticalMeasures-python-
- Size: 61.6 MB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
✍🏻 Practical Statistics and Probability in Python and Excel
University of Data Science and Artificial Intelligence - PUC-SP - 2nd Semester/2024
🎶 Prelude Suite no.1 (J. S. Bach) - Sound Design Remix 🖤
https://github.com/user-attachments/assets/867282f7-2962-4957-a080-12fd42151ebd
📺 For better resolution, watch the video on YouTube.
Statistics and Probability
This repository, created by Fabiana 🚀 Campanari in the 2nd semester of 2024, consolidates the materials and code developed for the Statistics and Probability course within the Data Science and Artificial Intelligence program at PUC-SP, under the guidance of Professor Eric Bacconi Gonçalves. It is designed to support hands-on learning through exercises, scripts, and datasets.
Repository Contents:
- Python Scripts: This section includes scripts for a wide range of statistical analyses, covering key topics such as distributions, population and sample concepts, and hypothesis testing. Calculations include:
- Central Tendency: Mean, median, and mode
- Dispersion: Standard deviation, variance, and range
- Positional Measures: Percentiles and quartiles (Q1, Q2, Q3)
- Distribution Shape: Skewness and kurtosis
- Confidence Intervals for estimating population parameters
- Correlation and Covariance for bivariate analysis
- Practical Exercises: Available in Python and Excel, these exercises provide hands-on practice in calculating statistical measures and applying concepts, including:
- Analysis of Variance (ANOVA): Comparing means across multiple groups
- Hypothesis Testing: Null hypothesis (( H_0 )) tests, such as T-tests (one-sample, independent, and paired), ANOVA, and Chi-square tests
- Regression Analysis: Linear regression models for predictive analysis
- Probability: Exercises covering probability distributions, expected value, and variance
- Statistical Tests: This section includes implementations of statistical tests tailored to analyze variables like age and salary, categorized by region and educational level. Each test includes the process of setting up and testing the null hypothesis (( H_0 )) for statistical significance.
- Support Materials: Supplementary documentation on probability, relevant datasets, and homework assignments to reinforce key concepts.
Study Topics:
This repository provides a comprehensive foundation in core topics, including Descriptive Statistics, Probability Distributions, Population and Sample, Hypothesis Testing I and II (featuring null hypothesis (( H_0 )) testing), and Regression Analysis. It serves as a practical tool for building statistical and analytical skills.
Statistical Measures Analysis in Python:
This repository contains Python scripts for descriptive statistical analysis of employee salary and age data, including analyses for the entire dataset as well as subgrouped by education level and region.
Features:
Descriptive statistics: Mean, Median, Mode, Variance, Standard Deviation, Coefficient of Variation (CV), and Amplitude (Range). Grouped analysis: The same statistics calculated by grouping data based on region and education level. Designed for students: Easy-to-follow code with comments and explanations for each step.
Dataset:
The dataset used in this analysis contains employee details, including their age, salary, region of origin (regproc), and education level (grauinstrucao). Click here to get the Dataset
Getting Started:
To run this script, ensure you have the following:
Python 3 installed.
Necessary libraries (pandas) installed.
An Excel file containing the dataset in the appropriate format.
Codes:
1. Statics Measures
👇 Copy code
```python
Copy code
Importing the necessary library
import pandas as pd
Define the file path to the dataset
filepath = 'addyourdatasetpath_here'
Load the dataset into a DataFrame
df = pd.readexcel(filepath)
Display the first few rows of the dataset
df.head()
--- Descriptive Statistics for 'SALARIO' (salary) ---
Generate descriptive statistics for the 'SALARIO' column
print("Descriptive statistics for 'SALARIO':") print(df.salario.describe())
Calculate the range (amplitude) of the 'SALARIO' column
amplsalario = df['salario'].max() - df['salario'].min() print("\nAmplitude of 'SALARIO':", amplsalario)
Calculate the mode of the 'SALARIO' column
modasalario = df.salario.mode()[0] print("\nMode of 'SALARIO':", modasalario)
Calculate the variance of the 'SALARIO' column
varsalario = df.salario.var() print("\nVariance of 'SALARIO':", varsalario)
Calculate the coefficient of variation (CV) for 'SALARIO'
cvsalario = df.salario.std() / df.salario.mean() print("\nCoefficient of variation (CV) of 'SALARIO':", cvsalario)
--- Descriptive Statistics for 'SALARIO' by 'grau_instrucao' (educational level) ---
Generate descriptive statistics for 'SALARIO' grouped by 'grau_instrucao' (education level)
print("\nDescriptive statistics for 'SALARIO' grouped by 'grauinstrucao':") print(df.groupby('grauinstrucao')['salario'].describe())
Calculate the range (amplitude) of 'SALARIO' by 'grau_instrucao'
amplsalariograu = df.groupby('grauinstrucao')['salario'].max() - df.groupby('grauinstrucao')['salario'].min() print("\nAmplitude of 'SALARIO' by 'grauinstrucao':") print(amplsalario_grau)
Calculate the mode of 'SALARIO' by 'grau_instrucao'
modasalariograu = df.groupby('grauinstrucao')['salario'].agg(lambda x: pd.Series.mode(x)[0]) print("\nMode of 'SALARIO' by 'grauinstrucao':") print(modasalariograu)
Calculate the variance of 'SALARIO' by 'grau_instrucao'
varsalariograu = df.groupby('grauinstrucao')['salario'].var() print("\nVariance of 'SALARIO' by 'grauinstrucao':") print(varsalariograu)
Calculate the coefficient of variation (CV) for 'SALARIO' by 'grau_instrucao'
cvsalariograu = df.groupby('grauinstrucao')['salario'].std() / df.groupby('grauinstrucao')['salario'].mean() print("\nCoefficient of variation (CV) of 'SALARIO' by 'grauinstrucao':") print(cvsalario_grau)
Summary of key descriptive statistics
print("\nSummary for 'SALARIO' as a Whole:") print(f"\nAmplitude: {amplsalario}") print(f"\nMode: {modasalario}") print(f"\nVariance: {varsalario}") print(f"\nCoefficient of variation (CV): {cvsalario}")
print("\nSummary by 'grauinstrucao':") print(f"\nAmplitude by 'grauinstrucao': \n{amplsalariograu}") print(f"\nMode by 'grauinstrucao': \n{modasalariograu}") print(f"\nVariance by 'grauinstrucao': \n{varsalariograu}") print(f"\nCoefficient of variation (CV) by 'grauinstrucao': \n{cvsalario_grau}") ```
2. Sample Selection
👇 Copy code
```python
Import pandas and numpy libraries
import pandas as pd import numpy as np
Define the file path
filepath = 'addyourdatasetpath_here'
Read the Excel file into a DataFrame
df = pd.readexcel(filepath)
Sample Selection
Simple random sample without replacement with 20 elements
sample = df.sample(20, replace=False) print(sample)
Simple random sample without replacement with 20 elements (fixing the random seed)
sample = df.sample(n=20, replace=False, random_state=2903) print(sample)
Check the classes of the variable and their proportions
percestciv = df["estadocivil"].valuecounts(normalize=True) print(percestciv)
Execute equal stratified sample by marital status
samplestratequal = df.groupby(['estadocivil'], groupkeys=False).apply(lambda x: x.sample(n=10, replace=False, randomstate=2903)) print(samplestrat_equal)
Define desired total
N = 20
Execute proportional stratified sample by marital status
samplestratprop = df.groupby(['estadocivil'], groupkeys=False).apply( lambda x: x.sample(int(np.rint(N * len(x) / len(df))), randomstate=2903) # Proportional sample calculation ).sample(frac=1, randomstate=2903).resetindex(drop=True) # Shuffle the sample print(samplestrat_prop)
Check the proportion of each marital status
percestciv = samplestratprop["estadocivil"].valuecounts(normalize=True) print(percestciv)
Execute equal stratified sample by region of origin
samplestratequal = df.groupby(['regproc'], groupkeys=False).apply(lambda x: x.sample(n=10, replace=False, randomstate=2903)) print(samplestrat_equal)
Execute equal stratified sample by education level
samplestratequal = df.groupby(['grauinstrucao'], groupkeys=False).apply(lambda x: x.sample(n=10, replace=False, randomstate=2903)) print(samplestrat_equal)
Execute equal stratified sample by region of origin and education level
samplestratequal = df.groupby(['regproc', 'grauinstrucao'], groupkeys=False).apply(lambda x: x.sample(n=10, replace=False, randomstate=2903)) print(samplestratequal)
Create an equal stratified sample by education level and region of origin
samplestratequal = df.groupby(['grauinstrucao', 'regproc'], groupkeys=False).apply(lambda x: x.sample(n=min(len(x), 10), replace=False, randomstate=2903)).resetindex(drop=True) print(samplestrat_equal)
Save the stratified sample to a new Excel file
outputpath = 'pathtosavesample/stratifiedsample.xlsx' samplestratequal.toexcel(output_path, index=False)
print(f"The stratified sample has been saved to {output_path}") ```
3. One-Sample t-Test
👇 Copy code
```python
Exercise 3 – Test the hypothesis that the salary is equal to 12. What is your conclusion?
Import pandas library
import pandas as pd
Import scipy library
import scipy.stats as stats
File path
filepath = 'addyourdatasetpath_here'
Load the file into Python
df = pd.readexcel(filepath) print(df.head())
Bring only the age variable to perform the test
base_age = df['idade']
Execute the t-test testing H0: age = 32 and H1: age ≠ 32
resultttest = stats.ttest1samp(baseage, 32) pvalue = resultt_test.pvalue alpha = 0.05
if p_value < alpha: print("We reject the null hypothesis (H0).") else: print("We do not reject the null hypothesis (H0).")
Remember the mean
print(f"Mean age: {df.idade.mean()}")
Execute the t-test testing H0: age = 34 and H1: age ≠ 34
resultttest = stats.ttest1samp(baseage, 34) pvalue = resultt_test.pvalue alpha = 0.05
Decision based on p-value
if p_value < alpha: print("We reject the null hypothesis (H0).") else: print("We do not reject the null hypothesis (H0).")
With a p-value of 0.045, which is less than the significance level of 0.05, we reject the null hypothesis. This indicates that there is statistically significant evidence to suggest that the average age of employees is different from 34.
Execute the t-test testing H0: age = 35 and H1: age ≠ 35
resultttest = stats.ttest1samp(baseage, 35) pvalue = resultt_test.pvalue alpha = 0.05
Decision based on p-value
if p_value < alpha: print("We reject the null hypothesis (H0).") else: print("We do not reject the null hypothesis (H0).")
With a p-value of 0.2234, which is greater than the significance level of 0.05, we do not reject the null hypothesis. This means that there is not enough evidence to conclude that the average age of employees is different from the hypothesized value (32, 35, or any other value being tested).
Answering question 3
Test the hypothesis that the salary is equal to 12. What is your conclusion?
import pandas as pd import scipy.stats as stats
File path
filepath = 'addyourdatasetpath_here'
Load the data
df = pd.readexcel(filepath) print(df.head())
Select the variable of interest (salary)
salaries = df['salario']
Execute the t-test testing H0: salary = 12 and H1: salary ≠ 12
resultttest = stats.ttest1samp(salaries, 12) print(resultt_test)
Get the p-value from the test result
pvalue = resultt_test.pvalue
Define the significance level
alpha = 0.05
Interpret the result
p_value = 8.755117588192511e-06
if p_value < alpha: print("We reject the null hypothesis (H0).") else: print("We do not reject the null hypothesis (H0).")
Define the conclusion based on the p-value and the significance level
if p_value < alpha: conclusion = "We reject the null hypothesis (H0)." else: conclusion = "We do not reject the null hypothesis (H0)."
Display the test result and conclusion
print(f"t-test statistic: {resultttest.statistic}") print(f"p-value: {resultttest.pvalue}") print(conclusion)
Analysis and Conclusion with p-value and significance level
In the hypothesis test performed, we tested the null hypothesis (H0) that the average salary of employees is equal to 12 against the alternative hypothesis (H1) that the average salary of employees is different from 12.
The results of the t-test were as follows:
- t-test statistic: -4.500727991746298
- p-value: 8.755117588192511e-06
With a p-value of approximately 8.76e-06, which is significantly less than the significance level of 0.05, we reject the null hypothesis (H0). This indicates that there is statistically significant evidence to suggest that the average salary of employees is different from 12.
Therefore, the conclusion of the test is that we reject the null hypothesis (H0) and accept the alternative hypothesis (H1), indicating that the average salary of employees is not equal to 12.
```
4. Two-Sample t-Test
👇 Copy code
```python
Question 4
To test the hypothesis that income is equal for the two marital statuses (single and married), we can use the t-test for independent samples. We will follow these steps:
Visualization
1. Extract income data for singles and married individuals.
2. Perform the t-test for independent samples.
3. Interpret the p-value to accept or reject the null hypothesis.
Steps in pseudocode:
- Import necessary libraries (pandas and scipy.stats).
- Load data from the Excel file.
- Extract income columns for singles and married individuals.
- Perform the t-test for independent samples.
- Interpret the result based on the p-value.
Install scipy if necessary
%pip install scipy pandas
import pandas as pd from scipy import stats
Load the data from the Excel file
filepath = 'addyourdatasetpathhere' df = pd.readexcel(file_path)
Visualize the first rows of the DataFrame
print(df.head())
Check the mean salary by marital status group
print(df.groupby(['estado_civil'])['salario'].describe())
Extract income columns for singles and married individuals
singleincome = df[df['estadocivil'] == 's']['salario'] marriedincome = df[df['estadocivil'] == 'c']['salario']
Perform the t-test for independent samples
tstat, pvalue = stats.ttestind(marriedincome, singleincome, equalvar=False)
Display the results of the t-test
print("Results of the t-Test:") print(f"t-statistic: {tstat}") print(f"p-value: {pvalue}")
Interpret the result
alpha = 0.05 if p_value < alpha: print("Conclusion: We reject the null hypothesis. The incomes are different for the two marital statuses.") else: print("Conclusion: We do not reject the null hypothesis. The incomes are equal for the two marital statuses.")
Conclusion Results of the t-Test:
Interpretation:
- t-statistic: 4.567472731259726
- p-value: 6.527014259249644e-06
The p-value is extremely small (6.527014259249644e-06), much smaller than the common significance level (0.05). This indicates that there is a significant difference in income between the two marital statuses.
Conclusion:
We reject the null hypothesis. The incomes are different for the two marital statuses (single and married). ```
5. One-Way ANOVA
👇 Copy code
```python
Import necessary libraries
import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols from statsmodels.stats.multicomp import pairwise_tukeyhsd
File path
filepath = 'addyourdatasetpath_here'
Load the data into Python
df = pd.readexcel(filepath)
Display the first few rows of the DataFrame
print(df.head())
Check the average salary by education level
print("Average salary by education level:") print(df.groupby(['grau_instrucao'])['salario'].describe())
Create a model to compare salary by education level
model = ols('salario ~ grau_instrucao', data=df).fit()
Perform ANOVA
anovaresult = sm.stats.anovalm(model) print("ANOVA Results:") print(anova_result)
Interpret the results
alpha = 0.05 pvalue = anovaresult['PR(>F)'][0] if pvalue < alpha: conclusionanova = "There is a significant difference in salaries among different education levels." else: conclusionanova = "There is no significant difference in salaries among different education levels." print(f"Conclusion from ANOVA: {conclusionanova}")
Post Hoc Tukey Test to evaluate specific differences
tukey = pairwisetukeyhsd(endog=df.salario, groups=df.grauinstrucao) print("Post Hoc Tukey Test Results:") print(tukey.summary())
Interpret Tukey results
print("Interpreting Tukey's test results:") for result in tukey.summary().data[1:]: # Skip header group1, group2, meandiff, padj, lower, upper, reject = result if reject: print(f"Significant difference between {group1} and {group2}: mean difference = {meandiff:.4f}, p-adj = {padj:.4f}") else: print(f"No significant difference between {group1} and {group2}: mean difference = {meandiff:.4f}, p-adj = {p_adj:.4f}")
Overall conclusion
print("Overall Conclusion:") print("The results indicate that salary is significantly affected by education level, with higher education corresponding to higher salaries.") ```
6. Two-Way ANOVA
👇 Copy code
```python
Import necessary libraries
import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols from statsmodels.stats.multicomp import pairwise_tukeyhsd
File path
filepath = 'addyourdatasetpath_here'
Load the data into Python
df = pd.readexcel(filepath)
Display the first few rows of the DataFrame
print("Initial DataFrame:") print(df.head())
Check the average salary by education level and marital status
print("\nAverage salary by education level and marital status:") print(df.groupby(['grauinstrucao', 'estadocivil'])['salario'].describe())
Generate the model to compare salary by education level and marital status
model = ols('salario ~ grauinstrucao + estadocivil', data=df).fit()
Apply ANOVA
anovaresult = sm.stats.anovalm(model) print("\nANOVA Results:") print(anova_result)
Interpret the results
alpha = 0.05 pvalueinstrucao = anovaresult['PR(>F)']['grau_instrucao'] pvaluecivil = anovaresult['PR(>F)']['estado_civil']
if pvalueinstrucao < alpha: conclusioninstrucao = "There is a significant difference in salaries among different education levels." else: conclusioninstrucao = "There is no significant difference in salaries among different education levels."
if pvaluecivil < alpha: conclusioncivil = "There is a significant difference in salaries among different marital statuses." else: conclusioncivil = "There is no significant difference in salaries among different marital statuses."
print(f"\nConclusion from ANOVA for Education Level: {conclusioninstrucao}") print(f"Conclusion from ANOVA for Marital Status: {conclusioncivil}")
Post Hoc Tukey Test to evaluate specific differences for marital status
print("\nPost Hoc Tukey Test Results for Marital Status:") tukeyestadocivil = pairwisetukeyhsd(endog=df.salario, groups=df.estadocivil) print(tukeyestadocivil.summary())
Post Hoc Tukey Test to evaluate specific differences for education level
print("\nPost Hoc Tukey Test Results for Education Level:") tukeyinstrucao = pairwisetukeyhsd(endog=df.salario, groups=df.grauinstrucao) print(tukeyinstrucao.summary())
Overall conclusion
print("\nOverall Conclusion:") print("The results indicate that salary is significantly affected by both education level and marital status.") ```
7. Two-way ANOVA with Interaction
👇 Copy code
```python
Import necessary libraries
import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols from statsmodels.stats.multicomp import pairwise_tukeyhsd
File path
filepath = 'addyourdatasetpath_here'
Load the data into Python
df = pd.readexcel(filepath)
Display the first few rows of the DataFrame
print("Initial DataFrame:") print(df.head())
Check the average salary by education level and marital status
print("\nAverage salary by education level and marital status:") print(df.groupby(['grauinstrucao', 'estadocivil'])['salario'].describe())
Generate the model to compare salary by education level and marital status, including interaction
model = ols('salario ~ grauinstrucao * estadocivil', data=df).fit()
Apply ANOVA
anova = sm.stats.anova_lm(model) print("\nANOVA Results:") print(anova)
Interpret the results
alpha = 0.05 pvalueinstrucao = anova['PR(>F)']['grauinstrucao'] pvaluecivil = anova['PR(>F)']['estado_civil'] pvalue_interaction = anova['PR(>F)']['grauinstrucao:estadocivil']
Conclusions from ANOVA
conclusions = { "Grau de Instrução": "significant" if pvalueinstrucao < alpha else "not significant", "Estado Civil": "significant" if pvaluecivil < alpha else "not significant", "Interação": "significant" if pvalueinteraction < alpha else "not significant" }
for factor, result in conclusions.items(): print(f"Conclusion from ANOVA for {factor}: There is a {result} effect.")
Post Hoc Tukey Test for marital status
print("\nPost Hoc Tukey Test Results for Marital Status:") tukeyestadocivil = pairwisetukeyhsd(endog=df.salario, groups=df.estadocivil) print(tukeyestadocivil.summary())
Post Hoc Tukey Test for education level
print("\nPost Hoc Tukey Test Results for Education Level:") tukeyinstrucao = pairwisetukeyhsd(endog=df.salario, groups=df.grauinstrucao) print(tukeyinstrucao.summary())
Overall conclusion
print("\nOverall Conclusion:") print("The results indicate that salary is significantly affected by both education level and marital status,") print("and there is also a significant interaction between these two factors.") ```
8. Chi-Square Test for One Variable.
👇 Copy code
```python
Import necessary libraries
import pandas as pd import scipy.stats as stats
File path
filepath = 'addyourdatasetpath_here'
Load the data into a DataFrame
df = pd.readexcel(filepath)
Display the first few rows of the DataFrame
print("Initial DataFrame:") print(df.head())
Frequency table for the variable 'reg_proc'
freqregproc = df['regproc'].valuecounts() print("\nFrequency of the 'regproc' variable:") print(freqreg_proc)
Perform Chi-Square Test
chi2stat, pval = stats.chisquare(freqregproc) print("\nChi-Square Test Results:") print(f"Chi-Square Statistic: {chi2stat}") print(f"p-value: {pval}")
Interpretation and Conclusion
alpha = 0.05 # Significance level if p_val < alpha: print("Reject the null hypothesis. The distribution of the region of origin is not the same in the sample.") else: print("Fail to reject the null hypothesis. The distribution of the region of origin is the same in the sample.") ```
- Chi-Square Test for Independence of Two Variables
👇 Copy code
```python
Import necessary libraries
import pandas as pd import scipy.stats as stats
File path
filepath = 'addyourdatasetpath_here'
Load the data into a DataFrame
df = pd.readexcel(filepath)
Display the first few rows of the DataFrame
print("Initial DataFrame:") print(df.head())
Create a contingency table for 'grauinstrucao' and 'regproc'
contingencytable = pd.crosstab(df['grauinstrucao'], df['regproc']) print("\nContingency Table:") print(contingencytable)
Perform Chi-Square Test of Independence
chi2stat, pval, dof, expected = stats.chi2contingency(contingencytable) print("\nResults of the Chi-Square Test of Independence:") print(f"Chi-Square Statistic: {chi2stat}") print(f"p-value: {pval}") print(f"Degrees of Freedom: {dof}") print("Expected Frequencies:") print(expected)
Interpretation and Conclusion
alpha = 0.05 # Significance level if p_val < alpha: print("\nConclusion: Reject the null hypothesis. The distribution of education levels varies according to the region of origin.") else: print("\nConclusion: Fail to reject the null hypothesis. The distribution of education levels does not vary according to the region of origin.") ```
Copyright 2024 Fabiana Campanari. Code released under the MIT license.
Owner
- Name: Fabiana ⚡️ Campanari
- Login: FabianaCampanari
- Kind: user
- Location: Brazil 🇧🇷
- Company: @Mindful-AI-Assistants | @Quantum-Software-Development
- Website: fabicampanari@proton.me
- Twitter: CampanariFabi
- Repositories: 20
- Profile: https://github.com/FabianaCampanari
🇶 AI/ML Dev · Data Scientist (Humanistic AI) · Software & Design · Psych Grad · Quantum Mindset · 🕸️ Seeker of the Unknown 𝚿
GitHub Events
Total
- Issues event: 9
- Watch event: 2
- Delete event: 135
- Push event: 202
- Pull request event: 269
- Create event: 135
Last Year
- Issues event: 9
- Watch event: 2
- Delete event: 135
- Push event: 202
- Pull request event: 269
- Create event: 135
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Fabiana 🚀 Campanari | f****i@g****m | 256 |
| dependabot[bot] | 4****] | 1 |
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 8
- Total pull requests: 295
- Average time to close issues: 4 minutes
- Average time to close pull requests: about 3 hours
- Total issue authors: 1
- Total pull request authors: 3
- Average comments per issue: 0.0
- Average comments per pull request: 0.01
- Merged pull requests: 292
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 8
- Pull requests: 295
- Average time to close issues: 4 minutes
- Average time to close pull requests: about 3 hours
- Issue authors: 1
- Pull request authors: 3
- Average comments per issue: 0.0
- Average comments per pull request: 0.01
- Merged pull requests: 292
- Bot issues: 0
- Bot pull requests: 4
Top Authors
Issue Authors
- FabianaCampanari (9)
Pull Request Authors
- FabianaCampanari (330)
- dependabot[bot] (3)
- imgbot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- github/codeql-action/analyze v3 composite
- github/codeql-action/init v3 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- matplotlib ==3.4.3
- numpy ==1.22.0
- pandas ==1.3.3
- scikit-learn ==1.5.0
- scipy ==1.7.1
- seaborn ==0.11.2
- statsmodels ==0.13.1