storytelling-2024-25

🎭 Storyteling for Data Science and Artificial Intelligence - PUC-SP University

https://github.com/mindful-ai-assistants/storytelling-2024-25

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Keywords

artificial-inteligemce bot copilot data-sicence jupiter-notebook latex mathpix mathplotlib numpy oneness-consciousness pandas python3 sympy-library tensorflow

Last synced: 6 months ago · JSON representation

Repository

🎭 Storyteling for Data Science and Artificial Intelligence - PUC-SP University

Basic Info

Host: GitHub
Owner: Mindful-AI-Assistants
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://github.com/MindfulAI-Copilots-Bots/storytelling
Size: 256 MB

Statistics

Stars: 9
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Topics

artificial-inteligemce bot copilot data-sicence jupiter-notebook latex mathpix mathplotlib numpy oneness-consciousness pandas python3 sympy-library tensorflow

Created almost 2 years ago · Last pushed 8 months ago

Metadata Files

Readme Contributing Funding License Citation Security

README.md

[ Portugus] [English] <!--

--> Storytelling

Storytelling: First, Second, and Third Semesters of 2024/2025

for Data Science and Artificial Intelligence - PUC-SP University

Welcome to the Storytelling repository! This repository is dedicated to the Storytelling course offered in the 1st and 2nd semesters of the 2024/2025 academic year within the Data Science and Humanistic Artificial Intelligence undergraduate program at the Pontifical Catholic University of So Paulo (PUC-SP).

We extend our sincere gratitude to our Beloved Professor Rooney Ribeiro Albuquerque Coelho for his invaluable guidance and expertise throughout this course. His dedication to excellence in teaching has been instrumental in shaping our understanding of storytelling and data science.

Introduction to the Course

The Data Science and AI course at PUC-SP provides a comprehensive understanding of data science techniques, tools, and methodologies. The course emphasizes the importance of storytelling in presenting data insights effectively. Students learn to apply various data analysis methods and communicate findings in a compelling manner.

About Storytelling
Projects
Resources
Readings
Python Libraries for Data Science and Artificial Intelligence
How to Run the Code
Contributing
License

About Storytelling

Storytelling is the art of telling stories, essential in Data Science and Artificial Intelligence for effectively communicating findings and insights.

Through Storytelling, students learn to transform raw data into stories that inform, persuade, and inspire. This is crucial because while data analysis can reveal valuable insights, these insights are useless if they cannot be effectively communicated.

The Storytelling discipline in the Data Science and Artificial Intelligence undergraduate program at PUC-SP aims to equip students with the necessary skills to tell effective stories with data.

Key Storytelling concepts include:

Narrative: The structure and flow of the story you are telling with your data.
Data Visualization: The graphical representation of data to highlight trends and patterns.
Context: The background information that helps frame and interpret the data .
Simplicity: The ability to convey complex information in a simple and easy-to-understand manner.

Projects

Here, you will find a variety of projects developed during the course. ach project is an opportunity to explore and apply the concepts learned in the classroom, allowing us to tell powerful stories with data.

Explore notable projects developed during the course. Each project is an opportunity to apply classroom concepts and create compelling data stories.

Highlighted Projects

Orange Datamining

Orange is an open-source data visualization and analysis tool widely used in data mining and machine learning projects. It allows for intuitive workflows and visual programming without the need for extensive code.

Projects Developed in Orange

Below is a list of 23 projects developed using Orange:

Data Exploration and Visualization
- Description: Explore data through various visualizations and understand relationships between variables.
- Objectives: Understand data distribution and variable interactions.
Predictive Modeling
- Description: Apply models like logistic regression and Random Forest to make predictions and evaluate performance.
- Objectives: Create and validate predictive models with performance metrics.
Clustering Techniques
- Description: Utilize clustering techniques to group data into distinct clusters.
- Objectives: Identify patterns and group data based on similarities.
Dataset Classification
- Description: Apply classification algorithms and evaluate their performance.
- Objectives: Classify data accurately and assess the model.
Feature Selection
- Description: Evaluate and select important features for modeling.
- Objectives: Improve model performance by reducing dimensionality.
Time Series Analysis
- Description: Analyze and visualize temporal data using time series charts.
- Objectives: Understand trends and variations over time.
Data Preprocessing and Cleaning
- Description: Clean and transform data to prepare it for modeling.
- Objectives: Ensure data quality before model use.
Sentiment Analysis
- Description: Analyze sentiment in textual data and visualize word frequency.
- Objectives: Identify sentiment patterns in texts.
Regression Analysis
- Description: Apply regression models to predict continuous variables.
- Objectives: Predict continuous values based on independent variables.
Decision Trees
- Description: Create and visualize decision trees for classification or regression.
- Objectives: Explain decision-making based on hierarchical rules.

Resources

This repository also contains links to the main libraries and platforms used during the course, essential for the development of our projects and to enhance our skills in data science and storytelling.

If you're looking for more resources related to storytelling and data science, check out the following:

Additional Resources

Readings

Additionally, we provide files of books studied during the program. These readings complement our classroom learning into the art and science of storytelling.

Python Libraries for Data Science and Artificial Intelligence

1. General Purpose and Core Libraries

Fundamental libraries for data manipulation, mathematical computations, and general support:

NumPy: Python library used for working with arrays. It also has functions for working in the domain of linear algebra, Fourier transform, and matrices.
Pandas: Library for data manipulation and analysis, providing data structures like DataFrames for handling tabular data.
Matplotlib: Comprehensive library for creating static, animated, and interactive visualizations in Python.
Seaborn: High-level interface for creating informative and attractive statistical graphics.
SciPy: Library for scientific and technical computing, providing functions for optimization, integration, interpolation, eigenvalue problems, and other mathematical tasks.

2. Statistical Analysis

Libraries for statistical data analysis and modeling:

Statsmodels: Provides classes and functions for estimating statistical models, as well as conducting statistical tests.
Pingouin: Statistical tests, effect sizes, and Bayesian analysis in Python.
PyMC: Probabilistic programming framework for Bayesian statistical modeling and machine learning.
Scipy.stats: Functions for statistical analysis and hypothesis testing, including distributions, tests, and more.
Reliability: Python package for reliability analysis and statistical modeling.

3. Machine Learning and Deep Learning

Widely used tools for machine learning and neural networks:

Scikit-learn: A simple and efficient tool for data mining and data analysis, featuring a wide variety of machine learning algorithms.
TensorFlow: An open-source framework for machine learning and deep learning, often used for training neural networks.
Keras: A high-level neural networks API that runs on top of TensorFlow, designed for fast prototyping.
PyTorch: A deep learning framework for flexibility and performance in neural network modeling.
LightGBM: A gradient boosting framework that is particularly effective with large datasets.
XGBoost: A powerful, efficient implementation of gradient boosting algorithms.
CatBoost: An algorithm for categorical data that helps create high-performance machine learning models.

4. Natural Language Processing (NLP

Libraries for working with natural language:

SpaCy: Industrial-strength NLP library, known for its speed and accuracy.
Transformers: State-of-the-art natural language processing (NLP) models, including BERT, GPT-3, and more.
NLTK: A toolkit for working with human language data, providing easy access to text processing libraries.
Gensim: Topic modeling and document similarity analysis.

5. Computer Vision

Libraries and tools for working with image data:

OpenCV: A library for computer vision and machine learning, offering tools for image processing and manipulation.
PyTorch Vision: A collection of computer vision tools integrated with PyTorch for image-based tasks.
TensorFlow Image: Image processing functions for TensorFlow, including resizing, cropping, and filtering.
Keras Applications: Pre-trained deep learning models for computer vision tasks, such as image classification.
Albumentations: An image augmentation library that provides various transformations for image preprocessing.
SimpleCV: A framework for building computer vision applications using Python.

6. Data Visualization

Libraries for creating graphs, dashboards, and interactive maps:

Matplotlib: A comprehensive library for creating static, animated, and interactive visualizations.
Seaborn: High-level interface for drawing attractive statistical graphics.
Plotly: A graphing library for creating interactive visualizations.
Bokeh: Visualization library for creating interactive plots and dashboards.
Altair: Declarative visualization library for statistical data visualization.
Dash: A framework for creating interactive, web-based data dashboards.

7. Geospatial Analysis and Mapping

Libraries for geospatial data analysis and mapping:

Geopandas: Python library for working with geospatial data, including tools for geometric operations and map visualization.
Shapely: A library for manipulation and analysis of geometric shapes.
Folium: A library for creating interactive maps with Leaflet.js.
Kepler.gl: A powerful geospatial data visualization tool for large-scale data exploration.
Cartopy: A library for cartographic projections and geospatial data visualization.
Pyproj: A library for performing coordinate transformations and projections.
Rasterio: Library for reading and writing geospatial raster data.
OSMnx: Tools for downloading and analyzing street networks from OpenStreetMap.
Geopy: Geocoding library for performing forward and reverse geocoding.

8. Document Preparation and Collaboration

Tools for preparing technical documents:

Overleaf: Online LaTeX editor for collaborative writing of technical documents.
Jupyter Notebooks: Web-based interactive environment for data analysis, combining code and rich text.

9. Automated Machine Learning (AutoML)

Libraries that simplify model training:

H2O.ai: Open-source machine learning and AutoML platform for building models at scale.
TPOT: AutoML tool based on genetic algorithms to optimize machine learning pipelines.
Auto-sklearn: AutoML system for scikit-learn that automatically selects models and tunes hyperparameters.

10. Time Series Analysis

Libraries for time series data:

Statsmodels: Provides tools for time series analysis, regression, and statistical modeling.
Prophet: Forecasting tool from Facebook for handling time series data.
Darts: Library for deep learning-based time series forecasting.

11. Business Intelligence and Reporting

Tools for business intelligence, reporting, and dashboarding:

Power BI: Business analytics tool for creating interactive reports and dashboards from data.

12. Others

Additional tools for specialized analyses:

Orange: Open-source data visualization and analysis tool, designed for both novice and expert users.
BeautifulSoup: Library for parsing HTML and XML documents and extracting data.
Scrapy: Framework for building web scrapers and extracting data from websites.

How to Run the Code

Clone este repositrio

bash git clone https://github.com/seu-repositorio/storytelling-2024.git

Instale as dependncias:

bash pip install -r requirements.txt

Execute os scripts no diretrio principal:

bash python main_script.py

Contributing

We welcome contributions to this project! If you'd like to contribute, please follow these steps:

Fork the repository: Click the "Fork" button at the top of the repository page to create your own copy of the project.
Clone the repository: Clone your forked repository to your local machine using the following command: bash git clone https://github.com/your-username/your-forked-repository.git

###

Owner

Name: 𖤐 Mindful AI ॐ
Login: Mindful-AI-Assistants
Kind: organization
Email: fabicampanari@proton.me
Location: Brazil

Website: https://github.com/Mindful-AI-Assistants
Repositories: 4
Profile: https://github.com/Mindful-AI-Assistants

𖤐 Empowering businesses with AI-driven technologies like Copilots, Agents, Bots and Predictions, alongside intelligent Decision-Making Support 𖤐

GitHub Events

Total

Watch event: 1
Delete event: 261
Push event: 436
Pull request event: 506
Create event: 252

Last Year

Watch event: 1
Delete event: 261
Push event: 436
Pull request event: 506
Create event: 252

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 183
Average time to close issues: less than a minute
Average time to close pull requests: about 4 hours
Total issue authors: 1
Total pull request authors: 2
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 169
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 1
Pull requests: 183
Average time to close issues: less than a minute
Average time to close pull requests: about 4 hours
Issue authors: 1
Pull request authors: 2
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 169
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

Pull Request Authors

FabianaCampanari (178)
dependabot[bot] (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (2) python (2)

Dependencies

.github/workflows/GH_TOKEN.yml actions

.github/workflows/calibreapp-image-actions.yml actions

actions/checkout v3 composite
calibreapp/image-actions main composite
peter-evans/create-pull-request v4 composite

.github/workflows/jupyter-execute.yml actions

.github/workflows/publish-python-package.yml actions

actions/checkout v4 composite
actions/setup-python v4 composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/python-app.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite

.devcontaiver/Dockerfile docker

ghcr.io/containerbase/devcontainer 10.1.4 build
python 3.11-slim-buster build

package-lock.json npm

package.json npm

@primer/css 21.1.1

requirements.txt pypi

beautifulsoup4 ==4.10.0
ipywidgets ==7.6.5
joblib ==1.2.0
jupyter ==1.0.0
keras ==2.13.1
matplotlib ==3.4.3
nltk ==3.9
notebook ==6.4.12
numpy ==1.22.0
pandas ==1.3.3
psycopg2-binary ==2.9.1
requests ==2.32.0
scikit-learn ==1.5.0
scipy ==1.11.1
seaborn ==0.11.2
spacy ==3.1.3
sqlalchemy ==1.4.23
tensorflow ==2.11.1

.github/workflows/dependabot.yml actions