BrainCode-AI-ML-From-Tensors-to-ANNs
Hands-on AI & ML guide: from tensors to neural networks, with code, formulas, and model evaluation.
https://github.com/Mindful-AI-Assistants/BrainCode-AI-ML-From-Tensors-to-ANNs
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
Hands-on AI & ML guide: from tensors to neural networks, with code, formulas, and model evaluation.
Basic Info
- Host: GitHub
- Owner: Mindful-AI-Assistants
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://github.com/Mindful-AI-Assistants/BrainCode-AI-ML-From-Tensors-to-ANNs
- Size: 138 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 6
- Releases: 0
Topics
Metadata Files
README.md
Brain Made of Code, Created with Heart
From Regression to Neural Nets: Learning with Gradient Descent & Beyond
[!IMPORTANT]
Heads Up
Projects and deliverables may be made publicly available whenever possible.
The course prioritizes hands-on practice with real data in consulting scenarios.
All activities comply with the academic and ethical guidelines of PUC-SP.
Confidential information from this repository remains private in private repositories.
This project provides a comprehensive and hands-on guide to Machine Learning and Artificial Neural Networks (ANNs), combining theory, Python code, and visualizations.
Historical & Mathematical Background: Full explanations of the theory behind algorithms, including derivations, historical context, and all key formulas. Formulas are provided in LaTeX, ready to copy-paste for documentation or reports.
Foundations: understanding tensors as the core data structure in deep learning.
Datasets: loading and working with real datasets online (e.g., MNIST) for experimentation.
Regression & Optimization: training models with Batch Gradient Descent, Stochastic Gradient Descent, and Mini-batch GD.
Regularization: applying techniques like Elastic Net to reduce overfitting and improve generalization.
Model Evaluation: exploring accuracy metrics, detecting and handling outliers, and understanding error analysis.
Neural Networks: building progressively from the artificial neuron model to Multilayer Perceptrons (MLP), forward and backward propagation, and optimization improvements like Momentum.
Step-by-step explanations before each implementation.
Python code split into reproducible cells, ready for Jupyter Notebook or Google Colab.
Practical examples and visualizations to illustrate convergence, performance, overfitting behavior, and accuracy measurement.
Ideal for beginners and intermediate learners looking to build a solid foundation in machine learning optimization algorithms, and who want to go beyond running code to truly understand the underlying concepts of optimization, evaluation, and neural networks.
I - Artificial Neural Networks From Perceptron to Modern Learning Algorithms
This repository provides a hands-on and structured overview of Artificial Neural Networks (ANNs) starting from the Perceptron, moving through the Multilayer Perceptron (MLP), and exploring the learning algorithms that power Machine Learning, such as Gradient Descent and its variations.
The content is inspired by the official theoretical material Decreasing-Gradient.pdf from the Undergraduate Program in Humanistic AI and Data Science at PUC So Paulo, Brazil.
Motivation
The human brain processes information in a nonlinear, adaptive, and massively parallel way very different from how conventional computers work.
For example, the brain can recognize a familiar face in milliseconds, even in a completely new environment. Meanwhile, traditional computers may take much longer to solve much simpler problems.
Taking inspiration from biology, Artificial Neural Networks are computational models designed to learn from data, adapt through experience, and replicate human-like problem solving.
Historical Context
McCulloch & Pitts (1943): Introduced the first neural network models.
Hebb (1949): Developed the basic model of self-organization.
Rosenblatt (1958): Introduced the perceptron, a supervised learning model.
Hopfield (1982), Rumelhart, Hinton & Williams: Revived the field with symmetric networks for optimization and the backpropagation method.
Artificial Neuron Mode
Each artificial neuron receives input signals $X1, X2, ..., Xp$ (binary or real values), each multiplied by a weight $w1, w2, ..., wp$ (real values). The neuron computes a weighted sum (activity level):
$$ \Huge a = w1 X1 + w2 X2 + \cdots + wp Xp $$
latex
\a = w_1 X_1 + w_2 X_2 + \cdots + w_p X_p\
The output y is determined by an activation function, such as:
$$ \Huge y = \begin{cases} 1, & \text{if } a \geq t \ 0, & \text{if } a < t \end{cases} $$
latex
y =
\begin{cases}
1, & \text{if } a \geq t \\
0, & \text{if } a < t
\end{cases}
Key Benefits of ANNs
- Adaptability through learning
- Ability to operate with partial knowledge
- Fault tolerance
- Generalization
- Contextual information processing
- Input-output mapping
Application Areas
- Pattern classification
- Clustering/categorization
- Function approximation
- Prediction
- Optimization
- Content-addressable memory
- Control systems
Learning Process
ANNs operate in two main phases:
- Training Phase: The network learns by adjusting its free parameters (weights) to perform a specific function.
- Application Phase: The trained network is used for its intended purpose (e.g., pattern or image classification).
The learning process involves:
- Stimulation by the environment (input).
- Modification of free parameters (weights) as a result.
- The network responds differently due to internal changes.
Learning is governed by a set of pre-established rules (learning algorithm) and a learning paradigm (model).
Error Correction Learning
The output of neuron $k$ at iteration $n$ is $yk(n)$, and the desired response is $dk(n)$. The error signal is:
$$ \Huge ek(n) = dk(n) - y_k(n) $$
The goal is to minimize the cost function (performance index):
$$ \Huge E(n) = \frac{1}{2} e_k^2(n) $$
Weights are updated as:
$$ \Huge w{kj}(n+1) = w{kj}(n) + \Delta w_{kj}(n) $$
The Perceptron
The perceptron, proposed by Rosenblatt (1958), is the simplest type of ANN. It uses supervised learning and error correction to adjust the weight vector. For a perceptron with two inputs and a bias:
- The bias allows the threshold value in the activation function to be set, and is updated like any other weight.
Nonlinearities and Activation Functions
Nonlinearities are inherent in most real-world problems.
Incorporated through nonlinear activation functions (e.g., sigmoid, tanh) and multiple layers.
MLPs use sigmoid functions in hidden layers and linear functions in the output layer.
MLP (MultiLayer Perceptron)
Composed of neurons with nonlinear activation functions in intermediate (hidden) layers.
Only the output layer receives a desired output during training.
The error for hidden layers is estimated by the effect they cause on the output error (backpropagation).
Two-Layer Perceptron Architecture
A two-layer perceptron (MLP with one hidden layer and one output layer) can approximate any function, linear or not (Cybenko, 1989).
Layer 1 (Hidden/Intermediate): Each neuron contributes lines (hyperplanes) to form surfaces in input space, "linearizing" the features.
Layer 2 (Output): Neurons combine these lines to form convex regions, enabling complex decision boundaries.
The generalization capacity of the network increases with the number of neurons.
Empirically, 35 neurons per layer strike a good balance between modeling power and computational cost.
Input Layer: Receives input patterns.
Hidden Layer(s): Main processing; feature extraction.
Output Layer: Produces the final result.
Main Concepts and Key Formulas
$$ \Huge a = \sum{i=1}^{p} wi X_i $$
$$ \Huge y = f(a) $$
where $\Huge f$ is the activation function (e.g., sigmoid, tanh)
$$ \Huge ek(n) = dk(n) - y_k(n)$ $$
$$ \Huge E(n) = \frac{1}{2} e_k^2(n $$
$$ \Huge w{kj}(n+1) = w{kj}(n) + \eta \frac{\partial E(n)}{\partial w_{kj}} $$
$$ \Huge \delta^{(2)}(t) = (d(t) - y(t)) \cdot f'^{(2)}(u) $$
$$ \Huge deltaj^(1)(t) = ( sumk [ deltak^(2) * wkj^(2) ] ) * f'^(1)( u_j^(1)) $$
Training: Two-Phase Process
1. Forward Phase*
Initialize learning rate $\eta$ and weight matrix $w$ with random values.
Present input to the first layer.
Each neuron in layer $i$ computes its output, which is passed to the next layer.
The final output is compared to the desired output.
The error for each output neuron is calculated.
- Example Calculation:
Forward Computation Example
- For input values:
- $\Large ( X_0 = 1 )$
- $\Large ( X_1 = 0.43 )$
- $\Large ( X_2 = 0.78 )$
- And example weights:
- $\Large ( w^{(1)}_{00} = 0.45 )$
- $\Large ( w^{(1)}_{01} = 0.89 )$
- etc...
Compute the activations and outputs for each layer using an activation function (e.g.,** tanh**):
$$ uj^{(1)} = \sumi Xi \cdot w{ji}^{(1)} $$
- Compute activation (output from each hidden neuron):
$y^{(1)}j = \tanh(u^{(1)}j)$
Compute output layer pre-activation:
$u^{(2)} = \sumj y^{(1)}j w^{(2)}_j$Output of network:
$y^{(2)} = \tanh(u^{(2)})$Calculate error:
$e = d - y^{(2)}$
$E = \frac{1}{2} e^2$
2. Backward Phase (Backpropagation)
Start from the output layer.
Each node adjusts its weight to reduce its error.
For hidden layers, the error is determined by the weighted errors of the next layer (chain rule).
Output layer weight update:
$w^{(2)}(t+1) = w^{(2)}(t) + \eta \delta^{(2)} y^{(1)}(t)$
where $\delta^{(2)}(t) = (d(t) - y(t)) \cdot f'^{(2)}(u)$
- Hidden layer delta:
latex
$\delta^{(1)}_j(t) = \left( \sum_k \delta^{(2)}_k w^{(2)}_{kj} \right) \cdot f'^{(1)}(u_j)$
Example: Training a Two-Layer Perceptron
1. Initialize all weights randomly. 2. Present an input vector $X$. 3. Compute outputs for the first (hidden) layer:
$uj^{(1)} = \sumi Xi w{ji}^{(1)}$
$yj^{(1)} = \tanh(uj^{(1)})$
4. Compute output for the second (output) layer:
$u^{(2)} = \sumj y^{(1)}j \cdot w^{(2)}_j$
$y^{(2)} = \tanh(u^{(2)})$
5. Calculate error:
$e = d - y^{(2)}$
$E = \frac{1}{2} e^2$
6. Backward phase:
- Compute $\delta^{(2)}$ and update output weights.
- Compute $\delta^{(1)}$ for each hidden neuron and update hidden weights.
Why Two Layers and 35 Neurons per Layer?
- Theoretical Power: Two-layer MLPs can approximate any continuous function (universal approximation theorem).
- Practical Simplicity: Most real-world problems rarely require more than two layers.
- Cost-Benefit: 35 neurons per layer often provide sufficient capacity for generalization without excessive computational cost.
Local Maximum (Local Maxima)
In gradient descent training, the algorithm updates weights to reduce error by following the gradient of the cost function. However, the cost function may have multiple local maxima or minima.
Local Maximum: A point where the cost function has a peak relative to nearby points but is not the absolute highest point globally.
Gradient descent can get "stuck" in local maxima or minima, preventing the network from reaching the best possible solution.
Techniques such as random restarts, momentum, or advanced optimization algorithms help mitigate this problem.
Usage
Artificial Neural Networks, especially perceptrons and MLPs, are widely used in various domains due to their adaptability and ability to model complex nonlinear relationships.
Strengths
Ability to learn from examples and generalize to unseen data.
Fault tolerance and robustness to noisy inputs.
Flexibility to model complex, nonlinear functions.
Parallel processing capability.
Weaknesses
Training can be computationally expensive, especially for large networks.
Susceptible to getting stuck in local minima or maxima.
Requires careful tuning of hyperparameters (learning rate, number of neurons, layers).
Lack of interpretability compared to simpler models.
Additional Relevant Points
Learning Rate () Importance
The learning rate $\eta$ controls the step size during weight updates:
If $\eta$ is too large, the training may overshoot minima and fail to converge.
If $\eta$ is too small, training will be very slow and may get stuck in local minima.
Adaptive learning rate methods (e.g., learning rate decay, Adam optimizer) can improve convergence.
Activation Functions
While the document mentions sigmoid and tanh, it is useful to note:
ReLU (Rectified Linear Unit):
Widely used in modern neural networks for faster convergence and to mitigate vanishing gradient problems.Softmax:
Commonly used in output layers for multi-class classification problems.
Overfitting and Regularization
Neural networks with too many parameters can overfit training data, performing poorly on unseen data.
Techniques such as early stopping, dropout, and L2 regularization help improve generalization.
Batch vs. Online Learning
The document discusses iterative weight updates per sample (online/stochastic gradient descent).
In practice, batch or mini-batch gradient descent is often used for computational efficiency and stability.
Practical Considerations
Data preprocessing (normalization, encoding) is crucial for effective training.
Initialization of weights affects convergence speed and final performance.
Monitoring training with validation sets helps detect overfitting.
# Algorithms Used to Train Machine Learning Models
1.Gradient Descent
Gradient Descent is a mathematical optimization method primarily used for minimizing differentiable multivariate functions. It is a first-order iterative algorithm that adjusts model parameters to find the minimum value of a function, typically representing an error or cost to minimize.
The way gradient descent works can be explained as follows: Imagine standing on top of a hill wanting to reach the lowest point in a valley. In algorithm terms, you start with initial parameter values and calculate the slope (gradient) of the cost function with respect to these parameters. This slope shows the steepest ascent direction. To minimize the function, you take a step in the opposite direction, "descending the slope" toward the lowest point.
These steps are repeated iteratively, adjusting the model parameters opposite to the gradient direction until the algorithm converges to the minimum. The step size is controlled by a learning rate that defines how big the adjustments are at each iteration.
1 - Gradient Descent (Batch)
Gradient Descent is an iterative algorithm to minimize a cost function by adjusting parameters opposite to the gradient direction. Batch Gradient Descent calculates the gradient using the entire dataset each step, resulting in stable but sometimes slow parameter updates.
2 Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent updates parameters based on a single random sample per iteration. This yields noisier but faster updates, suitable for large datasets and deep learning models.
3. Elastic Net Regularization
Elastic Net combines L1 (Lasso) and L2 (Ridge) penalties to improve the performance of linear regression models, particularly when many variables are correlated. It helps prevent overfitting and performs automatic feature selection, making it a powerful tool for machine learning modeling.
4. Mini-batch Gradient Descent
Mini-batch Gradient Descent is a compromise between batch and stochastic descent. It updates parameters using small random batches, accelerating convergence with reduced noise.
5. Adam (Adaptive Moment Estimation)
An algorithm combining momentum and adaptive learning rates to improve convergence and training efficiency, especially in deep neural networks.
6. RMSProp
Adapts the learning rate for each parameter, useful to accelerate training and avoid oscillations.
II - Artificial Neural Networks (ANN) - Comprehensive Theory, Use Cases, and Python Code Guide
Required Libraries Installation
Before running any code, install the necessary Python libraries appropriate for your operating system and environment:
Cell 1 - Installation Commands
```python
macOS Terminal or Jupyter Notebook (IPython):
%pip install numpy matplotlib tensorflow scikit-learn tensorflow-datasets ```
```python
Windows Command Prompt or PowerShell:
pip install numpy matplotlib tensorflow scikit-learn tensorflow-datasets ```
```python
Linux Terminal or Jupyter Notebook (IPython):
%pip install numpy matplotlib tensorflow scikit-learn tensorflow-datasets ```
This will install: - numpy: Numerical computations - matplotlib: Plotting and visualization - tensorflow: Deep learning framework - scikit-learn: Machine learning utilities - tensorflow-datasets: Loading datasets like MNIST easily
0. Understanding Tensors and Loading MNIST Dataset
0.1 What is a Tensor?
- Concept:
A tensor generalizes scalars (0-D), vectors (1-D), and matrices (2-D) to n-dimensional arrays. Data in neural networks (inputs, weights, activations) are represented as tensors.
Understanding tensors is essential for deep learning frameworks.
- **Use Case
Manages multi-dimensional data like images (3D tensors with height, width, channels) or batches of images (4D tensors).
- Code:
```python import tensorflow as tf
0-D scalar
scalar = tf.constant(42) print("Scalar:", scalar, "Shape:", scalar.shape)
1-D vector
vector = tf.constant() print("Vector:", vector, "Shape:", vector.shape)
2-D matrix
matrix = tf.constant([, ]) print("Matrix:\n", matrix.numpy()) print("Shape:", matrix.shape)
3-D tensor (example: color image 2x2 pixels with 3 color channels)
tensor3d = tf.constant([[, ], [, ]]) print("3D tensor:\n", tensor3d.numpy()) print("Shape:", tensor_3d.shape) ```
0.2 Loading MNIST Dataset from TensorFlow Datasets
- Concept:
MNIST dataset can be streamed using TensorFlow Datasets, cached automatically without manual download management.
- Use Case:
Practice and benchmark image classification models.
- Code:
```python import tensorflow_datasets as tfds import tensorflow as tf import matplotlib.pyplot as plt
dstrain = tfds.load('mnist', split='train', shufflefiles=True, assupervised=True) dstest = tfds.load('mnist', split='test', as_supervised=True)
def normalize_img(image, label): image = tf.cast(image, tf.float32) / 255.0 return image, label
dstrain = dstrain.map(normalizeimg).shuffle(10000).batch(32).prefetch(tf.data.AUTOTUNE) dstest = dstest.map(normalizeimg).batch(32).prefetch(tf.data.AUTOTUNE)
for image, label in ds_train.take(1): plt.imshow(tf.squeeze(image), cmap='gray') plt.title(f"Label: {label.numpy()}") plt.axis('off') plt.show() ```
1. Artificial Neuron Model
- Concept:
Computes weighted sum of inputs followed by a nonlinear activation function (e.g., sigmoid).
- Use Case:
Fundamental computation unit for classification.
- Code: - with MNIST-like input shape simplified to vector
```python import numpy as np import matplotlib.pyplot as plt
def sigmoid(x): return 1 / (1 + np.exp(-x))
Simulating a single flattened MNIST image input (784 pixels normalized)
X = np.random.rand(784) # Normally we'd flatten an image; here random for example w = np.random.rand(784) # weights vector
a = np.dot(w, X) y = sigmoid(a)
print("Activation:", a) print("Output:", y)
xvals = np.linspace(-10, 10, 100) plt.plot(xvals, sigmoid(x_vals)) plt.title("Sigmoid Activation") plt.xlabel("Input") plt.ylabel("Output") plt.grid(True) plt.show() ```
2. Gradient Descent (Batch Gradient Descent)
- Concept:
Updates weights by calculating gradients over the entire training dataset.
- Use Case:
Stable, but computationally expensive for large datasets.
- Code:
```python import numpy as np import matplotlib.pyplot as plt
Synthetic data generation for demonstration
X = 2 * np.random.rand(100, 1) y = 4 + 3 * X + np.random.randn(100, 1)
Xb = np.c[np.ones((100, 1)), X] # add bias term
learningrate = 0.1 niterations = 1000 m = 100 theta = np.random.randn(2,1) # initial weights
for iteration in range(niterations): gradients = 2/m * Xb.T.dot(Xb.dot(theta) - y) theta = theta - learningrate * gradients
print("Theta (Batch GD):", theta)
plt.plot(X, y, "b.") Xnew = np.array([, ]) Xnewb = np.c[np.ones((2,1)), Xnew] ypredict = Xnewb.dot(theta) plt.plot(Xnew, ypredict, "r-") plt.title("Batch Gradient Descent") plt.show() ```
3. Stochastic Gradient Descent (SGD)
- Concept:
Updates weights using gradients from one training sample at a time, adding noise but enabling faster updates.
- Use Case:
Useful for large datasets or online learning.
- Code:
```python theta = np.random.randn(2,1) nepochs = 50 t = 0 m = len(Xb)
for epoch in range(nepochs): for i in range(m): randomindex = np.random.randint(m) xi = Xb[randomindex:randomindex+1] yi = y[randomindex:random_index+1] gradients = 2 * xi.T.dot(xi.dot(theta) - yi) eta = 0.1 / (1 + t * 0.01) # decaying learning rate theta = theta - eta * gradients t += 1
print("Theta (SGD):", theta) ```
4. Mini-batch Gradient Descent
- Concept:
Updates weights on small subsets (mini-batches), balancing stability and speed.
- Use Case:
Common in deep learning training.
- Code:
```python theta = np.random.randn(2,1) niterations = 50 batchsize = 20 m = 100
for iteration in range(niterations): indices = np.random.permutation(m) for startidx in range(0, m, batchsize): endidx = startidx + batchsize Xbatch = Xb[indices[startidx:endidx]] ybatch = y[indices[startidx:endidx]] gradients = 2/len(Xbatch) * Xbatch.T.dot(Xbatch.dot(theta) - ybatch) theta = theta - learningrate * gradients
print("Theta (Mini-batch):", theta) ```
5. Elastic Net Regularization with Scikit-learn
- Concept:
Combines L1 and L2 regularization to prevent overfitting.
- Use Case:
Regression with regularization on MNIST features or other datasets.
- Code:
```python from sklearn.linearmodel import ElasticNet from sklearn.modelselection import traintestsplit from sklearn.metrics import meansquarederror
Example with synthetic or MNIST flattened feature data
Xtrain, Xtest, ytrain, ytest = traintestsplit(Xb[:,1].reshape(-1,1), y, testsize=0.2, random_state=42)
elasticnet = ElasticNet(alpha=0.1, l1ratio=0.7, maxiter=1000) elasticnet.fit(Xtrain, ytrain.ravel()) ypred = elasticnet.predict(Xtest) mse = meansquarederror(ytest, y_pred)
print(f"ElasticNet Coefs: {elasticnet.coef}") print(f"Intercept: {elasticnet.intercept}") print(f"MSE: {mse}") ```
6. Multilayer Perceptron (MLP) Forward Pass
- Use Case:
General-purpose nonlinear function approximation with hidden layers.
- Code:
```python import numpy as np
X = np.random.rand(785) # example input with bias w_hidden = np.random.rand(2, 785) # two neurons in hidden layer
uhidden = np.dot(whidden, X) yhidden = np.tanh(uhidden)
woutput = np.random.rand(2) uoutput = np.dot(woutput, yhidden) youtput = np.tanh(uoutput)
print("Hidden outputs:", yhidden) print("Network output:", youtput) ```
7. Backpropagation and Weight Updates
- Use Case:
Train neural networks by propagating error gradients backward.
- Code:
```python desired = 0.5 error = desired - y_output E = 0.5 * error**2
def tanh_derivative(x): return 1 - np.tanh(x)**2
deltaoutput = error * tanhderivative(uoutput) deltahidden = deltaoutput * woutput * tanhderivative(uhidden)
learning_rate = 0.1
woutput += learningrate * deltaoutput * yhidden whidden += learningrate * np.outer(delta_hidden, X)
print("Updated weights") print("Output weights:", woutput) print("Hidden weights:", whidden) print("Loss:", E) ```
8. Momentum Optimization
- Use Case:
Speeds convergence and helps avoid local minima during training.
- Code:
```python import numpy as np
grad = np.array([0.1, -0.2, 0.05]) learningrate = 0.1 momentum = 0.9 velocity = np.zeroslike(grad) weights = np.array([0.5, -0.3, 0.8])
velocity = momentum * velocity - learning_rate * grad weights += velocity
print("Weights updated:", weights) ```
9. Activation Functions: ReLU Example
- Use Case:
Effective activation to accelerate deep neural network training.
- Code:
```python import numpy as np
def relu(x): return np.maximum(0, x)
inputs = np.array([-2, -1, 0, 1, 2]) outputs = relu(inputs)
print("ReLU outputs:", outputs) ```
10. Regularization & Dropout Example
- Use Case:
Combat overfitting by randomly disabling neurons during training.
- Code:
```python from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout
model = Sequential([ Dense(64, activation='relu', input_shape=(20,)), Dropout(0.5), Dense(1, activation='sigmoid') ])
model.summary() ```
11. Batch vs Mini-batch vs Online Learning
- Use Case:
Trade-offs in training efficiency and noise introduced.
- Code:
```python import numpy as np
batchsize = 32 datasetsize = 1000 indices = np.arange(dataset_size)
for epoch in range(5): np.random.shuffle(indices) for start in range(0, datasetsize, batchsize): end = start + batchsize batchindices = indices[start:end] # Perform training step on batch_data ```
12. Data Preprocessing: Feature Normalization
- Use Case:
Normalize features to accelerate and stabilize training.
- Code:
```python from sklearn.preprocessing import StandardScaler import numpy as np
X = np.random.rand(100, 5) * 10
scaler = StandardScaler() Xnormalized = scaler.fittransform(X)
print("Means after scaling:", np.mean(Xnormalized, axis=0)) print("Stds after scaling:", np.std(Xnormalized, axis=0)) ```
13. Adam Optimizer Example with MNIST
- Use Case:
Efficient training with adaptive gradients.
- Code:
```python import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Flatten, Dense
(xtrain, ytrain), (xtest, ytest) = tf.keras.datasets.mnist.loaddata() xtrain, xtest = xtrain / 255.0, x_test / 255.0
model = Sequential([ Flatten(input_shape=(28, 28)), Dense(128, activation='relu'), Dense(10, activation='softmax') ])
model.compile(optimizer='adam', loss='sparsecategoricalcrossentropy', metrics=['accuracy'])
history = model.fit(xtrain, ytrain, epochs=5, validationsplit=0.1, verbose=2) testloss, testacc = model.evaluate(xtest, ytest, verbose=2) print(f"Test accuracy: {testacc:.4f}") ```
14. RMSProp Optimizer Example with MNIST
- Use Case:
Alternative adaptive optimizer for neural networks.
- Code:
```python modelrms = Sequential([ Flatten(inputshape=(28, 28)), Dense(128, activation='relu'), Dense(10, activation='softmax') ])
modelrms.compile(optimizer='rmsprop', loss='sparsecategorical_crossentropy', metrics=['accuracy'])
modelrms.fit(xtrain, ytrain, epochs=5, validationsplit=0.1, verbose=2) testloss, testacc = modelrms.evaluate(xtest, ytest, verbose=2) print(f"Test accuracy with RMSProp: {testacc:.4f}") ```
15. Demonstrating Accuracy with MNIST Dataset
To demonstrate the accuracy of your trained model on the MNIST dataset using TensorFlow, you should use the model's evaluate() method. This method returns both the loss and the evaluation metrics such as accuracy.
- Code Example:
```python import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Flatten, Dense
Load MNIST dataset
(xtrain, ytrain), (xtest, ytest) = tf.keras.datasets.mnist.load_data()
Normalize pixel values to
xtrain, xtest = xtrain / 255.0, xtest / 255.0
Build the neural network model
model = Sequential([ Flatten(input_shape=(28, 28)), Dense(128, activation='relu'), Dense(10, activation='softmax') ])
Compile the model specifying optimizer, loss, and metrics
model.compile(optimizer='adam', loss='sparsecategoricalcrossentropy', metrics=['accuracy'])
Train the model
model.fit(xtrain, ytrain, epochs=5, validation_split=0.1, verbose=2)
Evaluate the model on the test dataset
testloss, testacc = model.evaluate(xtest, ytest, verbose=2)
Print the accuracy as a percentage
print(f"Test accuracy: {test_acc * 100:.2f}%") ```
Explanation
The
evaluate()function runs the model on the test data and reports the loss and accuracy.The accuracy is printed as a percentage for easier interpretation.
This approach is standard for classification tasks and directly shows how well your model performs on unseen data
16. Overfitting and Early Stopping
- Concept:**
Overfitting means a model fits training data too closely including noise, harming generalization to new data.
- Use Case:
Early Stopping monitors validation loss and stops training when improvements cease, mitigating overfitting. This improves generalization in MNIST and other datasets.
Important: MNIST Loading and Normalization (Place This at the Start of Your Notebook/Script)
Before running the training code below, ensure you have loaded and normalized the MNIST dataset so that the variables x_train and y_train exist and are ready for use.
```python import tensorflow as tf
Load MNIST dataset
(xtrain, ytrain), (xtest, ytest) = tf.keras.datasets.mnist.load_data()
Normalize pixel values to
xtrain = xtrain / 255.0 xtest = xtest / 255.0 ```
Early Stopping Model Training Code (Sendoultimo Item):
```python from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Flatten, Dense from tensorflow.keras.callbacks import EarlyStopping
earlystopping = EarlyStopping(monitor='valloss', patience=3, restorebestweights=True)
model = Sequential([ Flatten(input_shape=(28, 28)), Dense(128, activation='relu'), Dense(10, activation='softmax') ])
model.compile(optimizer='adam', loss='sparsecategoricalcrossentropy', metrics=['accuracy'])
model.fit(xtrain, ytrain, epochs=50, validationsplit=0.1, callbacks=[earlystopping], verbose=2) ```
Summary of Placement:
MNIST loading and normalization block should be inserted at the very start of your notebook or script (immediately after imports).
The Early Stopping model training code block should be placed later in the training section, near the end of your model building and training workflow, but before evaluation.
This arrangement ensures that your model training example with early stopping is fully functional using the MNIST dataset, and provides a clear, logical flow for this script.
References
- Content derived from Decreasing-Gradient.pdf.
- Classic works by McCulloch & Pitts, Hebb, Rosenblatt, Hopfield, Rumelhart, Hinton & Williams, and Cybenko.
- NVIDEA Building a Brain Course
- Neuralearn Courses
See alsso our Project:
Predictive, PI, and Gradient Descent Control in TAB Converters for Electric Vehicles
(Under Construtction)
Meet the Crew Under Jahs Vibes!
https://github.com/user-attachments/assets/f8ad6d7a-6d85-4f2c-b9cc-ce8230ba3b9b
United by Vision
Guided by Jah
Strength in Unity
Reference
Content derived from Decreasing-Gradient.pdf.
Application of MPC controls with descending gradient and PI in a TAB converter used in electric vehicle powertrains by Atlio Caliari de Lima,PHD.
Feel Free to Reach Out:
Email Me
My Contacts Hub
Copyright 2025 Mindful-AI-Assistants. Code released under the MIT license.
Owner
- Name: 𖤐 Mindful AI ॐ
- Login: Mindful-AI-Assistants
- Kind: organization
- Email: fabicampanari@proton.me
- Location: Brazil
- Website: https://github.com/Mindful-AI-Assistants
- Repositories: 4
- Profile: https://github.com/Mindful-AI-Assistants
𖤐 Empowering businesses with AI-driven technologies like Copilots, Agents, Bots and Predictions, alongside intelligent Decision-Making Support 𖤐
GitHub Events
Total
- Delete event: 25
- Push event: 30
- Pull request event: 55
- Create event: 35
Last Year
- Delete event: 25
- Push event: 30
- Pull request event: 55
- Create event: 35
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 40
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 0
- Pull requests: 40
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 4
Top Authors
Issue Authors
Pull Request Authors
- FabianaCampanari (36)
- dependabot[bot] (4)