QTW 7333 Module 12: Convolutional Neural Networks Study
Guide
1. What Is a Convolution?
- A convolution is the integral of the product of two
functions after one is reversed and shifted.
- In discrete terms, we “slide” a small function (called a
filter or kernel) across an input
(like an image or signal), and compute an output value based on their
overlap.
Mathematical Representation:
For continuous functions:
y(t) = ∫ f(τ) g(t - τ) dτ
For discrete functions (used in images and signals):
y[n] = Σ f[k] * g[n - k]
In neural networks, this becomes:
output = Σ (input * filter)
This is a dot product between the filter and the
part of the input it overlaps.
2. Convolution in Neural Networks
- We slide the filter (e.g., a 3x3 matrix) over the input (e.g., an
image matrix).
- At each position, we take an element-wise multiplication of
overlapping values, then sum to get a single number.
- That number becomes a feature in the output (also
called a feature map or activation
map).
Example:
If you convolve a 5x5 image with a 3x3 filter: - Output size becomes
(5-3)+1 = 3 (in each dimension), unless you apply
padding.
3. Padding
- Padding helps maintain the original size of the
image after convolution.
- We add extra pixels (usually 0s) around the border.
Without Padding: Output size = (W - F + 1)
With Padding (P): Output size = ((W - F + 2P) / S) +
1
Where: - W = input width/height - F = filter size - P = padding - S =
stride
4. Stride
- Stride controls how many pixels we move the filter at a time.
- Stride = 1 → move filter 1 pixel at a time.
- Larger stride → smaller output and less
overlap.
5. Filter Examples
Each of these is used to extract different features (edges, textures,
etc.).
6. Neural Network Architecture Recap
From Module 11 (transcript):
Each neuron is essentially a regression
function:
z = Wx + b → σ(z)
Where: - W = weights - x = inputs - b = bias - σ = activation
function (e.g., sigmoid, ReLU)
In CNNs, this regression is replaced with convolution
operations.
7. Matrix Dimensions & Output Size
(Important)
If: - Input = 28x28 - Filter = 3x3 - Padding = 1 - Stride = 1
Then: - Output = ((28 - 3 + 2×1) / 1) + 1 = 28
→ Output has same dimension as input.
8. Summary of Key Concepts
- Convolution: Overlapping of input and filter to
extract features.
- Padding: Adds border to control output size.
- Stride: Controls step size of the filter
movement.
- Filters/Kernels: Detect edges, textures,
patterns.
- Output: Feature maps used in downstream
layers.
- Multiple filters: Used to extract different feature
types.
9. Key Takeaways
- Convolutions are powerful because they preserve spatial
relationships.
- CNNs learn filters automatically during training.
- Padding + stride lets you control the size of your
output feature maps.
- Edge detection and blurring are
simple real-world examples of convolution filters.
10. Practice Questions
- What does a convolution operation do in image processing?
- Why do we use padding in convolutional neural networks?
- How does the stride affect the output size?
- What happens when you apply an edge detection filter to an
image?
- Derive the output size of a convolutional layer given the input
size, filter size, padding, and stride.
- What is the role of the bias vector in a convolutional layer?
Suggested Reading
From the Elements of Statistical Learning (Hastie, Tibshirani,
Friedman):
- Chapter 11: Neural Networks
- Chapter 5: Basis Expansions and Regularization (for understanding
convolution-like operations)
URL for textbook:
https://web.stanford.edu/~hastie/ElemStatLearn/
🧠 Module 12: Convolutional Neural Networks — Study Guide
1. Core Concept: What is a Convolution?
Definition:
A convolution is the overlap between two functions,
calculated by sliding a function (called a filter or
kernel) over input data (usually an image), multiplying
overlapping values, and summing the results.
This helps extract important features like edges, textures, or color
gradients from images.
2. Mathematical Representation
Continuous form:
y(t) = ∫ f(τ)·g(t−τ) dτ
Discrete (for digital images and neural
networks):
y[i, j] = Σ Σ input[m+i, n+j] * filter[m, n]
Where: - input
is your image matrix -
filter
is the kernel (e.g., edge detection) -
y
is the output feature map
3. CNNs in Action: Python Code Example (Beginner-Friendly)
import numpy as np
from scipy.signal import convolve2d
import matplotlib.pyplot as plt
# Example input image (5x5)
image = np.array([
[1, 2, 3, 0, 1],
[0, 1, 2, 3, 0],
[1, 0, 1, 2, 1],
[2, 1, 0, 1, 2],
[1, 2, 1, 0, 1]
])
# Example edge detection filter (3x3)
filter_kernel = np.array([
[0, -1, 0],
[-1, 4, -1],
[0, -1, 0]
])
# Apply convolution
output = convolve2d(image, filter_kernel, mode='valid')
print("Output after convolution:")
print(output)
Output: A smaller matrix that highlights where edges
or features were detected.
4. CNN Layers Breakdown
- Convolutional Layer: Applies filters to extract
features.
- Padding: Adds zeros around the image to preserve
size.
- Stride: How many steps the filter moves.
- Activation (ReLU): Applies a function like
max(0, x)
to make the model non-linear.
- Pooling: Reduces dimensionality (MaxPool,
AvgPool).
5. Visual Example from Class Slides
Averaging filter smooths the image using a
uniform kernel.
Edge detection filter highlights boundaries
using:
[[ 0, -1, 0],
[-1, 4, -1],
[ 0, -1, 0]]
6. Key Takeaways
- Convolution = Sliding + Multiplying + Summing
- Padding helps maintain the input size
- Stride affects how much we reduce dimensionality
- Filters are learned during training to extract
useful features
- Convolutions are why CNNs outperform traditional
methods in image tasks
7. Relevant Questions to Test Yourself
- What does a convolution do in a neural network?
- Why is padding used in convolution layers?
- What effect does increasing the stride have?
- How does an edge detection filter work?
- What is the output size if you apply a 3x3 filter to a 5x5 image
with no padding and stride 1?
8. Layman’s Explanation – Pizza Cutter Analogy
Imagine you’re cutting a large pizza with a stencil
that is 3x3 inches in size. You place your stencil on the pizza, look at
just that square, and rate it from 1 to 10 based on how much pepperoni
it has.
Then, you move the stencil over a little (by 1 inch = stride), and do
it again. You’re scanning the entire pizza, one small patch at a
time.
Your stencil is the filter. The pizza is the
image. Your rating is the output
feature. If you want the edges to also be analyzed (not just
the center), you place napkins (zeros) around the edge — that’s
padding.
This is what convolution does: it helps see the important
parts of an image (like cheese, crust, or pepperoni) and
condense it into useful info — which is what a neural network needs to
make a decision.
9. Textbook References for Deeper Reading
From “The Elements of Statistical Learning” by Hastie et
al.
- Chapter 11: Neural Networks
URL: https://web.stanford.edu/~hastie/ElemStatLearn/
From “The Statistical Sleuth” by Ramsey &
Schafer
- Not image-focused but useful for regression background
Here’s your complete study guide for Convolutional Neural
Networks (CNNs), integrating all class visuals, transcripts,
textbook knowledge, mathematical concepts, Python code, key takeaways,
and an easy-to-understand real-world analogy.
QTW 7333 – Module 12
Study Guide: Convolutional Neural Networks (CNNs)
🧠 Overview of CNNs
A Convolutional Neural Network is a type of neural network
architecture optimized for processing images. It mimics
the human visual cortex where only nearby
neurons communicate—meaning filters (kernels)
focus on local patterns (edges, corners, etc.) instead of the entire
image at once.
🧩 Layers in a CNN
- Convolutional Layer
- Activation Function (usually ReLU)
- Pooling Layer
- Flatten Layer
- Dense (Fully Connected) Layer
- Output Layer (often Softmax for
classification)
🧮 Mathematical Concepts
1. Convolution Layer:
Applies a filter (kernel) across the image.
Equation (2D discrete convolution):
Y(i, j) = Σ_m Σ_n X(i+m, j+n) · K(m, n)
Where:
- X = input image
- K = kernel
- Y = output feature map
2. Pooling Layer:
Reduces spatial size by selecting max or
average values.
- Max Pooling (2×2 with stride 2):
From:
[[1, 1],
[5, 6]] → max = 6
3. Flatten Layer:
Converts multi-dimensional tensor into 1D vector for dense
layers.
Example: [[0, -1, 0],
[-1, 4, -1],
[0, -1, 0]]
→ [0, -1, 0, -1, 4, -1, 0, -1, 0]
💻 Python Code Example (Simple CNN with Keras)
import tensorflow as tf
from tensorflow.keras import layers, models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Flatten and Dense
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()
🎨 Layman’s Analogy
Imagine you’re looking at a Where’s Waldo book: -
Your eyes scan small sections of the image (like a
filter). - You’re checking patterns like hats, glasses,
red-white shirts. - When something stands out, your brain
stores that info. - Once enough features are collected, you
decide where Waldo is.
Similarly: - Convolutional layers scan for patterns.
- Pooling summarizes those patterns (shrinks detail). -
Flattening stacks everything into a list. -
Dense layers make a final decision (e.g., this is
Waldo!).
📘 Textbook Support
From Elements of Statistical Learning: -
Chapter 11 – Neural Networks
https://web.stanford.edu/~hastie/ElemStatLearn/
Also explore CS231n by Stanford (Visual CNN
walkthrough):
http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture5.pdf
🧠 Key Takeaways
- CNNs are ideal for image classification due to spatial
awareness.
- Convolutional layers extract local patterns.
- Pooling layers reduce spatial dimensions.
- Flattening reshapes the image into a 1D vector for Dense
layers.
- CNNs reduce parameters compared to traditional
dense networks while improving accuracy.
❓ Practice Questions
- What is the purpose of a convolutional layer?
- What does max pooling do? Why is it useful?
- Why do we flatten data before sending it to a dense layer?
- In what way is a CNN biologically inspired?
- How does increasing the number of filters affect the output?
📘 QTW 7333 – Module 12: Transfer Learning Study
Guide
🔍 1. Concept Overview: What Is Transfer Learning?
Definition:
Transfer Learning is a technique where we take a pre-trained
model (often trained on large datasets using powerful hardware)
and reuse its early layers while retraining
only the final layers for a new, often smaller, task.
Instead of training everything from scratch, we “transfer”
the learning from one task to another.
🧠 2. Why Does It Work?
From your transcript: - Early layers (top of the network) learn
general features (edges, colors, shapes). - Later
layers (dense layers) learn specific task-related
features. - General features are reusable,
even across different tasks (e.g., dogs vs. cats, or cars vs. planes). -
Saves time, data, and compute resources.
💻 4. Python Code Example (Using Keras)
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras import Input
# Load pre-trained VGG16 without top layers
base_model = VGG16(weights='imagenet', include_top=False, input_tensor=Input(shape=(224, 224, 3)))
# Freeze base model layers
for layer in base_model.layers:
layer.trainable = False
# Add new dense layers on top
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x) # for 10-class problem
# Final model
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
This uses ImageNet-trained features and trains only
the classifier on your custom data.
📚 5. Reference from Textbooks
The Elements of Statistical Learning
(Hastie, Tibshirani, Friedman)
- Chapter 11 (Neural Networks): explains how lower-level representations
generalize across problems.
Link: https://web.stanford.edu/~hastie/ElemStatLearn/
✅ 6. Key Takeaways
- You can reuse expensive, pre-trained models to solve your own
problem.
- You only need to train a few layers (usually the last dense
layers).
- Great for limited data, faster
training, and cross-domain tasks.
- Pre-trained models are available via TensorFlow
Hub, PyTorch Hub, Hugging
Face, etc.
❓ 7. Relevant Questions
- Why is transfer learning useful in deep learning?
- Which parts of the model are typically frozen and which are
retrained?
- Can transfer learning work across domains (e.g., vision to
language)?
- What role does the vanishing gradient play in motivating transfer
learning?
🧸 8. Layman Explanation (Real-World Analogy)
Imagine you’re learning to play music.
- You trained 10 years on the piano.
- Now you want to learn the guitar.
You don’t need to relearn music theory, rhythm, notes, or
timing. You just need to learn how to hold and strum
the guitar.
That’s transfer learning: - Your music theory = pre-trained layers
(frozen). - Learning guitar specifics = final layers (retrained). -
You’re not starting from scratch — you’re adapting.
Just like your brain doesn’t relearn how sound works each time, a CNN
doesn’t need to relearn edge detection or textures when classifying new
images.
📘 QTW 7333 – Module 12: CNN Part I – Set Up Your Data Study
Guide
🔍 1. What Are We Doing?
We’re: - Importing CIFAR-10 image dataset - Visualizing the dataset -
Normalizing the pixel values - Preparing everything to feed into a
Convolutional Neural Network (CNN)
🧪 2. Why Normalize and Visualize?
- Normalization scales pixel values from
[0, 255]
to [0.0, 1.0]
→ helps the neural
network converge faster.
- Visualization lets us verify that the images and
labels are correctly aligned before training.
📚 3. Textbook Support
From The Elements of Statistical Learning by Hastie,
Tibshirani & Friedman:
- Chapter 11 (Neural Networks)
- Chapter 7 (Model Assessment and Selection)
Link: https://web.stanford.edu/~hastie/ElemStatLearn/
Also helpful: TensorFlow official docs:
https://www.tensorflow.org/tutorials/images/cnn
📊 4. Mathematical Explanation
Each image is a tensor of shape (32, 32, 3)
(height, width, RGB channels).
Normalization:
For each pixel value \(x\) in the
image:
\(x_{\text{normalized}} =
\frac{x}{255.0}\)
Labels are integer-encoded:
e.g., 0 = airplane, 1 = car, …, 9 = truck
💻 5. Python Code (Beginner-Friendly Setup)
# Step 1: Import libraries
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Step 2: Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Step 3: Normalize pixel values to [0, 1]
train_images, test_images = train_images / 255.0, test_images / 255.0
# Step 4: Define class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
# Step 5: Plot first 25 images
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([]) # Remove x-axis
plt.yticks([]) # Remove y-axis
plt.grid(False)
plt.imshow(train_images[i])
plt.xlabel(class_names[train_labels[i][0]])
plt.show()
✅ 6. Key Takeaways
- CIFAR-10 is a great starter dataset for image
classification with 10 common object classes.
- Normalization is essential in neural networks to
ensure pixel values are in a standard range.
- Plotting images helps validate data quality before training.
- This setup phase is crucial. Garbage in = garbage out.
❓ 7. Questions to Test Yourself
- Why do we divide pixel values by 255 in image datasets?
- What are the dimensions of each CIFAR-10 image?
- Why is it useful to visualize the images before training a CNN?
- What does
train_labels[i][0]
mean in the plotting
loop?
🧸 8. Layman’s Analogy: Filing Photos for an Album
Imagine you’re creating a photo album: - Each photo
= an image - The photo label = the name written below
it (dog, truck, etc.) - But all your photos are in different
formats and brightness levels.
Before pasting them in: 1. You resize all to the same dimensions
(32x32 pixels). 2. You adjust brightness (normalization). 3. You label
them (class names). 4. Then you lay them out in the album
(plotting).
This setup makes your album (dataset) organized and easy to interpret
— just like it makes the CNN’s job easier during training.
Here is your complete Study Guide for CNN Part II: Building
the Model from QTW 7333, including textbook links, transcript
summary, mathematical explanation, Python code, key takeaways, self-test
questions, and a beginner analogy.
📘 QTW 7333 – Module 12: CNN Part II – Building the
Model
🔍 1. What Are We Doing in This Module?
We’re: - Building a CNN architecture using Keras’
Sequential
API. - Adding convolution, pooling, flattening,
and dense layers. - Compiling the model with an optimizer and loss
function. - Training the model and validating performance. - Making
predictions and visualizing results.
🧠 2. Key Concepts
- Conv2D: Applies convolutional filters to extract
patterns.
- MaxPooling2D: Reduces spatial dimensions, retains
important features.
- Flatten: Converts 2D output to 1D for dense layer
input.
- Dense (fully connected layer): Final decision
layers.
- Softmax: Output layer for classification across
multiple classes.
🧮 3. Mathematical Representation
Each Conv2D layer:
output = ReLU(W * input + b)
Where: - *
= convolution operation - W
=
filter/kernel weights - b
= bias - ReLU
=
max(0, x), element-wise activation
MaxPooling2D:
Reduces matrix by selecting the max value in each window (e.g.,
2x2).
Dense layer:
output = Softmax(Wx + b)
💻 4. Complete Python Code
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
# Load and normalize CIFAR-10
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
# Define CNN model
my_model = models.Sequential()
my_model.add(layers.Conv2D(32, (2, 2), activation='relu', input_shape=(32, 32, 3)))
my_model.add(layers.Conv2D(64, (3, 3), activation='relu'))
my_model.add(layers.MaxPooling2D((2, 2)))
my_model.add(layers.Conv2D(17, (2, 3), activation='relu'))
my_model.add(layers.Conv2D(14, (4, 4), activation='relu'))
my_model.add(layers.MaxPooling2D((2, 2)))
my_model.add(layers.Flatten())
my_model.add(layers.Dense(100, activation='relu'))
my_model.add(layers.Dense(10, activation='softmax'))
# Compile the model
my_model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
# Train the model
history = my_model.fit(train_images, train_labels, epochs=5, batch_size=50,
validation_data=(test_images, test_labels))
# Predict and visualize results
results = my_model.predict(test_images)
# Show predictions vs true labels
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([]); plt.yticks([]); plt.grid(False)
plt.imshow(test_images[i], cmap=plt.cm.binary)
true_label = test_labels[i][0]
pred_label = np.argmax(results[i])
plt.xlabel(f"True: {true_label}, Pred: {pred_label}")
plt.show()
📊 5. Accuracy Output Example
Training output (sample):
- val_accuracy: 0.6112 → 0.6368 → 0.6776
- Each epoch improves accuracy.
- Final prediction probabilities have shape (10000, 10), one row per
test image, 10 columns per class.
✅ 6. Key Takeaways
- CNNs excel at image recognition because they learn spatial
hierarchies (edges → shapes → objects).
- ReLU is the default activation for convolutional layers.
- Softmax turns raw outputs into probabilities for multi-class
classification.
- You must flatten the convolutional output before
connecting it to a dense layer.
- Model performance improves with deeper
architectures, good normalization, and
more epochs.
❓ 7. Practice Questions
- What does the Flatten layer do?
- Why do we use MaxPooling in a CNN?
- What does the final Dense(10, activation=‘softmax’) layer
represent?
- Why is the Adam optimizer commonly used?
- What shape do the CNN predictions have and why?
🧸 8. Layman Explanation – The “Photo Sorting Machine”
Imagine building a smart photo sorter: - The first
machine part looks for edges (where colors change
suddenly). - The next part identifies shapes (cars,
cats, etc.). - Finally, a decision-maker reads everything and says,
“This is a truck!”
Each “part” is a layer in your neural network: -
Convolution layers = find patterns (like textures) -
Pooling layers = zoom out to reduce clutter -
Dense layers = make decisions based on what was found -
Softmax = gives the final vote (percent chance it’s
each object)
Just like sorting your photos by eye, CNNs do this automatically—but
with math and training!
📚 Textbook Support
The Elements of Statistical Learning
- Chapter 11: Neural Networks (pg. 392+)
https://web.stanford.edu/~hastie/ElemStatLearn/
TensorFlow Tutorial (Official)
https://www.tensorflow.org/tutorials/images/cnn
—generate this as: - A Jupyter Notebook for
immediate use? - A PDF Study Sheet for printing? - An
interactive HTML presentation with sliders?
Perfect. I’ve compiled your materials into a structured study guide
that covers the following topics:
- Convolutional Layer Fundamentals
- Transfer Learning Concepts
- CNN Part I: Dataset Setup
- CNN Part II: Model Building
- Transfer Learning: VGG16 + Custom Classifier
I’ll walk you through each section in an easy-to-follow format:
Convolutional Neural Networks: Beginner Study Guide
Section 1: What is a Convolution?
Concept:
A convolution is when we slide a filter (small matrix) over data (like
an image) to extract meaningful features (like edges or patterns). Each
overlap is multiplied element-wise and summed into a new output
value.
Math:
If I is an image matrix and K is a
filter:
S(i,j) = ∑∑ K(m,n) * I(i+m, j+n)
This is done for all valid positions of the filter.
Python Example:
import numpy as np
from scipy.signal import convolve2d
image = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
kernel = np.array([[0, 1],
[1, 0]])
result = convolve2d(image, kernel, mode='valid')
print(result)
Key Terms: - Padding: Adds extra
borders to maintain original size. - Stride: How many
steps the filter moves. - Filter: Also called a
kernel.
Takeaways: - Filters are learned during training. -
Convolutions detect patterns like edges, color, and textures.
Relevant Questions: - Why do we use padding? - What
happens when you increase the stride?
Layman Analogy:
Imagine using a small window to look at different parts of a big
painting. Each time, you write down a summary of what you see (like
color intensity or sharpness). You repeat this until you’ve scanned the
whole picture.
Section 2: Transfer Learning Explained
Concept:
Transfer learning lets us reuse a powerful pretrained model (like VGG16)
and just train the final layers on our specific task.
Why?
Advanced models take weeks and lots of compute. Transfer learning lets
us use pretrained features and train only the “head” (final layers).
Diagram Summary:
Left = pretrained convolution layers
Right = new dense layers
We keep the left side “frozen” and train only the right.
Mathematics: If F is the feature
extractor and W are trainable weights:
Prediction = softmax(W * F(x))
Python Example:
base_model = tf.keras.applications.VGG16(input_shape=(160,160,3),
include_top=False,
weights='imagenet')
base_model.trainable = False
Takeaways: - Saves compute and time. - Works with
few samples. - Can be used across domains.
Relevant Questions: - What layers should you freeze?
- Why is softmax used in the final layer?
Layman Analogy:
Imagine you’re learning to bake cakes. Instead of learning everything
from scratch, you buy a cake mix (pretrained model) and just focus on
decorating it to your style (custom layers).
Section 3: CNN, Part I: Set Up Your Data
Code Snippet:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images = train_images / 255.0 # normalize pixel values
test_images = test_images / 255.0
Visualization:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([]), plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[i][0]])
plt.show()
Takeaways: - Normalizing image data (0–255 → 0–1) is
crucial. - Visualizing helps verify your labels and structure.
Relevant Questions: - Why normalize pixel values? -
What does CIFAR-10 contain?
Section 4: CNN, Part II: Build the Model
Architecture Overview:
model = models.Sequential()
model.add(layers.Conv2D(32, (2,2), activation='relu', input_shape=(32,32,3)))
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128, (4,3), activation='relu'))
model.add(layers.Conv2D(128, (4,4), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(100, activation='relu'))
model.add(layers.Dense(10, activation='softmax')) # For 10 classes
Compile and Train:
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=50, validation_data=(test_images, test_labels))
Takeaways: - Conv2D
layers extract
features. - MaxPooling2D
reduces size and keeps important
features. - Dense
layers make final predictions.
Relevant Questions: - What does the flatten layer
do? - Why is softmax good for multiclass classification?
Layman Analogy:
Imagine your model is like a detective. The convolutional layers are
like the detective gathering clues (edges, colors), the flatten layer
organizes all the clues in a single file, and the dense layers use that
file to figure out: “Is this a cat or a truck?”
Section 5: Transfer Learning in Practice (VGG16 + Cats vs
Dogs)
Steps Summary: 1. Load dataset with
image_dataset_from_directory
2. Normalize with
Rescaling
3. Load VGG16 base (exclude top) 4. Freeze base
model 5. Add global average pooling, dropout, and dense output 6.
Compile and train
Key Code:
base_model = tf.keras.applications.VGG16(include_top=False, weights='imagenet', input_shape=(160,160,3))
base_model.trainable = False
model = tf.keras.Sequential([
tf.keras.layers.Rescaling(1./127.5, offset=-1),
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1) # binary classification
])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_dataset, epochs=10, validation_data=validation_dataset)
Takeaways: - VGG16 has 14 million+ parameters; we
only train a few hundred. - Fast training, even on small datasets. -
Accuracy improves in just a few epochs.
Relevant Questions: - What is the difference between
categorical and binary crossentropy? - What does dropout do?
Layman Analogy:
Think of the VGG16 model as an expert art critic who has studied
millions of paintings. You’re training a small assistant to only tell
apart cats and dogs. You don’t retrain the expert, just use their
opinions (features) and teach your assistant how to interpret them for a
very specific task.
Here’s a distilled synthesis from everything we’ve done
today—including the screenshots, class transcripts, and the concepts in
your PDF materials—into three general takeaways and
three thoughtful questions for Dr. Slater:
Three General Takeaways
1. Transfer Learning is a Game-Changer for Resource-Limited
Training
By freezing the convolutional base of pretrained models (like VGG16) and
training only the classifier head, we unlock the power of highly complex
models without needing massive compute. This approach significantly
accelerates training while maintaining strong performance, even with
limited data.
2. CNNs Rely Heavily on Proper Data Handling and
Architectural Balance
Performance is strongly influenced by seemingly simple preprocessing
steps (e.g., normalization by dividing by 255, resizing images, proper
label encoding). Additionally, the depth of convolution and pooling
layers must be balanced with memory constraints and task complexity.
3. The Shift from Dense to Convolutional Networks Reflects a
Change in How Models “See” Data
Dense networks treat input as flat vectors; CNNs treat it spatially.
This is crucial for images, where structure and locality matter. CNNs
learn hierarchical features—from edges in early layers to complex shapes
in deeper ones—mimicking how visual cortex neurons operate.
Three Thought-Provoking Questions for
Dr. Slater
1. Given the benefits of transfer learning, how do we assess
when it’s more appropriate to fine-tune versus freeze layers, especially
when working with scientific or niche datasets like medical scans or
satellite imagery?
2. In real-world deployment, how do we mitigate the risks of
CNN misclassification (e.g., frog vs. dog) in high-stakes applications
such as autonomous vehicles, military surveillance, or healthcare
diagnostics?
3. From a pedagogical or research standpoint, where do you
see the biggest conceptual gaps in student understanding when
transitioning from dense to convolutional neural networks—and how can we
better bridge that leap, perhaps with visual tools or hands-on
demos?
\section*{General Takeaways from CNNs and Transfer Learning}
\begin{enumerate}
\item \textbf{Transfer Learning is a Game-Changer for Resource-Limited Training} \\
Freezing the convolutional base of pretrained models like VGG16 and training only the dense classifier head allows us to leverage powerful models without the need for massive compute resources. This greatly reduces training time while maintaining strong performance, especially when working with smaller datasets.
\item \textbf{CNNs Rely Heavily on Proper Data Handling and Architectural Balance} \\
Performance is significantly influenced by preprocessing steps such as normalization (e.g., dividing pixel values by 255), image resizing, and label formatting. The architecture’s depth (number and size of convolution and pooling layers) must be tuned in relation to the complexity of the task and available memory.
\item \textbf{The Shift from Dense to Convolutional Networks Reflects a Change in How Models "See" Data} \\
Dense networks treat data as flat vectors; CNNs process it spatially. This is essential for image tasks where spatial locality matters. CNNs build hierarchical feature maps—from simple edges to complex objects—similar to the human visual system.
\end{enumerate}
\vspace{0.5cm}
\section*{Thought-Provoking Questions for Dr. Slater}
\begin{enumerate}
\item \textbf{How should we decide whether to freeze or fine-tune pretrained convolutional layers when working with domain-specific datasets (e.g., medical imaging, satellite data, or environmental analysis)?}
\item \textbf{What safeguards or model evaluation strategies would you recommend when deploying CNNs in high-stakes environments (e.g., defense, transportation, healthcare) to reduce risks from misclassifications like mistaking a frog for a dog?}
\item \textbf{From your experience teaching deep learning, where do students typically struggle most when transitioning from dense to convolutional networks, and how could we better support that shift using visual or interactive resources?}
\end{enumerate}
Jessica McPhaul 7333 – QTW Module 12 Presession Week 12
________________________________________ Three General Takeaways 1.
Transfer Learning is a Game-Changer for Resource-Limited Training By
freezing the convolutional base of pretrained models (like VGG16) and
training only the classifier head, we unlock the power of highly complex
models without needing massive compute. This approach significantly
accelerates training while maintaining strong performance, even with
limited data. 2. CNNs Rely Heavily on Proper Data Handling and
Architectural Balance Performance is strongly influenced by seemingly
simple preprocessing steps (e.g., normalization by dividing by 255,
resizing images, proper label encoding). Additionally, the depth of
convolution and pooling layers must be balanced with memory constraints
and task complexity. 3. The Shift from Dense to Convolutional Networks
Reflects a Change in How Models “See” Data Dense networks treat input as
flat vectors; CNNs treat it spatially. This is crucial for images, where
structure and locality matter. CNNs learn hierarchical features—from
edges in early layers to complex shapes in deeper ones—mimicking how
visual cortex neurons operate. ________________________________________
3 Questiosn 1. Given the benefits of transfer learning, how do we assess
when it’s more appropriate to fine-tune versus freeze layers, especially
when working with scientific or niche datasets like medical scans or
satellite imagery? 2. In real-world deployment, how do we mitigate the
risks of CNN misclassification (e.g., frog vs. dog) in high-stakes
applications such as autonomous vehicles, military surveillance, or
healthcare diagnostics? 3. From a pedagogical or research standpoint,
where do you see the biggest conceptual gaps in student understanding
when transitioning from dense to convolutional neural networks—and how
can we better bridge that leap, perhaps with visual tools or hands-on
demos? ________________________________________
