from IPython.display import display, HTML
from lussi.mnist import *
= load_data() X_train, X_test, y_train, y_test, X_train_unscaled
Project 4: MNIST using KNN and Neural Networks
Project 4 Github: [Quarto Presentation] [Python] | Projects: [4] [3] [2] [1]
Executive Summary
This project evaluates the performance of K-Nearest Neighbors (KNN) and Neural Networks for handwritten digit recognition using the MNIST dataset. The analysis reveals several key findings:
Performance:
- The Neural Network achieved 97.1% accuracy, outperforming KNN’s 94.7% accuracy
- The Neural Network showed more consistent performance across all digits, with accuracy ranging from 95.3% to 98.9%
- KNN showed more variability, with accuracy ranging from 89.9% to 99.2%
Computational Characteristics:
- Training: KNN trained in 4.07 seconds vs. Neural Network’s 16.05 seconds
- Prediction Speed:
- For small batches (1-100 images), KNN was faster
- For larger batches (1000 images), Neural Network was significantly faster (0.07ms vs 0.31ms per image)
Error Patterns:
- Both models struggled most with visually similar digits (e.g., 3/5, 4/9, 7/9)
- KNN showed higher error rates for complex digits like ‘8’ (89.9% accuracy)
- Neural Network maintained >95% accuracy across all digit classes
This analysis demonstrates that while KNN offers faster training and competitive performance for small-scale predictions, the Neural Network provides superior accuracy and better scaling for larger batch predictions, making it more suitable for production deployment despite longer training times.
Project Overview
History and Significance of MNIST
The MNIST dataset (Modified National Institute of Standards and Technology) emerged from a practical need at the U.S. Postal Service in the late 1980s. It was created to help automate mail sorting by recognizing handwritten zip codes. Created by Yann LeCun, Corinna Cortes, and Christopher Burges, MNIST has become the de facto “Hello World” of machine learning. The dataset consists of 70,000 handwritten digits (60,000 for training, 10,000 for testing). Its standardized format and manageable size have made it an ideal benchmark for comparing machine learning algorithms for over three decades.
Understanding the Dataset Format
Though easily converted, the records are not actually stored as images. They are stored as a matrix. Each record of the 60,000 images is stored as a 28 by 28 matrix, with those positions holding the color of the pixel the position represents. It’s a square image, so 28 pixels by 28 pixels tall totals 784 total pixles, or 784 total numbers. Each of those numbers represents a shade of grayscale, 0 being all black, and 255 being white.
#visualize_digit_matrix(X_train_unscaled, index=0)
= visualize_digit_matrix_encoded(X_train_unscaled, index=0)
matrix_html display(HTML(matrix_html))
Wrapping around each of those records, it’s like any other machine learning dataset, test and train, and each of those two are broken apart into data and label, like so:
Training Set:
- Images: X_train → 60000 images, each of size 28
- Labels: y_train → 60000 labels, e.g., [5, 0, 4, 1, 9, …]
Testing Set:
- Images: X_test → 10000 images, each of size 28
- Labels: y_test → 10000 labels, e.g., [7, 2, 1, 0, 4, …]
Looking at Sample records
It’s easy to understand the core challenge by looking at records. There is much variation in hand written letters, with all sorts of factors presenting like:
- Writing styles and penmanship
- Stroke thickness and continuity
- Digit orientation and slant
- Image noise and quality
= plot_sample_digits_encoded(X_train_unscaled, y_train)
samples_html
display(HTML(samples_html))# plot_sample_digits(X_train_unscaled, y_train)
Project Goals
This project aims to:
- Compare the effectiveness of a simple, intuitive algorithm (KNN) against a more complex, modern approach (Neural Networks)
- Analyze the tradeoffs between computational complexity and accuracy
- Understand how different architectures handle the variations in handwritten digits
- Evaluate both training time and inference speed for real-world applicability
Model Implementation and Training
KNN
K-Nearest Neighbors (KNN) is a non-parametic algorithm, supervised machine algorithm used for both classification and regression tasks. The fundamental principle of KNN is simple: classify a new data point based on the majority vote (classification) or average (regression) of its K nearest neighbors in the feature space.
# Train KNN model
print("Training KNN Model...")
= time.time()
start_time = train_knn(X_train, X_test, y_train, y_test, rebuild_model=True)
knn_model, knn_accuracy = time.time() - start_time knn_train_time
Training KNN Model...
Training Configuration:
- Data Splitting:
- Used train_test_split() function to split the data:
- Training (80%)
- Testing (20%)
- An 80-20 split balances training data sufficiency and evaluation robustness
- Used train_test_split() function to split the data:
- Feature Scaling:
- Used StandardScalar() to scale the data, ensuring all features contribute equally to the distance calculation.
- Lazy Learning:
- KNN stores the entire training dataset and only makes predictions during inference
Prediction Process Mechanics:
- Distance Calculation
- For the new data point, calculate its distance to all training points.
- The default distance calculation method is Euclidean distance.
- Euclidean Distance formula: \(d(p, q) = \sqrt{\sum_{i=1}^n (q_i - p_i)^2}\)
- Neighbor Selection
- Select the K closest points (neighbors)
- In this implementation, n_neighbors =3, making the prediction sensitive to local patterns.
- Classification Method
- Majority voting determines the class
- Most frequent class among K neighbors wins
Key Parameters:
- Number of Neighbors: n_neighbors = 3
- A small value (e.g., 3) captures local patterns but is prone to overfitting.
- Larger values smoothen predictions but may underfit.
- Distance Metric:
- Default is Euclidean distance
- Other options include Manhattan or Minkowski distances for varying use cases.
- Weighting Scheme:
- Default is Uniform, where all neighbors contribute equally.
- Weighted options give more influence to closer neighbors.
Performance Metrics:
print(f"\nKNN Results:")
print(f"Training Time: {knn_train_time:.2f} seconds")
print(f"Accuracy: {knn_accuracy:.4f}")
KNN Results:
Training Time: 6.42 seconds
Accuracy: 0.9465
Conclusion:
K-Nearest Neighbors offers a straightforward yet powerful approach to classification. By leveraging local neighborhood information and flexible distance calculations, KNN provides an interpretable method for pattern recognition in machine learning tasks.
Neural Network
A neural network is a machine learning algorithm inspired by the structure and function of the human brain. It is designed to learn relationships, recognize patterns, and make predictions by mimicking how biological neurons process and transmit information. Neural networks excel in handling complex, non-linear data, making them a versatile tool for tasks such as image recognition, natural language processing, and classification.
print("\nTraining Neural Network...")
= time.time()
start_time = train_neural_network(X_train, X_test, y_train, y_test, rebuild_model=True)
nn_model, history, nn_accuracy = time.time() - start_time nn_train_time
Training Neural Network...
Architecture Overview:
We created a neural network designed for MNIST digit classification. It features a multi-layer feedforward architecture with strategic layer design and regularization techniques.
Detailed Layer Analysis:
- Input Layer
- Dimensions: 784 neurons (28 x 28 pixel flattened image)
- Purpose: Direct mapping of pixel intensity values
- Transformation: Converts 2D image to 1D feature vector
- First Hidden Layer
- Dimensions: 256 neurons
- Activation: ReLU (Rectified Linear Unit)
- Objectives:
- Initial complex feature extraction
- Introduces non-linear transformations
- Captures primary image characteristics
- First Dropout layer
- Dropout Rate: 0.2 (20%)
- Regularization Technique:
- Randomly deactivates 20% of neurons during training
- Prevents model overfitting
- Reduces neuron interdependence
- Second Hidden Layer
- Dimensions: 128 neurons
- Activation: ReLU (Rectified Linear Unit)
- Objectives:
- Further abstract feature representations
- Progressively reduce feature dimensionality
- Refine initial feature extraction
- Second Dropout Layer
- Dropout Rate: 0.2 (20%)
- Continues regularization strategy
- Prevents neural network from becoming too specialized
- Third Hidden Layer
- Dimensions: 64 neurons
- Activation: ReLU (Rectified Linear Unit)
- Objectives:
- Final feature abstraction
- Prepares data for classification
- Further reduces feature complexity
- Output Layer
- Neurons: 10 (one per digit 0-9)
- Activation: Softmax
- Characteristics:
- Converts raw scores to probability distribution
- Ensures probabilities sum to 1
- Enables multi-class classification
Training Configuration:
Hyperparameters:
- epochs
- Total Iterations: 10
- Purpose:
- Complete passes through entire training dataset
- Allows progressive weight refinement
- Prevents overfitting through limited iterations
- batch_size
- Configuration: 128 samples per gradient update
- Benefits:
- Computational efficiency
- Gradient noise reduction
- Memory-friendly processing
- validation_split
- Allocation: 10% of the training data
- Functions:
- Monitor model performance during training
- Detect potential overfitting
- Provide real-time performance insights
- epochs
Optimization Strategy:
- Adam
- Adaptive learning rate optimization
- Characteristics:
- Combines RMSprop and momentum advantages
- Dynamically adjusts per-parameter learning rates
- Handles sparse gradients effectively
- Adam
Loss Function:
- Sparse Categorical Cross-Entropy
- Ideal for multi-class classification
- Measures:
- Difference between predicted and actual distributions
- Guides weight updates during backpropagation
- Sparse Categorical Cross-Entropy
Performance Metrics:
print(f"\nNeural Network Results:")
print(f"Training Time: {nn_train_time:.2f} seconds")
print(f"Accuracy: {nn_accuracy:.4f}")
Neural Network Results:
Training Time: 11.20 seconds
Accuracy: 0.9741
Conclusion
The neural network architecture is carefully designed to balance complexity, feature extraction, and generalization. By incorporating strategic layer design, dropout regularization, and adaptive optimization, the model achieves robust performance in MNIST digit classification.
Model Comparison
In this section, we compare the performance of the K-Nearest Neighbors (KNN) algorithm and the Neural Network (NN) architecture based on key performance metrics: training time and accuracy.
= create_comparison_table()
compare_df print(compare_df)
Metric KNN Neural Network
0 Training Time (seconds) 6.27 10.91
1 Accuracy (%) 94.02 97.20
Performance Metrics
- Training Time
- KNN exhibits a faster training process (6.27 seconds) since it is a “lazy learning” algorithm, which delays most computation until prediction.
- The Neural Network, being a “eager learning” algorithm, spends more time (10.91 seconds) in training due to backpropagation, weight updates, and regularization techniques.
- Accuracy:
- The Neural Network outperforms KNN with an accuracy of 97.20%, compared to 94.02% for KNN.
- The Neural Network’s higher accuracy is attributed to its ability to extract complex, non-linear patterns in the data through multiple layers and activation functions.
- KNN, while simpler, relies on proximity in the feature space, which may not fully capture intricate relationships.
- Scalability:
- KNN’s computational cost increases significantly with larger datasets or higher-dimensional data due to the need to calculate distances for all training samples during prediction.
- Neural Networks scale better for larger datasets, as training is done once, and predictions are efficient after model training.
Confusion Matrices
= analyze_model_accuracies(knn_model, nn_model, X_test, y_test)
analysis_text, knn_cm_percent, nn_cm_percent # Then create and display the visualization
= compare_model_accuracies_encoded(knn_model, nn_model, X_test, y_test)
comparison_viz # Display both
# print(analysis_text)
display(HTML(comparison_viz))
1/438 ━━━━━━━━━━━━━━━━━━━━ 8s 20ms/step 96/438 ━━━━━━━━━━━━━━━━━━━━ 0s 527us/step208/438 ━━━━━━━━━━━━━━━━━━━━ 0s 485us/step316/438 ━━━━━━━━━━━━━━━━━━━━ 0s 480us/step438/438 ━━━━━━━━━━━━━━━━━━━━ 0s 485us/step438/438 ━━━━━━━━━━━━━━━━━━━━ 0s 500us/step
1/438 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step109/438 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step217/438 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step329/438 ━━━━━━━━━━━━━━━━━━━━ 0s 459us/step438/438 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step
Per-Class Performance Analysis
Let’s analyze how each model performs for different digits:
print("\nDetailed Per-Digit Analysis:")
print("-" * 50)
for i in range(10):
= knn_cm_percent[i,i]
knn_accuracy = nn_cm_percent[i,i]
nn_accuracy
print(f"\nDigit {i}:")
print(f"KNN Accuracy: {knn_accuracy:.1f}%")
print(f"Neural Network Accuracy: {nn_accuracy:.1f}%")
print(f"Difference: {(nn_accuracy - knn_accuracy):.1f}%")
Detailed Per-Digit Analysis:
--------------------------------------------------
Digit 0:
KNN Accuracy: 98.1%
Neural Network Accuracy: 98.1%
Difference: 0.0%
Digit 1:
KNN Accuracy: 99.2%
Neural Network Accuracy: 98.9%
Difference: -0.3%
Digit 2:
KNN Accuracy: 94.1%
Neural Network Accuracy: 97.1%
Difference: 3.0%
Digit 3:
KNN Accuracy: 95.0%
Neural Network Accuracy: 96.2%
Difference: 1.3%
Digit 4:
KNN Accuracy: 93.6%
Neural Network Accuracy: 98.0%
Difference: 4.4%
Digit 5:
KNN Accuracy: 94.1%
Neural Network Accuracy: 97.4%
Difference: 3.3%
Digit 6:
KNN Accuracy: 97.2%
Neural Network Accuracy: 99.1%
Difference: 1.9%
Digit 7:
KNN Accuracy: 92.7%
Neural Network Accuracy: 97.5%
Difference: 4.7%
Digit 8:
KNN Accuracy: 89.9%
Neural Network Accuracy: 94.8%
Difference: 4.9%
Digit 9:
KNN Accuracy: 92.0%
Neural Network Accuracy: 96.8%
Difference: 4.8%
Prediction Speed Analysis
To understand real-world performance implications, let’s analyze prediction speeds for different batch sizes:
= [1, 10, 100, 1000]
batch_sizes = {'knn': {}, 'nn': {}}
results print("\nPrediction Speed Analysis:")
print("-" * 50)
for batch_size in batch_sizes:
# Select subset of test data
= X_test[:batch_size]
X_batch
# KNN timing
= time.time()
start_time = knn_model.predict(X_batch)
_ = time.time() - start_time
knn_time 'knn'][batch_size] = knn_time
results[
# Neural Network timing
= time.time()
start_time = nn_model.predict(X_batch, verbose=0)
_ = time.time() - start_time
nn_time 'nn'][batch_size] = nn_time
results[
print(f"\nBatch size: {batch_size}")
print(f"KNN prediction time: {knn_time:.4f} seconds")
print(f"Neural Network prediction time: {nn_time:.4f} seconds")
print(f"Time per image - KNN: {(knn_time/batch_size)*1000:.2f}ms")
print(f"Time per image - NN: {(nn_time/batch_size)*1000:.2f}ms")
Prediction Speed Analysis:
--------------------------------------------------
Batch size: 1
KNN prediction time: 0.0149 seconds
Neural Network prediction time: 0.0273 seconds
Time per image - KNN: 14.87ms
Time per image - NN: 27.28ms
Batch size: 10
KNN prediction time: 0.0339 seconds
Neural Network prediction time: 0.0262 seconds
Time per image - KNN: 3.39ms
Time per image - NN: 2.62ms
Batch size: 100
KNN prediction time: 0.0607 seconds
Neural Network prediction time: 0.0317 seconds
Time per image - KNN: 0.61ms
Time per image - NN: 0.32ms
Batch size: 1000
KNN prediction time: 0.8629 seconds
Neural Network prediction time: 0.0456 seconds
Time per image - KNN: 0.86ms
Time per image - NN: 0.05ms
Key Findings and Business Impact
Overall Accuracy
Neural Network: 96.7606 (97.1%)
KNN: 91.9718 (94.7%)
The Neural Network provides a 2.4 percentage point higher accuracy.
Training Performance
KNN Training Time: 6.42 seconds
Neural Network Training Time: 11.20 seconds
Prediction Speed
Small batches (1-100 images): KNN performs faster Large batches (1000+ images): Neural Network shows superior performance Neural Network scales better for production workloads
Error Analysis
Both models struggle most with visually similar digits (3/5, 4/9, 7/9) Neural Network shows more consistent performance across all digit classes KNN shows higher variability in accuracy between different digits
Business Implications
For real-time, single-image processing: KNN might be preferable due to faster prediction times For batch processing: Neural Network is clearly superior Trade-off between setup time (KNN faster) vs long-term performance (NN better) Memory requirements favor Neural Network for large-scale deployment
Deployment Considerations
KNN requires storing entire training dataset (higher memory usage) Neural Network has fixed memory footprint after training Neural Network offers better scalability for production systems
Conclusion
- Neural Network (NN) clearly outperforms the K-Nearest Neighbors (KNN) model in terms of both accuracy and handling more complex patterns in the data. It also shows better scalability as the dataset grows.
- KNN is still a useful algorithm for simpler datasets or when interpretability and speed are more important than accuracy, but Neural Networks are better suited for high-accuracy tasks, especially with larger and more complex datasets.
Key Takeaways
- KNN is advantageous for smaller datasets and when simplicity and interpretability are priorities.
- Neural Networks are ideal for larger or more complex datasets where advanced feature extraction and higher accuracy are desired.
—– below has not yet started —-
Model Comparison The code you provided already includes excellent comparison functionality:
Accuracy metrics Confusion matrices Per-digit performance analysis Training time comparison Inference time comparison
Custom Prediction Testing
Section for testing with your own handwritten digits Process for loading and preprocessing custom images Comparison of how both models perform on your custom input
Would you like me to elaborate on any of these sections or help you implement specific parts of the outline? I can also help you create the actual Quarto markdown structure with code chunks if you’d like.