PROJECT RECAP

Objective: Develop an image classification ML model for diverse categories such as sunsets, mountains, and beaches.
Functionality: Categorize and identify images with high accuracy.
Added Feature: Offer travel recommendations based on user preferences.
Purpose: Address the need for a dependable image classifier in travel planning.
Impact: Enhance the overall travel planning experience.

FEEDBACK SESSION:

Peers’ Comments:

- Model Processing: Curiosity about the choice of CNN for image classification, and the metrics for gauging user engagement and model performance.
- Recommendation: The integration of the image classification and recommendation systems, addressing biases.
- Data Scrapping: Ensuring data quality from user-generated content on Flickr. The specific step takens for data handling and preprocessing.

Professor’s Feedback:

- Project’s Scale: Suggestion on Scaling down the project from integration of other systems.
- Recommendation Feedback: Inquiry on the validity and possibility of evaluation for a Recommendation Model.

~~Travel Recommendation System~~ ==> Image Recommendation System

Data Summary

Flickr API Consideration:

- Authentication: Setting up the Flickr API key using setFlickrAPIKey
- Rate Limiting: Understand and handle rate limits to ensure uninterrupted data scraping.
- Ethical Considerations: Consider privacy implications and attribution requirements.

Challenges of Data Scraping From Flickr:

- Conversion between API Calls and Images Downloaded: Data returns a list of characters with information of ID, URL, TAGS etc for the given image, instead of actual images.
- Data Duplication: API calls from Flickr return multiple duplications of images due to unknown backend functions.
- User-Generated Data: Inherent variability, potential biases, and noise, LOTS AND LOTS OF NOISES!!!!!.

Step To Collect Images from Flickr API

Set Up Environment: Load the necessary libraries (FlickrAPI, dplyr, readr, httr) and set Flickr API key using setFlickrAPIKey().
API Call Function: Create a function to fetch photos with given tags, using pagination to avoid duplicates and limit tag words to reduce spam.
Filter and Select: From the fetched photos, select relevant columns and filter out unwanted tags to avoid specific words.
Download Images: Execute the download_images function with the filtered URLs to download and save images to a specified folder on your local machine.

Final AI/ML Procedures

Download 7,000 images from Drive, allocating them into ‘train,’ ‘validation,’ and ‘test’ folders, each with ‘city,’ ‘mountain,’ and ‘river’ subfolders.
Distribute images into these subfolders: 600 for training, 200 for validation, and 100 for testing in each category.
There will be a total of 900 images per category divided amongst the training, validation, and testing folders.
Implement VGC 16 application functions extract_features() and reshape_features() for data preprocessing.
Construct a 7-layer CNN model with 2,138,755 parameters and fit the processed data into this model for training and validation.

Final AI/ML Result

Utilized the pre-trained VGG16 model to create features and defined a densely connected classifier for data processing.
Analysis of performance graphs indicates that the VGG16-based model (model2) demonstrates superior performance.
The performance of model2 on both training and validation tests is satisfactory for finalization.

Final AI/ML Summary

The VGG16 improves the accuracy to ~82% which is considered to be good for image classification model.
The better performance of fine-tuning the VGG16 and training a CNN from scratch is visible.
Due to the quality and quantity of data, we get varied results for this project and the model slightly overfits the training data.

## Model: "sequential_1"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  dense_6 (Dense)                    (None, 256)                     2097408     
##  dropout_4 (Dropout)                (None, 256)                     0           
##  dense_5 (Dense)                    (None, 128)                     32896       
##  dropout_3 (Dropout)                (None, 128)                     0           
##  dense_4 (Dense)                    (None, 64)                      8256        
##  dropout_2 (Dropout)                (None, 64)                      0           
##  dense_3 (Dense)                    (None, 3)                       195         
## ================================================================================
## Total params: 2138755 (8.16 MB)
## Trainable params: 2138755 (8.16 MB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________

## 10/10 - 0s - loss: 0.5228 - accuracy: 0.7767 - 126ms/epoch - 13ms/step

##      loss  accuracy 
## 0.5227583 0.7766666

Group 1 Flickr API - Image Classification Project

Group Members: Amruta Habbu; Bach Hoang Pham; Dang Khoa Tran; Khyati Sharma; Ragini Jakkam