Red Robin

1. INTRODUCTION

“Ever seen a bird in your garden and wondered what kind it is? Just take a picture of it and upload it to our webpage. It’ll tell you what you’re looking at.”

This project aims at functioning like a browser-based version of the well-known plantnet-app but for the classification of birds. A first release of this application has been realized in a hands-on approach as part of our LAB. Currently only 3 kinds of birds will be detected. This ILV-portfolio, however, is not completely connected to the LAB and aims at theoretically setting up this project on a larger scale. It also assumes that we have a client with whom we formulate requirements while delving into data science methodology, architecture, design patterns and other contents we discussed in class.

The requirements analysis documents can be found at the GitHub project “Bird_Requirements” (https://github.com/EliReckless/Bird_Requirements) and the first release draft from the LAB can be found at the GitLab project “My Bird Classifier” (https://gitlab.web.fh-kufstein.ac.at/2010837035/my-bird-classifier). The deployed release draft webpage can be found at: https://birdyclassifier.herokuapp.com/

1.1 Purpose

The aim of the project is to build a webpage where users can upload their bird pictures in order to classify them. (At first only three categories of birds will be detected: Great tit, robin and sparrow.) Our ficticious client is interested in creating a service similar to plant-web, the commercial benefit is supposed to be derived from advertisement. It can be assumed that the client brings sufficient domain knowledge to the table (e.g. for the creation of expert labels if necessary)

1.2 Intended Audience

The intended users are people interested in birds and ornithologists as well as casual users. Since the client wanted to keep an open mind towards use cases in science, these are also shortly mentioned in our explanation below. However, these use cases might need additional functionality not in the scope of this portfolio.

1.3 Intended Use

Classify birds based on pictures uploaded to our website. Mostly casual use in order to make the app commercially viable, but also ornithologists as well as possibly some uses in science. This might make additional requirements necessary (e.g. detection not based on an uploaded image but on images automatically taken from a camera live stream.

1.4 Scope

The scope of the first release should be a working webpage where great tit, robin and sparrow pictures in jpg and png format can be uploaded, rotated and sharpened by the user in order to classify the bird species shown in the picture. This is the LAB-Part of the present project which can be understood as the MVP that serves as a basis for further exchange with the client. Starting from this MVP it will be easier to explicitly work out the client’s wishes and expectations which will of course include a plethora of different bird species including localization and so forth.

1.5 Risks and further steps

There are potential risks in the development and deployment of the product, for example the classification could be wrong, the upload of the user picture could fail due to wrong picture formats or the webpage could be offline. In order to minimize these risks, testing will be implemented and the webserver will be hosted with an external company that has the competence to maintain the webapp 24/7.

In later releases, when more birds can be classified and the tool might be used in an international context, new and also rare birds species will have to be classified in order to provide for an exhaustive classification. However a certain amount of pictures is needed to train the model. If these pictures are not available on the internet or the training of the model already takes very long, those birds will not be able to be classified by the tool. Thus, as described in the section on CRISP-DM, the client themselves might be able to provide pictures based on their expertise.

2. OVERALL DESCRIPTION

2.1 User Needs

The target group for our web app are ornithologists as well as people interested in birds that would like to know which birds they have seen and been able to take pictures of. In later release states also scientists, environmentalists and city officials might be interested in the app when the location and time of the bird sighting can be provided.

2.2 Assumptions and Dependencies

Factors that impact our ability to fulfill the requirements outlined in this paper are definitely hardware limitations in terms of GPU for training of the classification model as well as manpower limitation since the pictures used to train the model can be scraped from the internet, but need to be sorted out manually in order to remove non-bird pictures that are found by the scraper.

Another limitation is the cloud data space. Since the model is a student project and should be hosted for free and supplied for free, the cloud space is limited, which makes it difficult to deploy a bigger model with more bird classifications.

For this project a model trained for the machine learning lab has been reused and needs to be trained more intensively to gain a better accuracy.

3. SYSTEM FEATURES AND REQUIREMENTS

Overview

3.1 Functional Requirements

The user must be able to upload a bird picture in jpg or png format, have certain preprocessing steps and press a button in order to receive the correct classification for the picture. The classification of the picture is limited to three birds: great tit, robin and sparrow for the first release.

The preprocessing steps the user can apply to the image are as follows:

  • rotate image left
  • rotate image right
  • flip image
  • sharpen image
  • contrast image

Upload

User Preprocess

Show Prediction

Future functional requirements might include more categories of birds as well as time and location of bird sightings in order to gain a deeper understanding of the bird population in certain regions.

3.2 Technical Requirements

Common technical requirements

In order to maintain a common programming platform, a docker development container shall be used with Visual Studio Code. The requirements shall be documented in an requirements.txt file and contain:

As a collaboration and CI/CD platform GitLab is used hosted by the FH Kufstein. The web app is programmed with streamlit for the first release. However a change to flask that is more flexible might be intended in a later state. The deployment for now will be on heroku, a platform hosted by Salesforce, that provides some limited free space for small applications. As the model grows, this deployment strategy might also be subject to change, depending on the required capacity of the app and the available offers of cloud space providers.

Logging has to be used for recording and traceability of error states in the software process. Since an evaluation by administrators and software developers is planned and the content might be subject to legal restrictions, it must not contain any confidential information. The bird classifier should log the usage of the webpage as well as uploading errors with the format of the pictures. If there are many pictures of a certain format uploaded, the format should be added to the allowed files. In case any error occurs on the webpage, the logging should record this as well.

Currently the uploaded image by the user is always replaced by the new uploaded picture. In future the pictures should be saved as well as the logging and the predicted accuracies. Thus a database connection will be needed at a later release state.

Logging

Additional common requirements include logging for the prediction process itself as well as for given errors.

Prediction Logger

Error Logger

Specific technical requirements

For the classification of the bird image we need to resize and reshape the uploaded and preprocessed image from the user to a numpy array format in float32. This array will be handed over to the model that has been trained and saved in advance in a jupyter notebook and is loaded to predict the user image.

For the Webpage the streamlit app.py needs to be created and a Procfile has to be set up for the deployment to heroku.

Image Converter

Model Loader

Streamlit App

3.3 Non-Functional requirements

Non-functional requirements are important to define a framework for the software architecture. This is because different software architecture styles might have different impacts on the non-functional requirement and vice-versa. For example, according to dev.to, a Batch Sequential architecture can have a negative impact in usability while a MVC architecture has a great performance in this area. The latter example once again highlights the ever given trade-offs within the field of Software Development and Data Science.

The main non-functional requirements comprise the following:

  • usability
  • life time
  • cost
  • scalability
  • reliability
  • concurrency
  • performance
  • security
  • availability
  • simplicity
  • reusabilty
  • portability
  • testability

The primary goal of the bird-classifier is collecting as many happy customers as possible. And the key to customer happiness is a smooth customer experience. When it comes to a classification algorithm such as the bird-classifier, the key aspects of this smooth experience is clearly the algorithm’s accuracy as well as the duration of the prediction. According to latest scientific research, customers want faster responses from their application interactions than ever.

To make our customers happy, we therefore decided to go with the following non-functional requirements:

  • performance: the accuracy of our model’s prediction is supposed to be at least 80% in the first step. To achieve this the model needs to be trained with more pictures and the layers on top of the VGG16 model might need adaptation. For the local model training the jupyter notebook containing the model could use Goolge Colab GPU resources. However, for future releases the manual training process should be switched to a regularly monitored, automated training including more bird pictures and species. Further, the 80% Accuracy shall be combined with a prediction time of 7 seconds max.

  • scalability: the scalabilty of the bird-classifier is key to handle the requests of the constantly growing and bird loving bird-classifier community. The current server might run out of resources, e.g. CPU or RAM. More resources could be bought and the architecture remains the same. However, with ever growing user numbers, this is not a permanent solution. A second possibility is to take advantage of a so called-load balancer to evenly distribute the requests a cross multiple servers. There are multiple ways to deal with this non-functional requirement all of which have their advantages and disadvantages.

Accuracy

Speed

Scalability

3.4 User Interface Requirements

The user must be able to access the webpage via internet and upload his or her picture to it. A further processing step to rotate or sharpen the picture is nice to have and shall be integrated. A button should be supplied to the user to start running the prediction after he has finished his or her preparation of the picture. The prediction should be shown to the user after he or she has pressed the button.

Interface Design

Wireframes

3.5 External Deliverables

Similar projects like the Eagle Vision project (https://github.com/Joshmantova/Eagle-Vision) have been set up to classify birds. However our classification model strives to help users worldwide and ornithologists in particular to identify more species of birds with a higher accuracy. To achieve this the product might have to move from a free cloud platform to a paid platform that offers more performance. The fee could be refinanced via donations and/or public founding programs.

3.6 Activity Requirements

In order to keep track of the ongoing programming and to be able to eliminate errors and problems at short notice the programming activity must be documented. This will also help to enable new employees to join, be able to implement changed requirements in the system appropriately and react to changes in the technical environment. It will also help the stakeholders to understand the architecture decisions and give developers specifications regarding the architecture. The documentation shall be made using Sphinx.

3.7 Security Requirements

Since no sensitive data is stored, security requirements do not have the highest importance. However, it needs to be ensured that no misuse is done and personal and compromising images are not stored by the model.

4. DATA SCIENCE METHODOLOGY CONCEPT – CRISP-DM

4.1 Why CRISP-DM

Besides the respective FH-courses which already gave us an introduction to the process, we also decided to use CRISP-DM relying on an online source (Datascience-PM; https://www.datascience-pm.com/crisp-dm-2/) which summed up pros and cons in a simple fashion. Despite CRISP-DM having been developed specifically for Data Mining Projects it can and has been applied to all sorts of projects. Bill Vorhies claims that since data science projects “start with business understanding” and “data that must be gathered, explored, and prepped in some way”, so CRISP-DM is applicable to “even the most advanced of today’s data science activities” (https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome).

Despite some criticism of being too rigid, it is also possible to employ a more “loose CRISP-DM implementation” (Datascience-PM) which offers flexibility. Looking at the other reasons not to use CRISP-DM – it being documentation heavy, not modern i.e. predating big-data, and not suited for the coordination of bigger projects – we argue that a strong emphasis on documentation would have a positive effect on our learning as part of our first software project. The second and third inhibiting factors don’t apply because we’re not working with big data per se and we are working as a “small, tight-knit team” (Datascience-PM) which CRIPS-DM assumes.

4.2 CRISP-DM in our project

Crisp dm in our project

Business Understanding

The first step in our project would be to sit down with the (fictitious) client in form of a workshop, to try to understand how their business works and what they aim to accomplish. What do they expect the project to do for them in the long run, what are possible insights they want to gain, etc. But as mentioned above, we are not in a situation where the client just provides a lot of data and needs us to make sense of it. In fact, the expected outcome is quite clear: the classification of different kinds of birds based on images. It would therefore be more important to us to understand the most likely use cases, e.g. will private persons use the final project to classify birds in their garden (likely cell phone pictures) or are we dealing with a scientific purpose, where better image quality is likely.

Data Understanding

It is necessary to establish an idea of the characteristics of what the pictures uploaded in the final product (by the user) will look like. What image quality can be expected? Which formats? Since we are working with images of birds, it might be that the user uploads cell-phone pictures with only a very small portion of the image displaying the object of interest. Instead of collecting data it should be provided by the client in this case, including expert labels. But in our practical realization of the classifier we work with quite unproblematic and high-quality image from a search engine. It may become necessary to go back to the previous step of the CRISP-DM during this stage.

Data Preparation

Data preparation in our project focuses especially on preparing the images for classification through our ML model. (Preprocessing like reformatting, cropping, resizing, etc.). One very relevant task will be the cleaning of the data. It is possible that not all data provided by the client (or parsed through an image downloader in the real-world scenario) will be labeled by experts or otherwise incorrectly labeled. The client will once again be included in this process to ensure quality and consolidate expectations.

Modeling

The modeling technique of choice for our classification problem should quite clearly be a convolutional neural network and since the training from scratch would take up too many resources a pre-trained base-model is used. A test design should to be agreed upon with the client and then the training and tweaking begins. Assessment of the model has to be put into relation to the client’s expectations – we’re looking for a good enough model given the resources, not the best possible model.

Evaluation

The evaluation will not include only evaluation metrics of our model and prediction result, but needs to include a test ‘in the field’. Does the project deliver the expected result when employed by users? Are the users (and the client) satisfied when it comes both to result AND usability of the (preliminary) user interface?

Deployment

Deployment in this project will include an approach regarding monitoring and maintenance. and a final report delivered to the customer. It might become necessary to conduct additional model-tuning later on and deploy the updated model or model weights. This might have to do with additional birds that are added to the classification scope or user feedback etc.

Dataflow

Schematic of data flow in this project:
Client’s sample images -> expert labelling -> additional data collection/delivery by client -> (more expert labelling) -> data cleaning -> checking with experts, agreeing on dataset for modeling -> preprocessing for use in model -> training of model -> evaluation model -> exporting model/weights to use in final product

Dataflow of image uploaded by user: Upload of image -> applying of chosen preprocessing step by user -> saving of image specifics in DB on prediction (including logging)

Dataflow Graphs

5. TESTING - WHY WE USE UNIT TESTING AND HOW WE WOULD POSSIBLY IMPLEMENT IT

In order to improve the quality of our software we introduce unit testing and write tests to automatically check if vital parts of our project are working as intended. Besides testing the initial code base before we deploy the project, we will use testing to assure that it continues to work after deployment. This is essential as we plan to expand the platform and hope to classify a wide range of birds in the future. If something stops working properly, we want to find out about it as soon as possible. In combination with continuous integration (CI) we provide the necessary tools to do that.

As we improve bird classifiers accuracy, we will need to retrain the model using additional training data. Testing could be used to validate that the model gets the expected data as expected for training. Since our resources are limited (especially GPUs for training our CNN) validating training data with testing could save us a lot of valuable run time down the line.

6. DATA SCIENCE ARCHITECTURE & WEB TECHNOLOGIES

6.1 Why is a data science architecture essential for us?

Data scientists incline to use ad hoc tactics to achieve the goals. We have a lot of creative hacking methods in different programming languages on different machine learning frameworks. We also have complete freedom of choice with programming languages, tools, and frameworks to improve creative thinking and development. However, data scientists must entirely shape their assets before delivery because there can be many pitfalls if they are not. (https://developer.ibm.com/technologies/artificial-intelligence/articles/architectural-thinking-in-the-wild-west-of-data-science/)

6.2 Which data science architecture is the best choice for us?

Before we can answer this question, we have to explain our project shortly in one sentence. The project aims to build a web page where users can upload their bird pictures to classify them. For this project goal, we have to meet the following two conditions: We need a website where a user can upload his photo. We need a model to classify this picture.

6.3 To achieve these conditions, we decided to use the following tools:

Tensorflow:

TensorFlow is an open-source library created by the Google Brain team for numerical computation and large-scale machine learning. TensorFlow bundles together a slew of machine learning and deep learning (aka neural networking) models and algorithms and makes them valuable. (https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-library-explained.html)

For the picture classification, we use the VGG16, a Very Deep Convolutional Networks for Large-Scale Image Recognition with 16 weight layers and add some individual bird classification layers to it. In order to deploy the model online for free, we need to use a smaller Tensorflow version called Tensorflow lite.

Streamlit:

Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science.(https://docs.streamlit.io/en/stable/)

With Streamlit, we can create our interactive environment, and we can create a web app with powerful content for our customers. We can get values from users and change the result interactively according to their inputs. It is very easy to install and create a web app with Streamlit.

Docker Container:

Container images become containers at runtime, and in the case of Docker containers - images become containers when they run on Docker Engine. Available for Linux and Windows-based applications, containerized software will always run the same, regardless of the infrastructure. Thus, containers isolate software from its environment and ensure that it works uniformly despite differences between development and staging. (https://www.docker.com/resources/what-container)

Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing our Docker containers. (https://docs.docker.com/get-started/overview/#:~:text=Docker%20uses%20a%20client%2Dserver,to%20a%20remote%20Docker%20daemon.)

Heroku:

Heroku is a cloud platform that lets us build, deliver, monitor and scale our app. Also, Heroku focuses relentlessly on apps and the developer experience around apps. (https://www.heroku.com/what)

Heroku lets us deploy, run and manage applications written in different programming languages. A Heroku application collects source code, perhaps a framework, and some dependency description that instructs a build system as to which additional dependencies do to build and run the application need.

When Heroku receives the application source, it initiates a build of the source application. The build mechanism is typically language-specific, but follows the same pattern, typically retrieving the specified dependencies, and creating any necessary assets (whether as simple as processing style sheets or as complex as compiling code). (https://devcenter.heroku.com/articles/how-heroku-works)

This platform is chosen since it is free of use up to a certain amount of space and capacity and also very self-explanatory.

6.4 Architecture Overview

In order to gain a better understanding of our proposed architecture, an overview with the process steps and the used toolkit is provided:

Archtiecture Overview

7. APPROVAL

Once all the requirements are defined, it is important to request approval by all stakeholders with the latest version of the document to be able to avoid misunderstanding and incomplete coverage of the user requirements. In case of costs for a non-free deployment, it must be clarified who is being charged for it and on which basis.