Images are uploaded to github image folder in case they are too small, attmept to zoom in on rpubs document if possible

Setup software on VM

  • Setup virtual machine on GCP compute engine
    • I chose a windows VM with desktop
  • Install Google chrome
  • First step was to initialize VM, Google chrome has a RDP add on which allows you to access VM easily
    • I didn’t choose to load the VM with a container.
      • There is potential that this could cause problems in workflow between team if other members were to use Linux or some other version/platform.
    • Issues aside, loading up VM and installing Anaconda is extremely easy.
      • Anaconda comes with python 3, pip, Jupyter, and some base packages
      • This allows for pipenv setup as well as testing within jupyter for quick editing

Accessing data

  • My initial csv files were loaded into my script files from github
    • Being that this project had us dealing with cloud, I decided all files from would be loaded from Google cloud storage buckets
  • The initial setup of the buckets was easy, but messing around with authorizations was not so easy.
    • That is why i decided to just make my bucket public
      • This can be accomplished through gsutil within Google command prompt and through the web browser like in the image below

Initial problems

  • Writing to buckets or accessing other Google API, from compute engine, requires permissions which need to be set on compute engine on installation.
    • Luckily Google has updated the engine to allow for you to change these permissions, but the engine needs to be stopped in order to accomplish this
  • Writing from python to buckets within virtual machine, required use of a custom function within the Google cloud documentation

End Intial Setup

  • We now have a virtual machine, with anaconda loaded and access to read and write to Google storage buckets

Create testing environment

  • Through experimentation within jupyter, I was able to edit my original scripts to begin testing in a clean environment.
  • This highlights a problem with the data flow process.
    • Building requirements.txt file is more of an iterative process.
    • I wanted to make my project flow purely pythonic
      • My requirements.txt file, needs to exist outside of my python code.
        • Perhaps this is semantic, as several other aspects of the project were done through web browser, but it stuck with me that I couldn’t build a requirements text from within an empty python environment
        • Therefore, I chose to load a preconstructed requirements.txt file into my virtual environment, which i document below

Workflow

  • All steps below occur on new virtual environment compute engine

Step1- create pipenv

  • Easy with anaconda

Step2- load _requirements

  • Load in txt file
  • below I show packages in clean environment

Step3- load csv files from cloud bucket

  • Creates local df of train/test csv from original project

Step4- Begin loading scripts

  • Train_model.py does the following
  • Writes a classification_report_csv and pipeline locally to cwd
  • Writes a classification_report_csv and pipeline to Google cloud bucket
  • Images of new bucket displayed below

Step4- update old score script

  • load pipeline off cloud
    • new updated code posted below
  • Incorporating above code into score.py
  • Incorporating write to cloud for results