Assignment 3

Write a software program that does all of the following:

  1. Loads in the data from cars.data.csv. The data can be stored anyway you choose, in any data structure you choose (probably a list of some kind). The data should load on startup by referencing a file path, or even better, a file picker dialog box.

  2. In the main portion of your program you should run the following operations and print the result to the console (except number 4). How you achieve this is up to you. However, operations need to be performed on the data itself (don’t hard code the solution).
    • Print to the console the top 10 rows of the data sorted by safety in descending order
    • Print to the console the bottom 15 rows of the data sorted by maint in ascending order
    • Print to the console all rows that are high or vhigh in fields buying, maint, and safety, sorted by doors in ascending order. Find these matches using regular expressions.
    • Save to a file all rows (in any order) that are: buying: vhigh, maint: med, doors: 4, and persons: 4 or more. The file path can be a hardcoded location (name it output.txt) or use a dialog box.
  3. Your code needs to be able to handle exceptions. It should handle all data as specified by the data definition document from Lesson 2, and throw some kind of error when it encounters data that doesn’t match that format. To test this, I will add the line vlow, vlow, 1, 1, vbig, vhigh to the .csv file. Your program should gracefully handle this line in all cases from the previous part.

import Tkinter, tkFileDialog, csv, re
root = Tkinter.Tk()
root.withdraw()
filePath = tkFileDialog.askopenfilename(parent = root)
infileName = open(filePath)
cars_data = []

def program(case):
    infileName = open(filePath)
    validate = filter(lambda col: 
            not re.match("v?high|med|low", col[0])              #buying
            or not re.match("v?high|med|low", col[1])           #maint
            or not re.match("2|3|4|5more", col[2])              #doors
            or not re.match("2|4|more", col[3])                 #persons
            or not re.match("small|med|big", col[4])            #lug_boot
            or not re.match("low|med|high", col[5])             #safety
            or not re.match("unacc|acc|good|vgood", col[6]),    #class_val
               csv.reader(infileName))
    row_count = sum(1 for row in validate)
    if row_count != 0:
        print "Encountered and removed", row_count, "row(s) of invalid data."
 
    infileName = open(filePath)
    cars_data = filter(lambda col: 
        ("vhigh" or "high" or "med" or "low") in col[0]         #buying
        or ("vhigh" or "high" or "med" or "low") in col[1]      #maint
        or ("2" or "3" or "4" or "5more") in col[2]             #doors
        or ("2" or "4" or "more") in col[3]                     #persons
        or ("small" or "med" or "big") in col[4]                #lug_boot
        or ("low" or "med" or "high") in col[5]                 #safety
        or ("unacc" or "acc" or "good" or "vgood") in col[6],   #class_val
           csv.reader(infileName))

    if case == 1:
        return sorted(cars_data, key=lambda col: col[5], reverse=True)[0:10]
    elif case == 2:
        return sorted(cars_data, key=lambda col: col[1], reverse=False)[-15:-1]
    elif case == 3:
        data = filter(lambda col: 
            re.search("high", col[0]) 
            or re.search("high", col[1]) 
            or re.search("high", col[5]), cars_data)
        #data = filter(lambda col: re.search("high", col[0] and col[1] and col[5]), input)
        return sorted(data, key=lambda col: col[2], reverse=False)
    else:
        data = filter(lambda col: 
            'vhigh' in col[0] 
            or 'med' in col[1] 
            or '4' in col[2]
            or ('4' or "more") in col[3], cars_data)
        root.withdraw()
        outfileName = tkFileDialog.asksaveasfile(mode='wb', defaultextension=".txt")
        csv.writer(outfileName).writerows(data)
        return "Saved:" + outfileName.name
        outfileName.close()
    infileName.close()

if __name__ == "__main__":
    cars_data = csv.reader(infileName)

    print program(1)
    print program(2)
#   print program(3) # Works. Outputs pages of data.
    print program(4)
## [['vhigh', 'vhigh', '2', '2', 'small', 'med', 'unacc'], ['vhigh', 'vhigh', '2', '2', 'med', 'med', 'unacc'], ['vhigh', 'vhigh', '2', '2', 'big', 'med', 'unacc'], ['vhigh', 'vhigh', '2', '4', 'small', 'med', 'unacc'], ['vhigh', 'vhigh', '2', '4', 'med', 'med', 'unacc'], ['vhigh', 'vhigh', '2', '4', 'big', 'med', 'unacc'], ['vhigh', 'vhigh', '2', 'more', 'small', 'med', 'unacc'], ['vhigh', 'vhigh', '2', 'more', 'med', 'med', 'unacc'], ['vhigh', 'vhigh', '2', 'more', 'big', 'med', 'unacc'], ['vhigh', 'vhigh', '3', '2', 'small', 'med', 'unacc']]
## [['low', 'vhigh', '4', '4', 'small', 'med', 'unacc'], ['low', 'vhigh', '4', '4', 'med', 'med', 'acc'], ['low', 'vhigh', '4', '4', 'big', 'med', 'acc'], ['low', 'vhigh', '4', 'more', 'small', 'med', 'unacc'], ['low', 'vhigh', '4', 'more', 'med', 'med', 'acc'], ['low', 'vhigh', '4', 'more', 'big', 'med', 'acc'], ['low', 'vhigh', '5more', '2', 'small', 'med', 'unacc'], ['low', 'vhigh', '5more', '2', 'med', 'med', 'unacc'], ['low', 'vhigh', '5more', '2', 'big', 'med', 'unacc'], ['low', 'vhigh', '5more', '4', 'small', 'med', 'unacc'], ['low', 'vhigh', '5more', '4', 'med', 'med', 'acc'], ['low', 'vhigh', '5more', '4', 'big', 'med', 'acc'], ['low', 'vhigh', '5more', 'more', 'small', 'med', 'unacc'], ['low', 'vhigh', '5more', 'more', 'med', 'med', 'acc']]
## Saved:C:/Users/josez/Desktop/output.txt

Assignment 4

This assignment will give you a chance to do a few important analytical tasks. There are two parts below, do them both.

  1. Design and implement a system that takes a webpage URL as input. The program will read the page and extract the important text (news story, blog post, etc.) from the page’s source. Writing a program that can do this for any webpage is a major undertaking, so we will just focus on a single page. You can hardcode the link into the program. Take this text and store it in the program to use in the next step.

  2. Take the important text that you extracted from the page and submit it to the Alchemy API for analysis. Specifically, obtain the Ranked Keywords. Once you have the keywords, print to the console the top ten results. Below are the detailed steps:
    • Get an API key from IBM Bluemix.
    • Download the Python SDK from the site.
    • Look at the example provided in the SDK.
    • Import the Alchemy module into your code.
    • Call the function to get Ranked Keywords.
    • The result will be in JSON. Process that JSON and get the top ten keywords, and their relevance.
    • Print those results to the console.
import bs4, urllib2, json
from watson_developer_cloud import AlchemyLanguageV1

# BEAUTIFUL SOUP
url = "https://win95.ajf.me/"
page = urllib2.urlopen(url)
soup = bs4.BeautifulSoup(page.read(), 'html.parser')
main = soup.find("main").get_text()
print(main.encode('utf-8'))

# IBM BLUEMIX ALCHEMY API
alchemy_language = AlchemyLanguageV1(
    api_key = '61179a3cb49f6f43097388765c07acde60c635bd')
response = json.dumps(
    alchemy_language.combined(
        text = main,
        extract = 'keywords',
        sentiment = 0,
        max_items = 10),
    indent = 2)

# JSON
keywords = json.loads(response)['keywords']
print("Relevance  Keywords\n=========  ========")
for i in range(10):
    words = keywords[i]['text']
    score = keywords[i]['relevance']
    print score, " ", words
## 
## Happy Birthday Windows 95!20 years of Windows 95!August 24, 1995–August 24, 2015
## 
## Windows 95 in your browser
## Fancy a blast from the past? Curious about the Microsoft Windows release that introduced such familiar concepts as the Desktop, Start Menu, Taskbar and Notifications Area? Well, now thanks to the wonders of Emscripten and DOSBox, and modern JavaScript runtimes, you can try it out in your browser.
## This is strictly for educational purposes. Windows 95 is a copyrighted piece of software, and Microsoft (and others) have not had their rights expire yet, in fact they probably never will. While Microsoft no longer sell Windows 95 as a retail product, nor do they still sell licenses or support for it (which ended on December 31, 2001), it is still very much protected by copyright law, and you may be infringing it. I would argue that this might (and that is quite tenuous) be protected under United States fair use and United Kingdom fair dealing provisions, given that this is a non-commercial use for the purposes of research and education, which should have no effect on the potential market value of Windows 95. However, I am not a lawyer, and even if I was one, there's no guarantee Microsoft's (or the other copyright holders') would agree with me here. If I am sent a Cease and Desist letter, I shall take this down as soon as possible.
## Also, I do not own any of the trademarks used herein, and this website has no official connection to or endorsement from Microsoft, or any other trademark holder.
## With all that being said: yes, you are proceeding at your own risk by using this, and I am proceeding at my own risk by choosing to host this. If you're willing to do that, then click below:
## Start Windows 95
## If you have a slow connection, the download might take a long time. The disk image is 47MB gzipped (131MB uncompressed), so you'll need to be patient.
## You can fiddle around and have a bit of nostalgia (or, if you are one of the newer generations, a learning experience), but anything you do won't be saved, it's entirely ephemeral. This is because the disk image resides in a temporary filesystem (i.e. your device's RAM) and will be lost once you leave the page.
## FAQ
## Why is it so slow?
## A factor of a few things:
##         
## Windows 95 isn't being run directly, rather it is running on an emulated CPU
## DOSBox isn't really optimised for Windows 95, it's really for DOS games and, at a stretch, possibly Windows 3.x
## Because DOSBox isn't optimised for Windows 95, it doesn't have native disk drivers, and instead Windows 95 has to go via DOS ("real-mode disk access") to read the hard disk
## DOSBox isn't running natively on your machine, it's been compiled to JavaScript using Emscripten - even with asm.js support, this won't be quite as fast as DOSBox natively
## Em-DOSBox uses Emscripten's "emterpreter" rather than compiling directly to asm.js, because it needs to be able to pause and resume execution, and the emterpreter interpreting bytecode has worse performance than normal asm.js output - unfortunately, turning off emterpreter breaks everything
## Moore's Law is ending
## 
## 
## Why do I keep getting "Emulation aborted due to nested emulation timeout."?
## In some cases, Em-DOSBox will abort when the emulator is taking too long, to avoid freezing the browser. There's not much that can be done about this, unfortunately. Even with a higher timeout, you still see this message and have it abort on you when trying to do certain things (open Internet Explorer, for instance). That being said, I am looking into this.
## If you're never able to get through startup, the one piece of advice I can give here is to use Firefox. It seems to work better here, presumably because of its asm.js support.
## Can I load or save my own files or software?
## The only thing the emulator has access to is the disk image of Windows 95 temporarily stored in memory. Changes to that image aren't saved anywhere, so anything you do in Windows 95 – changing settings, writing poetry in Notepad, defragmenting the hard drive, deleting everything – will be lost once the emulator is stopped. As configured right now, there's no way to attach other drives to the emulator yourself, and there's no networking, so you have no way to get files in or out.
## If you need to run old software that only works under Windows 95, this site can't help you. I suggest installing Windows 95 under DOSBox, or another piece of emulation or virtualisation software (such as VMWare or VirtualBox), on your computer. Alternatively, obtain an old computer which runs Windows 95.
## How was this done?
## I installed Windows 95 in DOSBox using this guide from a virtualised CD, then packaged up the disk image, along with an AUTOEXEC.BAT file and a custom dosbox.conf using Em-DOSBox. Really, all the hard work was done by the Emscripten, DOSBox and Em-DOSBox people. And, of course, the browser vendors and other people who have worked tirelessly to make the modern web platform what it is today. In the process of making this, I never once had to touch the DOSBox source code!
## What version of Windows 95 is it?
## Aha, someone's aware that Windows 95 didn't just have a single version! In this case, it's Windows 95 OSR2. That version had FAT32 and Internet Explorer 3.0, but didn't support the Pentium properly and lacked USB support. It's a CD-ROM install. Well... I think it's OSR2: the install disc has a 1996 timestamp and it has IE3 (like OSR2), yet it reports itself as "4.00.950 C" in System Properties, and the CD-ROM label was WIN_95C... like OSR 2.5. Hmm. Something's weird about that install disc. Or Wikipedia is lying to me.
## Why did you make this?
## Nostalgia! I was watching Politics Unboringed. In it, Mr. Foreman shows us the Internet websites of the big three political parties in Britain... in 1996. On Windows 3.1. - and CANYON.MID played in the background. That tune... it's magical to me, so I started listening to it on YouTube. And that gave me even more nostalgia, and I really wanted to recreate the experience of using Windows 95 from my childhood.
## Now, an astute observer might notice that I'm only 19 at the time of writing. When I was 10, it would have been 2006, eleven years after Windows 95's release, and that is true. But I grew up with ancient computers. My Dad stubbornly refused to upgrade to Windows XP for quite a long time, so the family computer kept running Windows 98SE. As for myself, my first computer was a thrown-out business machine by Dan (a British computer make that went bust before I got that machine, unbeknownst to me). It had 16MB of RAM, a 486DX2 66MHz processor, and had Windows 3.1 and MS-DOS 5 loaded on it. It had no soundcard, but a SoundBlaster 16 was bought off eBay. I loved that thing. After a while messing about on Windows 3.1, I upgraded it to Windows 95 with a disc a friend gave me... I'm fairly certain it wasn't obtained legitimately (no, this isn't the disc this site used), and I think it had OSR2 on it. Starting 95 up for the first time (after almost crying after it failed to correctly write the boot record, which I didn't initially realise how to fix), was pretty magical. And CANYON.MID, rendered both by 3.1 and 95, is pretty magical to me.
## As for why Windows 95 over 3.1? Well, if you want to relive 3.1, there's already michaelv.org. Also, honestly, Windows 95 is more fun. It feels, in a way Windows 3.1 never did and never will, like a "real" operating system. I can't really explain it.
## Oh, and the other reason I did it: because it's fun, duh.
## Who are you?
## My name is Andrea, and I have a website here, if you want to know more about me. If you want to contact me, why not email me?
## 
## 
## 
##         Windows 95 Copyright © 1981–1996 Microsoft Corp.
##         This uses Em-DOSBox, an Emscripten port of DOSBox, Copyright © its respective authors, and licensed under the GNU GPLv2.
##         Background to this page taken from here.
##     
## 
## Relevance  Keywords
## =========  ========
## 0.924588   Windows
## 0.738673   disk image
## 0.700018   DOSBox
## 0.622711   DOSBox source code
## 0.616678   Microsoft Windows release
## 0.606061   Happy Birthday Windows
## 0.536944   modern JavaScript runtimes
## 0.536238   potential market value
## 0.526867   real-mode disk access
## 0.526308   native disk drivers

Assignment 5

This is a small exploration into Data Mining.

  1. Download the new data set on the Lesson 5 page called brainandbody.csv. This file is a small set of average brain weights and average body weights for a number of animals. We want to see if a relationship exists between the two. (This data set acquired above).

  2. Perform a linear regression using the least squares method on the relationship of brain weight br to body weight `bo. Do this using just the built in Python functions (this is really easy using scipy, but we’re not there yet). We are looking for a model in the form \(bo = X \cdot br + Y\). Find values for \(X\) and \(Y\) and print out the entire model to the console.

# IMPORT CSV FILE
import Tkinter, tkFileDialog, csv
root = Tkinter.Tk()
root.withdraw()
filePath = tkFileDialog.askopenfilename(parent = root)
infileName = open(filePath)
brainandbody = csv.reader(infileName)

# REGRESSION
next(brainandbody) # Skips first row of CSV file
br = [] # X-variable
bo = [] # Intercept
n = 0

for col in brainandbody:
    n += 1
    bo.append(float(col[1]))
    br.append(float(col[2]))

infileName.close()

mu_br =  sum(br) / n
mu_bo =  sum(bo) / n
sum_XX = sum([i ** 2 for i in br])
sum_YY = sum([i ** 2 for i in bo])
sum_XY = sum([a * b for a, b in zip(br, bo)])
ss_x = sum_XX - n * mu_br ** 2
ss_y = sum_YY - n * mu_bo ** 2
ss_xy = sum_XY - n * mu_br * mu_bo
beta_1 = ss_xy / ss_x
beta_0 = mu_bo - beta_1 * mu_br

print "Model: bo = ", beta_1, "* br ", "+" if beta_0 > 0 else "-", abs(beta_0)
## Model: bo =  0.902912947729 * br  - 56.8555454286

References

https://newcircle.com/s/post/1572/python_for_beginners_reading_and_manipulating_csv_files

https://docs.python.org/2/library/csv.html

https://wiki.python.org/moin/HowTo/Sorting

https://docs.python.org/2/library/re.html

http://www.pythonforbeginners.com/systems-programming/using-the-csv-module-in-python/

https://www.youtube.com/watch?v=5SCiigk5g4Q # 22:00 to 23:33; 31:10 to 32:37

https://www.youtube.com/watch?v=E2MGwEK4pCc

https://www.youtube.com/watch?v=oo61RRbtSG4

https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (up to date code)

https://pymotw.com/2/json/

https://docs.python.org/3/library/json.html

https://www.reddit.com/r/learnpython/comments/3nx9ch/json_load_vs_loads/

http://stackoverflow.com/questions/2835559/parsing-values-from-a-json-file-using-python

http://www.w3resource.com/JSON/python-json-module-tutorial.php

https://github.com/watson-developer-cloud/python-sdk

https://www.ibm.com/watson/developercloud/alchemy-language/api/v1/?python#

http://rpubs.com/josezuniga/257312

https://docs.python.org/2/library/functions.html

https://evanhahn.com/python-skip-header-csv-reader/

http://stackoverflow.com/questions/7368789/convert-all-strings-in-a-list-to-int

http://stackoverflow.com/questions/12555443/squaring-all-elements-in-a-list

http://stackoverflow.com/questions/10271484/how-to-perform-element-wise-multiplication-of-two-lists-in-python