Plot everything! In this assignment we will explore the matplotlib library and its features by plotting the results of previous assignments. Please do all of the following:
buying
, maint
, safety
, and doors
fields with one plot for each for a total of four. Make each graph a subplot of a single output..csv
fileobjects.png
from homework 8. The image should be in the background and the object centers can be small circles or points at or around the center points.As with previous assignments, many of the details of the implementation are up to you. However, keep this in mind. Much of the purpose of plotting is to communicate data and the information therein effectively and efficiently. Your plots should be able to be interpreted easily and be robust enough to express the data used to create it. This means include things like labels, legends, proper scaling, etc. Also, you don’t need to perform the operations themselves from the previous homework. You can just have a static list of data, e.g. for the center points in part 3 you can have a list hard coded or read in somewhere.
import matplotlib.pyplot as plt
# PLOT CARS DATA
import pandas, numpy
url_cars = "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20602/cars.data.csv"
cars_vars = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'class_val']
cars_data = pandas.read_table(url_cars, sep=',', header = None, names = cars_vars)
variable = ['buying', 'maint', 'safety', 'doors']
plt.figure(1)
for i in range(0, len(variable)):
plt.subplot(221 + i)
frequency = cars_data.groupby(variable[i])[variable[i]].count()
features = list(frequency.index)
frequencies = list(frequency)
y_pos = numpy.arange(len(features))
plt.bar(y_pos, frequencies, align='center', alpha=0.5)
plt.xticks(y_pos, features)
plt.ylabel('Frequency')
plt.title(variable[i].title())
plt.savefig(".\DATA_602_HW10_1.png")
# PLOT REGRESSION
import pandas, numpy
from scipy import stats
url_brainbody = "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20602/brainandbody.csv"
brainandbody = pandas.read_csv(url_brainbody)
br = brainandbody["brain"] # X-data
bo = brainandbody["body"] # Y-data
slope, intercept, r_value, p_value, std_err = stats.linregress(br, bo)
plt.figure(2)
plt.plot(br, bo, '.')
plt.plot(br, slope*br + intercept, 'r-')
plt.title('$bo = %3.7s*br%3.7s$' %(slope, intercept))
plt.savefig(".\DATA_602_HW10_2.png")
# PLOT ON IMAGE
import scipy.ndimage as ndimage
import scipy.misc as misc
import urllib2, cStringIO, numpy
url = "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20602/objects.png"
infile = cStringIO.StringIO(urllib2.urlopen(url).read())
raw = misc.imread(infile)
img = ndimage.gaussian_filter(raw, 2)
thres = img > img.mean()
labels, count = ndimage.label(thres)
index = list(range(0, count))
center = ndimage.measurements.center_of_mass(img, labels, index)
x = []; y = []
for i in range(0, count):
x.append(center[i][1])
y.append(center[i][0])
plt.figure(3)
infile = plt.imread(url)
implot = plt.imshow(infile)
plt.scatter(x, y, c='r', s=10)
plt.savefig(".\DATA_602_HW10_3.png")
# PLOT SERVER REQUESTS
import pandas, datetime
url = "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20602/epa-http.txt"
dirty = pandas.read_table(url, header = None, names = ['raw'])
clean = dirty.replace('=\\"\sH', '=\sH', regex=True)
data = pandas.DataFrame(clean, columns = ['raw'])
data['date'] = data['raw'].str.extract('(\[\S+\])', expand=True)
data['date'] = pandas.to_datetime(data['date']+"081995", format='[%d:%H:%M:%S]%m%Y')
data['hour'] = pandas.DatetimeIndex(data['date']).hour
frequency = data.groupby('hour')['hour'].count()
hours = list(frequency.index)
frequencies = list(frequency)
plt.figure(4)
plt.plot(hours, frequencies, 'ro-')
plt.ylabel('Frequency')
plt.title("Requests Per Hour")
plt.savefig(".\DATA_602_HW10_4.png")
Plot Cars Data | Plot Regression | Plot on Image | Plot Server Requests |
---|---|---|---|
Perform a Monte Carlo simulation to calculate Value at Risk (VaR) for the Apple stock price using the file attached to this lesson. There exist a number of ways to do this type of analysis, but you can follow this basic procedure (refer to this file for a more rigorous mathematical overview):
apple.2011.csv
has 3 columns: date, price, and percent change. The information you are really interested in is the percent change. This value is the percent change in the price from the previous date to the date on the corresponding row.The other requirement for this assignment is to use an IPython Notebook. Include in the notebook all the code, the results, and any other information you feel is needed (charts, graphs, plots, etc). Rather than submitting .py
files, give me the .ipynb
file for your notebook.
# TO create Jupiter Notebook, start Anaconda,
# open Jupiter Notebook from Anaconda, browser will open.
# From the Files tab, select New Python 2 Notebook.
import pandas, numpy, random
url = "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20602/apple.2011.csv"
apple_vars = ['date', 'price', 'change']
apple_data = pandas.read_csv(url, sep=',', header = None, names = apple_vars, skiprows = 1)
change = apple_data['change']
change.pop(0)
change = pandas.to_numeric(change)
mu = change.mean()
sigma = change.std()
print "mu =", mu, "\nsigma =", sigma, "\n\nTRIAL RUN"
# TRIAL RUN
p = [apple_data['price'][0]]
d = [0]
for i in range(20 + 1):
if i != 0:
d.append(random.gauss(mu, sigma))
p.append(p[i - 1] * (1 + d[i]))
print "Day", i, "Price:", p[i], "Change:", d[i]
# SIMULATION
sim_price = []
for j in range(0, 10000):
p = [apple_data['price'][0]]
d = [0]
for i in range(20 + 1):
if i != 0:
d.append(random.gauss(mu, sigma))
p.append(p[i - 1] * (1 + d[i]))
sim_price.append(p[len(p) - 1])
VaR = numpy.percentile(sim_price, 1)
print "\nValue at Risk:", VaR
## mu = 0.000957355207171
## sigma = 0.0165205562984
##
## TRIAL RUN
## Day 0 Price: 329.57 Change: 0
## Day 1 Price: 329.681262543 Change: 0.000337599123228
## Day 2 Price: 338.203725213 Change: 0.0258506128124
## Day 3 Price: 333.233213693 Change: -0.0146967970761
## Day 4 Price: 325.743593696 Change: -0.0224756107396
## Day 5 Price: 334.556182337 Change: 0.0270537588812
## Day 6 Price: 337.579207861 Change: 0.00903592784485
## Day 7 Price: 335.522798413 Change: -0.00609163538528
## Day 8 Price: 342.772031742 Change: 0.0216057846557
## Day 9 Price: 332.760843361 Change: -0.029206549701
## Day 10 Price: 336.873303444 Change: 0.012358605783
## Day 11 Price: 339.611199999 Change: 0.00812737764146
## Day 12 Price: 343.981604379 Change: 0.0128688464342
## Day 13 Price: 345.970091141 Change: 0.00578079390481
## Day 14 Price: 346.497524283 Change: 0.001524505024
## Day 15 Price: 359.932835263 Change: 0.0387746233049
## Day 16 Price: 372.470976744 Change: 0.0348346698403
## Day 17 Price: 369.547412869 Change: -0.00784910518565
## Day 18 Price: 373.823718252 Change: 0.0115717367627
## Day 19 Price: 382.933973097 Change: 0.0243704569818
## Day 20 Price: 370.681369365 Change: -0.0319966484893
##
## Value at Risk: 281.113617852
This homework will get your feet wet with some basic parallel computing approaches. Do both of the following:
As with the last homework, you will submit this to me as an IPython notebook. Include the results of your comparison there, along with everything else (code, charts, graphs, etc.)
# Install ipyparallel: $ pip install ipyparallel
# To enable the IPython Clusters tab in Jupyter Notebook: $ ipcluster nbextension enable
# Go to IPython Clusters tab in Jupyter Notebook, select # of engines, click "Start"
# To disable it again: $ ipcluster nbextension disable
# To Begin Parallel Computer Cluster of 4 in Regular Python: $ ipcluster start -n 4
# To End Parallel Computer Cluster in Regular Python: $ $ ipcluster stop
import timeit, ipyparallel as ipp
def MC_Simulation(input):
import pandas, numpy, random
url = "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20602/apple.2011.csv"
apple_vars = ['date', 'price', 'change']
apple_data = pandas.read_csv(url, sep=',', header = None, names = apple_vars, skiprows = 1)
change = apple_data['change']
change.pop(0)
change = pandas.to_numeric(change)
mu = change.mean()
sigma = change.std()
sim_price = []
for j in range(0, input):
p = [apple_data['price'][0]]
d = [0]
for i in range(20 + 1):
if i != 0:
d.append(random.gauss(mu, sigma))
p.append(p[i - 1] * (1 + d[i]))
sim_price.append(p[len(p) - 1])
VaR = numpy.percentile(sim_price, 1)
return "\nValue at Risk:", VaR
def parrallel_computing(input):
direct_view = clients[:] # use all engines
async_result = direct_view.apply_async(MC_Simulation, input)
return async_result.get()
# PARALLEL COMPUTING
clients = ipp.Client()
clients.block = True
print "Clients:", clients.ids
# COMPARE TIMES
n = 1#0**2
t = timeit.Timer(lambda: MC_Simulation(10000))
print "Processed Locally:", n, "loops =", t.timeit(n), "seconds"
t = timeit.Timer(lambda: parrallel_computing(10000))
print "Parallel Computing:", n, "loops =", t.timeit(n), "seconds"
clients.shutdown(hub=True)
USING PYTHON LOCALLY
Clients: [0, 1, 2, 3]
Processed Locally: 100 loops = 119.301532646 seconds
Parallel Computing: 100 loops = 483.29072373 seconds
USING JUPYTER NOTEBOOK
Clients: [0, 1, 2, 3]
Processed Locally: 100 loops = 148.152401799 seconds
Parallel Computing: 100 loops = 357.975700053 seconds
http://matplotlib.org/examples/
http://ipython.org/install.html
https://rpubs.com/josezuniga/257325
https://rpubs.com/josezuniga/259983
http://minrk.github.io/scipy-tutorial-2011/
https://www.youtube.com/watch?v=xpmliX5-BMk
http://minrk.github.io/scipy-tutorial-2011/
https://www.youtube.com/watch?v=S5dhe0f5huA
https://docs.python.org/2/library/random.html
https://github.com/ipython/ipython/issues/9500
https://pythonspot.com/en/matplotlib-bar-chart/
http://matplotlib.org/users/pyplot_tutorial.html
http://ipyparallel.readthedocs.io/en/latest/intro.html
http://www.labri.fr/perso/nrougier/teaching/matplotlib/
http://scipy.github.io/old-wiki/pages/Cookbook/Matplotlib
http://ipyparallel.readthedocs.io/en/latest/multiengine.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html
http://stackoverflow.com/questions/19585280/convert-a-row-in-pandas-into-list
http://stackoverflow.com/questions/4426663/how-do-i-remove-the-first-item-from-a-python-list
http://stackoverflow.com/questions/19068862/how-to-overplot-a-line-on-a-scatter-plot-in-python