Final Presentation

Brett Burk
December 10th, 2014

Introduction

  • 1usagov
  • Bit.ly
  • 12/10/2012-12/11/2012

Loading in

  • Downloading data
  • Python Munging
  • Combining datasets
  • Processing JSON
  • Exporting to CSV

Exploration in R

library(ggplot2)
bitly <- read.csv('csvconv.csv', header=F, stringsAsFactors=F)
names(bitly) <- c('city', 'country', 'totsite', 'mainsite', 'date', 'time')
nrow(bitly)
[1] 159328

Main Sites

plot of chunk unnamed-chunk-3

Petitions

plot of chunk unnamed-chunk-4

Top Five Petitions

[1] "grant temporary protected status guatemalans"                                        
[2] "remove tax exempt status enjoyed sea shepherd conservation society"                  
[3] "secure resources and funding and begin construction death star 2016"                 
[4] "remove 10 year ban placed christopher nicholas nick bertke prevents him entering usa"
[5] "open honest dialog cuba government secure release american operative alan gross"     

NASA

plot of chunk unnamed-chunk-6

Time for Sunday

[1] "Mean:  20.9681572594551"
[1] "Standard Deviation:  1.35756143388699"

Time for Monday

[1] "Mean:  13.5097892590075"
[1] "Standard Deviation:  5.3425945169581"
  0%  25%  50%  75% 100% 
   0   10   13   17   23 

Foreigners

countries <- table(bitly$country)
countriesnous <- countries[countries != max(countries)]
sum(countriesnous) / (max(countries) + sum(countriesnous))
[1] 0.3346639
  • 33% of the website links were from non-US citizens

Guatemala!

countriesnous[countriesnous == max(countriesnous)]
  GT 
6551 

We can see that the biggest used country code is for Guatemala, which is unsurprising given the amount of links for the petition.

Live Data

More Processing

  • Process in similar way with Python
  • Dealing with JSON and struggling with text file (partial outputs, etc.)

Loading into R

bitlycurr <- read.csv('livejsonn.csv', header=F, stringsAsFactors=F)
names(bitlycurr) <- c('city', 'country', 'totsite', 'mainsite', 'date', 'time')
nrow(bitlycurr)
[1] 14848

First Graph of New Data

plot of chunk unnamed-chunk-12

  • Healthcare.gov is the second most popular current site
  • Nasa Still Popular

NASA

plot of chunk unnamed-chunk-13

  • Space Shuttle

Petitions

plot of chunk unnamed-chunk-14

Conclusions

  • Interesting data set
  • Live interactivity?
  • Challenges with parsing JSON, and text files in python
  • Challenges with live data set
  • Creating this slideshow