Synoposis

For assignment 2, five data importing exercises were performed per instructions in the Data Wrangling with R - Week 2’s Assignment Section, which included importing the following data sets:

  1. Reddit User Data
  2. HUD 2017 Fair Market Rents Data
  3. Cincinnati OH Average Temperature Data (scroll down to Ohio and see the “OHCINCIN.txt” file)

Packages

The following packages were used in completing the assignment:

library(readxl) # used for importing the downloaded FMR excel file in exercise #3
library(gdata) # used for scraping the FMR excel file in exercise #4
library(DT) # used for displaying R data objects (matrices or data frames) as tables on HTML pages

Homework

The following shows my solutions to the homework problems, including the code used to obtain the solutions. I used datatable() to display the first few rows of the data frame and str() to display the structure of each data frame.

Exercise 1:

Download & import the csv file located at: https://bradleyboehmke.github.io/public/data/reddit.csv

redditdata <- read.csv("reddit.csv")
library(DT) # display first 10 rows of the data frame
datatable(head(redditdata,10), options = list(scrollX=TRUE, pageLength=5))
str(redditdata) # display structure of the data frame
## 'data.frame':    32754 obs. of  14 variables:
##  $ id               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender           : int  0 0 1 0 1 0 0 0 0 0 ...
##  $ age.range        : Factor w/ 7 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
##  $ marital.status   : Factor w/ 6 levels "Engaged","Forever Alone",..: NA NA NA NA NA 4 3 4 4 3 ...
##  $ employment.status: Factor w/ 6 levels "Employed full time",..: 1 1 2 2 1 1 1 4 1 2 ...
##  $ military.service : Factor w/ 2 levels "No","Yes": NA NA NA NA NA 1 1 1 1 1 ...
##  $ children         : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ education        : Factor w/ 7 levels "Associate degree",..: 2 2 5 2 2 2 5 2 2 5 ...
##  $ country          : Factor w/ 439 levels " Canada"," Canada eh",..: 394 394 394 394 394 394 125 394 394 125 ...
##  $ state            : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
##  $ income.range     : Factor w/ 8 levels "$100,000 - $149,999",..: 2 2 8 2 7 2 NA 7 2 7 ...
##  $ fav.reddit       : Factor w/ 1834 levels "","'home' page (or front page if you prefer)",..: 720 691 1511 1528 188 691 1318 571 1629 1 ...
##  $ dog.cat          : Factor w/ 3 levels "I like cats.",..: NA NA NA NA NA 2 2 2 1 1 ...
##  $ cheese           : Factor w/ 11 levels "American","Brie",..: NA NA NA NA NA 3 3 1 10 7 ...

Exercise 2:

Import the reddit csv file directly from the URL provided.

redditsite <- read.csv("http://bradleyboehmke.github.io/public/data/reddit.csv")
# display first 10 rows of the data frame
datatable(head(redditsite,10),options = list(scrollX=TRUE, pageLength=5))
str(redditsite) # display structure of the data frame
## 'data.frame':    32754 obs. of  14 variables:
##  $ id               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender           : int  0 0 1 0 1 0 0 0 0 0 ...
##  $ age.range        : Factor w/ 7 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
##  $ marital.status   : Factor w/ 6 levels "Engaged","Forever Alone",..: NA NA NA NA NA 4 3 4 4 3 ...
##  $ employment.status: Factor w/ 6 levels "Employed full time",..: 1 1 2 2 1 1 1 4 1 2 ...
##  $ military.service : Factor w/ 2 levels "No","Yes": NA NA NA NA NA 1 1 1 1 1 ...
##  $ children         : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ education        : Factor w/ 7 levels "Associate degree",..: 2 2 5 2 2 2 5 2 2 5 ...
##  $ country          : Factor w/ 439 levels " Canada"," Canada eh",..: 394 394 394 394 394 394 125 394 394 125 ...
##  $ state            : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
##  $ income.range     : Factor w/ 8 levels "$100,000 - $149,999",..: 2 2 8 2 7 2 NA 7 2 7 ...
##  $ fav.reddit       : Factor w/ 1834 levels "","'home' page (or front page if you prefer)",..: 720 691 1511 1528 188 691 1318 571 1629 1 ...
##  $ dog.cat          : Factor w/ 3 levels "I like cats.",..: NA NA NA NA NA 2 2 2 1 1 ...
##  $ cheese           : Factor w/ 11 levels "American","Brie",..: NA NA NA NA NA 3 3 1 10 7 ...

Exercise 3:

Download & import the excel file located at: http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx

library(readxl)
FMRdata <- read_excel("FY2017_4050_FMR.xlsx")
# display first 10 rows of the data frame
datatable(head(FMRdata,10),options = list(scrollX=TRUE, pageLength=5))
str(FMRdata) # display structure of the data frame
## Classes 'tbl_df', 'tbl' and 'data.frame':    4769 obs. of  21 variables:
##  $ fips2010         : chr  "2300512300" "6099999999" "6999999999" "0100199999" ...
##  $ fips2000         : chr  NA NA NA "0100199999" ...
##  $ fmr2             : num  1078 677 666 822 977 ...
##  $ fmr0             : num  755 502 411 587 807 501 665 665 491 464 ...
##  $ fmr1             : num  851 506 498 682 847 505 751 751 494 467 ...
##  $ fmr3             : num  1454 987 961 1054 1422 ...
##  $ fmr4             : num  1579 1038 1158 1425 1634 ...
##  $ State            : num  23 60 69 1 1 1 1 1 1 1 ...
##  $ Metro_code       : chr  "METRO38860MM6400" "NCNTY60999N60999" "NCNTY69999N69999" "METRO33860M33860" ...
##  $ areaname         : chr  "Portland, ME HUD Metro FMR Area" "American Samoa" "Northern Mariana Islands" "Montgomery, AL MSA" ...
##  $ county           : num  NA 999 999 1 3 5 7 9 11 13 ...
##  $ CouSub           : chr  "12300" "99999" "99999" "99999" ...
##  $ countyname       : chr  "Cumberland County" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
##  $ county_town_name : chr  "Chebeague Island town" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
##  $ pop2010          : num  341 55519 53883 54571 182265 ...
##  $ acs_2016_2       : num  1109 653 642 788 873 ...
##  $ state_alpha      : chr  "ME" "AS" "MP" "AL" ...
##  $ fmr_type         : num  40 40 40 40 40 40 40 40 40 40 ...
##  $ metro            : num  1 0 0 1 1 0 1 1 0 0 ...
##  $ FMR_PCT_Change   : num  0.972 1.037 1.037 1.043 1.119 ...
##  $ FMR_Dollar_Change: num  -31 24 24 34 104 35 26 26 52 52 ...

Exercise 4:

Import the FMR excel file directly from the URL provided.

library(gdata)
FMRsite <- read.xls("http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx")
# display first 10 rows of the data frame
datatable(head(FMRsite,10),options = list(scrollX=TRUE, pageLength=5))
str(FMRsite) # display structure of the data frame
## 'data.frame':    4769 obs. of  21 variables:
##  $ fips2010         : num  2.3e+09 6.1e+09 7.0e+09 1.0e+08 1.0e+08 ...
##  $ fips2000         : num  NA NA NA 1e+08 1e+08 ...
##  $ fmr2             : int  1078 677 666 822 977 671 866 866 621 621 ...
##  $ fmr0             : int  755 502 411 587 807 501 665 665 491 464 ...
##  $ fmr1             : int  851 506 498 682 847 505 751 751 494 467 ...
##  $ fmr3             : int  1454 987 961 1054 1422 839 1163 1163 853 849 ...
##  $ fmr4             : int  1579 1038 1158 1425 1634 958 1298 1298 856 1094 ...
##  $ State            : int  23 60 69 1 1 1 1 1 1 1 ...
##  $ Metro_code       : Factor w/ 2598 levels "METRO10180M10180",..: 451 2592 2594 384 160 625 55 55 626 627 ...
##  $ areaname         : Factor w/ 2598 levels " Santa Ana-Anaheim-Irvine, CA HUD Metro FMR Area",..: 1903 52 1723 1633 571 122 186 186 263 271 ...
##  $ county           : int  NA 999 999 1 3 5 7 9 11 13 ...
##  $ CouSub           : int  12300 99999 99999 99999 99999 99999 99999 99999 99999 99999 ...
##  $ countyname       : Factor w/ 1961 levels "Abbeville County",..: 462 41 1265 92 99 110 163 178 239 249 ...
##  $ county_town_name : Factor w/ 3175 levels "Abbeville County",..: 533 60 2024 136 149 165 254 277 386 401 ...
##  $ pop2010          : int  341 55519 53883 54571 182265 27457 22915 57322 10914 20947 ...
##  $ acs_2016_2       : int  1109 653 642 788 873 636 840 840 569 569 ...
##  $ state_alpha      : Factor w/ 56 levels "AK","AL","AR",..: 24 4 28 2 2 2 2 2 2 2 ...
##  $ fmr_type         : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ metro            : int  1 0 0 1 1 0 1 1 0 0 ...
##  $ FMR_PCT_Change   : num  0.972 1.037 1.037 1.043 1.119 ...
##  $ FMR_Dollar_Change: int  -31 24 24 34 104 35 26 26 52 52 ...

Exercise 5:

Go to the University of Dayton weather data site http://academic.udayton.edu/kissock/http/Weather/citylistUS.htm, scroll down to Ohio and import the Cincinnati (OHCINCIN.txt) file

library(gdata)
OHurl <- "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
OH_import <- read.table(OHurl)
# display first 10 rows of the data frame
datatable(head(OH_import,10),options = list(scrollX=TRUE, pageLength=5))
str(OH_import) # display structure of the data frame
## 'data.frame':    7963 obs. of  4 variables:
##  $ V1: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ V2: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ V3: int  1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
##  $ V4: num  41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...