This is my homework report for week 2, produced with R Markdown. In this homework I perform the five data importing exercises listed under Week 2’s Assignment section, which includes importing the following three data sets:
To reproduce the code and results throughout this homework assignment I used the following packages:
library(readxl) # for reading in the .xlsx file in exercise #3
library(gdata) # for scraping the .xlsx file in exercise #4
For each problem I imported the data and save as a data frame. I then used head() to display the first few rows of the data frame and str() to display the structure of each data frame. In this example I do not display the code so that you can have the enjoyment of finding the required code on your own; however, in your homework I expect you to show all your code.
1. Download & import the csv file located at: https://bradleyboehmke.github.io/public/data/reddit.csv
'data.frame': 32754 obs. of 14 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ gender : int 0 0 1 0 1 0 0 0 0 0 ...
$ age.range : Factor w/ 7 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
$ marital.status : Factor w/ 6 levels "Engaged","Forever Alone",..: NA NA NA NA NA 4 3 4 4 3 ...
$ employment.status: Factor w/ 6 levels "Employed full time",..: 1 1 2 2 1 1 1 4 1 2 ...
$ military.service : Factor w/ 2 levels "No","Yes": NA NA NA NA NA 1 1 1 1 1 ...
$ children : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ education : Factor w/ 7 levels "Associate degree",..: 2 2 5 2 2 2 5 2 2 5 ...
$ country : Factor w/ 439 levels " Canada"," Canada eh",..: 394 394 394 394 394 394 125 394 394 125 ...
$ state : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
$ income.range : Factor w/ 8 levels "$100,000 - $149,999",..: 2 2 8 2 7 2 NA 7 2 7 ...
$ fav.reddit : Factor w/ 1834 levels "","___","-","?",..: 720 691 1511 1528 188 691 1318 571 1629 1 ...
$ dog.cat : Factor w/ 3 levels "I like cats.",..: NA NA NA NA NA 2 2 2 1 1 ...
$ cheese : Factor w/ 11 levels "American","Brie",..: NA NA NA NA NA 3 3 1 10 7 ...
2. Now import the above csv file directly from the url provided (without downloading to your local hard drive)
3. Import the .xlsx file located at: http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4769 obs. of 21 variables:
$ fips2010 : chr "2300512300" "6099999999" "6999999999" "0100199999" ...
$ fips2000 : chr NA NA NA "0100199999" ...
$ fmr2 : num 1078 677 666 822 977 ...
$ fmr0 : num 755 502 411 587 807 501 665 665 491 464 ...
$ fmr1 : num 851 506 498 682 847 505 751 751 494 467 ...
$ fmr3 : num 1454 987 961 1054 1422 ...
$ fmr4 : num 1579 1038 1158 1425 1634 ...
$ State : num 23 60 69 1 1 1 1 1 1 1 ...
$ Metro_code : chr "METRO38860MM6400" "NCNTY60999N60999" "NCNTY69999N69999" "METRO33860M33860" ...
$ areaname : chr "Portland, ME HUD Metro FMR Area" "American Samoa" "Northern Mariana Islands" "Montgomery, AL MSA" ...
$ county : num NA 999 999 1 3 5 7 9 11 13 ...
$ CouSub : chr "12300" "99999" "99999" "99999" ...
$ countyname : chr "Cumberland County" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
$ county_town_name : chr "Chebeague Island town" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
$ pop2010 : num 341 55519 53883 54571 182265 ...
$ acs_2016_2 : num 1109 653 642 788 873 ...
$ state_alpha : chr "ME" "AS" "MP" "AL" ...
$ fmr_type : num 40 40 40 40 40 40 40 40 40 40 ...
$ metro : num 1 0 0 1 1 0 1 1 0 0 ...
$ FMR_PCT_Change : num 0.972 1.037 1.037 1.043 1.119 ...
$ FMR_Dollar_Change: num -31 24 24 34 104 35 26 26 52 52 ...
4. Now import the above .xlsx file directly from the url provided (without downloading to your local hard drive)
trying URL 'http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx'
Content type 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 615031 bytes (600 KB)
==================================================
downloaded 600 KB
'data.frame': 4769 obs. of 21 variables:
$ fips2010 : num 2.3e+09 6.1e+09 7.0e+09 1.0e+08 1.0e+08 ...
$ fips2000 : num NA NA NA 1e+08 1e+08 ...
$ fmr2 : int 1078 677 666 822 977 671 866 866 621 621 ...
$ fmr0 : int 755 502 411 587 807 501 665 665 491 464 ...
$ fmr1 : int 851 506 498 682 847 505 751 751 494 467 ...
$ fmr3 : int 1454 987 961 1054 1422 839 1163 1163 853 849 ...
$ fmr4 : int 1579 1038 1158 1425 1634 958 1298 1298 856 1094 ...
$ State : int 23 60 69 1 1 1 1 1 1 1 ...
$ Metro_code : Factor w/ 2598 levels "METRO10180M10180",..: 451 2592 2594 384 160 625 55 55 626 627 ...
$ areaname : Factor w/ 2598 levels " Santa Ana-Anaheim-Irvine, CA HUD Metro FMR Area",..: 1903 52 1723 1633 571 122 186 186 263 271 ...
$ county : int NA 999 999 1 3 5 7 9 11 13 ...
$ CouSub : int 12300 99999 99999 99999 99999 99999 99999 99999 99999 99999 ...
$ countyname : Factor w/ 1961 levels "A\xf1asco Municipio",..: 462 42 1265 92 99 110 163 178 239 249 ...
$ county_town_name : Factor w/ 3175 levels "A\xf1asco Municipio",..: 533 61 2024 136 149 165 254 277 386 401 ...
$ pop2010 : int 341 55519 53883 54571 182265 27457 22915 57322 10914 20947 ...
$ acs_2016_2 : int 1109 653 642 788 873 636 840 840 569 569 ...
$ state_alpha : Factor w/ 56 levels "AK","AL","AR",..: 24 4 28 2 2 2 2 2 2 2 ...
$ fmr_type : int 40 40 40 40 40 40 40 40 40 40 ...
$ metro : int 1 0 0 1 1 0 1 1 0 0 ...
$ FMR_PCT_Change : num 0.972 1.037 1.037 1.043 1.119 ...
$ FMR_Dollar_Change: int -31 24 24 34 104 35 26 26 52 52 ...
5. Go to this University of Dayton webpage http://academic.udayton.edu/kissock/http/Weather/citylistUS.htm, scroll down to Ohio and import the Cincinnati (OHCINCIN.txt) file
'data.frame': 7948 obs. of 4 variables:
$ V1: int 1 1 1 1 1 1 1 1 1 1 ...
$ V2: int 1 2 3 4 5 6 7 8 9 10 ...
$ V3: int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
$ V4: num 41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...