Synopsis

There are various formats in which data is available to us now.Many a time we need data which might be available only in a particular format.The most common formats in whcih data is available are:.csv, .txt and .xlsx (in some cases .xls).This R markdown file shows the various ways to import data in different formats,namely .csv .txt and .xls using data from certain websites.

Packages Required

The following packages are required to perform the exercises:

# To import the downloaded excel file into R 
# Also to import url data via temp file (In my case scraping data using gdata was not possible due to inability to install perl)

library(readxl) 

#To display the HTML R markdown file in a tabular format

library(DT)

Homework Problems

As asked, I have performed the 5 exercises and saved the data as a dataframe and am displaying the first few rows using head() and structure of the data using str().

1.Download & import the csv file located at: https://bradleyboehmke.github.io/public/data/reddit.csv

# Downloading and importing Github data in .csv format
GIT<-read.csv("reddit.csv", na=TRUE);
datatable(head(GIT), class = 'cell-border stripe')
str(GIT);
## 'data.frame':    32754 obs. of  14 variables:
##  $ id               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender           : Factor w/ 3 levels "0","1","NA": 1 1 2 1 2 1 1 1 1 1 ...
##  $ age.range        : Factor w/ 8 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
##  $ marital.status   : Factor w/ 7 levels "Engaged","Forever Alone",..: 5 5 5 5 5 4 3 4 4 3 ...
##  $ employment.status: Factor w/ 7 levels "Employed full time",..: 1 1 2 2 1 1 1 5 1 2 ...
##  $ military.service : Factor w/ 3 levels "NA","No","Yes": 1 1 1 1 1 2 2 2 2 2 ...
##  $ children         : Factor w/ 3 levels "NA","No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ education        : Factor w/ 8 levels "Associate degree",..: 2 2 6 2 2 2 6 2 2 6 ...
##  $ country          : Factor w/ 440 levels " Canada"," Canada eh",..: 395 395 395 395 395 395 125 395 395 125 ...
##  $ state            : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
##  $ income.range     : Factor w/ 9 levels "$100,000 - $149,999",..: 2 2 9 2 7 2 8 7 2 7 ...
##  $ fav.reddit       : Factor w/ 1835 levels "","'home' page (or front page if you prefer)",..: 720 691 1512 1529 188 691 1319 571 1630 1 ...
##  $ dog.cat          : Factor w/ 4 levels "I like cats.",..: 4 4 4 4 4 2 2 2 1 1 ...
##  $ cheese           : Factor w/ 12 levels "American","Brie",..: 8 8 8 8 8 3 3 1 11 7 ...

2.Import the above csv file directly from the url provided (without downloading to your local hard drive)

#Importing data from url

url<-"https://bradleyboehmke.github.io/public/data/reddit.csv";
Giturl <- read.csv(url, na=TRUE);
datatable(head(Giturl), class = 'cell-border stripe')
str(Giturl);
## 'data.frame':    32754 obs. of  14 variables:
##  $ id               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender           : Factor w/ 3 levels "0","1","NA": 1 1 2 1 2 1 1 1 1 1 ...
##  $ age.range        : Factor w/ 8 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
##  $ marital.status   : Factor w/ 7 levels "Engaged","Forever Alone",..: 5 5 5 5 5 4 3 4 4 3 ...
##  $ employment.status: Factor w/ 7 levels "Employed full time",..: 1 1 2 2 1 1 1 5 1 2 ...
##  $ military.service : Factor w/ 3 levels "NA","No","Yes": 1 1 1 1 1 2 2 2 2 2 ...
##  $ children         : Factor w/ 3 levels "NA","No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ education        : Factor w/ 8 levels "Associate degree",..: 2 2 6 2 2 2 6 2 2 6 ...
##  $ country          : Factor w/ 440 levels " Canada"," Canada eh",..: 395 395 395 395 395 395 125 395 395 125 ...
##  $ state            : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
##  $ income.range     : Factor w/ 9 levels "$100,000 - $149,999",..: 2 2 9 2 7 2 8 7 2 7 ...
##  $ fav.reddit       : Factor w/ 1835 levels "","'home' page (or front page if you prefer)",..: 720 691 1512 1529 188 691 1319 571 1630 1 ...
##  $ dog.cat          : Factor w/ 4 levels "I like cats.",..: 4 4 4 4 4 2 2 2 1 1 ...
##  $ cheese           : Factor w/ 12 levels "American","Brie",..: 8 8 8 8 8 3 3 1 11 7 ...

3.Download & import the .xlsx file located at: http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx

#Downloading and importingdata from  Huduser .xlsx file

library(readxl);
Huduser<-read_excel("FY2017_4050_FMR.xlsx", sheet= "EXCEL_DATA");
datatable(head(Huduser), class = 'cell-border stripe');
str(Huduser);
## Classes 'tbl_df', 'tbl' and 'data.frame':    4769 obs. of  21 variables:
##  $ fips2010         : chr  "2300512300" "6099999999" "6999999999" "0100199999" ...
##  $ fips2000         : chr  NA NA NA "0100199999" ...
##  $ fmr2             : num  1078 677 666 822 977 ...
##  $ fmr0             : num  755 502 411 587 807 501 665 665 491 464 ...
##  $ fmr1             : num  851 506 498 682 847 505 751 751 494 467 ...
##  $ fmr3             : num  1454 987 961 1054 1422 ...
##  $ fmr4             : num  1579 1038 1158 1425 1634 ...
##  $ State            : num  23 60 69 1 1 1 1 1 1 1 ...
##  $ Metro_code       : chr  "METRO38860MM6400" "NCNTY60999N60999" "NCNTY69999N69999" "METRO33860M33860" ...
##  $ areaname         : chr  "Portland, ME HUD Metro FMR Area" "American Samoa" "Northern Mariana Islands" "Montgomery, AL MSA" ...
##  $ county           : num  NA 999 999 1 3 5 7 9 11 13 ...
##  $ CouSub           : chr  "12300" "99999" "99999" "99999" ...
##  $ countyname       : chr  "Cumberland County" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
##  $ county_town_name : chr  "Chebeague Island town" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
##  $ pop2010          : num  341 55519 53883 54571 182265 ...
##  $ acs_2016_2       : num  1109 653 642 788 873 ...
##  $ state_alpha      : chr  "ME" "AS" "MP" "AL" ...
##  $ fmr_type         : num  40 40 40 40 40 40 40 40 40 40 ...
##  $ metro            : num  1 0 0 1 1 0 1 1 0 0 ...
##  $ FMR_PCT_Change   : num  0.972 1.037 1.037 1.043 1.119 ...
##  $ FMR_Dollar_Change: num  -31 24 24 34 104 35 26 26 52 52 ...

4.Import the above .xlsx file directly from the url provided (without downloading to your local hard drive)

#Importing .xlsx data from the url into a temp file and then reading the data from the temp file

temp<-tempfile( ,fileext=".xlsx");
download.file("http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx",temp,mode="wb");
exceldata<-read_excel("C:/Users/kolliprk/AppData/Local/Temp/Rtmpc9rHUY/file27245dca19ad.xlsx");
datatable(head(exceldata), class = 'cell-border stripe');
str(exceldata);
## Classes 'tbl_df', 'tbl' and 'data.frame':    4769 obs. of  21 variables:
##  $ fips2010         : chr  "2300512300" "6099999999" "6999999999" "0100199999" ...
##  $ fips2000         : chr  NA NA NA "0100199999" ...
##  $ fmr2             : num  1078 677 666 822 977 ...
##  $ fmr0             : num  755 502 411 587 807 501 665 665 491 464 ...
##  $ fmr1             : num  851 506 498 682 847 505 751 751 494 467 ...
##  $ fmr3             : num  1454 987 961 1054 1422 ...
##  $ fmr4             : num  1579 1038 1158 1425 1634 ...
##  $ State            : num  23 60 69 1 1 1 1 1 1 1 ...
##  $ Metro_code       : chr  "METRO38860MM6400" "NCNTY60999N60999" "NCNTY69999N69999" "METRO33860M33860" ...
##  $ areaname         : chr  "Portland, ME HUD Metro FMR Area" "American Samoa" "Northern Mariana Islands" "Montgomery, AL MSA" ...
##  $ county           : num  NA 999 999 1 3 5 7 9 11 13 ...
##  $ CouSub           : chr  "12300" "99999" "99999" "99999" ...
##  $ countyname       : chr  "Cumberland County" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
##  $ county_town_name : chr  "Chebeague Island town" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
##  $ pop2010          : num  341 55519 53883 54571 182265 ...
##  $ acs_2016_2       : num  1109 653 642 788 873 ...
##  $ state_alpha      : chr  "ME" "AS" "MP" "AL" ...
##  $ fmr_type         : num  40 40 40 40 40 40 40 40 40 40 ...
##  $ metro            : num  1 0 0 1 1 0 1 1 0 0 ...
##  $ FMR_PCT_Change   : num  0.972 1.037 1.037 1.043 1.119 ...
##  $ FMR_Dollar_Change: num  -31 24 24 34 104 35 26 26 52 52 ...

5.Go to this University of Dayton webpage http://academic.udayton.edu/kissock/http/Weather/citylistUS.htm, scroll down to Ohio and import the Cincinnati (OHCINCIN.txt) file

# Importing the .txt data

ct<-read.table("OHCINCIN.txt",col.names= c("V1","V2","V3","V4"),stringsAsFactors = FALSE);
datatable(head(ct), class = 'cell-border stripe');
str(ct);
## 'data.frame':    7963 obs. of  4 variables:
##  $ V1: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ V2: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ V3: int  1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
##  $ V4: num  41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...