There are various formats in which data is available to us now.Many a time we need data which might be available only in a particular format.The most common formats in whcih data is available are:.csv, .txt and .xlsx (in some cases .xls).This R markdown file shows the various ways to import data in different formats,namely .csv .txt and .xls using data from certain websites.
The following packages are required to perform the exercises:
# To import the downloaded excel file into R
# Also to import url data via temp file (In my case scraping data using gdata was not possible due to inability to install perl)
library(readxl)
#To display the HTML R markdown file in a tabular format
library(DT)
As asked, I have performed the 5 exercises and saved the data as a dataframe and am displaying the first few rows using head() and structure of the data using str().
1.Download & import the csv file located at: https://bradleyboehmke.github.io/public/data/reddit.csv
# Downloading and importing Github data in .csv format
GIT<-read.csv("reddit.csv", na=TRUE);
datatable(head(GIT), class = 'cell-border stripe')
str(GIT);
## 'data.frame': 32754 obs. of 14 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : Factor w/ 3 levels "0","1","NA": 1 1 2 1 2 1 1 1 1 1 ...
## $ age.range : Factor w/ 8 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
## $ marital.status : Factor w/ 7 levels "Engaged","Forever Alone",..: 5 5 5 5 5 4 3 4 4 3 ...
## $ employment.status: Factor w/ 7 levels "Employed full time",..: 1 1 2 2 1 1 1 5 1 2 ...
## $ military.service : Factor w/ 3 levels "NA","No","Yes": 1 1 1 1 1 2 2 2 2 2 ...
## $ children : Factor w/ 3 levels "NA","No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ education : Factor w/ 8 levels "Associate degree",..: 2 2 6 2 2 2 6 2 2 6 ...
## $ country : Factor w/ 440 levels " Canada"," Canada eh",..: 395 395 395 395 395 395 125 395 395 125 ...
## $ state : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
## $ income.range : Factor w/ 9 levels "$100,000 - $149,999",..: 2 2 9 2 7 2 8 7 2 7 ...
## $ fav.reddit : Factor w/ 1835 levels "","'home' page (or front page if you prefer)",..: 720 691 1512 1529 188 691 1319 571 1630 1 ...
## $ dog.cat : Factor w/ 4 levels "I like cats.",..: 4 4 4 4 4 2 2 2 1 1 ...
## $ cheese : Factor w/ 12 levels "American","Brie",..: 8 8 8 8 8 3 3 1 11 7 ...
2.Import the above csv file directly from the url provided (without downloading to your local hard drive)
#Importing data from url
url<-"https://bradleyboehmke.github.io/public/data/reddit.csv";
Giturl <- read.csv(url, na=TRUE);
datatable(head(Giturl), class = 'cell-border stripe')
str(Giturl);
## 'data.frame': 32754 obs. of 14 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : Factor w/ 3 levels "0","1","NA": 1 1 2 1 2 1 1 1 1 1 ...
## $ age.range : Factor w/ 8 levels "18-24","25-34",..: 2 2 1 2 2 2 2 1 3 2 ...
## $ marital.status : Factor w/ 7 levels "Engaged","Forever Alone",..: 5 5 5 5 5 4 3 4 4 3 ...
## $ employment.status: Factor w/ 7 levels "Employed full time",..: 1 1 2 2 1 1 1 5 1 2 ...
## $ military.service : Factor w/ 3 levels "NA","No","Yes": 1 1 1 1 1 2 2 2 2 2 ...
## $ children : Factor w/ 3 levels "NA","No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ education : Factor w/ 8 levels "Associate degree",..: 2 2 6 2 2 2 6 2 2 6 ...
## $ country : Factor w/ 440 levels " Canada"," Canada eh",..: 395 395 395 395 395 395 125 395 395 125 ...
## $ state : Factor w/ 53 levels "","Alabama","Alaska",..: 33 33 48 33 6 33 1 6 33 1 ...
## $ income.range : Factor w/ 9 levels "$100,000 - $149,999",..: 2 2 9 2 7 2 8 7 2 7 ...
## $ fav.reddit : Factor w/ 1835 levels "","'home' page (or front page if you prefer)",..: 720 691 1512 1529 188 691 1319 571 1630 1 ...
## $ dog.cat : Factor w/ 4 levels "I like cats.",..: 4 4 4 4 4 2 2 2 1 1 ...
## $ cheese : Factor w/ 12 levels "American","Brie",..: 8 8 8 8 8 3 3 1 11 7 ...
3.Download & import the .xlsx file located at: http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx
#Downloading and importingdata from Huduser .xlsx file
library(readxl);
Huduser<-read_excel("FY2017_4050_FMR.xlsx", sheet= "EXCEL_DATA");
datatable(head(Huduser), class = 'cell-border stripe');
str(Huduser);
## Classes 'tbl_df', 'tbl' and 'data.frame': 4769 obs. of 21 variables:
## $ fips2010 : chr "2300512300" "6099999999" "6999999999" "0100199999" ...
## $ fips2000 : chr NA NA NA "0100199999" ...
## $ fmr2 : num 1078 677 666 822 977 ...
## $ fmr0 : num 755 502 411 587 807 501 665 665 491 464 ...
## $ fmr1 : num 851 506 498 682 847 505 751 751 494 467 ...
## $ fmr3 : num 1454 987 961 1054 1422 ...
## $ fmr4 : num 1579 1038 1158 1425 1634 ...
## $ State : num 23 60 69 1 1 1 1 1 1 1 ...
## $ Metro_code : chr "METRO38860MM6400" "NCNTY60999N60999" "NCNTY69999N69999" "METRO33860M33860" ...
## $ areaname : chr "Portland, ME HUD Metro FMR Area" "American Samoa" "Northern Mariana Islands" "Montgomery, AL MSA" ...
## $ county : num NA 999 999 1 3 5 7 9 11 13 ...
## $ CouSub : chr "12300" "99999" "99999" "99999" ...
## $ countyname : chr "Cumberland County" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
## $ county_town_name : chr "Chebeague Island town" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
## $ pop2010 : num 341 55519 53883 54571 182265 ...
## $ acs_2016_2 : num 1109 653 642 788 873 ...
## $ state_alpha : chr "ME" "AS" "MP" "AL" ...
## $ fmr_type : num 40 40 40 40 40 40 40 40 40 40 ...
## $ metro : num 1 0 0 1 1 0 1 1 0 0 ...
## $ FMR_PCT_Change : num 0.972 1.037 1.037 1.043 1.119 ...
## $ FMR_Dollar_Change: num -31 24 24 34 104 35 26 26 52 52 ...
4.Import the above .xlsx file directly from the url provided (without downloading to your local hard drive)
#Importing .xlsx data from the url into a temp file and then reading the data from the temp file
temp<-tempfile( ,fileext=".xlsx");
download.file("http://www.huduser.gov/portal/datasets/fmr/fmr2017/FY2017_4050_FMR.xlsx",temp,mode="wb");
exceldata<-read_excel("C:/Users/kolliprk/AppData/Local/Temp/Rtmpc9rHUY/file27245dca19ad.xlsx");
datatable(head(exceldata), class = 'cell-border stripe');
str(exceldata);
## Classes 'tbl_df', 'tbl' and 'data.frame': 4769 obs. of 21 variables:
## $ fips2010 : chr "2300512300" "6099999999" "6999999999" "0100199999" ...
## $ fips2000 : chr NA NA NA "0100199999" ...
## $ fmr2 : num 1078 677 666 822 977 ...
## $ fmr0 : num 755 502 411 587 807 501 665 665 491 464 ...
## $ fmr1 : num 851 506 498 682 847 505 751 751 494 467 ...
## $ fmr3 : num 1454 987 961 1054 1422 ...
## $ fmr4 : num 1579 1038 1158 1425 1634 ...
## $ State : num 23 60 69 1 1 1 1 1 1 1 ...
## $ Metro_code : chr "METRO38860MM6400" "NCNTY60999N60999" "NCNTY69999N69999" "METRO33860M33860" ...
## $ areaname : chr "Portland, ME HUD Metro FMR Area" "American Samoa" "Northern Mariana Islands" "Montgomery, AL MSA" ...
## $ county : num NA 999 999 1 3 5 7 9 11 13 ...
## $ CouSub : chr "12300" "99999" "99999" "99999" ...
## $ countyname : chr "Cumberland County" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
## $ county_town_name : chr "Chebeague Island town" "American Samoa" "Northern Mariana Islands" "Autauga County" ...
## $ pop2010 : num 341 55519 53883 54571 182265 ...
## $ acs_2016_2 : num 1109 653 642 788 873 ...
## $ state_alpha : chr "ME" "AS" "MP" "AL" ...
## $ fmr_type : num 40 40 40 40 40 40 40 40 40 40 ...
## $ metro : num 1 0 0 1 1 0 1 1 0 0 ...
## $ FMR_PCT_Change : num 0.972 1.037 1.037 1.043 1.119 ...
## $ FMR_Dollar_Change: num -31 24 24 34 104 35 26 26 52 52 ...
5.Go to this University of Dayton webpage http://academic.udayton.edu/kissock/http/Weather/citylistUS.htm, scroll down to Ohio and import the Cincinnati (OHCINCIN.txt) file
# Importing the .txt data
ct<-read.table("OHCINCIN.txt",col.names= c("V1","V2","V3","V4"),stringsAsFactors = FALSE);
datatable(head(ct), class = 'cell-border stripe');
str(ct);
## 'data.frame': 7963 obs. of 4 variables:
## $ V1: int 1 1 1 1 1 1 1 1 1 1 ...
## $ V2: int 1 2 3 4 5 6 7 8 9 10 ...
## $ V3: int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
## $ V4: num 41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...