Setting up the defaults:
knitr::opts_chunk$set(echo = TRUE, results = "asis")
setInternet2(TRUE)
and load the data into R. The code book, describing the variable names is here.
Apply strsplit() to split all the names of the data frame on the characters “wgtp”. What is the value of the 123 element of the resulting list?
d <- read.csv("getdata-data-ss06hid.csv", stringsAsFactors = FALSE)
strsplit(names(d),"wgtp")[[123]]
[1] “” “15”
Remove the commas from the GDP numbers in millions of dollars and average them. What is the average?
Original data sources: http://data.worldbank.org/data-catalog/GDP-ranking-table
gdpdata <- read.csv("getdata-data-GDP.csv",stringsAsFactors = FALSE, skip = 4, nrows = 215)
gdp <- as.numeric(gsub(",","",gdpdata[,c("X.4")], fixed = TRUE))
## Warning: NAs introduced by coercion
mean(gdp,na.rm = TRUE)
[1] 377652.4
grep("^United", gdpdata[,c("X.3")])
[1] 1 6 32
length(gdpdata[grep("^United", gdpdata[,c("X.3")]),c("X.3")])
[1] 3
Load the educational data from this data set.
Match the data based on the country shortcode. Of the countries for which the end of the fiscal year is available, how many end in June?
Original data sources: http://data.worldbank.org/data-catalog/GDP-ranking-table http://data.worldbank.org/data-catalog/ed-stats
library(data.table)
## Warning: package 'data.table' was built under R version 3.2.4
eddata <- read.csv("getdata-data-EDSTATS_Country.csv", stringsAsFactors = FALSE)
dtGDP <- data.table(read.csv("getdata-data-GDP.csv", skip = 4, nrows = 215))
dtGDP <- dtGDP[X != ""]
dtGDP <- dtGDP[, list(X, X.1, X.3, X.4)]
setnames(dtGDP, c("X", "X.1", "X.3", "X.4"), c("CountryCode", "rankingGDP", "Long.Name", "gdp"))
dt <- merge(dtGDP, eddata, all = TRUE, by = c("CountryCode"))
length(grep("Fiscal year end: June 30",dt$Special.Notes))
[1] 13
How many values were collected in 2012? How many values were collected on Mondays in 2012?
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.2.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.2.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.2.4
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following object is masked from 'package:data.table':
##
## last
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.2.4
## Version 0.4-0 included new data defaults. See ?getSymbols.
amzn = getSymbols("AMZN",auto.assign=FALSE)
## As of 0.4-0, 'getSymbols' uses env=parent.frame() and
## auto.assign=TRUE by default.
##
## This behavior will be phased out in 0.5-0 when the call will
## default to use auto.assign=FALSE. getOption("getSymbols.env") and
## getOptions("getSymbols.auto.assign") are now checked for alternate defaults
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for more details.
## Warning in download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=",
## from.m, : downloaded length 164296 != reported length 200
sampleTimes = index(amzn)
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.2.4
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, mday, month, quarter, wday, week, yday, year
addmargins(table(year(sampleTimes), weekdays(sampleTimes)))
Friday Monday Thursday Tuesday Wednesday Sum
2007 51 48 51 50 51 251 2008 50 48 50 52 53 253 2009 49 48 51 52 52 252 2010 50 47 51 52 52 252 2011 51 46 51 52 52 252 2012 51 47 51 50 51 250 2013 51 48 50 52 51 252 2014 50 48 50 52 52 252 2015 49 48 51 52 52 252 2016 17 17 18 19 19 90 Sum 469 445 474 483 485 2356