Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

Apply strsplit() to split all the names of the data frame on the characters “wgtp”.

What is the value of the 123 element of the resulting list?


Answer


Download file…

Q1Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
Q1 <- read.csv(Q1Url)
Q1


Computing solution…

Q1_colnames <- names(Q1)
strsplit(Q1_colnames, "^wgtp")[[123]]
[1] ""   "15"


Options:

  1. “wgt” “15”

  2. “wgtp”

  3. “” “15”

  4. “wgtp” “15”



Question 2

Load the Gross Domestic Product data for the 190 ranked countries in this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv

Remove the commas from the GDP numbers in millions of dollars and average them. What is the average?

Original data sources:

http://data.worldbank.org/data-catalog/GDP-ranking-table


Answer


Downloading file…

Q2_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
Q2_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 4/Q3GDP.csv"
download.file(Q2_Url, Q2_Path, method = "curl")

Loading and tidying data…

Q2_File <- read.csv(Q2_Path, nrow = 190, skip = 4)
Q2_File <- Q2_File[,c(1, 2, 4, 5)]
colnames(Q2_File) <- c("CountryCode", "Rank", "Country", "Total")
Q2_File
Q2_File$Total <- as.integer(gsub(",", "", Q2_File$Total))
mean(Q2_File$Total, na.rm = T)
[1] 377652.4

Options:

  1. 377652.4

  2. 381668.9

  3. 387854.4

  4. 293700.3




Question 3

In the data set from Question 2 what is a regular expression that would allow you to count the number of countries whose name begins with “United”? Assume that the variable with the country names in it is named countryNames. How many countries begin with United?


Answer


Fixing country names:

Q2_File$Country <- as.character(Q2_File$Country)
Q2_File$Country[99] <- "Côte d’Ivoire"
Q2_File$Country[186] <- "São Tomé and Príncipe"

Generating solution…

Q2_File$Country[grep("^United", Q2_File$Country)]
[1] "United States"        "United Kingdom"       "United Arab Emirates"


Options:

  1. grep(“*United“,countryNames), 2

  2. grep(“^United”,countryNames), 4

  3. grep(“^United”,countryNames), 3

  4. grep(“United$”,countryNames), 3




Question 4

Load the Gross Domestic Product data for the 190 ranked countries in this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv

Load the educational data from this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv

Match the data based on the country shortcode. Of the countries for which the end of the fiscal year is available, how many end in June?

Original data sources:

http://data.worldbank.org/data-catalog/GDP-ranking-table

http://data.worldbank.org/data-catalog/ed-stats


Answer


Loading packages…

library(data.table)


Download file…

Q4GDP_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
Q4GDP_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3GDP.csv"
Q4Edu_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
Q4Edu_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3Edu.csv"

download.file(Q4GDP_Url, Q4GDP_Path, method = "curl")
download.file(Q4Edu_Url, Q4Edu_Path, method = "curl")


Merging the data…

Q4GDP <- fread(Q4GDP_Path, skip = 5, nrows = 190, select = c(1, 2, 4, 5), col.names = c("CountryCode", "Rank", "Economy", "Total"))
Q4Edu <- fread(Q4Edu_Path)
Q4_Merge <- merge(Q4GDP, Q4Edu, by = 'CountryCode')
Q4_Merge


Computing solution…

FiscalJune <- grep("Fiscal year end: June", Q4_Merge$`Special Notes`)
NROW(FiscalJune)
[1] 13


Options:

  1. 13

  2. 7

  3. 16

  4. 8




Question 5

You can use the quantmod (http://www.quantmod.com/) package to get historical stock prices for publicly traded companies on the NASDAQ and NYSE. Use the following code to download data on Amazon’s stock price and get the times the data was sampled.

library(quantmod)
amzn = getSymbols("AMZN", auto.assign=FALSE)
sampleTimes = index(amzn)


How many values were collected in 2012? How many values were collected on Mondays in 2012?


Answer


Loading package…

library(quantmod)
library(lubridate)
amzn = getSymbols("AMZN", auto.assign=FALSE)
sampleTimes = index(amzn)


How many values were collected in 2012?

amzn2012 <- sampleTimes[grep("^2012", sampleTimes)]
NROW(amzn2012)
[1] 250


How many values were collected on Mondays in 2012?

NROW(amzn2012[weekdays(amzn2012) == "Monday"])
[1] 47


Options:

  1. 252, 50

  2. 250, 51

  3. 251, 47

  4. 250, 47



END

---
title: "Quiz 4"
output: html_notebook
---

<br />

---

## Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using **download.file()** from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

Apply **strsplit()** to split all the names of the data frame on the characters *"wgtp"*.

What is the value of the 123 element of the resulting list?

<br />

### Answer

<br/>
Download file...
```{r}
Q1Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
Q1 <- read.csv(Q1Url)
Q1
```
<br/>
Computing solution...
```{r}
Q1_colnames <- names(Q1)
strsplit(Q1_colnames, "^wgtp")[[123]]
```

<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. "wgt" "15"

b. "wgtp"

c. <u>**"" "15"**</u>

d. "wgtp" "15"

</div>


<br/>
---

<br />

## Question 2

Load the Gross Domestic Product data for the 190 ranked countries in this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv

Remove the commas from the GDP numbers in millions of dollars and average them. What is the average?

Original data sources:

http://data.worldbank.org/data-catalog/GDP-ranking-table 

<br />

### Answer

<br/>
Downloading file...
```{r}
Q2_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
Q2_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 4/Q3GDP.csv"
download.file(Q2_Url, Q2_Path, method = "curl")
```

Loading and tidying data...
```{r}
Q2_File <- read.csv(Q2_Path, nrow = 190, skip = 4)
Q2_File <- Q2_File[,c(1, 2, 4, 5)]
colnames(Q2_File) <- c("CountryCode", "Rank", "Country", "Total")
Q2_File
```
```{r}
Q2_File$Total <- as.integer(gsub(",", "", Q2_File$Total))
mean(Q2_File$Total, na.rm = T)
```


<br/>
<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. <u>**377652.4**</u>

b. 381668.9

c. 387854.4

d. 293700.3

</div>

<br/>

---

<br />

## Question 3

In the data set from Question 2 what is a regular expression that would allow you to count the number of countries whose name begins with *"United"*? Assume that the variable with the country names in it is named *countryNames*. How many countries begin with United?

<br />

### Answer

<br/>
Fixing country names:
```{r}
Q2_File$Country <- as.character(Q2_File$Country)
Q2_File$Country[99] <- "Côte d’Ivoire"
Q2_File$Country[186] <- "São Tomé and Príncipe"
```

Generating solution...
```{r}
Q2_File$Country[grep("^United", Q2_File$Country)]
```

<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. grep("*United",countryNames), 2

b. grep("^United",countryNames), 4

c. <u>**grep("^United",countryNames), 3**</u>

d. grep("United$",countryNames), 3

</div>

<br/>

---

<br />

## Question 4

Load the Gross Domestic Product data for the 190 ranked countries in this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv

Load the educational data from this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv

Match the data based on the country shortcode.
Of the countries for which the end of the fiscal year is available, how many end in June?

Original data sources:

http://data.worldbank.org/data-catalog/GDP-ranking-table

http://data.worldbank.org/data-catalog/ed-stats

<br />

### Answer

<br/>
Loading packages...
```{r}
library(data.table)
```

<br/>
Download file...
```{r}
Q4GDP_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
Q4GDP_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3GDP.csv"
Q4Edu_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
Q4Edu_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3Edu.csv"

download.file(Q4GDP_Url, Q4GDP_Path, method = "curl")
download.file(Q4Edu_Url, Q4Edu_Path, method = "curl")
```
<br/>
Merging the data...
```{r}
Q4GDP <- fread(Q4GDP_Path, skip = 5, nrows = 190, select = c(1, 2, 4, 5), col.names = c("CountryCode", "Rank", "Economy", "Total"))
Q4Edu <- fread(Q4Edu_Path)

Q4_Merge <- merge(Q4GDP, Q4Edu, by = 'CountryCode')
Q4_Merge
```
<br/>
Computing solution...
```{r}
FiscalJune <- grep("Fiscal year end: June", Q4_Merge$`Special Notes`)
NROW(FiscalJune)
```

<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. <U>**13**</u>

b. 7

c. 16

d. 8

</div>

<br/>

---

<br />

## Question 5

You can use the quantmod (http://www.quantmod.com/) package to get historical stock prices for publicly traded companies on the NASDAQ and NYSE. Use the following code to download data on Amazon's stock price and get the times the data was sampled.

```{r}
library(quantmod)
amzn = getSymbols("AMZN", auto.assign=FALSE)
sampleTimes = index(amzn)
```
<br/>
How many values were collected in 2012?
How many values were collected on Mondays in 2012?

<br />

### Answer

<br/>
Loading package...
```{r}
library(quantmod)
amzn = getSymbols("AMZN", auto.assign=FALSE)
sampleTimes = index(amzn)
```

<br/>
How many values were collected in 2012?
```{r}
amzn2012 <- sampleTimes[grep("^2012", sampleTimes)]
NROW(amzn2012)
```

<br/>
How many values were collected on Mondays in 2012?
```{r}
NROW(amzn2012[weekdays(amzn2012) == "Monday"])
```

<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. 252, 50

b. 250, 51

c. 251, 47

d. <u>**250, 47**</u>

</div>

<br/>

---

<center>**END**</center>

---