Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

Create a logical vector that identifies the households on greater than 10 acres who sold more than $10,000 worth of agriculture products. Assign that logical vector to the variable agricultureLogical. Apply the which() function like this to identify the rows of the data frame where the logical vector is TRUE.

which(agricultureLogical)

What are the first 3 values that result?


Answer


Download file…

Q1Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
Q1 <- read.csv(Q1Url)
Q1


Computing solution…

agricultureLogical <- Q1$ACR == 3 & Q1$AGS == 6
which(agricultureLogical)
 [1]  125  238  262  470  555  568  608  643  787  808  824  849  952  955 1033 1265 1275 1315 1388 1607
[21] 1629 1651 1856 1919 2101 2194 2403 2443 2539 2580 2655 2680 2740 2838 2965 3131 3133 3163 3291 3370
[41] 3402 3585 3652 3852 3862 3912 4023 4045 4107 4113 4117 4185 4198 4310 4343 4354 4448 4453 4461 4718
[61] 4817 4835 4910 5140 5199 5236 5326 5417 5531 5574 5894 6033 6044 6089 6275 6376 6420


Options:

  1. 125, 238,262

  2. 403, 756, 798

  3. 236, 238, 262

  4. 59, 460, 474



Question 2

Using the jpeg package read in the following picture of your instructor into R

https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg

Use the parameter native=TRUE. What are the 30th and 80th quantiles of the resulting data? (some Linux systems may produce an answer 638 different for the 30th quantile)


Answer


Loading package…

library(jpeg)


Downloading file…

Q2Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg"
Q2Path = '/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q2.jpg'
download.file(Q2Url, Q2Path, mode = 'wb')
Q2 <- readJPEG(Q2Path, native = TRUE)


Computing solution…

quantile(Q2, probs = c(0.3, 0.8))
      30%       80% 
-15258512 -10575416 


‘some Linux systems may produce an answer 638 different for the 30th quantile.’

Mine is Linux, so…

paste(quantile(Q2, probs = 0.3) - 638, quantile(Q2, probs = 0.8))
[1] "-15259150 -10575416"

Options:

  1. -15259150 -10575416

  2. -10904118 -10575416

  3. 10904118 -594524

  4. -16776430 -15390165




Question 3

Load the Gross Domestic Product data for the 190 ranked countries in this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv

Load the educational data from this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv

Match the data based on the country shortcode. How many of the IDs match?

Sort the data frame in descending order by GDP rank (so United States is last). What is the 13th country in the resulting data frame?

Original data sources:

http://data.worldbank.org/data-catalog/GDP-ranking-table

http://data.worldbank.org/data-catalog/ed-stats


Answer


Loading packages…

library(dplyr)
library(data.table)


Download file…

Q3GDP_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
Q3GDP_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3GDP.csv"
Q3Edu_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
Q3Edu_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3Edu.csv"

download.file(Q3GDP_Url, Q3GDP_Path, method = "curl")
download.file(Q3Edu_Url, Q3Edu_Path, method = "curl")


Analyze the data…

Q3GDP <- fread(Q3GDP_Path, skip = 5, nrows = 190, select = c(1, 2, 4, 5), col.names = c("CountryCode", "Rank", "Economy", "Total"))
Q3Edu <- fread(Q3Edu_Path)
Q3GDP
Q3Edu


Merging and sorting data…

Q3_Merge <- merge(Q3GDP, Q3Edu, by = 'CountryCode')
Q3_Merge <- Q3_Merge %>% arrange(desc(Rank))
Q3_Merge


Generating solution…

paste(nrow(Q3_Merge), " matches, 13th country is ", Q3_Merge$Economy[13])
[1] "189  matches, 13th country is  St. Kitts and Nevis"


Options:

  1. 189 matches, 13th country is Spain

  2. 234 matches, 13th country is St. Kitts and Nevis

  3. 190 matches, 13th country is St. Kitts and Nevis

  4. 190 matches, 13th country is Spain

  5. 189 matches, 13th country is St. Kitts and Nevis

  6. 234 matches, 13th country is Spain




Question 4

What is the average GDP ranking for the “High income: OECD” and “High income: nonOECD” group?


Answer


Computing solution…

Q3_Merge %>% group_by(`Income Group`) %>%
    filter("High income: OECD" %in% `Income Group` | "High income: nonOECD" %in% `Income Group`) %>%
    summarize(Average = mean(Rank, na.rm = T)) %>%
    arrange(desc(`Income Group`))


Options:

  1. 32.96667, 91.91304

  2. 133.72973, 32.96667

  3. 23, 45

  4. 30, 37

  5. 23.966667, 30.91304

  6. 23, 30




Question 5

Cut the GDP ranking into 5 separate quantile groups. Make a table versus Income.Group. How many countries are Lower middle income but among the 38 nations with highest GDP?


Answer


Computing solution…

Q3_Merge$RankGroups <- cut(Q3_Merge$Rank, breaks = 5)
vs <- table(Q3_Merge$RankGroups, Q3_Merge$`Income Group`)
vs
              
               High income: nonOECD High income: OECD Lower middle income Low income Upper middle income
  (0.811,38.8]                    4                18                   5          0                  11
  (38.8,76.6]                     5                10                  13          1                   9
  (76.6,114]                      8                 1                  12          9                   8
  (114,152]                       4                 1                   8         16                   8
  (152,190]                       2                 0                  16         11                   9
vs[1, "Lower middle income"]
[1] 5


Options:

  1. 0

  2. 18

  3. 3

  4. 5



END

---
title: "Quiz 3"
output: html_notebook
---

<br />

---

## Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using **download.file()** from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

and load the data into R. The code book, describing the variable names is here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

Create a logical vector that identifies the households on greater than 10 acres who sold more than $10,000 worth of agriculture products. Assign that logical vector to the variable *agricultureLogical.* Apply the **which()** function like this to identify the rows of the data frame where the logical vector is *TRUE*.

*which(agricultureLogical)*

What are the first 3 values that result?

<br />

### Answer

<br/>
Download file...
```{r}
Q1Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
Q1 <- read.csv(Q1Url)
Q1
```
<br/>
Computing solution...
```{r}
agricultureLogical <- Q1$ACR == 3 & Q1$AGS == 6
which(agricultureLogical)
```
<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. <u>**125, 238,262**</u>

b. 403, 756, 798

c. 236, 238, 262

d. 59, 460, 474

</div>


<br/>
---

<br />

## Question 2

Using the jpeg package read in the following picture of your instructor into R

https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg

Use the parameter *native=TRUE*. What are the 30th and 80th quantiles of the resulting data? (some Linux systems may produce an answer 638 different for the 30th quantile)

<br />

### Answer

<br/>
Loading package...
```{r}
library(jpeg)
```
<br/>
Downloading file...
```{r}
Q2Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg"
Q2Path = '/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q2.jpg'
download.file(Q2Url, Q2Path, mode = 'wb')
Q2 <- readJPEG(Q2Path, native = TRUE)
```
<br/>
Computing solution...
```{r}
quantile(Q2, probs = c(0.3, 0.8))
```
<br/>
*'some Linux systems may produce an answer 638 different for the 30th quantile.'*

Mine is Linux, so...
```{r}
paste(quantile(Q2, probs = 0.3) - 638, quantile(Q2, probs = 0.8))
```
<br/>
<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. <u>**-15259150 -10575416**</u>

b. -10904118 -10575416

c. 10904118 -594524

d. -16776430 -15390165

</div>

<br/>

---

<br />

## Question 3

Load the Gross Domestic Product data for the 190 ranked countries in this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv

Load the educational data from this data set:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv

Match the data based on the country shortcode. How many of the IDs match?

Sort the data frame in descending order by GDP rank (so United States is last). What is the 13th country in the resulting data frame?

Original data sources:

http://data.worldbank.org/data-catalog/GDP-ranking-table

http://data.worldbank.org/data-catalog/ed-stats

<br />

### Answer

<br/>
Loading packages...
```{r}
library(dplyr)
library(data.table)
```

<br/>
Download file...
```{r}
Q3GDP_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
Q3GDP_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3GDP.csv"
Q3Edu_Url <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
Q3Edu_Path <- "/home/cabunic/Data Science/Coursera/3 - Getting and Cleaning Data/Week 3/Q3Edu.csv"

download.file(Q3GDP_Url, Q3GDP_Path, method = "curl")
download.file(Q3Edu_Url, Q3Edu_Path, method = "curl")
```
<br/>
Analyze the data...
```{r}
Q3GDP <- fread(Q3GDP_Path, skip = 5, nrows = 190, select = c(1, 2, 4, 5), col.names = c("CountryCode", "Rank", "Economy", "Total"))
Q3Edu <- fread(Q3Edu_Path)

Q3GDP
Q3Edu
```
<br/>
Merging and sorting data...
```{r}
Q3_Merge <- merge(Q3GDP, Q3Edu, by = 'CountryCode')
Q3_Merge <- Q3_Merge %>% arrange(desc(Rank))
Q3_Merge
```
<br/>
Generating solution...
```{r}
paste(nrow(Q3_Merge), " matches, 13th country is ", Q3_Merge$Economy[13])
```

<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. 189 matches, 13th country is Spain

b. 234 matches, 13th country is St. Kitts and Nevis

c. 190 matches, 13th country is St. Kitts and Nevis

d. 190 matches, 13th country is Spain

e. <u>**189 matches, 13th country is St. Kitts and Nevis**</u>

f. 234 matches, 13th country is Spain

</div>

<br/>

---

<br />

## Question 4

What is the average GDP ranking for the "High income: OECD" and "High income: nonOECD" group? 

<br />

### Answer

<br/>
Computing solution...
```{r}
Q3_Merge %>% group_by(`Income Group`) %>%
    filter("High income: OECD" %in% `Income Group` | "High income: nonOECD" %in% `Income Group`) %>%
    summarize(Average = mean(Rank, na.rm = T)) %>%
    arrange(desc(`Income Group`))
```


<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. <u>**32.96667, 91.91304**</u>

b. 133.72973, 32.96667

c. 23, 45

d. 30, 37

f. 23.966667, 30.91304

g. 23, 30

</div>

<br/>

---

<br />

## Question 5

Cut the GDP ranking into 5 separate quantile groups. Make a table versus Income.Group. How many countries are Lower middle income but among the 38 nations with highest GDP?

<br />

### Answer

<br/>
Computing solution...
```{r}
Q3_Merge$RankGroups <- cut(Q3_Merge$Rank, breaks = 5)
vs <- table(Q3_Merge$RankGroups, Q3_Merge$`Income Group`)
vs
```
```{r}
vs[1, "Lower middle income"]
```
<br/>

<div style= "border: 5px dotted gray; padding: 10px 20px; background-color:#e8e8e8; box-shadow: 0 1px 5px rgba(0, 0, 0, 0.25);">
**Options:**

a. 0

b. 18

c. 3

d. <u>**5**</u>

</div>

<br/>

---

<center>**END**</center>

---