主要議題:行政區界套圖

Sys.setlocale('LC_ALL','C')
[1] "C"
library(dplyr)
library(ggplot2)
library(maps)
library(ggmap)
library(caTools)


1. Drawing a Map of the US

1.1

If you look at the structure of the statesMap data frame using the str function, you should see that there are 6 variables. One of the variables, group, defines the different shapes or polygons on the map. Sometimes a state may have multiple groups, for example, if it includes islands. How many different groups are there?

# 1.1
#美國地圖
statesMap = map_data('state')
table(statesMap$group)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
 202  149  312  516   79   91   94   10  872  381  233  329  257  256  113 
  16   17   18   19   20   21   22   23   24   25   26   27   28   29   30 
 397  650  399  566   36  220   30  460  370  373  382  315  238  208   70 
  31   32   33   34   35   36   37   38   39   40   41   42   43   44   45 
 125  205   78   16  290   21  168   37  733   12  105  238  284  236  172 
  46   47   48   49   50   51   52   53   54   55   56   57   58   59   60 
  66  304  166  289 1088   59  129   96   15  623   17   17   19   44  448 
  61   62   63 
 373  388   68 
table(statesMap$group) %>% length
[1] 63
1.2

You can draw a map of the United States by typing the following in your R console:

ggplot(statesMap, aes(x=long, y=lat, group=group)) + 
  geom_polygon(fill="white", color="black")

We specified two colors in geom_polygon – fill and color. Which one defined the color of the outline of the states?

  • color
  • color=‘black’ 用黑色畫states輪廓


2 Coloring the States by Predictions

2.1 Predictive Model

Now, let’s color the map of the US according to our 2012 US presidential election predictions from the Unit 3 Recitation. We’ll rebuild the model here, using the dataset PollingImputed.csv. Be sure to use this file so that you don’t have to redo the imputation to fill in the missing values, like we did in the Unit 3 Recitation.

Load the data using the read.csv function, and call it “polling”. Then split the data using the subset function into a training set called “Train” that has observations from 2004 and 2008, and a testing set called “Test” that has observations from 2012.

Note that we only have 45 states in our testing set, since we are missing observations for Alaska, Delaware, Alabama, Wyoming, and Vermont, so these states will not appear colored in our map.

Then, create a logistic regression model and make predictions on the test set using the following commands:

polling = read.csv('data/PollingImputed.csv')
#2012年作為test data
trn = subset(polling, Year != 2012)
tst = subset(polling, Year == 2012)
#Republican:是否為共和黨
mod2 = glm(Republican~SurveyUSA+DiffCount, trn, family=binomial)
pred = predict(mod2,tst,type='response')
repub = as.numeric(pred > 0.5)
df = data.frame(pred, repub, state=tst$State)
head(df)

For how many states is our binary prediction 1 (for 2012), corresponding to Republican?

#repub:predict出來機率>0.5估為是Republican
sum(repub)
[1] 22

What is the average predicted probability of our model (on the Test set, for 2012)?

mean(pred) 
[1] 0.4852626
2.2 Merge Data into Map

Now, we need to merge “predictionDataFrame” with the map data “statesMap”, like we did in lecture. Before doing so, we need to convert the Test.State variable to lowercase, so that it matches the region variable in statesMap. Do this by typing the following in your R console:

df$region = tolower(df$state)
pmap = merge(statesMap, df, by='region') 

How many observations are there in predictionMap?

nrow(pmap)       # 15034
[1] 15034

How many observations are there in stateMap?

nrow(statesMap)  # 15537
[1] 15537
2.3 The Rule of merge()

When we merged the data in the previous problem, it caused the number of observations to change. Why? Check out the help page for merge by typing ?merge to help you answer this question.

  • Because we only make predictions for 45 states, we no longer have observations for some of the states. These observations were removed in the merging process.
2.4 Plot the color map

Now we are ready to color the US map with our predictions! You can color the states according to our binary predictions by typing the following in your R console:

# fill=repub:顏色區分是否為共和黨
pmap = pmap[order(pmap$group, pmap$order) , ]
ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) +
  geom_polygon(color='black')

The states appear light blue and dark blue in this map. Which color represents a Republican prediction?

  • Light blue
  • repub=1預測是共和黨,從圖上看是淺藍色
2.5

We see that the legend displays a blue gradient for outcomes between 0 and 1. However, when plotting the binary predictions there are only two possible outcomes: 0 or 1. Let’s replot the map with discrete outcomes. We can also change the color scheme to blue and red, to match the blue color associated with the Democratic Party in the US and the red color associated with the Republican Party in the US. This can be done with the following command:

Alternatively, we could plot the probabilities instead of the binary predictions. Change the plot command above to instead color the states by the variable TestPrediction.

#改用Prediction機率區分顏色
ggplot(pmap, aes(x=long, y=lat, group=group, fill=pred)) +
  geom_polygon(color='black') +
  scale_fill_gradient(
    low="blue", high="red", 
    guide="legend", breaks= c(0,1), 
    labels=c("Democrat", "Republican"), name="Prediction 2012")

You should see a gradient of colors ranging from red to blue. Do the colors of the states in the map for TestPrediction look different from the colors of the states in the map with TestPredictionBinary? Why or why not?

  • TestPredictionBinary只有0、1兩個factor,顏色只有兩種

  • TestPrediction用機率區分顏色,會有一個連續區間,顏色種類會比較多



3. Understanding the Predictions

3.1

In the 2012 election, the state of Florida ended up being a very close race. It was ultimately won by the Democratic party.

df$pred[ df$state == 'Florida'] # 0.96404
[1] 0.9640395

Did we predict this state correctly or incorrectly?

  • We incorrectly predicted this state by predicting that it would be won by the Republican party.
  • Florida在2012年選舉最終是由Democratic party獲選
3.2

What was our predicted probability for the state of Florida?

df$pred[ df$state == 'Florida'] # 0.96404
[1] 0.9640395

What does this imply?

  • Our prediction model did not do a very good job of correctly predicting the state of Florida, and we were very confident in our incorrect prediction.


4. Parameter Settings

In this part, we’ll explore what the different parameter settings of geom_polygon do. Throughout the problem, use the help page for geom_polygon, which can be accessed by ?geom_polygon. To see more information about a certain parameter, just type a question mark and then the parameter name to get the help page for that parameter. Experiment with different parameter settings to try and replicate the plots!

We’ll be asking questions about the following three plots:

grad = scale_fill_gradient(
  low="blue", high="red", 
  guide="legend", breaks= c(0,1), 
  labels=c("Democrat", "Republican"), name="Prediction 2012")
ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) + grad +
  geom_polygon(color='black',linetype=3,size=1) + ggtitle("Plot(1)")

ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) + grad +
  geom_polygon(color='black',linetype=1,size=3) + ggtitle("Plot(2)")

ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) + grad +
  geom_polygon(color='black',linetype=1,size=1,alpha=0.3) + ggtitle("Plot(3)")

4.1

Plots (1) and (2) were created by changing different parameters of geom_polygon from their default values.

What is the name of the parameter we changed to create plot (1)?

  • linetype
  • linetype=3會用虛線畫

What is the name of the parameter we changed to create plot (2)?

  • size
4.2

Plot (3) was created by changing the value of a different geom_polygon parameter to have value 0.3. Which parameter did we use?

  • alpha







---
title: "AS7-1 美國總統大選地圖"
author: "劉育銘, M064020025, 2018/07/25"
output: html_notebook
---

<br>

**主要議題：行政區界套圖**


```{r echo=T, message=F, cache=F, warning=F}
Sys.setlocale('LC_ALL','C')
library(dplyr)
library(ggplot2)
library(maps)
library(ggmap)
library(caTools)
```

<br><hr>

### 1. Drawing a Map of the US

##### 1.1 
If you look at the structure of the statesMap data frame using the str function, you should see that there are 6 variables. One of the variables, group, defines the different shapes or polygons on the map. Sometimes a state may have multiple groups, for example, if it includes islands. _How many different groups are there?_

```{r}
# 1.1

#美國地圖
statesMap = map_data('state')
table(statesMap$group)
table(statesMap$group) %>% length
```

##### 1.2
You can draw a map of the United States by typing the following in your R console:
```{r}
ggplot(statesMap, aes(x=long, y=lat, group=group)) + 
  geom_polygon(fill="white", color="black")
```
We specified two colors in geom_polygon -- `fill` and `color`. _Which one defined the color of the outline of the states?_

+ color
+ color='black' 用黑色畫states輪廓

<br><hr>

### 2 Coloring the States by Predictions

##### 2.1 Predictive Model

Now, let's color the map of the US according to our 2012 US presidential election predictions from the Unit 3 Recitation. We'll rebuild the model here, using the dataset PollingImputed.csv. Be sure to use this file so that you don't have to redo the imputation to fill in the missing values, like we did in the Unit 3 Recitation.

Load the data using the read.csv function, and call it "polling". Then split the data using the subset function into a training set called "Train" that has observations from 2004 and 2008, and a testing set called "Test" that has observations from 2012.

Note that we only have 45 states in our testing set, since we are missing observations for Alaska, Delaware, Alabama, Wyoming, and Vermont, so these states will not appear colored in our map.

Then, create a logistic regression model and make predictions on the test set using the following commands:


```{r}
polling = read.csv('data/PollingImputed.csv')

#2012年作為test data
trn = subset(polling, Year != 2012)
tst = subset(polling, Year == 2012)

#Republican:是否為共和黨
mod2 = glm(Republican~SurveyUSA+DiffCount, trn, family=binomial)
pred = predict(mod2,tst,type='response')
repub = as.numeric(pred > 0.5)
df = data.frame(pred, repub, state=tst$State)
head(df)
```

_For how many states is our binary prediction 1 (for 2012), corresponding to Republican?_
```{r}
#repub:predict出來機率>0.5估為是Republican
sum(repub)
```

_What is the average predicted probability of our model (on the Test set, for 2012)?_
```{r}
mean(pred) 
```

##### 2.2 Merge Data into Map
Now, we need to merge "predictionDataFrame" with the map data "statesMap", like we did in lecture. Before doing so, we need to convert the Test.State variable to lowercase, so that it matches the region variable in statesMap. Do this by typing the following in your R console:

```{r}
df$region = tolower(df$state)
pmap = merge(statesMap, df, by='region') 
```

_How many observations are there in predictionMap?_
```{r}
nrow(pmap)       # 15034
```

_How many observations are there in stateMap?_
```{r}
nrow(statesMap)  # 15537
```

##### 2.3 The Rule of `merge()`
_When we merged the data in the previous problem, it caused the number of observations to change. Why?_ Check out the help page for merge by typing ?merge to help you answer this question.

+ Because we only make predictions for 45 states, we no longer have observations for some of the states. These observations were removed in the merging process.
+

##### 2.4 Plot the color map
Now we are ready to color the US map with our predictions! You can color the states according to our binary predictions by typing the following in your R console:
```{r}
# fill=repub:顏色區分是否為共和黨
pmap = pmap[order(pmap$group, pmap$order) , ]
ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) +
  geom_polygon(color='black')
```
The states appear light blue and dark blue in this map. _Which color represents a Republican prediction?_

+ Light blue
+ repub=1預測是共和黨，從圖上看是淺藍色

##### 2.5
We see that the legend displays a blue gradient for outcomes between 0 and 1. However, when plotting the binary predictions there are only two possible outcomes: 0 or 1. Let's replot the map with discrete outcomes. We can also change the color scheme to blue and red, to match the blue color associated with the Democratic Party in the US and the red color associated with the Republican Party in the US. This can be done with the following command:

Alternatively, we could plot the probabilities instead of the binary predictions. Change the plot command above to instead color the states by the variable TestPrediction. 

```{r}
#改用Prediction機率區分顏色

ggplot(pmap, aes(x=long, y=lat, group=group, fill=pred)) +
  geom_polygon(color='black') +
  scale_fill_gradient(
    low="blue", high="red", 
    guide="legend", breaks= c(0,1), 
    labels=c("Democrat", "Republican"), name="Prediction 2012")
```
You should see a gradient of colors ranging from red to blue. _Do the colors of the states in the map for TestPrediction look different from the colors of the states in the map with TestPredictionBinary? Why or why not?_

+ TestPredictionBinary只有0、1兩個factor，顏色只有兩種

+ TestPrediction用機率區分顏色，會有一個連續區間，顏色種類會比較多

<br><hr>

### 3. Understanding the Predictions

##### 3.1 
In the 2012 election, the state of Florida ended up being a very close race. It was ultimately won by the Democratic party. 
```{r}
df$pred[ df$state == 'Florida'] # 0.96404
```
_Did we predict this state correctly or incorrectly? _

+ We incorrectly predicted this state by predicting that it would be won by the Republican party. 
+ Florida在2012年選舉最終是由Democratic party獲選

##### 3.2
_What was our predicted probability for the state of Florida?_
```{r}
df$pred[ df$state == 'Florida'] # 0.96404
```

_What does this imply?_

+ Our prediction model did not do a very good job of correctly predicting the state of Florida, and we were very confident in our incorrect prediction.
+

<br><hr>

##### 4. Parameter Settings
In this part, we'll explore what the different parameter settings of geom_polygon do. Throughout the problem, use the help page for geom_polygon, which can be accessed by ?geom_polygon. To see more information about a certain parameter, just type a question mark and then the parameter name to get the help page for that parameter. Experiment with different parameter settings to try and replicate the plots!

We'll be asking questions about the following three plots:
```{r}
grad = scale_fill_gradient(
  low="blue", high="red", 
  guide="legend", breaks= c(0,1), 
  labels=c("Democrat", "Republican"), name="Prediction 2012")
```

```{r}
ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) + grad +
  geom_polygon(color='black',linetype=3,size=1) + ggtitle("Plot(1)")
```

```{r}
ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) + grad +
  geom_polygon(color='black',linetype=1,size=3) + ggtitle("Plot(2)")
```


```{r}
ggplot(pmap, aes(x=long, y=lat, group=group, fill=repub)) + grad +
  geom_polygon(color='black',linetype=1,size=1,alpha=0.3) + ggtitle("Plot(3)")
```

##### 4.1 
Plots (1) and (2) were created by changing different parameters of geom_polygon from their default values.

_What is the name of the parameter we changed to create plot (1)?_

+ linetype
+ linetype=3會用虛線畫

_What is the name of the parameter we changed to create plot (2)?_

+ size
+


##### 4.2 
Plot (3) was created by changing the value of a different geom_polygon parameter to have value 0.3. _Which parameter did we use?_

+ alpha
+

<br><hr>

<br><br><br><br><br>

<style>
.caption {
  color: #777;
  margin-top: 10px;
}
p code {
  white-space: inherit;
}
pre {
  word-break: normal;
  word-wrap: normal;
  line-height: 1;
}
pre code {
  white-space: inherit;
}
p,li {
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

.r{
  line-height: 1.2;
}

title{
  color: #cc0000;
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

body{
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

h1,h2,h3,h4,h5{
  color: #008800;
  font-family: "Trebuchet MS", "微軟正黑體", "Microsoft JhengHei";
}

h3{
  color: #b36b00;
  background: #ffe0b3;
  line-height: 2;
  font-weight: bold;
}

h5{
  color: #006000;
  background: #ffffe0;
  line-height: 2;
  font-weight: bold;
}

em{
  color: #0000c0;
  background: #f0f0f0;
  }

</style>

