課本第四章習題:第7題、第8題、第9題、第12題、第13題

4.7

Suppose that we wish to predict whether a given stock will issue a dividend this year (“Yes” or “No”) based on \(X\), last year’s percent profit. We examine a large number of companies and discover that the mean value of \(X\) for companies that issued a dividend was \(\overline{X}\) = 10, while the mean for those that didn’t was \(\overline{X}\) = 0. In addition, the variance of X for these two sets of companies was \(\sigma^2\) = 36. Finally, 80 % of companies issued dividends. Assuming that X follows a normal distribution, predict the probability that a company will issue a dividend this year given that its percentage profit was \(X\) = 4 last year.

Hint: Recall that the density function for a normal random variable is \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\] You will need to use Bayes’ theorem.

(4.12)

x = 4
mean1 = 10
mean2 = 0
variance = 36
prior1 = 0.8
prior2 = 0.2
#fk1 = 1/sqrt(2*prior*variance)*exp(-(x-mean)^2 / 2*variance);fk1
fk1 = (prior1*exp(-(x-mean1)^2 / 2*variance))/(prior1*exp(-(x-mean1)^2 / 2*variance) + prior2*exp(-(x-mean2)^2 / 2*variance)) ; fk1
[1] 1.803211e-156

4.8

Suppose that we take a data set, divide it into equally-sized training and test sets, and then try out two different classification procedures. First we use logistic regression and get an error rate of 20 % on the training data and 30 % on the test data. Next we use 1-nearest neigh- bors (i.e. K = 1) and get an average error rate (averaged over both test and training data sets) of 18 %. Based on these results, which method should we prefer to use for classification of new observations? Why?

logistic是線性的、paramatric的,knn是非線性、non-paramatric的,knn不論在test或train中error rate都比logistic小,代表這份資料是適合使用knn這類型的方式。

4.9

This problem has to do with odds.

  1. On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?

\[odds = \frac {P(X)}{1-P(X)}\]

p = 0.37/(1+0.37);p
[1] 0.270073

由odds算出約有27%的人會違約

  1. Suppose that an individual has a 16 % chance of defaulting on her credit card payment. What are the odds that she will de- fault?
odds = 0.16/(1-0.16);odds
[1] 0.1904762

4.12

This problem involves writing functions. (a) Write a function, Power(), that prints out the result of raising 2 to the 3rd power. In other words, your function should compute \(2^3\) and print out the results. Hint: Recall that x^a raises x to the power a. Use the print() function to output the result.

power <- function(x){
  print (x^3)
}
  1. Create a new function, Power2(), that allows you to pass any two numbers, x and a, and prints out the value of x^a. You can do this by beginning your function with the line

    Power2=function(x,a){

You should be able to call your function by entering, for instance,

Power2(3,8)

on the command line. This should output the value of 38, namely, 6, 561.

power2 <- function(x,a){
  print(x^a)
}
  1. Using the Power2() function that you just wrote, compute \(10^3\), \(8^17\), and \(131^3\).
power2(10,3)
[1] 1000
power2(8,17)
[1] 2.2518e+15
power2(131,3)
[1] 2248091
  1. Now create a new function, Power3(), that actually returns the result x^a as an R object, rather than simply printing it to the screen. That is, if you store the value x^a in an object called result within your function, then you can simply return() this

    return()

result, using the following line:

return(result)

The line above should be the last line in your function, before the } symbol.

power3 <- function(x,a){
  result = x^a
  return(result)
}
  1. Now using the Power3() function, create a plot of f(x) = \(x^2\). The x-axis should display a range of integers from 1 to 10, and the y-axis should display \(x^2\). Label the axes appropriately, and use an appropriate title for the figure. Consider displaying either the x-axis, the y-axis, or both on the log-scale. You can do this by using log=‘‘x’’, log=‘‘y’’, or log=‘‘xy’’ as arguments to the plot() function.
x = c(1:10)
plot(x,power3(x, 2),xlab="Log of x",ylab="Log of x^2", main = "Log of x^2 vs Log of x",log = "xy")

  1. Create a function, PlotPower(), that allows you to create a plot of x against x^a for a fixed a and for a range of values of x. For instance, if you call

    PlotPower (1:10,3)

then a plot should be created with an x-axis taking on values 1, 2, . . . , 10, and a y-axis taking on values \(1^3\), \(2^3\), . . . , \(10^3\).

PlotPower<- function(x,a){
  yt = paste("Log of x^",a)
  maint = paste(yt,"vs Log of x")
  plot(x,power3(x, a),xlab="Log of x",ylab=yt, main = maint,log = "xy")
}
PlotPower(c(1:10),3)

4.13

Using the Boston data set, fit classification models in order to predict whether a given suburb has a crime rate above or below the median. Explore logistic regression, LDA, and KNN models using various sub- sets of the predictors. Describe your findings.

library(ISLR)
data(Boston)
summary(Boston$crim)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.00632  0.08204  0.25651  3.61352  3.67708 88.97620 
#median = 0.25651
Boston$below <- ifelse(Boston$crim > 0.25651,1,0)
Boston$below <-  as.factor(Boston$below)
# below = 1 :高於平均 / 0 : 低於平均
library(corrplot)
Boston$below <- as.numeric(Boston$below)
cor <- cor(Boston)
corrplot(cor)

set.seed(144)
library(caTools)
split = sample.split(Boston$below,SplitRatio = 0.7)
TR = subset(Boston,split == TRUE)
TS = subset(Boston,split == FALSE)
glm1 <- glm(below~zn+indus+nox+age+dis+rad+tax+black+lstat,TR,family = "binomial")#correlation高的
pred = predict(glm1, newdata=TS, type="response")
x = table(actual = TS$below,predict = pred > 0.5);x
      predict
actual FALSE TRUE
     0    67    9
     1     9   67
acc = sum(diag(x))/sum(x);acc
[1] 0.8815789
library(MASS)
lda1 <- lda(below~zn+indus+nox+age+dis+rad+tax+black+lstat,TR)
pred <- predict(lda1, TS)
x <- table(actual = TS$below , pred = pred$class);x
      pred
actual  0  1
     0 68  8
     1 16 60
acc = sum(diag(x))/sum(x) ; acc
[1] 0.8421053
library(class)
TR.X <- as.matrix(TR$below)
TS.X <- as.matrix(TS$below)
set.seed(1)
pred <- knn(TR.X,TS.X,TR$below, k = 1)
x <- table(pred = pred,actual = TS$below);x
    actual
pred  0  1
   0 76  0
   1  0 76
acc = sum(diag(x))/sum(x) ; acc
[1] 1

在同樣的predictor組合下,準確率是 knn(k=1) > logistic > lda,代表這份資料適合使用non-parametric,非線性的方法

---
title: "Statistical learning HW8"
author: "M074020010 陳韻卉"
output:
  html_notebook: default
  word_document: default
---

課本第四章習題：第7題、第8題、第9題、第12題、第13題


## 4.7
Suppose that we wish to predict whether a given stock will issue a
dividend this year (“Yes” or “No”) based on $X$, last year’s percent
proﬁt. We examine a large number of companies and discover that the
mean value of $X$ for companies that issued a dividend was $\overline{X}$ = 10,
while the mean for those that didn’t was $\overline{X}$ = 0. In addition, the variance of X for these two sets of companies was $\sigma^2$ = 36. Finally, 80 % of companies issued dividends. Assuming that X follows a normal distribution, predict the probability that a company will issue
a dividend this year given that its percentage proﬁt was $X$ = 4 last
year.

Hint: Recall that the density function for a normal random variable
is 
$$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$$ 
You will need to use Bayes’ theorem.<br>

![](./1.PNG)
(4.12)

```{r}
x = 4
mean1 = 10
mean2 = 0
variance = 36
prior1 = 0.8
prior2 = 0.2
#fk1 = 1/sqrt(2*prior*variance)*exp(-(x-mean)^2 / 2*variance);fk1
fk1 = (prior1*exp(-(x-mean1)^2 / 2*variance))/(prior1*exp(-(x-mean1)^2 / 2*variance) + prior2*exp(-(x-mean2)^2 / 2*variance)) ; fk1
```


## 4.8

Suppose that we take a data set, divide it into equally-sized training
and test sets, and then try out two diﬀerent classiﬁcation procedures.
First we use logistic regression and get an error rate of 20 % on the
training data and 30 % on the test data. Next we use 1-nearest neigh-
bors (i.e. K = 1) and get an average error rate (averaged over both
test and training data sets) of 18 %. Based on these results, which
method should we prefer to use for classiﬁcation of new observations?
Why?

logistic是線性的、paramatric的，knn是非線性、non-paramatric的，knn不論在test或train中error rate都比logistic小，代表這份資料是適合使用knn這類型的方式。

## 4.9
This problem has to do with odds.

(a) On average, what fraction of people with an odds of 0.37 of
defaulting on their credit card payment will in fact default?

$$odds = \frac {P(X)}{1-P(X)}$$

```{r}
p = 0.37/(1+0.37);p
```
由odds算出約有27%的人會違約

(b) Suppose that an individual has a 16 % chance of defaulting on
her credit card payment. What are the odds that she will de-
fault?

```{r}
odds = 0.16/(1-0.16);odds
```


## 4.12

 This problem involves writing functions.
(a) Write a function, Power(), that prints out the result of raising 2
to the 3rd power. In other words, your function should compute
$2^3$ and print out the results.
Hint: Recall that x^a raises x to the power a. Use the print()
function to output the result.

```{r}
power <- function(x){
  print (x^3)
}
```


(b) Create a new function, Power2(), that allows you to pass any
two numbers, x and a, and prints out the value of x^a. You can
do this by beginning your function with the line

    Power2=function(x,a){
    
You should be able to call your function by entering, for instance,

    Power2(3,8)

on the command line. This should output the value of 38, namely,
6, 561.

```{r}
power2 <- function(x,a){
  print(x^a)
}
```

(c) Using the Power2() function that you just wrote, compute $10^3$,
$8^17$, and $131^3$.

```{r}
power2(10,3)
power2(8,17)
power2(131,3)
```

(d) Now create a new function, Power3(), that actually returns the
result x^a as an R object, rather than simply printing it to the
screen. That is, if you store the value x^a in an object called
result within your function, then you can simply return() this

    return()

result, using the following line:

    return(result)

The line above should be the last line in your function, before
the } symbol.

```{r}
power3 <- function(x,a){
  result = x^a
  return(result)
}
```

(e) Now using the Power3() function, create a plot of f(x) = $x^2$.
The x-axis should display a range of integers from 1 to 10, and
the y-axis should display $x^2$. Label the axes appropriately, and
use an appropriate title for the ﬁgure. Consider displaying either
the x-axis, the y-axis, or both on the log-scale. You can do this
by using log=‘‘x’’, log=‘‘y’’, or log=‘‘xy’’ as arguments to
the plot() function.

```{r}
x = c(1:10)
plot(x,power3(x, 2),xlab="Log of x",ylab="Log of x^2", main = "Log of x^2 vs Log of x",log = "xy")

```


(f) Create a function, PlotPower(), that allows you to create a plot
of x against x^a for a ﬁxed a and for a range of values of x. For
instance, if you call

    PlotPower (1:10,3)

then a plot should be created with an x-axis taking on values
1, 2, . . . , 10, and a y-axis taking on values $1^3$, $2^3$, . . . , $10^3$.

```{r}
PlotPower<- function(x,a){
  yt = paste("Log of x^",a)
  maint = paste(yt,"vs Log of x")
  plot(x,power3(x, a),xlab="Log of x",ylab=yt, main = maint,log = "xy")
}
```


```{r}
PlotPower(c(1:10),3)
```

## 4.13

Using the Boston data set, ﬁt classiﬁcation models in order to predict
whether a given suburb has a crime rate above or below the median.
Explore logistic regression, LDA, and KNN models using various sub-
sets of the predictors. Describe your ﬁndings.

```{r}
library(ISLR)
data(Boston)

summary(Boston$crim)

```

```{r}
#median = 0.25651
Boston$below <- ifelse(Boston$crim > 0.25651,1,0)
# below = 1 :高於平均 / 0 : 低於平均
```

```{r}
library(corrplot)
Boston$below <- as.numeric(Boston$below)
cor <- cor(Boston)
corrplot(cor)
```


```{r}
set.seed(144)
library(caTools)
Boston$below <-  as.factor(Boston$below)
split = sample.split(Boston$below,SplitRatio = 0.7)
TR = subset(Boston,split == TRUE)
TS = subset(Boston,split == FALSE)
```

```{r}

glm1 <- glm(below~zn+indus+nox+age+dis+rad+tax+black+lstat,TR,family = "binomial")#correlation高的
```


```{r}
pred = predict(glm1, newdata=TS, type="response")
x = table(actual = TS$below,predict = pred > 0.5);x
acc = sum(diag(x))/sum(x);acc
```


```{r}
library(MASS)
lda1 <- lda(below~zn+indus+nox+age+dis+rad+tax+black+lstat,TR)
```


```{r}
pred <- predict(lda1, TS)
x <- table(actual = TS$below , pred = pred$class);x
acc = sum(diag(x))/sum(x) ; acc
```

```{r}
library(class)
TR.X <- as.matrix(TR$below)
TS.X <- as.matrix(TS$below)
set.seed(1)
pred <- knn(TR.X,TS.X,TR$below, k = 1)
x <- table(pred = pred,actual = TS$below);x
acc = sum(diag(x))/sum(x) ; acc
```

在同樣的predictor組合下，準確率是 knn(k=1) > logistic > lda，代表這份資料適合使用non-parametric,非線性的方法

