library(tidyverse)
library(openintro)
library(infer)
library("plotrix")

Exercise 1

What percent of the adults in your sample think climate change affects their local community? Hint: Just like we did with the population, we can calculate the proportion of those in this sample who think climate change affects their local community.

62 % of adults say YES

us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
ggplot(us_adults, aes(x = climate_change_affects)) + 
  geom_bar() +
  labs(
    x = "", y = "",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip()

us_adults %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62
n <- 60
samp <- us_adults %>%
  sample_n(size = n)
summary(us_adults)
##  climate_change_affects
##  Length:100000         
##  Class :character      
##  Mode  :character
table(samp$climate_change_affects)
## 
##  No Yes 
##  24  36

Exercise 2

Would you expect another student’s sample proportion to be identical to yours? Would you expect it to be similar? Why or Why not?

No, another student’s sample proportion may be slightly different from mine because sample proportion, p (point estimates) will vary from one sample to another depending on the simulation. …

Exercise 3

In the interpretation above, we used the phrase “95% confident”. What does “95% confidence” mean? Getting a sense of the precision of the estimate, we compute a 95% confidence which is a range of value above and below the point estimate within which the true value in the proportion is likely to lie with 95% confidence. The other 5% is the possibility that the true value is not within the confidence interval.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.467    0.717

Exercise 4

Does your confidence interval capture the true value proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

No. the lower limit of the interval is 45% and upper limit for 95% confidence interval is 72 %.

Exercise 5

Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

It is expected that only +/- 0.01 range of interval may be different to capture the true population mean.

Exercise 6

Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain, why. Make sure to include your plot in your answer.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.50)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.567     0.65
ggplot(samp, aes(x = climate_change_affects, y = "")) + 
  geom_point(colour = "black") +
  geom_smooth(method = "lm", se = TRUE, level = 0.50) +
  labs(title = "50% Confidence Interval") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold",hjust = 0.5))
## `geom_smooth()` using formula = 'y ~ x'

Exercise 7

Choose a different confidence level than 95%. Would you expect a confidence interval at this level to me wider or narrower than the the confidence interval you calculated at the 95% confidence level? Explain your reasoning.

As the confidence level is higher, the range of confidence interval is wider that will capture “true p”.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1      0.5      0.7

Exercise 8

Using code from the infer package and data from the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
us_adults %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.617    0.623

Exercise 9

Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
us_adults %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.50)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.619    0.621
us_adults<-round(data.frame(x = 1:20,
                      y = runif(20, 20, 40),
                      lower = runif(20, 0, 20),
                      upper = runif(20, 40, 50)), 4)
plotCI(x = us_adults$x,           
       y = us_adults$y,
       li = us_adults$lower,
       ui = us_adults$upper)

Exercise 10

Lastly, try one or more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated, then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true proportion.

samp <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.617    0.623
samp<-round(data.frame(x = 1:10,
                      y = runif(10, 10, 20),
                      lower = runif(10, 0, 10),
                      upper = runif(10, 20, 40)), 4)
ggplot(samp, aes(x, y)) + geom_point() + 
geom_errorbar(aes(ymin = lower, ymax = upper))

Exercise 11

Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

The widths of intervals increases, sample sizes decreases.

Exercise 12

Finally, given a sample size (say, 60), how does the width of the interval changes as you increase the number of bootstrap samples. Hint: Does changing the number of bootstrap samples affect the standard error?

No change in the width of the interval as the number of bootstrap samples increases.

---
title: "Lab 5B: Foundations for statistical inference - Confidence Intervals"
author: "Lwin Nandar Shwe"
date: "April-6"
output: openintro::lab_report
---

```{r load-packages, message=FALSE}
library(tidyverse)
library(openintro)
library(infer)
library("plotrix")
```

### Exercise 1

What percent of the adults in your sample think climate change affects their local community?
Hint: Just like we did with the population, we can calculate the proportion of those in this sample who think climate change affects their local community.

62 % of adults say YES

```{r code-chunk-label}
us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
ggplot(us_adults, aes(x = climate_change_affects)) + 
  geom_bar() +
  labs(
    x = "", y = "",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip()
us_adults %>%
  count(climate_change_affects) %>%
  mutate(p = n /sum(n))
n <- 60
samp <- us_adults %>%
  sample_n(size = n)
summary(us_adults)
table(samp$climate_change_affects)
```

### Exercise 2
Would you expect another student's sample proportion to be identical to yours? Would you expect it to be similar? Why or Why not?

No, another student's sample proportion may be slightly different from mine because sample proportion, p (point estimates) will vary from one sample to another depending on the simulation. 
...

### Exercise 3
In the interpretation above, we used the phrase "95% confident". What does "95% confidence" mean? 
Getting a sense of the precision of the estimate, we compute a 95% confidence which is a range of value above and below the point estimate within which the true value in the proportion is likely to lie with 95% confidence. The other 5% is the possibility that the true value is not within the confidence interval.

```{r confidence-interpretation}
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
```
### Exercise 4
Does your confidence interval capture the true value proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor's interval capture this value?

No. the lower limit of the interval is 45% and upper limit for 95% confidence interval is 72 %.

### Exercise 5
Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

It is expected that only +/- 0.01 range of interval may be different to capture the true population mean.

### Exercise 6
Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain, why. Make sure to include your plot in your answer.


```{r confidence-level}
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.50)
ggplot(samp, aes(x = climate_change_affects, y = "")) + 
  geom_point(colour = "black") +
  geom_smooth(method = "lm", se = TRUE, level = 0.50) +
  labs(title = "50% Confidence Interval") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold",hjust = 0.5))
```

### Exercise 7
Choose a different confidence level than 95%. Would you expect a confidence interval at this level to me wider or narrower than the the confidence interval you calculated at the 95% confidence level? Explain your reasoning. 

As the confidence level is higher, the range of confidence interval is wider that will capture "true p".

```{r different-interval}
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)
```
### Exercise 8
Using code from the infer package and data from the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

```{r us-adult}
us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
us_adults %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
```
### Exercise 9
Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

```{r us-adult2}
us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
us_adults %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.50)
us_adults<-round(data.frame(x = 1:20,
                      y = runif(20, 20, 40),
                      lower = runif(20, 0, 20),
                      upper = runif(20, 40, 50)), 4)
plotCI(x = us_adults$x,           
       y = us_adults$y,
       li = us_adults$lower,
       ui = us_adults$upper)

```

### Exercise 10
Lastly, try one or more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated, then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true proportion.

``` {r confedence-level}
samp <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)
samp<-round(data.frame(x = 1:10,
                      y = runif(10, 10, 20),
                      lower = runif(10, 0, 10),
                      upper = runif(10, 20, 40)), 4)
ggplot(samp, aes(x, y)) + geom_point() + 
geom_errorbar(aes(ymin = lower, ymax = upper))
```

### Exercise 11
Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

The widths of intervals increases, sample sizes decreases.

### Exercise 12
Finally, given a sample size (say, 60), how does the width of the interval changes as you increase the number of bootstrap samples. Hint: Does changing the number of bootstrap samples affect the standard error?

No change in the width of the interval as the number of bootstrap samples increases.
