Getting Started

Loading packages

library(tidyverse)
library(openintro)
library(infer)
set.seed(99)

Creating the data

We will assume a total population size of 100,000 even though that’s much smaller than the population of all US adults to keep our computations simple. The proportion of interest is: Roughly six-in-ten U.S adults (62%) say climate change in currently affecting their local community either a great deal or some, according to a new Pew Research Center survey.

us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)

Visualization of distribution of responses:

ggplot(us_adults, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "", y ="",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip()

Obtaining summary statistics:

us_adults %>% count(climate_change_affects) %>% mutate (p = n/sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62

Starting with sample size of 60

n <- 60
samp <- us_adults %>%
  sample_n(size = n)

1. What percent of the adults in your sample think climate change affects their local community?

samp %>% count(climate_change_affects) %>% mutate(p = n/sum(n))
## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                        21  0.35
## 2 Yes                       39  0.65

About 55% of adults in this sample think climate change affects their local community.

2. Would you expect another student’s sample proportion to be identical to yours? Would you expect it to be similar? Why or why not?

I would expect it to be similar, but not identical. We are pulling data from the same source but through the repeated sampling process, our samples should be around the population proportion of 62%.

Confidence Intervals

Finding the 95% confidence interval for proportion of US adults who think climate change affects their local community

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>% 
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.533    0.767

Confidence levels

1. In the interpretation above, we used the phrase “95% confident”. What does “95% confidence” mean?

We are 95% confident that the true proportion of U.S. adults that think that climate change is affecting their local community is between 0.417 and 0.667. 95% confidence means that 95% of the times we gather samples and plot their means, it will fall between these parameters.

2. Does your confidence interval capture the true population proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

Yes, it will capture the true population proportion of US adults who think climate change affects their local community 95% of the time. If we are in a classroom, we can expect our neighbor’s interval to capture this value.

3. Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

I would expect 95% of these intervals to capture the true population proportion since that is what we were aiming for from the beginning and was integrated in our calculations.

Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why. Make sure to include your plot in your answer.

In this simulation, 48 out of 50 of the confidence intervals constructed included the true population proportion. This is consistent with our 95% confidence level. While not exactly the same value, this is expected due to the nature of this process.

More Practice

1. If we chose a confidence level lower than 95%, it would narrow the interval. This is because there is more uncertainty as we go down in confidence, and the true proportion will be captured at a lesser rate than 95%.

2. Using code from the infer package and data from the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>% 
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.533     0.75

We are 90% confident that the true proportion of US adults that think that climate change is affecting their local communities is between 0.433 and 0.650

3. Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

I utilized a 90% confidence level. The proportion of intervals that that included the true proportion was 44/50 which 0.88, or 88%. This is very close to our confidence level.

4. Lastly, try one more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated. Then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true population proportion.

samp %>%
  specify(response = climate_change_affects, success = "Yes") %>% 
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "prop") %>%
  get_ci(level = 0.80)
## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.567    0.733

Our app is consistent with our bootstrapping simulation, in which 41 out of 50 confidence intervals captured the true population proportion.

5. Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

We notice that as we increase the sample size, the width of the intervals decreases. The contrary occurs when we decrease sample size.

6. Finally, given a sample size (say, 60), how does the width of the interval change as you increase the number of bootstrap samples.

Widths do not seem to change as adjust the number of bootstrap samples.

---
title: "MATH217 Lab Hw"
author: "Ibrahim Pinzon Perez"
date: "`r Sys.Date()`"
output: openintro::lab_report
---

## Getting Started

### Loading packages

```{r, message=FALSE}
library(tidyverse)
library(openintro)
library(infer)
set.seed(99)
```

### Creating the data

We will assume a total population size of 100,000 even though that's much smaller than the population of all US adults to keep our computations simple. The proportion of interest is: Roughly six-in-ten U.S adults (62%) say climate change in currently affecting their local community either a great deal or some, according to a new Pew Research Center survey. 

```{r}
us_adults <- tibble(
  climate_change_affects = c(rep("Yes", 62000), rep("No", 38000))
)
```

Visualization of distribution of responses: 

```{r}
ggplot(us_adults, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "", y ="",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip()
```

Obtaining summary statistics: 

```{r}
us_adults %>% count(climate_change_affects) %>% mutate (p = n/sum(n))
```

Starting with sample size of 60

```{r}
n <- 60
samp <- us_adults %>%
  sample_n(size = n)
```

#### 1. What percent of the adults in your sample think climate change affects their local community?

```{r}
samp %>% count(climate_change_affects) %>% mutate(p = n/sum(n))
```

About 55% of adults in this sample think climate change affects their local community.

#### 2. Would you expect another student’s sample proportion to be identical to yours? Would you expect it to be similar? Why or why not?

I would expect it to be similar, but not identical. We are pulling data from the same source but through the repeated sampling process, our samples should be around the population proportion of 62%. 

## Confidence Intervals

Finding the 95% confidence interval for proportion of US adults who think climate change affects their local community

```{r}
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>% 
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)
```

### Confidence levels

#### 1. In the interpretation above, we used the phrase “95% confident”. What does “95% confidence” mean?

We are 95% confident that the true proportion of U.S. adults that think that climate change is affecting their local community is between 0.417 and 0.667. 95% confidence means that 95% of the times we gather samples and plot their means, it will fall between these parameters.   

#### 2. Does your confidence interval capture the true population proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

Yes, it will capture the true population proportion of US adults who think climate change affects their local community 95% of the time. If we are in a classroom, we can expect our neighbor's interval to capture this value. 

#### 3. Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

I would expect 95% of these intervals to capture the true population proportion since that is what we were aiming for from the beginning and was integrated in our calculations. 

#### Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed (the default values for the above app), what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why. Make sure to include your plot in your answer.

```{r, echo = FALSE}
knitr::include_graphics("/Users/ibrahimpinzon/Desktop/CI_Math217.png")
```

In this simulation, 48 out of 50 of the confidence intervals constructed included the true population proportion. This is consistent with our 95% confidence level. While not exactly the same value, this is expected due to the nature of this process. 

## More Practice

#### 1. If we chose a confidence level lower than 95%, it would narrow the interval. This is because there is more uncertainty as we go down in confidence, and the true proportion will be captured at a lesser rate than 95%. 

#### 2. Using code from the infer package and data from the one sample you have (samp), find a confidence interval for the proportion of US Adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

```{r}
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>% 
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)
```

We are 90% confident that the true proportion of US adults that think that climate change is affecting their local communities is between 0.433 and 0.650

#### 3. Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

I utilized a 90% confidence level. The proportion of intervals that that included the true proportion was 44/50 which 0.88, or 88%. This is very close to our confidence level. 

#### 4. Lastly, try one more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated. Then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that are capture the true population proportion.

```{r}
samp %>%
  specify(response = climate_change_affects, success = "Yes") %>% 
  generate(reps = 1000, type = "bootstrap") %>% 
  calculate(stat = "prop") %>%
  get_ci(level = 0.80)
```

Our app is consistent with our bootstrapping simulation, in which 41 out of 50 confidence intervals captured the true population proportion. 

#### 5. Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

We notice that as we increase the sample size, the width of the intervals decreases. The contrary occurs when we decrease sample size. 

#### 6. Finally, given a sample size (say, 60), how does the width of the interval change as you increase the number of bootstrap samples.

Widths do not seem to change as adjust the number of bootstrap samples. 