6a.Generate a boxplot of poverty rate at the county level (2 points). Based on the boxplot, what is the median poverty rate and the interquartile range (IQR) of the poverty rate? (2 points) What’s the minimum and maximum values for the poverty rate? (4 points) Note: the function to generate boxplot in R is boxplot(data$var, main=”title of boxplot”)

boxplot(pamort$avemort, main="Poverty Rate at County Level")

summary(pamort$avemort)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.301   8.175   8.563   8.573   8.858  11.113
#The median poverty rate at the county level is 8.56 and the IQR is 0.683 (Q3-Q1). The minimum and maximum values for the poverty rate are 7.301 and 11.113 respectively. 

6b) Is the distribution of poverty rate normally distributed? Why or why not? Describe how you reach to your conclusion. (4 points)

hist(pamort$avemort)

mean(pamort$avemort)
## [1] 8.573133
median(pamort$avemort)
## [1] 8.5626
#A characteristic of a normal distribution is that the data is symmetrical on both sides of the mean. In this example, the distribution is positively skewed because the median is less than the mean. 

6c) Please create two binary variables based on avemort and gini. For the former, please recode those less than or equal to 8 as “Low Mortality”, otherwise “High Mortality.” For the latter, those less than or equal to 0.4 should be coded as “Equal”, otherwise, “Unequal.” (8 points)

pam2 <-  pamort %>%
  transmute(
      avemort = ifelse(avemort <= 8, "Low Mortality", "High Mortality"),
      gini = ifelse(gini <= 0.4, "Equal", "Unequal"))

pam2 <- na.omit(pam2)
head(pam2)
## # A tibble: 6 x 2
##   avemort        gini   
##   <chr>          <chr>  
## 1 High Mortality Equal  
## 2 High Mortality Unequal
## 3 High Mortality Unequal
## 4 High Mortality Unequal
## 5 Low Mortality  Unequal
## 6 High Mortality Unequal
View(pam2)
#Or can also use this code if we wanted to add a two columns 

pam45 <-  pamort %>%
  mutate(
      avemort2 = ifelse(avemort <= 8, "Low Mortality", "High Mortality"),
      gini2 = ifelse(gini <= 0.4, "Equal", "Unequal"))

6d) How many counties have high mortality? And how many counties have “unequal” gini coefficient? (8 points)

pam2 %>%
  count(avemort)
## # A tibble: 2 x 2
##   avemort            n
##   <chr>          <int>
## 1 High Mortality    52
## 2 Low Mortality     15
pam2 %>%
  count(gini)
## # A tibble: 2 x 2
##   gini        n
##   <chr>   <int>
## 1 Equal      11
## 2 Unequal    56
#52 Counties have high mortality and 56 counties have "uneqaul" gini coefficient. 

6e) Show the confidence intervals for gini coefficients when county mortality level is low and high, respectively. i) Do these confidence intervals overlap? (4 points) ii) Interpret the confidence intervals from e). (8 points) iii) What conclusion(s) can you draw with regard to the county’s mortality levels and gini coefficients? (4 points)

#First I created a subset 
ginmo<- subset(x=pamort, select=c('avemort', 'gini')) 
View(ginmo)

#Then, I created a data frame
gindf <- data.frame(ginmo)
View(gindf)

#Then, I used the data frame to get the mean and sd
gindf %>%
  group_by(avemort <=8) %>%
   summarise(mean(gini), sd(gini))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
##   `avemort <= 8` `mean(gini)` `sd(gini)`
##   <lgl>                 <dbl>      <dbl>
## 1 FALSE                 0.420     0.0234
## 2 TRUE                  0.422     0.0234
#Finally I got the CI for Gini when Low Mort
a <- 0.4218000
s <- 0.02341612
n <- 15
error <- qnorm(0.975)*s/sqrt(n)
left95 <- a-error
right95 <- a+error
left95
## [1] 0.40995
right95
## [1] 0.43365
print (c(left95, right95))
## [1] 0.40995 0.43365
#CI are 0.40995 to 0.43365 for low mort

#And the CI for Gini when High Mort
a <- 0.4200577
s <- 0.02342817
n <- 52
error <- qnorm(0.975)*s/sqrt(n)
left95 <- a-error
right95 <- a+error
left95
## [1] 0.41369
right95
## [1] 0.4264254
print (c(left95, right95))
## [1] 0.4136900 0.4264254
#CI are 0.4136900 to 0.4264254 for high mort
#i) Yes, they overlap
#ii) We are 95% confident that the true population mean for the gini index when counties have low mortality is between 0.40995 to 0.43365. We are 95% confident that the true population mean for the gini index when counties have high mortality is between 0.4136900 to 0.4264254.
#iii) The confidence intervals for the gini index for counties with low mortality rates is slighlt wider than the interval band for the gini index for counties with high mortality. Both share very similar means (.42 and .421). Furthermore, since the confidence intervals overlap, there is not sufficient evidence to conclude that there is a difference between the two groups (gini index for counties with low mortality versus gini index for counties with high mortality)