Load Libraries

library(haven)
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.1     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(ipumsr)
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

Load Data

##1

wad <- read_dta("C:/Users/chris/Downloads/PA_Mortality.dta")
View(wad)

Question 6. a) Generate a boxplot of poverty rate at the county level (2 points). Based on the boxplot, what is the median poverty rate and the interquartile range (IQR) of the poverty rate? (2 points) What’s the minimum and maximum values for the poverty rate?

boxplot(wad$povrate, main="Poverty Rate at the county level")

summary(wad$povrate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.04873 0.09667 0.12455 0.12110 0.14199 0.24159

Median poverty rate = 0.12455

Interquarter range (IQR) = 00.14199

Minimum value = 0.04873

Maximum value = 0.24159

Question 6. b) Is the distribution of poverty rate normally distributed? Why or why not? Describe how you reach to your conclusion.

mean(wad$povrate, na.rm = T)
## [1] 0.1210957
median(wad$povrate, na.rm = T)
## [1] 0.1245455
hist(wad$povrate)

## A distribution is normally distributed when the mean and median are equal or should be similar. The histogram and the numbers: mean = 0.01210957, median = 0.1245455 show a nearly normal distribution. As a result, the distribution is approximately normal, though the boxplot is not exactly in the middle of the box and the whiskers are not the same on both sides of the box. This conclusion is based on the shape of the histogram and the mean and median values which are apprroximately similar.

Question 6. c) Please create two binary variables based on avemort and gini. For the former, please recode those less than or equal to 8 as “Low Mortality”, otherwise “High Mortality.” For the latter, those less than or equal to 0.4 should be coded as “Equal”, otherwise, “Unequal.”

chrs <- subset(wad, select = c("avemort","gini"))
chrs$avemort = ifelse(wad$avemort <= 8, "Low Mortality", "High Mortality")
chrs$gini = ifelse(wad$gini <= 0.4, "Equal", "Unequal")

Question 6. d) How many counties have high mortality? And how many counties have “unequal” gini coefficient?

tokpa <- wad %>% 
filter(chrs$avemort=="High Mortality")
nrow(tokpa)
## [1] 52
tokpa <- wad %>% 
filter(chrs$gini=="Unequal")
nrow(tokpa)
## [1] 56

Question 6. e) Show the confidence intervals for gini coefficients when county mortality level is low and high, respectively.

wad$avemort <- chrs$avemort
high = subset(wad, avemort == "High Mortality")
low = subset(wad, avemort == "Low Mortality")
length(high$gini)
## [1] 52
mean(high$gini)
## [1] 0.4200577
sd(high$gini)
## [1] 0.02342817
a <- 0.4200577
s <- 0.02342817
n <- 52

error <- qnorm(0.975)*s/sqrt(n)
leftn <- a-error
rightn <- a+error
print(c(leftn,rightn))
## [1] 0.4136900 0.4264254
length(low$gini)
## [1] 15
mean(low$gini)
## [1] 0.4218
sd(low$gini)
## [1] 0.02341612
a <- 0.4218
s <- 0.02341612
n <- 15

error <- qnorm(0.975)*s/sqrt(n)
leftn <- a-error
rightn <- a+error
print(c(leftn,rightn))
## [1] 0.40995 0.43365

Question 6, e) (i) Do these confidence intervals overlap? (4 points)

Yes the CI intervals overlaps based on the results obtained.

Question 6. e) (ii) Interpret the confidence intervals from e).

The CI intervals for the gini coefficient for the counties with high mortality and counties with low mortality mean that we are 95% confident that the true mean lie between 0.40995 and 0.43365, and 0.41369 and 0.4264254 respectively.

Question 6. e) iii) What conclusion(s) can you draw with regard to the county’s mortality levels and gini coefficients?

Based on the mortality levels and the gini coefficient i can conclude that there is a significant or adequate relationship between gini coefficient which measures income inequality and mortality.