library(haven)
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v stringr 1.4.0
## v tidyr 1.1.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(ipumsr)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
##1
wad <- read_dta("C:/Users/chris/Downloads/PA_Mortality.dta")
View(wad)
boxplot(wad$povrate, main="Poverty Rate at the county level")
summary(wad$povrate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04873 0.09667 0.12455 0.12110 0.14199 0.24159
mean(wad$povrate, na.rm = T)
## [1] 0.1210957
median(wad$povrate, na.rm = T)
## [1] 0.1245455
hist(wad$povrate)
## A distribution is normally distributed when the mean and median are equal or should be similar. The histogram and the numbers: mean = 0.01210957, median = 0.1245455 show a nearly normal distribution. As a result, the distribution is approximately normal, though the boxplot is not exactly in the middle of the box and the whiskers are not the same on both sides of the box. This conclusion is based on the shape of the histogram and the mean and median values which are apprroximately similar.
chrs <- subset(wad, select = c("avemort","gini"))
chrs$avemort = ifelse(wad$avemort <= 8, "Low Mortality", "High Mortality")
chrs$gini = ifelse(wad$gini <= 0.4, "Equal", "Unequal")
tokpa <- wad %>%
filter(chrs$avemort=="High Mortality")
nrow(tokpa)
## [1] 52
tokpa <- wad %>%
filter(chrs$gini=="Unequal")
nrow(tokpa)
## [1] 56
wad$avemort <- chrs$avemort
high = subset(wad, avemort == "High Mortality")
low = subset(wad, avemort == "Low Mortality")
length(high$gini)
## [1] 52
mean(high$gini)
## [1] 0.4200577
sd(high$gini)
## [1] 0.02342817
a <- 0.4200577
s <- 0.02342817
n <- 52
error <- qnorm(0.975)*s/sqrt(n)
leftn <- a-error
rightn <- a+error
print(c(leftn,rightn))
## [1] 0.4136900 0.4264254
length(low$gini)
## [1] 15
mean(low$gini)
## [1] 0.4218
sd(low$gini)
## [1] 0.02341612
a <- 0.4218
s <- 0.02341612
n <- 15
error <- qnorm(0.975)*s/sqrt(n)
leftn <- a-error
rightn <- a+error
print(c(leftn,rightn))
## [1] 0.40995 0.43365
Yes the CI intervals overlaps based on the results obtained.
The CI intervals for the gini coefficient for the counties with high mortality and counties with low mortality mean that we are 95% confident that the true mean lie between 0.40995 and 0.43365, and 0.41369 and 0.4264254 respectively.
Based on the mortality levels and the gini coefficient i can conclude that there is a significant or adequate relationship between gini coefficient which measures income inequality and mortality.