Guns death research

Who is more likely to dead under a gun

Qiushun Liang s3584868

Last updated: 22 October, 2017

Introduction

Problem Statement

Data

Data Cont.

Descriptive Statistics and Visualisation

sex_age<-guns%>%dplyr::select(sex, age)
f_age <- sex_age %>% filter(sex=="F")
m_age <- sex_age %>% filter(sex=="M")
summary(f_age,na.rm=T);sd(f_age$age,na.rm=T)
##      sex                 age       
##  Length:14449       Min.   :  0.0  
##  Class :character   1st Qu.: 29.0  
##  Mode  :character   Median : 44.0  
##                     Mean   : 43.7  
##                     3rd Qu.: 56.0  
##                     Max.   :101.0  
##                     NA's   :3
## [1] 17.78038
summary(m_age,na.rm=T);sd(m_age$age,na.rm=T)
##      sex                 age        
##  Length:86349       Min.   :  0.00  
##  Class :character   1st Qu.: 26.00  
##  Mode  :character   Median : 41.00  
##                     Mean   : 43.88  
##                     3rd Qu.: 58.00  
##                     Max.   :107.00  
##                     NA's   :15
## [1] 19.76871

Descriptive Statistics Cont.

hist(f_age$age, main="distribution of female gun death", xlab="years old",freq=FALSE)
x=seq(0,101,by=0.01)
y=dnorm(x,43.7,17.78)
lines(x,y,col="blue",lwd=2)

# Descriptive Statistics Cont1.

hist(m_age$age, main="distribution of male gun death", xlab="years old",freq=FALSE)
x=seq(0,107,by=0.01)
y=dnorm(x,43.8,19.77)
lines(x,y,col="blue",lwd=2)

- the distribution is :Bimodal normal distribution # Decsriptive Statistics Cont2.

month<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
race<-c("White","Asian/Pacific Islander","Native American/Native Alaskan","Black","Hispanic")
color<-c("blue","pink","red","orange","purple")
perc<-table(guns$month,guns$race)%>%prop.table(margin=2)*100
value<- matrix(perc, nrow=5,ncol=12, byrow=TRUE)
barplot(value,main = "monthly gun death by race",names.arg = month,xlab = "month",ylab = "race",col = color)
legend("bottom", race, cex = 0.3, fill = color,horiz=TRUE)

Hypothesis Testing: one-sample t-test

age<- guns$age
age%>% summary(na.rm=TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   27.00   42.00   43.86   58.00  107.00      18
sd(age, na.rm=TRUE)
## [1] 19.49618
t.test(age,mu=40,conf.level = .95,alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  age
## t = 62.814, df = 100780, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 40
## 95 percent confidence interval:
##  43.73723 43.97797
## sample estimates:
## mean of x 
##   43.8576

Hypthesis Testing Cont. one-sample t-test

-1. The 95% CI [43.73723 43.97797] of mean age does not catch the expect mean. -2. p < 2.2e-16 ,p-value < a, - -Thus the one-sample t-test result reject the null hypothesis. -Conclusion: -It means the mean gun death age is significantly different to the previously assumed mean death age of 40.

\[H_0 = 40\]

Hypothesis Testing: Chi-square Goodness of Fit Test

-Hypothesis: -Null hypothesis: the gun death have no relation with the education level. -Alternate hypothesis: the gun death is prefer to one or several education level.

pr<-c(0.2,0.2,0.2,0.2,0.2)
chi<-chisq.test(table(guns$education),p=pr)
chi$expected
##     1     2     3     4     5 
## 20149 20149 20149 20149 20149
chi$observed
## 
##     1     2     3     4     5 
## 21823 42927 21680 12946  1369
chi
## 
##  Chi-squared test for given probabilities
## 
## data:  table(guns$education)
## X-squared = 46084, df = 4, p-value < 2.2e-16

Hypothesis Testing cont: Chi-square Goodness of Fit Test