Cinita Mary Varghese (s3797635)
Last updated: 27 October, 2019
The dataset used for this assignment is collected from Kaggle.com and it contains information about gun death in the US in the years between 2012-2014.
The dataset contains data regarding the victim’s age, sex, race, education, intent, time (month and year) and place of death, and weather or not police was at the place of death.
The dataset contain 100798 observations and 11 variables.
The main variables used in this assignment are:
-Sex: It is the victims gender, and Male and Female is represented by M and F respectively.
-Intent: It shows method of death by gun,includes ‘Suicide’, ‘Accidental’, ‘NA’, ‘Homicide’, or
‘Undetermined’.
-Place: It shows the place that deaths occured like ‘Farm’, ‘Home’, ‘Industrial/construction’, ‘Other specified’, ‘Other unspecified’, ‘Residential institution’, ‘School/institution’, ‘Sports’, ‘Street Trade/service area’.
guns <- read_csv("guns.csv")
View(guns)
guns$sex <- as.factor(guns$sex)
guns$place <-as.factor(guns$place)gun1 <- guns %>% select(sex,place,intent)
gun2 <- gun1 %>% filter(intent=="Suicide") %>% select(sex,place)
str(gun2)## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 63175 obs. of 2 variables:
## $ sex : Factor w/ 2 levels "F","M": 2 1 2 2 2 2 2 2 2 2 ...
## $ place: Factor w/ 10 levels "Farm","Home",..: 2 9 4 2 4 2 2 2 2 2 ...
## - attr(*, "spec")=
## .. cols(
## .. X1 = col_double(),
## .. year = col_double(),
## .. month = col_character(),
## .. intent = col_character(),
## .. police = col_double(),
## .. sex = col_character(),
## .. age = col_double(),
## .. race = col_character(),
## .. hispanic = col_double(),
## .. place = col_character(),
## .. education = col_double()
## .. )
## [1] 0
## sex place
## F: 0 Home :38691
## M:54486 Other specified : 7182
## Other unspecified : 4174
## Street : 1972
## Trade/service area: 1518
## Farm : 345
## (Other) : 604
## sex place
## F:8689 Home :6724
## M: 0 Other specified : 849
## Other unspecified : 600
## Street : 209
## Trade/service area: 205
## Farm : 33
## (Other) : 69
## place
## sex Farm Home Industrial/construction Other specified
## F 0.0038 0.7739 0.0014 0.0977
## M 0.0063 0.7101 0.0026 0.1318
## place
## sex Other unspecified Residential institution School/instiution Sports
## F 0.0691 0.0018 0.0035 0.0013
## M 0.0766 0.0020 0.0049 0.0016
## place
## sex Street Trade/service area
## F 0.0241 0.0236
## M 0.0362 0.0279
gun4<- as.data.frame(gun3)
ggplot(gun4,aes(y=Freq,x=place,fill=place))+geom_bar(stat="identity")+
facet_grid(~sex)+ylab("proportion")+theme(axis.text.x = element_blank())+
labs(title = "Distribution of Places for Each Gender")knitr:kable function to print nice HTML tables. Here is an example R code:gun4 %>% group_by(sex) %>% summarise(Min = min(Freq,na.rm = TRUE),
Q1 = quantile(Freq,probs = .25,na.rm = TRUE),
Median = median(Freq, na.rm = TRUE),
Q3 = quantile(Freq,probs = .75,na.rm = TRUE),
Max = max(Freq,na.rm = TRUE),
Mean = mean(Freq, na.rm = TRUE),
SD = sd(Freq, na.rm = TRUE),
n = n(),
Missing = sum(is.na(Freq))) -> table1
knitr::kable(table1)| sex | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| F | 0.0013 | 0.002225 | 0.0137 | 0.05785 | 0.7739 | 0.10002 | 0.2390659 | 10 | 0 |
| M | 0.0016 | 0.003175 | 0.0171 | 0.06650 | 0.7101 | 0.10000 | 0.2184915 | 10 | 0 |
\[H_0: \:shows \ no \ relation \ between \ sex \ and \ place \ opted \ to \ suicide. \] \[H_A: \:shows \ relation \ between \ sex \ and \ place \ opted \ to \ suicide. \]
##
## Farm Home Industrial/construction Other specified
## F 51.98958 6246.315 21.31848 1104.572
## M 326.01042 39168.685 133.68152 6926.428
##
## Other unspecified Residential institution School/instiution Sports
## F 656.6092 17.19232 40.57388 13.47878
## M 4117.3908 107.80768 254.42612 84.52122
##
## Street Trade/service area
## F 299.9717 236.979
## M 1881.0283 1486.021
##
## Pearson's Chi-squared test
##
## data: table(gun2$sex, gun2$place)
## X-squared = 170.16, df = 9, p-value < 2.2e-16
## [1] 16.91898
## [1] 5.714766e-32
The hypothesis test shows that χ2 value is 170.16 and p-value is less than 0.001. This shows that there is a statistically significant association between sex and place. The output suggest that female is more likely to commit suicide at home.
Limitation: The ‘Other specified’ and ‘Other unspecified’ is included in place category . This has to be more detailed to enhance the accuracy of the result.
Strength: The sample size is large and the data is collected from different sources. This increases the objectivity of the data.
Future direction: The results is not enough to determine why female is more likely to attempt suicide at home. This might be the biology feature, but evidence is not clear. There should be more factors need to be involved, such as family status, if they have kids or family violence, etc.