The data is pulled from Heuer in 1979 in West Germany where it is classifed by age,sex and method X is the subject Freq is the frequency of suicides Sex is the factor indicating sex male or female method is the method of suicide age is their age Age group is age classified into 5 groups
Question: My question was to see observe the various correlations there were between the ages of people and the method of suicide they underwent, what the popular form of suicide was between genders and which age group commited the most suicides
Looking at the data I can see that the max frequency of suicide was around 1381 deaths with the median death is 59 The youngest person to ever commit suicide was 10 years while the oldest to commit suicide was 90 years old which is really tragic.. considering that the US,UK and France occupied Germany at that time frame.. which lead most people to believe that people in West Germany would be considerably happier than those in the East/
## Get an overview of the data set
head(sucide)
## X Freq sex method age age.group method2
## 1 1 4 male poison 10 10-20 poison
## 2 2 0 male cookgas 10 10-20 gas
## 3 3 0 male toxicgas 10 10-20 gas
## 4 4 247 male hang 10 10-20 hang
## 5 5 1 male drown 10 10-20 drown
## 6 6 17 male gun 10 10-20 gun
Looking at the data it seems that for X is 1 there were 4 males that were 10 years old that killed themselves with poison gas and looking down when x is 4 there were 247 males that were 10 years that commited suicide by hanging themselves
## Created a histogram on The subject and the frequency
library(ggplot2)
data <- data.frame(sucide$X,sucide$age)
ggplot(data,aes(x = sucide$X,y=sucide$Freq)) + geom_line() + ggtitle("Subjects and the frequencies of suicide in that category") +labs(x="Subject", y= "Frequency")
Here I displayed an overview of the data where it seems that there were a lot of high frequencies of suicides between 0 to 100 .
library(ggplot2)
ggplot(sucide,aes(x=method,y=Freq,fill=sex)) + geom_bar(stat="identity",position=position_dodge(0.9))+labs(y="Frequent of Suicide",x="Method of Suicide")
Here from this bar graph we can see that the most frequent method of suicide is hanging oneself. According to the data we can see an overwhelming amount of males killing themselves vs females killing themselves. However it seems for females their popular form of suicide is poisoning themselves.
## Create a new data frame that subsets the data as some of the columns in the data frame seems meaningless to me.. changed the values of the cell from cooking gas to cook and toxic gas to toxin to make the graph readable ##
suicide_sub <- subset(sucide, select=c(Freq,sex,method,age,age.group))
suicide_sub[suicide_sub =="cookgas"] <- "cook"
suicide_sub[suicide_sub == "toxicgas"] <- "toxin"
summary(suicide_sub)
## Freq sex method age
## Min. : 0.00 Length:306 Length:306 Min. :10
## 1st Qu.: 10.25 Class :character Class :character 1st Qu.:30
## Median : 59.00 Mode :character Mode :character Median :50
## Mean : 173.80 Mean :50
## 3rd Qu.: 178.75 3rd Qu.:70
## Max. :1381.00 Max. :90
## age.group
## Length:306
## Class :character
## Mode :character
##
##
##
ggplot(suicide_sub,aes(x=age.group,y=Freq)) + geom_point(aes
(colour=age),size=2) + ggtitle("Age group Analysis with the Frequency of Suicide") +
labs(x="Age-Group", y = "Frequency of Suicide")
Here I plotted the Age-Group with the Suicide Frequencies,Acoording to the data it seems that people ages 40 through 50 are the groups that are committing the most suicides in West Germany with figures ranging over a 1000! which is pretty shocking. It seems the age-group that are committing less suicides are the young folks in the 10 to 20 group which would make sense given that young people still have most of their life ahead of them compared to the older folks
ggplot(suicide_sub,aes(age,method)) + geom_boxplot(aes(colour=age)) + ggtitle("Boxplot of the age and the method of suicide")+ labs(x="Age of people", y = "Method of suicide")
I noticed that when I created the box plot calculating the data the average,median and mode with the age and method of suicide was the same all around like the median age for all methods was 50 and the lower and upper quarterly were the same as well this lead me to ponder if I should make a another subset of the data and then graph its boxplot to see what happened. This also lead me to believe that the data set must be false or manipulated cause the values are all the same numbers.
## I created another data subset and manipulated the rows of the data set since the data of the original boxplot felt off to me
## A subset between rows 77 to 185
Sucide_subset2 <- suicide_sub[c(77:185),]
head(Sucide_subset2)
## Freq sex method age age.group
## 77 87 male drown 50 40-50
## 78 229 male gun 50 40-50
## 79 62 male knife 50 40-50
## 80 63 male jump 50 40-50
## 81 146 male other 50 40-50
## 82 502 male poison 55 55-65
I created a smaller abridged data frame so that I can graph a boxplot with more variability in the data set
ggplot(Sucide_subset2,aes(age,method)) + geom_boxplot(aes(color=age)) + ggtitle("Boxplot for a subset of the people")+ labs(x="Age", y= "Method")
Here the data seems a bit more varied from here I made a subset of a handful of people so the data may be skewed towards more female then males in this data frame. From here it seems the median is around 70 and it seems like the older the folks the are, they tend to resort to using a form of gas like cooking gas,toxin gas or even poison gas.Though this may be inaccurate since I had only taken a subset of the rows.
Conclusion:
The data is really interesting in regards to suicide some Key observations I have noticed was the most frequent use of males hanging themselves compared to women.The most popular method for women were to ingest poison. Whereas the least popular method of suicide was jumping from high heights. The scatter plot indicated that the popular age group of people who commit ed suicide in West Germany were between the ages of 40 t0 50 compared to the younger group they had less suicides. I noticed that when creating the box plot the data seemed to have the same statistical values when comparing the Age to the method of suicide.Taking a subset of the people in the data. It appears that the data was more distributed,from that set it appears that the top 3 methods between people in that set were cooking,toxin and poison gas were popular among the older folks since the median seemed to be around 70.
Final Thoughts: I felt like this data is not legit most of the time when I was computing the data I seemed to get the same numbers, for instance I got the same number of suicide by methods or looking at the box plot above for age and methods the median,quartiles and the low and high were all the same values which doesn’t really seem right for records of three hundred and six suicides .Even when graphing the histogram the data seemed to all appear the same no matter how many times I try to change it around.Regardless it was definitely interesting looking at this data on R.