Data Overview

The data is pulled from Heuer in 1979 in West Germany where it is classified by age,sex and method X is the subject Freq is the frequency of suicides Sex is the factor indicating sex male or female method is the method of suicide age is their age Age group is age classified into 5 groups

Question:

My question was to see observe the various relationships between the ages of people and the method of suicide they underwent, what the popular form of suicide was between genders and which age group commit ed the most suicides

Looking at the data I can see that the max frequency of suicide was around 1381 deaths with the median death is 59 The youngest person to ever commit suicide was 10 years while the oldest to commit suicide was 90 years old which is really tragic.. considering that the US,UK and France occupied Germany at that time frame.. which lead most people to believe that people in West Germany would be considerably happier than those in the East/

## Get an overview of the data set 
head(sucide)
##   X Freq  sex   method age age.group method2
## 1 1    4 male   poison  10     10-20  poison
## 2 2    0 male  cookgas  10     10-20     gas
## 3 3    0 male toxicgas  10     10-20     gas
## 4 4  247 male     hang  10     10-20    hang
## 5 5    1 male    drown  10     10-20   drown
## 6 6   17 male      gun  10     10-20     gun

Looking at the data it seems that for X is 1 there were 4 males that were 10 years old that killed themselves with poison gas and looking down when x is 4 there were 247 males that were 10 years that commited suicide by hanging themselves

Line Graph of the Subjects and the Frequency

## Created a histogram on The subject and the frequency
library(ggplot2)
data <- data.frame(sucide$X,sucide$age)
ggplot(data,aes(x = sucide$X,y=sucide$Freq)) + geom_line() + ggtitle("Subjects and the frequencies of suicide in that category") +labs(x="Subject", y= "Frequency")

Here I displayed an overview of the data where it seems that there were a lot of high frequencies of suicides between 0 to 100. >

Bar Graph of Suicide Method and Frequency By Gender

library(ggplot2)
ggplot(sucide,aes(x=method,y=Freq,fill=sex)) + geom_bar(stat="identity",position=position_dodge(0.9))+labs(y="Frequent of Suicide",x="Method of Suicide")

Here from this bar graph we can see that the most frequent method of suicide is hanging oneself. According to the data we can see an overwhelming amount of males killing themselves vs females killing themselves. However it seems for females their popular form of suicide is poisoning themselves.

Scatterplot of Age Group and Frequency

## Create a new data frame that subsets the data as some of the columns in the data frame seems meaningless to me.. changed the values of the cell from cooking gas to cook and toxic gas to toxin to make the graph readable ## 

suicide_sub <- subset(sucide, select=c(Freq,sex,method,age,age.group))
suicide_sub[suicide_sub =="cookgas"] <- "cook"
suicide_sub[suicide_sub == "toxicgas"] <- "toxin"

summary(suicide_sub)
##       Freq             sex               method               age    
##  Min.   :   0.00   Length:306         Length:306         Min.   :10  
##  1st Qu.:  10.25   Class :character   Class :character   1st Qu.:30  
##  Median :  59.00   Mode  :character   Mode  :character   Median :50  
##  Mean   : 173.80                                         Mean   :50  
##  3rd Qu.: 178.75                                         3rd Qu.:70  
##  Max.   :1381.00                                         Max.   :90  
##   age.group        
##  Length:306        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
ggplot(suicide_sub,aes(x=age.group,y=Freq)) + geom_point(aes
  (colour=age),size=2) + ggtitle("Age group Analysis with the Frequency of Suicide") + 
labs(x="Age-Group", y = "Frequency of Suicide")

Here I plotted the Age-Group with the Suicide Frequencies,According to the data it seems that people ages 40 through 50 are the group that are committing the most suicides in West Germany with figures ranging over 1000, which is pretty shocking. It seems the age-group that are committing less suicides are the young folks in the 10 to 20 group which would make sense given that young people still have most of their life ahead of them compared to the older folks

Boxplot of All Records

ggplot(suicide_sub,aes(age,method)) + geom_boxplot(aes(colour=age)) + ggtitle("Boxplot of the age and the method of suicide")+ labs(x="Age of people", y = "Method of suicide")

I noticed that when I created the box plot calculating the data the average,median and mode with the age and method of suicide were the same all around like the median age for all methods was 50 and the lower and upper quartiles were the same as well this lead me to ponder if I should make a another subset of the data and then graph its box plot to see what happened. This also lead me to believe that the data set must be false or manipulated cause the values are all the same numbers.

## I created another data subset and manipulated the rows of the data set since the data of the original boxplot felt off to me 

## A subset between rows 77 to 185 
Sucide_subset2 <- suicide_sub[c(77:185),]
head(Sucide_subset2)
##    Freq  sex method age age.group
## 77   87 male  drown  50     40-50
## 78  229 male    gun  50     40-50
## 79   62 male  knife  50     40-50
## 80   63 male   jump  50     40-50
## 81  146 male  other  50     40-50
## 82  502 male poison  55     55-65

I created a smaller abridged data frame so that I can graph a boxplot with more variability in the data set

Boxplot of a subset of the People

ggplot(Sucide_subset2,aes(age,method)) + geom_boxplot(aes(color=age)) + ggtitle("Boxplot for a subset of the people")+ labs(x="Age", y= "Method")

Here the data seems a bit more varied from here I made a subset of a handful of people so the data may be skewed towards more female then males in this data frame. From here it seems the median is around 70 and it seems like the older the folks the are, they tend to resort to using a form of gas like cooking gas,toxin gas or even poison gas.Though this may be inaccurate since I had only taken a subset of the rows.

Conclusion:

The data is really interesting in regards to suicide some Key observations I have noticed was the most frequent use of males hanging themselves compared to women.The most popular method for women were to ingest poison. Whereas the least popular method of suicide was jumping from high heights. The scatter plot indicated that the popular age group of people who commit ed suicide in West Germany were between the ages of 40 t0 50 compared to the younger age group who had less suicides. I noticed that when creating the box plot the data seemed to have the same statistical values when comparing the Age to the method of suicide.Taking a subset of the people in the data. It appears that the data was more distributed,from that set it appears that the top 3 methods of suicide were cooking,toxin and poison gas.These were popular among the older folks since the median seemed to be around 70.

Final Thoughts:

I felt like this data is not legit, most of the time when I was computing the data I seemed to get the same numbers, for instance I got the same number of suicide by methods or looking at the box plot above for age and methods the median,quartiles, low and high were all the same values which doesn’t seem right for a record of three hundred and six suicides .Even when graphing the histogram the data seemed to all appear the same no matter how many times I try to change it around.Regardless it was definitely interesting looking at this data on R.