# load data
library(dplyr)
#The cases are limited to 2010 statistics for males only
# GoitHub repo: https://github.com/robertwelk/DATA606.git
dat <-read.csv('DATA606_project.csv') %>% as_tibble() %>%
filter(year==2010, sex=='male') %>%
select(country=ï..country, year, age,
suicide.rate=suicides.100k.pop, gdp=gdp_per_capita....)
head(dat)
Was suicide rate in the year 2010 for males dependent on age group and per capita income?
The cases are a age groups of males for each country in the study. There are 528
This dataset was compliled by Kaggle from other sources including the UN Developmental Program, World Bank, and World Health OrganiZation
This is an observational study
The response variable is suicide rate/100,000 people and is quantatative
quantatative independent variable: gdp per capita qualatative independent variable: age
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
# summary stats for each variable
library(ggplot2)
summary(dat$age)
## 15-24 years 25-34 years 35-54 years 5-14 years 55-74 years 75+ years
## 88 88 88 88 88 88
summary(dat$gdp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 991 7008 13818 23857 36326 111328
#variation in gdp by county
dat %>% group_by(country) %>% ggplot(aes(x=country,y=gdp)) + geom_bar(stat='identity') +
coord_flip()
#distribution of suicide rate by age
dat %>% ggplot(aes(x=age,y=suicide.rate)) + geom_boxplot()
# suicide rate as a function of gdp - not much of a relationship
dat %>% ggplot(aes(x=gdp,y=suicide.rate)) + geom_point()