I have chosen the csv file “suicide” to work through the problems below. This file includes data from suicide rates in West Germany, with factors such as age, age group, sex, method of suicide, a second method of suicide and frequency of suicides.
suicide <- read.csv(file="Suicide.csv",head=TRUE,sep=",",stringsAsFactors=FALSE)
suicide <- suicide[-c(1)]
head(suicide)
## Freq sex method age age.group method2
## 1 4 male poison 10 20-Oct poison
## 2 0 male cookgas 10 20-Oct gas
## 3 0 male toxicgas 10 20-Oct gas
## 4 247 male hang 10 20-Oct hang
## 5 1 male drown 10 20-Oct drown
## 6 17 male gun 10 20-Oct gun
Problem 1
Summary of data. Display mean and median for at least 2 factors.
summary(suicide)
## Freq sex method age
## Min. : 0.00 Length:306 Length:306 Min. :10
## 1st Qu.: 10.25 Class :character Class :character 1st Qu.:30
## Median : 59.00 Mode :character Mode :character Median :50
## Mean : 173.80 Mean :50
## 3rd Qu.: 178.75 3rd Qu.:70
## Max. :1381.00 Max. :90
## age.group method2
## Length:306 Length:306
## Class :character Class :character
## Mode :character Mode :character
##
##
##
mm = matrix(c(mean(suicide$Freq),
mean(suicide$age),
median(suicide$Freq),
median(suicide$age)),ncol=2)
colnames(mm)= c("mean", "median")
rownames(mm)= c("Frequency", "Age")
mm = as.table(mm)
mm
## mean median
## Frequency 173.7974 59.0000
## Age 50.0000 50.0000
Problem 2
Create new data frame (subset of columns and rows). Rename it.
minisuic = subset(suicide, Freq>10 & sex=="female" & method2=="gas" & age<90, select = c(Freq, sex, method2, age))
data.frame(minisuic)
## Freq sex method2 age
## 165 11 female gas 15
## 174 20 female gas 20
## 183 27 female gas 25
## 192 29 female gas 30
## 201 44 female gas 35
## 210 24 female gas 40
## 219 24 female gas 45
## 228 26 female gas 50
## 237 14 female gas 55
Problem 3
Create new column names for new data frame.
colnames(minisuic) = c("Attempts","M/F","How","Old")
head(minisuic)
## Attempts M/F How Old
## 165 11 female gas 15
## 174 20 female gas 20
## 183 27 female gas 25
## 192 29 female gas 30
## 201 44 female gas 35
## 210 24 female gas 40
Problem 4
Summary of new data frame. Print mean and median for same factors. Compare.
summary(minisuic)
## Attempts M/F How Old
## Min. :11.00 Length:9 Length:9 Min. :15
## 1st Qu.:20.00 Class :character Class :character 1st Qu.:25
## Median :24.00 Mode :character Mode :character Median :35
## Mean :24.33 Mean :35
## 3rd Qu.:27.00 3rd Qu.:45
## Max. :44.00 Max. :55
mm2 = matrix(c(mean(suicide$Freq),
mean(suicide$age),
mean(minisuic$Attempts),
mean(minisuic$Old),
median(suicide$Freq),
median(suicide$age),
median(minisuic$Attempts),
median(minisuic$Old)),ncol=2)
colnames(mm2)= c("mean", "median")
rownames(mm2)= c("Frequency", "Age","Attempts", "Old")
mm2 = as.table(mm2)
mm2
## mean median
## Frequency 173.79739 59.00000
## Age 50.00000 50.00000
## Attempts 24.33333 24.00000
## Old 35.00000 35.00000
Problem 5
Rename a value in a column. Do this three times.
minisuic$`M/F`[minisuic$`M/F` == "female"] = "F"
minisuic$Old[minisuic$Old >= 35] = ">35"
minisuic$Attempts[minisuic$Attempts >= 20 & minisuic$Attempts<30] = "twenties"
minisuic
## Attempts M/F How Old
## 165 11 F gas 15
## 174 twenties F gas 20
## 183 twenties F gas 25
## 192 twenties F gas 30
## 201 44 F gas >35
## 210 twenties F gas >35
## 219 twenties F gas >35
## 228 twenties F gas >35
## 237 14 F gas >35