Assignment 2
Name: [Joseph Kehoe] ID: [920697315] Date: [10_11_2024]
For this assignment, you will be using various summary statistics, or descriptions of central tendency, to describe and summarize data. Statistics of central tendency are another way to summarize and get information from you data. Along with different forms of visualization, central tendency is often a good first step when working with data. We will be using the same data LA crime aata as we did last week.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#load in the LA crimes data (1 point)
crime_stats <- read.csv("C:/Users/Joey/Downloads/LA_Crime_V2.csv")
#divide the data into male and female (1 point)
crime_male<- crime_stats %>%
filter(Vict.Sex == "M")
crime_female<- crime_stats %>%
filter(Vict.Sex == "F")
PART 1: Mean, Median and Mode
#first we will calculate the mean arithmetically, Using the sum function, calculate the mean victim age for the whole data set, then the male and female subsets. (1 point)
sum(crime_stats$Vict.Age)
## [1] 22485
#now calculate the mean age for the the three data sets using the mean() function in R (1 point)
mean_all <-mean(crime_stats$Vict.Age)
mean_female <-mean(crime_female$Vict.Age)
mean_male <-mean(crime_male$Vict.Age)
#calculate the median age for the the three data sets using the median() function in R(1 point)
median_all <-median(crime_stats$Vict.Age)
median_female <-median(crime_female$Vict.Age)
median_male <-median(crime_male$Vict.Age)
#R doesn't have a function built in to calculate the mode, so we are going to have to make one
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
#after you have run the above line, use the Mode() function we just made to calculate the mode for age in allthree data sets (1 point)
mode_all <-Mode(crime_stats$Vict.Age)
mode_male <-Mode(crime_male$Vict.Age)
mode_female <-Mode(crime_female$Vict.Age)
1.For each dataset, how do the mean, median and mode differ? What do each description of central tendency tell you about the data? (1 point) -The average age of victim (mean) is older for men, and younger for women. The mode for male is the same as the mode for the combined data set, but higher than the female. The median for men is similarly higher than it is for women. The mean age describes the age you would get if you took each individual’s age into equal account, and combined them with the intention of getting one representative, smooshed together age. The median just means that if you lined up every person by lowest age to highest, the median age would be the one in the middle. The mode would be the age that comes up the most often.
PART 2: Variance, Standard Deviation and Coefficient of Variation
#calculate the variance for all three datasets (1 point)
var_all <- var(crime_stats$Vict.Age)
var_male <- var(crime_male$Vict.Age)
var_female <- var(crime_female$Vict.Age)
#calculate the the standard variation for all three datasets (1 point)
sd_all <- sd(crime_stats$Vict.Age)
sd_male <- sd(crime_male$Vict.Age)
sd_female <- sd(crime_female$Vict.Age)
#calculate the coefficient of variation for all three dataset (1 point)
cv_all <- 15.5/239.3
cv_male <- 15.1/229
cv_female <- 15.7/245.3
#calculate the quantiles for all three datasets (1 point)
quant_all <- quantile(crime_stats$Vict.Age)
quant_male <- quantile(crime_male$Vict.Age)
quant_female <- quantile(crime_female$Vict.Age)