Assignment 2

Name: [Joseph Kehoe] ID: [920697315] Date: [10_11_2024]

For this assignment, you will be using various summary statistics, or descriptions of central tendency, to describe and summarize data. Statistics of central tendency are another way to summarize and get information from you data. Along with different forms of visualization, central tendency is often a good first step when working with data. We will be using the same data LA crime aata as we did last week.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#load in the LA crimes data (1 point)
crime_stats <- read.csv("C:/Users/Joey/Downloads/LA_Crime_V2.csv")
#divide the data into male and female (1 point)
crime_male<- crime_stats %>% 
  filter(Vict.Sex == "M")
crime_female<- crime_stats %>% 
  filter(Vict.Sex == "F")

PART 1: Mean, Median and Mode

#first we will calculate the mean arithmetically, Using the sum function, calculate the mean victim age for the whole data set, then the male and female subsets. (1 point)
sum(crime_stats$Vict.Age)
## [1] 22485
#now calculate the mean age for the the three data sets using the mean() function in R (1 point)
mean_all <-mean(crime_stats$Vict.Age) 
mean_female <-mean(crime_female$Vict.Age)
mean_male <-mean(crime_male$Vict.Age)
#calculate the median age for the the three data sets using the median() function in R(1 point)
median_all <-median(crime_stats$Vict.Age)
median_female <-median(crime_female$Vict.Age)
median_male <-median(crime_male$Vict.Age)
#R doesn't have a function built in to calculate the mode, so we are going to have to make one 

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

#after you have run the above line, use the Mode() function we just made to calculate the mode for age in allthree data sets (1 point)
mode_all <-Mode(crime_stats$Vict.Age)
mode_male <-Mode(crime_male$Vict.Age)
mode_female <-Mode(crime_female$Vict.Age)

1.For each dataset, how do the mean, median and mode differ? What do each description of central tendency tell you about the data? (1 point) -The average age of victim (mean) is older for men, and younger for women. The mode for male is the same as the mode for the combined data set, but higher than the female. The median for men is similarly higher than it is for women. The mean age describes the age you would get if you took each individual’s age into equal account, and combined them with the intention of getting one representative, smooshed together age. The median just means that if you lined up every person by lowest age to highest, the median age would be the one in the middle. The mode would be the age that comes up the most often.

  1. How the measures of central tendency differ between the datasets? What inferences can you draw from these differences? (1 point) See above

PART 2: Variance, Standard Deviation and Coefficient of Variation

#calculate the variance for all three datasets (1 point)
var_all <- var(crime_stats$Vict.Age)
var_male <- var(crime_male$Vict.Age)
var_female <- var(crime_female$Vict.Age)

#calculate the the standard variation for all three datasets (1 point)
sd_all <- sd(crime_stats$Vict.Age)
sd_male <- sd(crime_male$Vict.Age)
sd_female <- sd(crime_female$Vict.Age)
#calculate the coefficient of variation for all three dataset (1 point)
cv_all <- 15.5/239.3
cv_male <- 15.1/229
cv_female <- 15.7/245.3
  1. Each of these statistics describe the variability of the data. Describe what this means and what information you can gather from describing variation. (1 point) -SD: it means how far from the mean your data points tend to lie. If it’s higher, the data points tend to lie farther from the mean, and vise versa. -Variance: The variance describes how far the data points are from the mean, but it also describes how far apart they are from each other, so just how spread out they are generally. -CV:Coefficient of varitaion expresses the varitation from the mean of the datapoints as a percent, instead of in whatever units the data was taken in.
  2. How does theses variability statistics change between the three datasets? (1 point) -SD: There is a wider distribution from the mean in the age of female victims than there is in male victims. -Variance: There is a wider distribution of ages of female victims, both from the mean and from each other. Part 3: Quantiles
#calculate the  quantiles for all three datasets (1 point)
quant_all <- quantile(crime_stats$Vict.Age)
quant_male <- quantile(crime_male$Vict.Age)
quant_female <- quantile(crime_female$Vict.Age)
  1. For each dataset, what is the age of the first and third quantiles for each dataset. What can you infer from this information? (1 point) For all: 29, 51 For male: 30, 52 For female: 28, 49 It means that 50% of all victims of crime are between the ages of 29 and 51, 50% of male victims are between 30 and 52, and 50% of female victims are between the ages of 28 and 49.