This assignment is designed to simulate a scenario where you are taking over someone’s existing work, and continuing with it to draw some further insights.
This is a real world dataset taken from the Crime Statistics Agency Victoria. https://www.crimestatistics.vic.gov.au/crime-statistics/latest-crime-data/download-data, specifically the data called “Data Tables - LGA Criminal Incidents Visualisation - year ending September 2019 (XLSX, 15.96 MB)”. The data for this assignment is from Table 3, but is stored as a compressed csv file (“.csv.gz”) to make it easier to read and manage.
You have just joined a consulting company as a data scientist. To give you some experience and guidance, you are performing a quick summary of the data, following along from guidance from Amelia, and some of the questions your manager has. This is not a formal report, but rather something you are giving to your manager that describes the data, and some interesting insights. We have written example text for the first section on monash, and would like you to explore another area. Our example writings are a good example of how to get full marks.
Your colleague at your consulting firm, Amelia (in the text treatment below) has written some helpful hints throughout the assignment to help guide you.
Questions that are work marks are indicated with ** at the start and end of the question, as well as the number of marks in parenthesis.
This assignment will be worth 6% of your total grade, and is marked out of 18 marks total.
3 Marks for grammar and clarity. You must write in complete sentences and do a spell check.
5 Marks for presentation of the data visualisations
10 marks for the questions
Your marks will be weighted according to peer evaluation.
Sections that contain marks are indicated with **, and will have the number of marks indicated in parentheses. For example:
# `**` What are the types of item divisions? How many are there? (0.5 Mark) `**`
Remember, you can look up the help file for functions by typing ?function_name. For example, ?mean. Feel free to google questions you have about how to do other kinds of plots, and post on the ED if you have any questions about the assignment.
To complete the assignment you will need to fill in the blanks for function names, arguments, or other names. These sections are marked with *** or ___. At a minimum, your assignment should be able to be “knitted” using the knit button for your Rmarkdown document.
If you want to look at what the assignment looks like in progress, but you do not have valid R code in all the R code chunks, remember that you can set the chunk options to eval = FALSE like so:
```{r this-chunk-will-not-run, eval = FALSE}`r''`
ggplot()
```
If you do this, please remember to ensure that you remove this chunk option or set it to eval = TRUE when you submit the assignment, to ensure all your R code runs.
You will be completing this assignment in your assigned groups. A reminder regarding our recommendations for completing group assignments:
Your assignments will be peer reviewed, and results checked for reproducibility. This means:
Each student will be randomly assigned another team’s submission to provide feedback on three things:
This assignment is due in by close of business (5pm) on Wednesday 1st April. You will submit the assignment via ED. Please change the file name to include your teams name. For example, if you are team dplyr, your assignment file name could read: “assignment-1-2020-s1-team-dplyr.Rmd”
You work as a data scientist in the well named consulting company, “Consulting for You”.
It’s your second day at the company, and you’re taken to your desk. Your boss says to you:
Amelia has managed to find this treasure trove of data - get this: crime statistics in Victoria for the past years! Unfortunately, Amelia just left on holiday to New Zealand, and now won’t be back now for a while. They discovered this dataset the afternoon before they left on holiday, and got started on doing some data analysis.
We’ve got a meeting coming up soon where we need to discuss some new directions for the company, and we want you to tell us about this dataset and what we can do with it. We want to focus on monash, since we have a few big customers in that area, and then we want you to help us compare that whatever area has the highest burglary.
You’re in with the new hires of data scientists here. We’d like you to take a look at the data and tell me what the spreadsheet tells us. I’ve written some questions on the report for you to answer, and there are also some questions from Amelia I would like you to look at as well.
Most Importantly, can you get this to me by Wednesday 1st April, COB (COB = Close of Business at 5pm).
I’ve given this dataset to some of the other new hire data scientists as well, you’ll all be working as a team on this dataset. I’d like you to all try and work on the questions separately, and then combine your answers together to provide the best results.
From here, you are handed a USB stick. You load this into your computer, and you see a folder called “vic-crime”. In it is a folder called “data-raw”, and an Rmarkdown file. It contains the start of a data analysis. Your job is to explore the data and answer the questions in the document.
Note that the text that is written was originally written by Amelia, and you need to make sure that their name is kept up top, and to pay attention to what they have to say in the document!
Amelia: First, let’s read in the data using the function
read_csv()from thereadrpackage, and clean up the names, using therenamefunction fromdplyr. (I’ve also got some data there which shows you how to read in the excel sheet, if you want to do that (but absolutely no need to!))
# library(readxl)
# read in the data with read_excel
# crime_raw <- read_excel(here::here("2020/assignment-1/data-raw/Data_Tables_LGA_Criminal_Incidents_Year_Ending_September_2019.xlsx"),
# sheet = "Table 03")
# then write that as a compressed csv, - csv-gz file.
# readr::write_csv(crime_raw,
# here::here("2020/assignment-1/data-raw/crime-raw-table-3.csv.gz"))
library(readr)
library(here)
## here() starts at /cloud/project
crime_raw <- read_csv(here("data-raw/crime-raw-table-3.csv.gz"))
## Parsed with column specification:
## cols(
## Year = col_double(),
## `Year ending` = col_character(),
## `Local Government Area` = col_character(),
## Postcode = col_double(),
## `Suburb/Town Name` = col_character(),
## `Offence Division` = col_character(),
## `Offence Subdivision` = col_character(),
## `Offence Subgroup` = col_character(),
## `Incidents Recorded` = col_double()
## )
# let's explore the names o the variable in this data
names(crime_raw)
## [1] "Year" "Year ending" "Local Government Area"
## [4] "Postcode" "Suburb/Town Name" "Offence Division"
## [7] "Offence Subdivision" "Offence Subgroup" "Incidents Recorded"
# OK, we are going to need to rename these to have better names in R
# this means that the names are all lowercase and separated by an
# underscore
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
crime <- crime_raw %>%
# we don't need the "year ending" variable
select(-`Year ending`) %>%
# the variable names have spaces in them, so we need to refer to them
# in R using the special back tick, ` , around the names, otherwise R
# won't let us use them.
rename(year = Year,
local_gov_area = `Local Government Area`,
postcode = Postcode,
suburb_town = `Suburb/Town Name`,
offence_division = `Offence Division`,
offence_subdivision = `Offence Subdivision`,
offence_subgroup = `Offence Subgroup`,
n_incidents = `Incidents Recorded`)
Amelia: Let’s print the data and look at the first few rows.
crime
## # A tibble: 307,529 x 8
## year local_gov_area postcode suburb_town offence_division offence_subdivi…
## <dbl> <chr> <dbl> <chr> <chr> <chr>
## 1 2019 Alpine 3691 Dederang A Crimes agains… A20 Assault and…
## 2 2019 Alpine 3691 Dederang A Crimes agains… Other crimes ag…
## 3 2019 Alpine 3691 Dederang B Property and … B20 Property da…
## 4 2019 Alpine 3691 Dederang B Property and … B30 Burglary/Br…
## 5 2019 Alpine 3691 Dederang B Property and … B40 Theft
## 6 2019 Alpine 3691 Dederang D Public order … D20 Disorderly …
## 7 2019 Alpine 3691 Dederang D Public order … D20 Disorderly …
## 8 2019 Alpine 3691 Dederang E Justice proce… E10 Justice pro…
## 9 2019 Alpine 3691 Glen Creek B Property and … B50 Deception
## 10 2019 Alpine 3691 Gundowring B Property and … B10 Arson
## # … with 307,519 more rows, and 2 more variables: offence_subgroup <chr>,
## # n_incidents <dbl>
Amelia: And what are the names of the columns in the dataset?
names(crime)
## [1] "year" "local_gov_area" "postcode"
## [4] "suburb_town" "offence_division" "offence_subdivision"
## [7] "offence_subgroup" "n_incidents"
Amelia: How many years of data are there?
summary(crime$year)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2010 2012 2015 2015 2017 2019
Amelia: We have data that goes from 2010 until 2019, that’s nine years of data!
How many Local Government Areas (LGAs) are there? And what are the LGAs called?
n_distinct(crime$local_gov_area)
## [1] 79
unique(crime$local_gov_area)
## [1] "Alpine" "Ararat" "Ballarat"
## [4] "Banyule" "Bass Coast" "Baw Baw"
## [7] "Bayside" "Benalla" "Boroondara"
## [10] "Brimbank" "Buloke" "Campaspe"
## [13] "Cardinia" "Casey" "Central Goldfields"
## [16] "Colac-Otway" "Corangamite" "Darebin"
## [19] "East Gippsland" "Frankston" "Gannawarra"
## [22] "Glen Eira" "Glenelg" "Golden Plains"
## [25] "Greater Bendigo" "Greater Dandenong" "Greater Geelong"
## [28] "Greater Shepparton" "Hepburn" "Hindmarsh"
## [31] "Hobsons Bay" "Horsham" "Hume"
## [34] "Indigo" "Kingston" "Knox"
## [37] "Latrobe" "Loddon" "Macedon Ranges"
## [40] "Manningham" "Mansfield" "Maribyrnong"
## [43] "Maroondah" "Melbourne" "Melton"
## [46] "Mildura" "Mitchell" "Moira"
## [49] "Monash" "Moonee Valley" "Moorabool"
## [52] "Moreland" "Mornington Peninsula" "Mount Alexander"
## [55] "Moyne" "Murrindindi" "Nillumbik"
## [58] "Northern Grampians" "Port Phillip" "Pyrenees"
## [61] "Queenscliffe" "South Gippsland" "Southern Grampians"
## [64] "Stonnington" "Strathbogie" "Surf Coast"
## [67] "Swan Hill" "Towong" "Wangaratta"
## [70] "Warrnambool" "Wellington" "West Wimmera"
## [73] "Whitehorse" "Whittlesea" "Wodonga"
## [76] "Wyndham" "Yarra" "Yarra Ranges"
## [79] "Yarriambiack"
Amelia: That’s a lot of areas 79!
What are the types of offence divisions? How many are there?
unique(crime$offence_division)
## [1] "A Crimes against the person"
## [2] "B Property and deception offences"
## [3] "D Public order and security offences"
## [4] "E Justice procedures offences"
## [5] "C Drug offences"
## [6] "F Other offences"
n_distinct(crime$offence_division)
## [1] 6
!> Answer: They are six types of offence divisions.Their names are “Crimes against the person”,“Property and deception offences”,“Public order and security offences”,“Justice procedures offences”,“Drug offences” and “Other offences”
Amelia: Remember that you can learn more about what these functions do by typing
?uniqueor?n_distinctinto the console.
** What are the types of offence subdivisions? How many are there? (0.5 Mark) **unique(crime$offence_subdivision)
## [1] "A20 Assault and related offences"
## [2] "Other crimes against the person"
## [3] "B20 Property damage"
## [4] "B30 Burglary/Break and enter"
## [5] "B40 Theft"
## [6] "D20 Disorderly and offensive conduct"
## [7] "E10 Justice procedures"
## [8] "B50 Deception"
## [9] "B10 Arson"
## [10] "A70 Stalking, harassment and threatening behaviour"
## [11] "C30 Drug use and possession"
## [12] "A80 Dangerous and negligent acts endangering people"
## [13] "E20 Breaches of orders"
## [14] "D10 Weapons and explosives offences"
## [15] "C10 Drug dealing and trafficking"
## [16] "C20 Cultivate or manufacture drugs"
## [17] "D30 Public nuisance offences"
## [18] "F90 Miscellaneous offences"
## [19] "F30 Other government regulatory offences"
## [20] "A50 Robbery"
## [21] "D40 Public security offences"
## [22] "C90 Other drug offences"
## [23] "F20 Transport regulation offences"
## [24] "F10 Regulatory driving offences"
## [25] "B60 Bribery"
n_distinct(crime$offence_subdivision)
## [1] 25
!> Answer: There are 25 types of offence subdivision.Their names are [1] “A20 Assault and related offences”
[2] “Other crimes against the person”
[3] “B20 Property damage”
[4] “B30 Burglary/Break and enter”
[5] “B40 Theft”
[6] “D20 Disorderly and offensive conduct”
[7] “E10 Justice procedures”
[8] “B50 Deception”
[9] “B10 Arson”
[10] “A70 Stalking, harassment and threatening behaviour” [11] “C30 Drug use and possession”
[12] “A80 Dangerous and negligent acts endangering people” [13] “E20 Breaches of orders”
[14] “D10 Weapons and explosives offences”
[15] “C10 Drug dealing and trafficking”
[16] “C20 Cultivate or manufacture drugs”
[17] “D30 Public nuisance offences”
[18] “F90 Miscellaneous offences”
[19] “F30 Other government regulatory offences”
[20] “A50 Robbery”
[21] “D40 Public security offences”
[22] “C90 Other drug offences”
[23] “F20 Transport regulation offences”
[24] “F10 Regulatory driving offences”
[25] “B60 Bribery”
** How many types of offence_subgroup are there? (0.5 Mark) **unique(crime$offence_subgroup)
## [1] "A212 Non-FV Serious assault"
## [2] "Other crimes against the person"
## [3] "B21 Criminal damage"
## [4] "B321 Residential non-aggravated burglary"
## [5] "B49 Other theft"
## [6] "D22 Drunk and disorderly in public"
## [7] "D23 Offensive conduct"
## [8] "E14 Pervert the course of justice or commit perjury"
## [9] "B53 Obtain benefit by deception"
## [10] "B11 Cause damage by fire"
## [11] "A211 FV Serious assault"
## [12] "A731 FV Threatening behaviour"
## [13] "C32 Drug possession"
## [14] "A231 FV Common assault"
## [15] "A232 Non-FV Common assault"
## [16] "A81 Dangerous driving"
## [17] "B22 Graffiti"
## [18] "B311 Residential aggravated burglary"
## [19] "B322 Non-residential non-aggravated burglary"
## [20] "B19 Other fire related offences"
## [21] "E21 Breach family violence order"
## [22] "B42 Steal from a motor vehicle"
## [23] "D12 Prohibited and controlled weapons offences"
## [24] "A711 FV Stalking"
## [25] "A712 Non-FV Stalking"
## [26] "A732 Non-FV Threatening behaviour"
## [27] "A83 Throw or discharge object endangering people"
## [28] "A89 Other dangerous or negligent acts endangering people"
## [29] "B41 Motor vehicle theft"
## [30] "B43 Steal from a retail store"
## [31] "B45 Receiving or handling stolen goods"
## [32] "B51 Forgery and counterfeiting"
## [33] "C12 Drug trafficking"
## [34] "C21 Cultivate drugs"
## [35] "D11 Firearms offences"
## [36] "D24 Offensive language"
## [37] "D36 Other public nuisance offences"
## [38] "E13 Resist or hinder officer"
## [39] "E22 Breach intervention order"
## [40] "E23 Breach bail conditions"
## [41] "E29 Breach of other orders"
## [42] "F93 Cruelty to animals"
## [43] "D25 Criminal intent"
## [44] "A22 Assault police, emergency services or other authorised officer"
## [45] "B44 Theft of a bicycle"
## [46] "D35 Improper movement on public or private space"
## [47] "F33 Liquor and tobacco licensing offences"
## [48] "F39 Other government regulatory offences"
## [49] "A722 Non-FV Harassment and private nuisance"
## [50] "A51 Aggravated robbery"
## [51] "A52 Non-Aggravated robbery"
## [52] "A721 FV Harassment and private nuisance"
## [53] "B46 Fare evasion"
## [54] "B54 State false information"
## [55] "C31 Drug use"
## [56] "D13 Explosives offences"
## [57] "D21 Riot and affray"
## [58] "D32 Hoaxes"
## [59] "E11 Escape custody"
## [60] "E15 Prison regulation offences"
## [61] "E19 Other justice procedures offences"
## [62] "F92 Public health and safety offences"
## [63] "D43 Hacking"
## [64] "B29 Other property damage offences"
## [65] "C99 Other drug offences"
## [66] "D26 Disorderly conduct"
## [67] "F21 Public transport"
## [68] "F99 Other miscellaneous offences"
## [69] "B12 Cause a bushfire"
## [70] "B312 Non-residential aggravated burglary"
## [71] "F19 Other regulatory driving offences"
## [72] "B56 Professional malpractice and misrepresentation"
## [73] "F91 Environmental offences"
## [74] "C23 Possess drug manufacturing equipment or precursor"
## [75] "B59 Other deception offences"
## [76] "C11 Drug dealing"
## [77] "D33 Begging"
## [78] "F94 Dangerous substance offences"
## [79] "F34 Pornography and censorship offences"
## [80] "B329 Unknown non-aggravated burglary"
## [81] "D31 Privacy offences"
## [82] "F29 Other transport regulation offences"
## [83] "F23 Maritime regulations offences"
## [84] "B61 Bribery of officials"
## [85] "A82 Neglect or ill treatment of people"
## [86] "F36 Prostitution offences"
## [87] "F32 Commercial regulation offences"
## [88] "F24 Pedestrian offences"
## [89] "B55 Deceptive business practices"
## [90] "F31 Betting and gaming offences"
## [91] "E12 Fail to appear"
## [92] "F22 Aviation regulations offences"
## [93] "B52 Possess equipment to make false instrument"
## [94] "C22 Manufacture drugs"
## [95] "F16 Registration and roadworthiness offences"
## [96] "D44 Terrorism offences"
## [97] "F11 Drink driving"
## [98] "F35 Intellectual property"
## [99] "B319 Unknown aggravated burglary"
## [100] "F14 Parking offences"
## [101] "D49 Other public security offences"
## [102] "F12 Drug driving"
## [103] "D41 Immigration offences"
## [104] "F15 Licensing offences"
## [105] "D34 Defamation and libel"
## [106] "D42 Sabotage"
n_distinct(crime$offence_subgroup)
## [1] 106
!> Answer: There are 106 types of offence subgroup.
** What is the summary of the number of incidents? (What function gives you the minimum, median, and maximum, and 1st and third quartiles?) (0.5 Mark) **summary(crime$n_incidents)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 2.0 11.6 7.0 2837.0
Min. 1st Qu. Median Mean 3rd Qu. Max. 1.0 1.0 2.0 11.6 7.0 2837.0 # ** Can you tell me what each row represents, and what each of the columns measure? (1 Mark) **
Amelia: We need to describe what each row of the data represents, and take our best guess at what we think each column measures. It might be worthwhile looking through the excel sheet in the
datafolder, or on the website where the data was extracted: https://www.crimestatistics.vic.gov.au/crime-statistics/latest-crime-data/download-data - the file called “Data Tables - LGA Offence Visualisation - year ending September 2019 (XLSX, 17.13 MB)”. This data is also provided in the data-raw folder
!> Answer: For table 01,each row represents how many cases and the rate per 100,000 population during that year at which police region and government area. Each columns measure year number, police region,government area, case number and rate per 100,000.
Amelia: Let’s group by year and then sum up the number of property items. Then we can take this information and use
ggplotto plot the year on the x axis, andn_incidentson the y axis, and number of items as a column withgeom_col().
crime_year_n_incidents <- crime %>%
group_by(year) %>%
summarise(n_incidents = sum(n_incidents))
library(ggplot2)
ggplot(crime_year_n_incidents,
aes(x = factor(year),
y = n_incidents)) +
geom_col()
Amelia: I try and write three sentences complete about what I learn in a graphic. You should start with a quick summary of what the graphic shows you. Then, describe what is on the x axis, the y axis, and any other colours used to separate the data. You then need to describe what you learn. So, I would say:
“A summary of the number of items stolen from incidents for each year from 2010 until 2019. On the x axis is each year, and the y axis is the number of Offences. We learn that the number of offences starts from 300,000 (3e+05 means the number 3 with 5 zeros after it), and increases to about 400,000, with what seems to be a somewhat steady increase”
Amelia: Let’s filter the data down to the ‘monash’ LGAs.
crime_monash <- crime %>% filter(local_gov_area == "Monash")
Amelia: Let’s count the number of crimes per year.
crime_count_monash <- crime_monash %>% count(year)
ggplot(crime_count_monash,
aes(x = year,
y = n)) +
geom_col()
Amelia: This plot shows the number of offences per year in the Monash LGA, in Victoria. The x axis shows the year, and the y axis shows the number of offences for that year. There appears to be a slight upwards trend, perhaps slowly flattening out over the past 5 years.
Amelia: We count the number of observations in each
offence_subdivisionto tell us the general category of the most common. Let’s count the top 5
crime_monash %>% count(offence_division, sort = TRUE)
## # A tibble: 6 x 2
## offence_division n
## <chr> <int>
## 1 B Property and deception offences 2032
## 2 A Crimes against the person 1567
## 3 D Public order and security offences 697
## 4 E Justice procedures offences 502
## 5 C Drug offences 417
## 6 F Other offences 86
Amelia: The top divisions are “B Property and deception offences”, at 2032, followed by “A Crimes against the person” at 1567.
Amelia: We take the crime data, then group by year, and count the number of offences (
offence_division) in each year. We then plot this data. On the x axis we have year. On the y axis we have n, the number of crimes that take place in a subdivision in a year, and we are colouring according to the offence division, and drawing this with a line, then making sure that the limits go from 0 to 30.
crime_year_offence_monash <- crime_monash %>%
group_by(year) %>%
count(offence_division)
ggplot(crime_year_offence_monash,
aes(x = year,
y = n,
colour = offence_division)) +
geom_line() +
lims(y = c(0, 250)) # Makes sure the y axis goes to zero
Amelia: This shows us that the most common offence is “Property and Deception offences”. It looks like most crime types are increasing over time except for maybe type C.
Amelia: We count up the offence subgroup, which is the smallest category on offences. We then plot number of times an item is stolen, and reorder the y axis so that the items are in order of most to least.
crime_offence_monash <- crime_monash %>%
count(offence_subgroup,
wt = n_incidents,
sort = TRUE) %>%
top_n(n = 20,
wt = n)
# save an object of the maximum number of items stolen
# to help construct the plot below.
max_offences <- max(crime_offence_monash$n)
ggplot(crime_offence_monash,
aes(x = n,
y = reorder(offence_subgroup, n))) +
geom_point() +
lims(x = c(0, max_offences)) # make sure x axis goes from 0
Amelia:
Amelia: This could be where we focus our next marketing campaign! Let’s take the crime data, then count the number of rows in each local_gov_area, and take the top 5 results using
top_n, and arrange in descending order by the column “n”
crime %>%
count(local_gov_area) %>%
top_n(n = 5) %>%
arrange(desc(n))
## Selecting by n
## # A tibble: 5 x 2
## local_gov_area n
## <chr> <int>
## 1 Greater Geelong 13577
## 2 Yarra Ranges 10018
## 3 Greater Bendigo 9295
## 4 Casey 8948
## 5 Mornington Peninsula 8590
**) Which LGA had the most crime? (0.5 Mark) (**)!> Answer: Greater Geelong had the most crime.
** Subset the data to be the LGA with the most crime. (0.5 Mark) **crime_top_lga <- crime %>%
filter(local_gov_area == "Greater Geelong")
** Is crime in Greater Geelong increasing? (1 Mark) **crime_count_top <- crime_top_lga %>% count(year)
ggplot(crime_count_top,
aes(x = year,
y = n)) +
geom_col()
** What are the most common offences (offence division) at the top LGA Greater Geelong across all years? (1 Marks) **crime_top_lga %>%
count(offence_division,sort=TRUE)
## # A tibble: 6 x 2
## offence_division n
## <chr> <int>
## 1 B Property and deception offences 5785
## 2 A Crimes against the person 3631
## 3 D Public order and security offences 1698
## 4 E Justice procedures offences 1297
## 5 C Drug offences 964
## 6 F Other offences 202
B Property and deception offences are the most commom offences at 5785. ## ** Are any of these offences (offence division) increasing over time? (1 Mark) **
crime_year_offence_top <- crime_top_lga %>%
group_by(year) %>%
count(offence_division)
ggplot(crime_year_offence_top,
aes(x = year,
y = n,
group = offence_division)) +
geom_line()
These crimes have all increased over time. Some increase more, some less. Most fell in 2011, but eventually all crimes increased in 2019.
Amelia: I would write three sentences complete about what I learn in this graphic. You should start with a quick summary of what the graphic shows you. Then, describe what is on the x axis, the y axis, and any other colours used to separate the data. You then need to describe what you learn.
crime_items_top <- crime_top_lga %>%
count(offence_division)
ggplot(crime_items_top,
aes(x = n,
y = reorder(offence_division, n))) +
geom_point()
bind_rows()Amelia: You can combine the data together using
bind_rows().
top_crimes <- bind_rows(crime_monash,
crime_top_lga)
Amelia: Create two separate plots for each local government area using
facet_wrap()on each local government area in ggplot.
crime_year_offence_both <- top_crimes %>%
group_by(year, local_gov_area) %>%
count(offence_division)
gg_crime_offence <- ggplot(
data = crime_year_offence_both,
aes(x = year,
y = n,
colour = offence_division)) +
geom_line() +
facet_wrap(~ local_gov_area)
gg_crime_offence
crime_both <- top_crimes %>%
group_by(local_gov_area) %>%
count(offence_division)
ggplot(crime_both,
aes(x = n,
y = reorder(offence_division, n), # reorder the points
colour = local_gov_area)) +
geom_point()
** Do you have any recommendations about future directions with this dataset? Is there anything else in the excel spreadsheet we could look at? (2 Mark) **!> Answer: I think this dataset should take into account the change in the rate of crime and the population of Greater Geelong and Monash. We could look at Offence Subdivision and Offence Subgroup in the excel spreadsheet.
** For our presentation to stake holders, you get to pick one figure to show them, which of the ones above would you choose? Why? Recreate the figure below here and write 3 sentences about it (2.5 Marks) **Amelia: Remember, when you are describing data visualisation, You should start with a quick summary of what the graphic shows you. Then, describe what is on the x axis, the y axis, and any other colours used to separate the data. You then need to describe what you learn.
I would include the following figure:
gg_crime_offence
!> Answer: This figure shows the growth lines about the crime division between greater geelong and monash in different years.X axis measure the number of offence,y axis represent the years and different colours show the different types of offence division.This figure is a good illustration of the difference between monash and the place with the most offence division. Although the number of offence division in both areas is increasing year by year.
Data downloaded from https://www.crimestatistics.vic.gov.au/crime-statistics/latest-crime-data/download-data from the dataset called “Data Tables - LGA Offence Visualisation - year ending September 2019 (XLSX, 17.13 MB)”
Packages used (look for things which were loaded with library()): * ggplot2 * dplyr *