I decided to import my data about 2015 FBI Crimes in all NY Counties. In Order to first look at my data I need to import it as so:
library(readr)
FBI_NY_2015 <- read_csv("FBI_NY_2015.csv")
Parsed with column specification:
cols(
.default = col_integer(),
Geo_FIPS = col_character(),
Geo_NAME = col_character(),
Geo_QNAME = col_character(),
Geo_NATION = col_character(),
Geo_COUNTY = col_character()
)
See spec(...) for full column specifications.
Now I want to see exactly what it looks like
library(tibble)
glimpse(FBI_NY_2015, width = 75)
Observations: 54
Variables: 21
$ Geo_FIPS <chr> "36001", "36003", "36007", "36009", "36011", "3601...
$ Geo_NAME <chr> "Albany County", "Allegany County", "Broome County...
$ Geo_QNAME <chr> "Albany County, New York", "Allegany County, New Y...
$ Geo_NATION <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ Geo_STATE <int> 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36...
$ Geo_COUNTY <chr> "001", "003", "007", "009", "011", "013", "015", "...
$ SE_T001_001 <int> 309381, 47462, 196567, 77922, 78288, 130779, 87071...
$ SE_T002_001 <int> 127, 2, 785, 330, 201, 626, 470, 339, 18, 351, 149...
$ SE_T002_002 <int> 17, 0, 58, 37, 24, 33, 21, 32, 2, 32, 21, 53, 77, ...
$ SE_T002_003 <int> 110, 2, 727, 293, 177, 593, 449, 307, 16, 319, 128...
$ SE_T004_001 <int> 17, 0, 58, 37, 24, 33, 21, 32, 2, 32, 21, 53, 77, ...
$ SE_T004_002 <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
$ SE_T004_003 <int> 4, 0, 20, 24, 10, 11, 2, 18, 0, 9, 13, 21, 2, NA, ...
$ SE_T004_004 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ SE_T004_005 <int> 1, 0, 5, 3, 0, 2, 2, 1, 0, 5, 1, 2, 12, 0, 0, 1, 0...
$ SE_T004_006 <int> 12, 0, 33, 10, 14, 19, 17, 13, 2, 18, 7, 30, 63, 0...
$ SE_T006_001 <int> 110, 2, 727, 293, 177, 593, 449, 307, 16, 319, 128...
$ SE_T006_002 <int> 34, 0, 111, 75, 36, 175, 39, 61, 3, 81, 39, 99, 18...
$ SE_T006_003 <int> 72, 2, 600, 205, 136, 392, 399, 234, 13, 232, 88, ...
$ SE_T006_004 <int> 4, 0, 16, 13, 5, 26, 11, 12, 0, 6, 1, 12, 22, 0, 5...
$ SE_T008_001 <int> 0, 0, 0, 0, 1, 0, 1, 2, 0, 1, 2, 1, 3, 0, 0, 4, 0,...
Now I want to get rid or hide some of the variables that I do not need or want:
FBI_NY_2015<- FBI_NY_2015 %>% select(-one_of("Geo_COUNTY", "Geo_NATION", "Geo_NAME"))
FBI_NY_2015 <-FBI_NY_2015 %>% select(-one_of(c( "Geo_FIPS", "Geo_STATE")))
Now I want to rename the varibales to read them better:
FBI_NY_2015 <- FBI_NY_2015 %>% rename (CountyandState = Geo_QNAME, TotalPopulation = SE_T001_001, TotalViolentandPropertyCrimes = SE_T002_001, ViolentCrimes = SE_T002_002, PropertyCrimes = SE_T002_003, TotalViolentCrimes = SE_T004_001, Murders = SE_T004_002, Rape = SE_T004_003, RapeLegacy = SE_T004_004, Robberies = SE_T004_005, AggravatedAssaults = SE_T004_006, TotalPropertyCrimes = SE_T006_001, Burglaries = SE_T006_002, Larcenies = SE_T006_003, MotorVehicleThefts = SE_T006_004, Arson = SE_T008_001)
Now we can take a look at what we have so far:
head(FBI_NY_2015, 10)
Now I just want to select the variables that I wish to use:
FBI_NY_2015<- select(FBI_NY_2015,CountyandState, TotalPopulation, TotalViolentandPropertyCrimes, ViolentCrimes, PropertyCrimes, TotalViolentCrimes, Murders, Rape, Robberies, AggravatedAssaults, TotalPropertyCrimes, Burglaries, Larcenies, MotorVehicleThefts, Arson)
head(FBI_NY_2015, 10)
Now I will try to graph the information for Murders:
ggplot(data=FBI_NY_2015)+
geom_col(aes(x=CountyandState,y=Murders, fill = Murders))+
coord_flip()
After playing around with this and learning more about R, cleaning data and graphing, Here are some things I would like to improve: Learn how to condense the information for each county. Maybe simplify it b Nothern County, Easter, Western etc… Or just picking a few counties to focus on instead of all the counties of NY.