Coursera JHU Week 2 assignment requires to create a web page using R Markdown that features a map created with Leaflet.
The dataset used comes from R4DS Online Learning Community called TidyTuesday. A weekly dataset is published in the site and invites R code enthusiasts to develop data exploration and visualization skills.
For the assignment, I chose the UFO Sightings around the World dataset, published in June 25, 2019. This dataset includes >80,000 recorded UFO “sightings” around the world, including the UFO shape, latitude/longitude and state/country of where the sighting occurred, duration of the “event” and the data_time when it occurred.
Let’s do this!
First order of the day, read the data.
ufo_sightings <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv", col_names = TRUE,
col_types = "ccfffdcccdd")
Second, inspect the dataset
dim(ufo_sightings)
## [1] 80332 11
skim(ufo_sightings)
## Skim summary statistics
## n obs: 80332
## n variables: 11
##
## ── Variable type:character ──────────────────────────────────────────────────────────
## variable missing complete n min max empty n_unique
## city_area 0 80332 80332 1 69 0 19900
## date_documented 0 80332 80332 8 10 0 317
## date_time 0 80332 80332 14 16 0 69586
## described_encounter_length 0 80332 80332 2 31 0 8349
## description 15 80317 80332 1 246 0 79996
##
## ── Variable type:factor ─────────────────────────────────────────────────────────────
## variable missing complete n n_unique
## country 9670 70662 80332 5
## state 5797 74535 80332 67
## ufo_shape 1932 78400 80332 29
## top_counts ordered
## us: 65114, NA: 9670, ca: 3000, gb: 1905 FALSE
## ca: 9655, NA: 5797, wa: 4268, fl: 4200 FALSE
## lig: 16565, tri: 7865, cir: 7608, fir: 6208 FALSE
##
## ── Variable type:numeric ────────────────────────────────────────────────────────────
## variable missing complete n mean sd p0
## encounter_length 3 80329 80332 9017.23 620228.37 0.001
## latitude 1 80331 80332 38.12 10.47 -82.86
## longitude 0 80332 80332 -86.77 39.7 -176.66
## p25 p50 p75 p100 hist
## 30 180 600 9.8e+07 ▇▁▁▁▁▁▁▁
## 34.13 39.41 42.79 72.7 ▁▁▁▁▁▂▇▁
## -112.07 -87.9 -78.75 178.44 ▁▇▇▁▁▁▁▁
head(ufo_sightings)
## # A tibble: 6 x 11
## date_time city_area state country ufo_shape encounter_length
## <chr> <chr> <fct> <fct> <fct> <dbl>
## 1 10/10/19… san marc… tx us cylinder 2700
## 2 10/10/19… lackland… tx <NA> light 7200
## 3 10/10/19… chester … <NA> gb circle 20
## 4 10/10/19… edna tx us circle 20
## 5 10/10/19… kaneohe hi us light 900
## 6 10/10/19… bristol tn us sphere 300
## # … with 5 more variables: described_encounter_length <chr>,
## # description <chr>, date_documented <chr>, latitude <dbl>,
## # longitude <dbl>
Convert date_time variable to POSIXct and create year variable for subsequent filtering.
ufo_sightings <- ufo_sightings %>%
mutate(date_time = date_time %>% parse_date_time('mdy_HM'),
year = date_time %>% year())
For this assignments, will use the UFO sightings located in the state of Florida, US and year equal to 2014 (last year of documented sightings).
ufo_sightings$state <- toupper(ufo_sightings$state)
ufo_sightings$country <- toupper(ufo_sightings$country)
ufo_sightings$city_area <- toTitleCase(ufo_sightings$city_area)
ufo_sightings_us <- ufo_sightings %>%
mutate(country = case_when((state %in% state.abb | is.na(country)) ~ "US"))
ufo_sightings_fl <- ufo_sightings_us %>%
filter(state == "FL" & year == 2014)
Checking missing data for ufo_sightings_fl
colSums(is.na(ufo_sightings_fl))
## date_time city_area
## 0 0
## state country
## 0 0
## ufo_shape encounter_length
## 7 0
## described_encounter_length description
## 0 0
## date_documented latitude
## 0 0
## longitude year
## 0 0
There are some missing values in the ufo_shape variable. The complete.cases can get rid of these missing values. However, let’s not use the function for purposes of getting the most points in the map.
Select the variables needed for mapping the sightings
ufo_sightings_fl_map <- ufo_sightings_fl %>%
select(date_time, city_area, ufo_shape, description, latitude,
longitude)
Now, we’re ready to use leaflet to mark the locations of UFO sightings.
leaflet() %>%
addTiles() %>%
addMarkers(lat = ufo_sightings_fl_map$latitude,
lng = ufo_sightings_fl_map$longitude,
popup = paste("City: ", ufo_sightings_fl_map$city_area, "<br>",
"Date/Time: ", ufo_sightings_fl_map$date_time, "<br>",
"UFO Shape: ", ufo_sightings_fl_map$ufo_shape, "<br>",
"Description: ", ufo_sightings_fl_map$description),
clusterOptions = markerClusterOptions())