Introduction

Coursera JHU Week 2 assignment requires to create a web page using R Markdown that features a map created with Leaflet.

The dataset used comes from R4DS Online Learning Community called TidyTuesday. A weekly dataset is published in the site and invites R code enthusiasts to develop data exploration and visualization skills.

For the assignment, I chose the UFO Sightings around the World dataset, published in June 25, 2019. This dataset includes >80,000 recorded UFO “sightings” around the world, including the UFO shape, latitude/longitude and state/country of where the sighting occurred, duration of the “event” and the data_time when it occurred.

Let’s do this!

First order of the day, read the data.

ufo_sightings <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv", col_names = TRUE, 
                          col_types = "ccfffdcccdd")

Second, inspect the dataset

dim(ufo_sightings)
## [1] 80332    11
skim(ufo_sightings)
## Skim summary statistics
##  n obs: 80332 
##  n variables: 11 
## 
## ── Variable type:character ──────────────────────────────────────────────────────────
##                    variable missing complete     n min max empty n_unique
##                   city_area       0    80332 80332   1  69     0    19900
##             date_documented       0    80332 80332   8  10     0      317
##                   date_time       0    80332 80332  14  16     0    69586
##  described_encounter_length       0    80332 80332   2  31     0     8349
##                 description      15    80317 80332   1 246     0    79996
## 
## ── Variable type:factor ─────────────────────────────────────────────────────────────
##   variable missing complete     n n_unique
##    country    9670    70662 80332        5
##      state    5797    74535 80332       67
##  ufo_shape    1932    78400 80332       29
##                                   top_counts ordered
##      us: 65114, NA: 9670, ca: 3000, gb: 1905   FALSE
##       ca: 9655, NA: 5797, wa: 4268, fl: 4200   FALSE
##  lig: 16565, tri: 7865, cir: 7608, fir: 6208   FALSE
## 
## ── Variable type:numeric ────────────────────────────────────────────────────────────
##          variable missing complete     n    mean        sd       p0
##  encounter_length       3    80329 80332 9017.23 620228.37    0.001
##          latitude       1    80331 80332   38.12     10.47  -82.86 
##         longitude       0    80332 80332  -86.77     39.7  -176.66 
##      p25    p50    p75      p100     hist
##    30    180    600      9.8e+07 ▇▁▁▁▁▁▁▁
##    34.13  39.41  42.79  72.7     ▁▁▁▁▁▂▇▁
##  -112.07 -87.9  -78.75 178.44    ▁▇▇▁▁▁▁▁
head(ufo_sightings)
## # A tibble: 6 x 11
##   date_time city_area state country ufo_shape encounter_length
##   <chr>     <chr>     <fct> <fct>   <fct>                <dbl>
## 1 10/10/19… san marc… tx    us      cylinder              2700
## 2 10/10/19… lackland… tx    <NA>    light                 7200
## 3 10/10/19… chester … <NA>  gb      circle                  20
## 4 10/10/19… edna      tx    us      circle                  20
## 5 10/10/19… kaneohe   hi    us      light                  900
## 6 10/10/19… bristol   tn    us      sphere                 300
## # … with 5 more variables: described_encounter_length <chr>,
## #   description <chr>, date_documented <chr>, latitude <dbl>,
## #   longitude <dbl>

Convert date_time variable to POSIXct and create year variable for subsequent filtering.

ufo_sightings <- ufo_sightings %>% 
        mutate(date_time = date_time %>% parse_date_time('mdy_HM'), 
        year = date_time %>% year())

For this assignments, will use the UFO sightings located in the state of Florida, US and year equal to 2014 (last year of documented sightings).

ufo_sightings$state <- toupper(ufo_sightings$state)
ufo_sightings$country <- toupper(ufo_sightings$country)
ufo_sightings$city_area <- toTitleCase(ufo_sightings$city_area)
ufo_sightings_us <- ufo_sightings %>% 
     mutate(country = case_when((state %in% state.abb | is.na(country)) ~ "US"))
ufo_sightings_fl <- ufo_sightings_us %>% 
     filter(state == "FL" & year == 2014)

Checking missing data for ufo_sightings_fl

colSums(is.na(ufo_sightings_fl))
##                  date_time                  city_area 
##                          0                          0 
##                      state                    country 
##                          0                          0 
##                  ufo_shape           encounter_length 
##                          7                          0 
## described_encounter_length                description 
##                          0                          0 
##            date_documented                   latitude 
##                          0                          0 
##                  longitude                       year 
##                          0                          0

There are some missing values in the ufo_shape variable. The complete.cases can get rid of these missing values. However, let’s not use the function for purposes of getting the most points in the map.

Select the variables needed for mapping the sightings

ufo_sightings_fl_map <- ufo_sightings_fl %>% 
     select(date_time, city_area, ufo_shape, description, latitude, 
            longitude)

Now, we’re ready to use leaflet to mark the locations of UFO sightings.

leaflet() %>% 
     addTiles() %>% 
     addMarkers(lat = ufo_sightings_fl_map$latitude, 
                lng = ufo_sightings_fl_map$longitude, 
                popup = paste("City: ", ufo_sightings_fl_map$city_area, "<br>", 
                              "Date/Time: ", ufo_sightings_fl_map$date_time, "<br>", 
                              "UFO Shape: ", ufo_sightings_fl_map$ufo_shape, "<br>", 
                              "Description: ", ufo_sightings_fl_map$description), 
                clusterOptions = markerClusterOptions())