library(tidyverse)
library(lubridate)
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv") %>%
  # Extract year from date_time, and calculate decade from year
  mutate(year = year(parse_date_time(date_time, 'mdy_hm')),
         decade = as.factor(year %/% 10 * 10))

It would be interesting to see why there was a huge hike in UFO sightings since late 1990’s. Any news articles?

ufo_sightings %>%
  count(decade, sort = TRUE) %>% 
  ggplot(aes(decade, n)) +
  geom_col() +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "UFO Sightings by Decade",
       y = "number of sightings",
       x = NULL)

The number of sightings is higher in populous states (e.g., California, Texas and New York). What do they mean? Are these states more popular among aliens?

# the code from the textbook: 6.2.2 Data by US state
library(choroplethr)
library(openintro) # for abbr2state()
data(continental_us_states)

# Create region and value columns to pass to state_choropleth() to create map
ufo_map <-
  ufo_sightings %>%
  count(state, sort = TRUE) %>%
  mutate(region = tolower(abbr2state(state))) %>%
  rename(value = n) %>%
  filter(!is.na(region))

# Pass to state_choropleth() to create map
state_choropleth(ufo_map, 
                 num_colors=9,
                 zoom = continental_us_states) +
  scale_fill_brewer(palette="YlOrBr") +
  labs(title = "UFO Sightings in the United States",
       caption = "source: THE NATIONAL UFO REPORTING CENTER",
       fill = "Nunmber of Sightings") 

Import data

Hint: You can choose any data you like but can’t take one that is already taken by other groups.

ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv")

Description of the data and definition of variables

The data collected is roughly 80,000 observations of UFO sightings around the world. The variables of the data are date sighting occurred, city or area of sighting, state/region of sighting, country of sighting, UFO shape, length of encounter in seconds, described length of encounter, description of encounter, date documented, latitude, and longitude.

Visualize data

Hint: Create at least two plots.

library(ggplot2)
data(ufo_sightings)

ggplot(ufo_sightings, 
       aes(x = country,
           y = encounter_length)) +
  geom_point()

library(dplyr)
plotdata <- filter(ufo_sightings)


ggplot(plotdata,
       aes(x = encounter_length,
           y = latitude)) + 
  geom_line()

library(dplyr)
plotdata <- filter(ufo_sightings)


ggplot(plotdata,
       aes(x = encounter_length,
           y = longitude)) + 
  geom_line()

Correlation and regression analysis

data(ufo_sightings)

df <- dplyr :: select_if(ufo_sightings, is.numeric)

r <- cor(df, use = "complete.obs")
round(r,2)
##                  encounter_length latitude longitude
## encounter_length             1.00     0.00      0.01
## latitude                     0.00     1.00     -0.39
## longitude                    0.01    -0.39      1.00
houses_lm <- lm(encounter_length ~ country, 
                data = ufo_sightings)
summary(houses_lm)
## 
## Call:
## lm(formula = encounter_length ~ country, data = ufo_sightings)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
##   -66061    -5785    -5620    -5200 97769939 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)     3806      27002   0.141   0.8879  
## countryca      25053      29323   0.854   0.3929  
## countryde      20450      66820   0.306   0.7596  
## countrygb      62255      30578   2.036   0.0418 *
## countryus       1994      27113   0.074   0.9414  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 626300 on 70655 degrees of freedom
##   (9672 observations deleted due to missingness)
## Multiple R-squared:  0.000291,   Adjusted R-squared:  0.0002344 
## F-statistic: 5.142 on 4 and 70655 DF,  p-value: 0.0003864

Share interesting stories you found from the data

From the scatter plot, we found that the United States had the highest frequency of UFO encounters. We dug deeper and found that the higher the latitude, the longer the encounter was. Additionally, the lower the longitude, the longer the UFO encounter was. Based on this discovery, it seems that the northwestern hemisphere of had the longest recorded UFO encounters. It is no surprise that the the U.S. is in the area with the longest recorded encounters, being the country with the most UFO sightings.

Hide the messages, but display the code and its results on the webpage.

List names of all group members (both first and last name) at the top of the webpage.

Use the correct slug.