library(tidyverse)
library(scales)
options(scipen=999)
library(tidyverse)
library(lubridate)
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv") %>%
  # Extract year from date_time, and calculate decade from year
  mutate(year = year(parse_date_time(date_time, 'mdy_hm')),
         decade = as.factor(year %/% 10 * 10))

It would be interesting to see why there was a huge hike in UFO sightings since late 1990’s. Any news articles?

The CIA claims that UFO sightings could be a psycological factor of people who go outside. Many cultural aspects like tv shows and movies could also be infulencing the population into think what they are seeing is aliens or UFOs when it could just be a plane or shooting star. TV shows like the X-files or the Twilight Zone can make people think different things then what is really happenening. With an increase in cultural factors and social media in the later decades this could lead to an increase in potential sightings.

ufo_sightings %>%
  count(decade, sort = TRUE) %>% 
  ggplot(aes(decade, n)) +
  geom_col() +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "UFO Sightings by Decade",
       y = "number of sightings",
       x = NULL)

The number of sightings is higher in populous states (e.g., California, Texas and New York). What do they mean? Are these states more popular among aliens?

An increased amount of sightings could be happening in these states because of the larger population. The more people who see it could lead to them posting it on social media right away making people go outside and see the same sighting at the same time. It could also be since there are more people in these states there is a larger chance of a UFO being seen. If a UFO appears over a place with a low population denisty there could be a chance that it doesn’t even get noticed.

# the code from the textbook: 6.2.2 Data by US state
library(choroplethr)
library(openintro) # for abbr2state()
data(continental_us_states)

# Create region and value columns to pass to state_choropleth() to create map
ufo_map <-
  ufo_sightings %>%
  count(state, sort = TRUE) %>%
  mutate(region = tolower(abbr2state(state))) %>%
  rename(value = n) %>%
  filter(!is.na(region))

# Pass to state_choropleth() to create map
state_choropleth(ufo_map, 
                 num_colors=9,
                 zoom = continental_us_states) +
  scale_fill_brewer(palette="YlOrBr") +
  labs(title = "UFO Sightings in the United States",
       caption = "source: THE NATIONAL UFO REPORTING CENTER",
       fill = "Nunmber of Sightings") 

The states that are darker contain more sightings than the lighter states, this could be because of the larger amount of population density in these states. The increased light pollution in these states could make it easier to see planes and other things in the sky leading to more sightings.

Import data

Hint: You can choose any data you like but can’t take one that is already taken by other groups.

library(readr)

ufo_sightings <- read.csv("ufo_sightings.csv") %>%
  mutate(state = fct_lump(state, 10)) %>%
  filter(!is.na(state)) %>%
  mutate(ufo_shape = fct_lump(ufo_shape, 10)) %>%
  filter(!is.na(ufo_shape))

summary(ufo_sightings)
##             date_time           city_area         state       country     
##  7/4/2010 22:00  :   36   seattle    :  473   Other  :36319   au  :   10  
##  7/4/2012 22:00  :   31   phoenix    :  438   ca     : 9405   ca  : 2942  
##  11/16/1999 19:00:   26   las vegas  :  357   fl     : 4109   de  :    0  
##  9/19/2009 20:00 :   26   portland   :  355   wa     : 3989   gb  :   11  
##  7/4/2011 22:00  :   24   los angeles:  348   tx     : 3623   us  :63561  
##  10/31/2004 20:00:   23   san diego  :  328   ny     : 3150   NA's: 6218  
##  (Other)         :72576   (Other)    :70443   (Other):12147               
##     ufo_shape     encounter_length   described_encounter_length
##  light   :15438   Min.   :       0   5 minutes : 4426          
##  Other   :11211   1st Qu.:      30   2 minutes : 3262          
##  triangle: 7395   Median :     180   10 minutes: 3120          
##  circle  : 6946   Mean   :    6404   1 minute  : 2764          
##  fireball: 5825   3rd Qu.:     600   3 minutes : 2358          
##  unknown : 5239   Max.   :82800000   30 seconds: 2122          
##  (Other) :20688                      (Other)   :54690          
##                                                     description   
##  Fireball                                                 :   10  
##  ((NUFORC Note:  No information provided by witness.  PD)):    8  
##  UFO                                                      :    7  
##  ((NUFORC Note:  Witness provides no information.  PD))   :    6  
##  Bright Light                                             :    6  
##  (Other)                                                  :72697  
##  NA's                                                     :    8  
##    date_documented     latitude        longitude      
##  12/12/2009: 1374   Min.   :-46.16   Min.   :-176.66  
##  10/30/2006: 1243   1st Qu.: 34.21   1st Qu.:-114.08  
##  11/21/2010: 1167   Median : 39.28   Median : -89.40  
##  10/31/2008: 1078   Mean   : 38.71   Mean   : -95.13  
##  8/30/2013 : 1015   3rd Qu.: 42.37   3rd Qu.: -80.25  
##  3/19/2009 :  973   Max.   : 72.70   Max.   : 169.88  
##  (Other)   :65892   NA's   :1

Explain data

Hint: Source and description of data, and definition of variables.

The source of this data is from the TidyTuesday data section on the website https://github.com/rfordatascience/tidytuesday. This data is of UFO sightings in around the world and is sorted by the date and time of the sightings. The variables are date_time which is when the sighting happened, city_area, state, country which is where the sighting happened, ufo_shape which is the shape of the thing the person saw, encounter_length which is the duration of the sighting in seconds, described_encounter_length which is the duration of the sighting as described, description which is a discription of the sighting, date_documented which is the day the sighting was documented instead of the actual date it happened, and then latitude and longitude which give the exact location of the sighting.

Visualize data

Hint: Create at least two plots.

ggplot(ufo_sightings, aes(x = state, fill= state)) + 
  geom_bar()

library(dplyr)
mean_time <- ufo_sightings %>%
  group_by(ufo_shape) %>%
  summarize(mean_time = mean(encounter_length))

ggplot(mean_time, 
       aes(x = ufo_shape, 
           y = mean_time, fill= ufo_shape)) +
  geom_bar(stat = "identity")


labs(title = "Mean Time by UFO Shape", 
       x = "",
       y = "")
## $x
## [1] ""
## 
## $y
## [1] ""
## 
## $title
## [1] "Mean Time by UFO Shape"
## 
## attr(,"class")
## [1] "labels"

Correlation and regression analysis

data(ufo_sightings)

ufonumeric <- dplyr::select_if(ufo_sightings, is.numeric)

r <- cor(ufonumeric, use="complete.obs")
round(r,2)
##                  encounter_length latitude longitude
## encounter_length                1      0.0       0.0
## latitude                        0      1.0      -0.1
## longitude                       0     -0.1       1.0
library(ggplot2)
library(ggcorrplot)

# visualize the correlations
ggcorrplot(r, 
           hc.order = TRUE,
           type = "lower",
           lab = TRUE)