library(tidyverse)
library(lubridate)
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv") %>%
# Extract year from date_time, and calculate decade from year
mutate(year = year(parse_date_time(date_time, 'mdy_hm')),
decade = as.factor(year %/% 10 * 10))
It would be interesting to see why there was a huge hike in UFO sightings since late 1990’s. Any news articles?
ufo_sightings %>%
count(decade, sort = TRUE) %>%
ggplot(aes(decade, n)) +
geom_col() +
scale_y_continuous(labels = scales::comma) +
labs(title = "UFO Sightings by Decade",
y = "number of sightings",
x = NULL)
The number of sightings is higher in populous states (e.g., California, Texas and New York). What do they mean? Are these states more popular among aliens?
# the code from the textbook: 6.2.2 Data by US state
library(choroplethr)
library(openintro) # for abbr2state()
data(continental_us_states)
# Create region and value columns to pass to state_choropleth() to create map
ufo_map <-
ufo_sightings %>%
count(state, sort = TRUE) %>%
mutate(region = tolower(abbr2state(state))) %>%
rename(value = n) %>%
filter(!is.na(region))
# Pass to state_choropleth() to create map
state_choropleth(ufo_map,
num_colors=9,
zoom = continental_us_states) +
scale_fill_brewer(palette="YlOrBr") +
labs(title = "UFO Sightings in the United States",
caption = "source: THE NATIONAL UFO REPORTING CENTER",
fill = "Nunmber of Sightings")
Hint: You can choose any data you like but can’t take one that is already taken by other groups.
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv")
The data collected is roughly 80,000 observations of UFO sightings around the world. The variables of the data are date sighting occurred, city or area of sighting, state/region of sighting, country of sighting, UFO shape, length of encounter in seconds, described length of encounter, description of encounter, date documented, latitude, and longitude.
Hint: Create at least two plots.
library(ggplot2)
data(ufo_sightings)
ggplot(ufo_sightings,
aes(x = country,
y = encounter_length)) +
geom_point()
library(dplyr)
plotdata <- filter(ufo_sightings)
ggplot(plotdata,
aes(x = encounter_length,
y = latitude)) +
geom_line()
library(dplyr)
plotdata <- filter(ufo_sightings)
ggplot(plotdata,
aes(x = encounter_length,
y = longitude)) +
geom_line()
data(ufo_sightings)
df <- dplyr :: select_if(ufo_sightings, is.numeric)
r <- cor(df, use = "complete.obs")
round(r,2)
## encounter_length latitude longitude
## encounter_length 1.00 0.00 0.01
## latitude 0.00 1.00 -0.39
## longitude 0.01 -0.39 1.00
houses_lm <- lm(encounter_length ~ country,
data = ufo_sightings)
summary(houses_lm)
##
## Call:
## lm(formula = encounter_length ~ country, data = ufo_sightings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -66061 -5785 -5620 -5200 97769939
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3806 27002 0.141 0.8879
## countryca 25053 29323 0.854 0.3929
## countryde 20450 66820 0.306 0.7596
## countrygb 62255 30578 2.036 0.0418 *
## countryus 1994 27113 0.074 0.9414
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 626300 on 70655 degrees of freedom
## (9672 observations deleted due to missingness)
## Multiple R-squared: 0.000291, Adjusted R-squared: 0.0002344
## F-statistic: 5.142 on 4 and 70655 DF, p-value: 0.0003864