library(tidyverse)
library(scales)
options(scipen=999)
library(tidyverse)
library(lubridate)
ufo_sightings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-25/ufo_sightings.csv") %>%
# Extract year from date_time, and calculate decade from year
mutate(year = year(parse_date_time(date_time, 'mdy_hm')),
decade = as.factor(year %/% 10 * 10))
It would be interesting to see why there was a huge hike in UFO sightings since late 1990’s. Any news articles?
The CIA claims that UFO sightings could be a psycological factor of people who go outside. Many cultural aspects like tv shows and movies could also be infulencing the population into think what they are seeing is aliens or UFOs when it could just be a plane or shooting star. TV shows like the X-files or the Twilight Zone can make people think different things then what is really happenening. With an increase in cultural factors and social media in the later decades this could lead to an increase in potential sightings.
ufo_sightings %>%
count(decade, sort = TRUE) %>%
ggplot(aes(decade, n)) +
geom_col() +
scale_y_continuous(labels = scales::comma) +
labs(title = "UFO Sightings by Decade",
y = "number of sightings",
x = NULL)
The number of sightings is higher in populous states (e.g., California, Texas and New York). What do they mean? Are these states more popular among aliens?
An increased amount of sightings could be happening in these states because of the larger population. The more people who see it could lead to them posting it on social media right away making people go outside and see the same sighting at the same time. It could also be since there are more people in these states there is a larger chance of a UFO being seen. If a UFO appears over a place with a low population denisty there could be a chance that it doesn’t even get noticed.
# the code from the textbook: 6.2.2 Data by US state
library(choroplethr)
library(openintro) # for abbr2state()
data(continental_us_states)
# Create region and value columns to pass to state_choropleth() to create map
ufo_map <-
ufo_sightings %>%
count(state, sort = TRUE) %>%
mutate(region = tolower(abbr2state(state))) %>%
rename(value = n) %>%
filter(!is.na(region))
# Pass to state_choropleth() to create map
state_choropleth(ufo_map,
num_colors=9,
zoom = continental_us_states) +
scale_fill_brewer(palette="YlOrBr") +
labs(title = "UFO Sightings in the United States",
caption = "source: THE NATIONAL UFO REPORTING CENTER",
fill = "Nunmber of Sightings")
The states that are darker contain more sightings than the lighter states, this could be because of the larger amount of population density in these states. The increased light pollution in these states could make it easier to see planes and other things in the sky leading to more sightings.
Hint: You can choose any data you like but can’t take one that is already taken by other groups.
library(readr)
ufo_sightings <- read.csv("ufo_sightings.csv") %>%
mutate(state = fct_lump(state, 10)) %>%
filter(!is.na(state)) %>%
mutate(ufo_shape = fct_lump(ufo_shape, 10)) %>%
filter(!is.na(ufo_shape))
summary(ufo_sightings)
## date_time city_area state country
## 7/4/2010 22:00 : 36 seattle : 473 Other :36319 au : 10
## 7/4/2012 22:00 : 31 phoenix : 438 ca : 9405 ca : 2942
## 11/16/1999 19:00: 26 las vegas : 357 fl : 4109 de : 0
## 9/19/2009 20:00 : 26 portland : 355 wa : 3989 gb : 11
## 7/4/2011 22:00 : 24 los angeles: 348 tx : 3623 us :63561
## 10/31/2004 20:00: 23 san diego : 328 ny : 3150 NA's: 6218
## (Other) :72576 (Other) :70443 (Other):12147
## ufo_shape encounter_length described_encounter_length
## light :15438 Min. : 0 5 minutes : 4426
## Other :11211 1st Qu.: 30 2 minutes : 3262
## triangle: 7395 Median : 180 10 minutes: 3120
## circle : 6946 Mean : 6404 1 minute : 2764
## fireball: 5825 3rd Qu.: 600 3 minutes : 2358
## unknown : 5239 Max. :82800000 30 seconds: 2122
## (Other) :20688 (Other) :54690
## description
## Fireball : 10
## ((NUFORC Note: No information provided by witness. PD)): 8
## UFO : 7
## ((NUFORC Note: Witness provides no information. PD)) : 6
## Bright Light : 6
## (Other) :72697
## NA's : 8
## date_documented latitude longitude
## 12/12/2009: 1374 Min. :-46.16 Min. :-176.66
## 10/30/2006: 1243 1st Qu.: 34.21 1st Qu.:-114.08
## 11/21/2010: 1167 Median : 39.28 Median : -89.40
## 10/31/2008: 1078 Mean : 38.71 Mean : -95.13
## 8/30/2013 : 1015 3rd Qu.: 42.37 3rd Qu.: -80.25
## 3/19/2009 : 973 Max. : 72.70 Max. : 169.88
## (Other) :65892 NA's :1
Hint: Source and description of data, and definition of variables.
The source of this data is from the TidyTuesday data section on the website https://github.com/rfordatascience/tidytuesday. This data is of UFO sightings in around the world and is sorted by the date and time of the sightings. The variables are date_time which is when the sighting happened, city_area, state, country which is where the sighting happened, ufo_shape which is the shape of the thing the person saw, encounter_length which is the duration of the sighting in seconds, described_encounter_length which is the duration of the sighting as described, description which is a discription of the sighting, date_documented which is the day the sighting was documented instead of the actual date it happened, and then latitude and longitude which give the exact location of the sighting.
Hint: Create at least two plots.
ggplot(ufo_sightings, aes(x = state, fill= state)) +
geom_bar()
library(dplyr)
mean_time <- ufo_sightings %>%
group_by(ufo_shape) %>%
summarize(mean_time = mean(encounter_length))
ggplot(mean_time,
aes(x = ufo_shape,
y = mean_time, fill= ufo_shape)) +
geom_bar(stat = "identity")
labs(title = "Mean Time by UFO Shape",
x = "",
y = "")
## $x
## [1] ""
##
## $y
## [1] ""
##
## $title
## [1] "Mean Time by UFO Shape"
##
## attr(,"class")
## [1] "labels"
data(ufo_sightings)
ufonumeric <- dplyr::select_if(ufo_sightings, is.numeric)
r <- cor(ufonumeric, use="complete.obs")
round(r,2)
## encounter_length latitude longitude
## encounter_length 1 0.0 0.0
## latitude 0 1.0 -0.1
## longitude 0 -0.1 1.0
library(ggplot2)
library(ggcorrplot)
# visualize the correlations
ggcorrplot(r,
hc.order = TRUE,
type = "lower",
lab = TRUE)
data(ufo_sightings)
UFO_lm <- lm(encounter_length ~ state + country,
data = ufo_sightings)
summary(UFO_lm)
##
## Call:
## lm(formula = encounter_length ~ state + country, data = ufo_sightings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29175 -5491 -4901 -2563 82770825
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -254.6 161919.4 -0.002 0.999
## stateca -3679.1 11855.4 -0.310 0.756
## statefl 8348.2 13417.1 0.622 0.534
## stateil -5670.5 14736.1 -0.385 0.700
## statemi -5478.5 16032.4 -0.342 0.733
## stateny -3490.0 14143.2 -0.247 0.805
## stateoh -5081.4 15044.7 -0.338 0.736
## statepa -2614.0 14934.6 -0.175 0.861
## statetx -4058.0 13685.1 -0.297 0.767
## statewa 8722.4 13448.1 0.649 0.517
## stateOther -1041.8 10918.5 -0.095 0.924
## countryca 30471.7 161841.4 0.188 0.851
## countrygb 9640.0 223227.3 0.043 0.966
## countryus 6797.1 161577.9 0.042 0.966
##
## Residual standard error: 510900 on 66510 degrees of freedom
## (6218 observations deleted due to missingness)
## Multiple R-squared: 0.0001439, Adjusted R-squared: -5.149e-05
## F-statistic: 0.7365 on 13 and 66510 DF, p-value: 0.7283