Context

This dataset contains over 80,000 reports of UFO sightings over the last century.

Content

There are two versions of this dataset: scrubbed and complete. The complete data includes entries where the location of the sighting was not found or blank (0.8146%) or have an erroneous or blank time (8.0237%). Since the reports date back to the 20th century, some older data might be obscured. Data contains city, state, time, description, and duration of each sighting. We have chosen to work with the scrubbed dataset.

When exploring the data, we noticed most of the reports came from English-speaking countries, and that large chunks of the world were missing. As a result, we looked up the National UFO Reporting Center’s (NUFORC’s) website and noticed that it was only in English. This could explain the geographical concentration of the data. Further information on NUFORC and up-to-date datasets are available here: http://www.nuforc.org/.

Questions

The data compilers suggested the following questions: What areas of the country are most likely to have UFO sightings? Are there any trends in UFO sightings over time? Do they tend to be clustered or seasonal? Do clusters of UFO sightings correlate with landmarks, such as airports or government research centers? What are the most common UFO descriptions?

We decided to look at the data and formulate our own questions, many of which coincide with what the compilers laid out. I will be exploring two questions: 1) are sightings clustered in time; and 2) are there more sightings in certain countries and in areas of those countries.

Acknowledgement

This dataset was scraped, geolocated, and time standardized from NUFORC data by Sigmond Axel https://github.com/planetsig/ufo-reports. We accessed it from kaggle https://www.kaggle.com/NUFORC/ufo-sightings?select=scrubbed.csv.

# load tidyverse to read in data

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
ufo_sightings <- read_csv("scrubbed.csv")
## Parsed with column specification:
## cols(
##   datetime = col_character(),
##   city = col_character(),
##   state = col_character(),
##   country = col_character(),
##   shape = col_character(),
##   `duration (seconds)` = col_double(),
##   `duration (hours/min)` = col_character(),
##   comments = col_character(),
##   `date posted` = col_character(),
##   latitude = col_double(),
##   longitude = col_double()
## )
## Warning: 4 parsing failures.
##   row                col               expected   actual           file
## 27823 duration (seconds) no trailing characters `        'scrubbed.csv'
## 35693 duration (seconds) no trailing characters `        'scrubbed.csv'
## 43783 latitude           no trailing characters q.200088 'scrubbed.csv'
## 58592 duration (seconds) no trailing characters `        'scrubbed.csv'

Examine the parsing errors

errors <- ufo_sightings[c(27823, 35693, 43783, 58592), ]
errors
## # A tibble: 4 x 11
##   datetime city  state country shape `duration (seco… `duration (hour… comments
##   <chr>    <chr> <chr> <chr>   <chr>            <dbl> <chr>            <chr>   
## 1 2/2/200… bouse az    us      <NA>                NA each a few seco… Driving…
## 2 4/10/20… sant… ca    us      <NA>                NA eight seconds    2 red l…
## 3 5/22/19… mesc… nm    <NA>    rect…              180 two hours        Huge re…
## 4 7/21/20… ibag… <NA>  <NA>    circ…               NA 1/2 segundo      Viajaba…
## # … with 3 more variables: `date posted` <chr>, latitude <dbl>, longitude <dbl>

These have errors because in three cases description of the duration in seconds is missing, and, in one case, because the latitude is missing.

We should delete the one missing latitude before mapping the data points; if we need to work with the duration, we should decide whether to estimate a value for “a few seconds” or if we should just delete that entry.

We can repair the seconds in two of the other three because it is written in the duration (hours/min) column.

ufo_sightings[35693, 6] <- 8
ufo_sightings[58592, 6] <- .5

ufo_sightings[c(35693, 58592), ]
## # A tibble: 2 x 11
##   datetime city  state country shape `duration (seco… `duration (hour… comments
##   <chr>    <chr> <chr> <chr>   <chr>            <dbl> <chr>            <chr>   
## 1 4/10/20… sant… ca    us      <NA>               8   eight seconds    2 red l…
## 2 7/21/20… ibag… <NA>  <NA>    circ…              0.5 1/2 segundo      Viajaba…
## # … with 3 more variables: `date posted` <chr>, latitude <dbl>, longitude <dbl>

Structure of the data

str(ufo_sightings)
## tibble [80,332 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ datetime            : chr [1:80332] "10/10/1949 20:30" "10/10/1949 21:00" "10/10/1955 17:00" "10/10/1956 21:00" ...
##  $ city                : chr [1:80332] "san marcos" "lackland afb" "chester (uk/england)" "edna" ...
##  $ state               : chr [1:80332] "tx" "tx" NA "tx" ...
##  $ country             : chr [1:80332] "us" NA "gb" "us" ...
##  $ shape               : chr [1:80332] "cylinder" "light" "circle" "circle" ...
##  $ duration (seconds)  : num [1:80332] 2700 7200 20 20 900 300 180 1200 180 120 ...
##  $ duration (hours/min): chr [1:80332] "45 minutes" "1-2 hrs" "20 seconds" "1/2 hour" ...
##  $ comments            : chr [1:80332] "This event took place in early fall around 1949-50. It occurred after a Boy Scout meeting in the Baptist Church"| __truncated__ "1949 Lackland AFB&#44 TX.  Lights racing across the sky &amp; making 90 degree turns on a dime." "Green/Orange circular disc over Chester&#44 England" "My older brother and twin sister were leaving the only Edna theater at about 9 PM&#44...we had our bikes and I "| __truncated__ ...
##  $ date posted         : chr [1:80332] "4/27/2004" "12/16/2005" "1/21/2008" "1/17/2004" ...
##  $ latitude            : num [1:80332] 29.9 29.4 53.2 29 21.4 ...
##  $ longitude           : num [1:80332] -97.94 -98.58 -2.92 -96.65 -157.8 ...
##  - attr(*, "problems")= tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:4] 27823 35693 43783 58592
##   ..$ col     : chr [1:4] "duration (seconds)" "duration (seconds)" "latitude" "duration (seconds)"
##   ..$ expected: chr [1:4] "no trailing characters" "no trailing characters" "no trailing characters" "no trailing characters"
##   ..$ actual  : chr [1:4] "`" "`" "q.200088" "`"
##   ..$ file    : chr [1:4] "'scrubbed.csv'" "'scrubbed.csv'" "'scrubbed.csv'" "'scrubbed.csv'"
##  - attr(*, "spec")=
##   .. cols(
##   ..   datetime = col_character(),
##   ..   city = col_character(),
##   ..   state = col_character(),
##   ..   country = col_character(),
##   ..   shape = col_character(),
##   ..   `duration (seconds)` = col_double(),
##   ..   `duration (hours/min)` = col_character(),
##   ..   comments = col_character(),
##   ..   `date posted` = col_character(),
##   ..   latitude = col_double(),
##   ..   longitude = col_double()
##   .. )
head(ufo_sightings)
## # A tibble: 6 x 11
##   datetime city  state country shape `duration (seco… `duration (hour… comments
##   <chr>    <chr> <chr> <chr>   <chr>            <dbl> <chr>            <chr>   
## 1 10/10/1… san … tx    us      cyli…             2700 45 minutes       This ev…
## 2 10/10/1… lack… tx    <NA>    light             7200 1-2 hrs          1949 La…
## 3 10/10/1… ches… <NA>  gb      circ…               20 20 seconds       Green/O…
## 4 10/10/1… edna  tx    us      circ…               20 1/2 hour         My olde…
## 5 10/10/1… kane… hi    us      light              900 15 minutes       AS a Ma…
## 6 10/10/1… bris… tn    us      sphe…              300 5 minutes        My fath…
## # … with 3 more variables: `date posted` <chr>, latitude <dbl>, longitude <dbl>
tail(ufo_sightings)
## # A tibble: 6 x 11
##   datetime city  state country shape `duration (seco… `duration (hour… comments
##   <chr>    <chr> <chr> <chr>   <chr>            <dbl> <chr>            <chr>   
## 1 9/9/201… wood… ga    us      sphe…               20 20 seconds       Driving…
## 2 9/9/201… nash… tn    us      light              600 10 minutes       Round f…
## 3 9/9/201… boise id    us      circ…             1200 20 minutes       Boise&#…
## 4 9/9/201… napa  ca    us      other             1200 hour             Napa UF…
## 5 9/9/201… vien… va    us      circ…                5 5 seconds        Saw a f…
## 6 9/9/201… edmo… ok    us      cigar             1020 17 minutes       2 witne…
## # … with 3 more variables: `date posted` <chr>, latitude <dbl>, longitude <dbl>

Description

This dataset includes 80,332 rows and 11 columns. The columns represent the date/time of the sighting, the location (city, state and country) of the sighting, the shape of the object, the duration (in hours/minutes/seconds) of the sighting, comments describing the sighting, the date posted, and the latitude and longitude of the sighting.

Because the data includes columns on datetime, city, state, and country, we should be able to organize it and determine if there are any patterns around location and timing.

I will want to pull out years and times of year (months) in order to see if there are any patterns. In order to more easily do this, I will convert the datetime column, which is in character format now, into date/time format using lubridate’s mdy_hm() function.

Code to change datetime format

To more easily work with the dates and times in our data, we will convert the datetime column from “character” format to “datetime” format using the lubridate package.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
# convert the "datetime" column into date format from character format
newufo <- mutate(ufo_sightings, datetime = mdy_hm(ufo_sightings$datetime))

To see patterns by date and by time, we create separate columns for years, months, hours and minutes (of sighting).

Code to create new “year” and “month” columns

newufo <- mutate(newufo, year = year(datetime))
newufo <- mutate(newufo, month = month(datetime, label = TRUE))
newufo <- mutate(newufo, hour = hour(datetime))
newufo <- mutate(newufo, minute = minute(datetime))

The dataset already includes columns for city, state, and country, so there is no need to change the table to reflect those values.

Time Patterns

The quickest way to see if there is a pattern to the years when sightings took place is to draw a histogram.

Drawing a histogram of sightings per year

ggplot(data = newufo, mapping = aes(x = year)) + geom_bar(fill = "blue")

Here we can see that reports of ufo sightings increase as we move closer in time to the present.

We can verify this by counting the number of occurrences per year:

Code counting sightings per year

newufo %>% count(year)
## # A tibble: 87 x 2
##     year     n
##    <dbl> <int>
##  1  1906     1
##  2  1910     2
##  3  1916     1
##  4  1920     1
##  5  1925     1
##  6  1929     1
##  7  1930     1
##  8  1931     2
##  9  1933     1
## 10  1934     1
## # … with 77 more rows

To see if there is a pattern to the time of year sightings are reported, we can draw a histogram of months.

Drawing a histogram of sightings per month

ggplot(data = newufo, mapping = aes(x = month)) + geom_bar(fill = "blue")

Sightings seem to occur more often in the summer months, and decline during winter and spring.

We can also verify this by counting the number of occurrences per month:

Code to count sightings (n) per month

newufo %>% count(month)
## # A tibble: 12 x 2
##    month     n
##    <ord> <int>
##  1 Jan    5689
##  2 Feb    4667
##  3 Mar    5449
##  4 Apr    5527
##  5 May    5292
##  6 Jun    8130
##  7 Jul    9542
##  8 Aug    8638
##  9 Sep    7588
## 10 Oct    7406
## 11 Nov    6740
## 12 Dec    5664

Now we will get the mean and median values for how many sightings are reported per month, and then filter out the months which are above the mean and above the median.

Code for mean, median and for selecting those months greater than the mean and median

permonth <- count(newufo, month)
avgsighthings <- mean(permonth$n)
mediansightings <- median(permonth$n)
highmonths <- filter(permonth, permonth$n > avgsighthings)
highmonthsbymedian <- filter(permonth, permonth$n > mediansightings)
highmonths
## # A tibble: 6 x 2
##   month     n
##   <ord> <int>
## 1 Jun    8130
## 2 Jul    9542
## 3 Aug    8638
## 4 Sep    7588
## 5 Oct    7406
## 6 Nov    6740
highmonthsbymedian
## # A tibble: 6 x 2
##   month     n
##   <ord> <int>
## 1 Jun    8130
## 2 Jul    9542
## 3 Aug    8638
## 4 Sep    7588
## 5 Oct    7406
## 6 Nov    6740

The second half of the year features more sightings per month than the first half.

Geographical Patterns

Some country names are not available or listed. A quick check of not-available (NA) values, shows that 9670 entries lack a country identifier. (That leaves 70662 entries that do have a country.) With a fairly large number of NA’s, it is worth looking at what those are. By filtering them out and scanning the head and tail of the entries, it is clear that many entries that lack a country do have a state listed. Some of those “states” are Canadian provinces, but the majority are US states simply lacking a country identifier.

Analyzing the NA entries

missingcountry <- newufo %>% filter( is.na(country))
head(missingcountry)
## # A tibble: 6 x 15
##   datetime            city  state country shape `duration (seco…
##   <dttm>              <chr> <chr> <chr>   <chr>            <dbl>
## 1 1949-10-10 21:00:00 lack… tx    <NA>    light             7200
## 2 1973-10-10 23:00:00 berm… <NA>  <NA>    light               20
## 3 1979-10-10 22:00:00 sadd… ab    <NA>    tria…              270
## 4 1982-10-10 07:00:00 gisb… <NA>  <NA>    disk               120
## 5 1986-10-10 20:00:00 holm… ny    <NA>    chev…              180
## 6 1989-10-10 21:00:00 kran… ky    <NA>    tria…              180
## # … with 9 more variables: `duration (hours/min)` <chr>, comments <chr>, `date
## #   posted` <chr>, latitude <dbl>, longitude <dbl>, year <dbl>, month <ord>,
## #   hour <int>, minute <int>
str(missingcountry)
## tibble [9,670 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ datetime            : POSIXct[1:9670], format: "1949-10-10 21:00:00" "1973-10-10 23:00:00" ...
##  $ city                : chr [1:9670] "lackland afb" "bermuda nas" "saddle lake (canada)" "gisborne (new zealand)" ...
##  $ state               : chr [1:9670] "tx" NA "ab" NA ...
##  $ country             : chr [1:9670] NA NA NA NA ...
##  $ shape               : chr [1:9670] "light" "light" "triangle" "disk" ...
##  $ duration (seconds)  : num [1:9670] 7200 20 270 120 180 180 1200 3600 300 60 ...
##  $ duration (hours/min): chr [1:9670] "1-2 hrs" "20 sec." "4.5 or more min." "2min" ...
##  $ comments            : chr [1:9670] "1949 Lackland AFB&#44 TX.  Lights racing across the sky &amp; making 90 degree turns on a dime." "saw fast moving blip on the radar scope thin went outside and saw it again." "Lights far above&#44  that glance; then flee from the celestrialhavens&#44 only to appear again." "gisborne nz 1982 wainui beach to sponge bay" ...
##  $ date posted         : chr [1:9670] "12/16/2005" "1/11/2002" "1/19/2005" "1/11/2002" ...
##  $ latitude            : num [1:9670] 29.4 32.4 54 -38.7 41.5 ...
##  $ longitude           : num [1:9670] -98.6 -64.7 -111.7 178 -73.6 ...
##  $ year                : num [1:9670] 1949 1973 1979 1982 1986 ...
##  $ month               : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 10 10 10 10 10 10 10 10 10 10 ...
##  $ hour                : int [1:9670] 21 23 22 7 20 21 3 15 20 20 ...
##  $ minute              : int [1:9670] 0 0 0 0 0 0 0 0 0 30 ...
##  - attr(*, "problems")= tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:4] 27823 35693 43783 58592
##   ..$ col     : chr [1:4] "duration (seconds)" "duration (seconds)" "latitude" "duration (seconds)"
##   ..$ expected: chr [1:4] "no trailing characters" "no trailing characters" "no trailing characters" "no trailing characters"
##   ..$ actual  : chr [1:4] "`" "`" "q.200088" "`"
##   ..$ file    : chr [1:4] "'scrubbed.csv'" "'scrubbed.csv'" "'scrubbed.csv'" "'scrubbed.csv'"
##  - attr(*, "spec")=
##   .. cols(
##   ..   datetime = col_character(),
##   ..   city = col_character(),
##   ..   state = col_character(),
##   ..   country = col_character(),
##   ..   shape = col_character(),
##   ..   `duration (seconds)` = col_double(),
##   ..   `duration (hours/min)` = col_character(),
##   ..   comments = col_character(),
##   ..   `date posted` = col_character(),
##   ..   latitude = col_double(),
##   ..   longitude = col_double()
##   .. )
missingcountry_notUS <- missingcountry %>% filter(is.na(state))
head(missingcountry_notUS)
## # A tibble: 6 x 15
##   datetime            city  state country shape `duration (seco…
##   <dttm>              <chr> <chr> <chr>   <chr>            <dbl>
## 1 1973-10-10 23:00:00 berm… <NA>  <NA>    light               20
## 2 1982-10-10 07:00:00 gisb… <NA>  <NA>    disk               120
## 3 1993-10-10 03:00:00 zlat… <NA>  <NA>    sphe…             1200
## 4 1996-10-10 20:00:00 lake… <NA>  <NA>    light              300
## 5 2003-10-10 23:00:00 bick… <NA>  <NA>    unkn…             2700
## 6 2004-10-10 15:20:00 keda… <NA>  <NA>    oval               240
## # … with 9 more variables: `duration (hours/min)` <chr>, comments <chr>, `date
## #   posted` <chr>, latitude <dbl>, longitude <dbl>, year <dbl>, month <ord>,
## #   hour <int>, minute <int>
str(missingcountry_notUS)
## tibble [3,256 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ datetime            : POSIXct[1:3256], format: "1973-10-10 23:00:00" "1982-10-10 07:00:00" ...
##  $ city                : chr [1:3256] "bermuda nas" "gisborne (new zealand)" "zlatoust (russia)" "lake macquarie (nsw&#44 australia)" ...
##  $ state               : chr [1:3256] NA NA NA NA ...
##  $ country             : chr [1:3256] NA NA NA NA ...
##  $ shape               : chr [1:3256] "light" "disk" "sphere" "light" ...
##  $ duration (seconds)  : num [1:3256] 20 120 1200 300 2700 240 600 300 1200 3600 ...
##  $ duration (hours/min): chr [1:3256] "20 sec." "2min" "20 minutes" "5 min" ...
##  $ comments            : chr [1:3256] "saw fast moving blip on the radar scope thin went outside and saw it again." "gisborne nz 1982 wainui beach to sponge bay" "I woke up at night and looked out the window near my bed. There was a huge sphere of shining light in front of "| __truncated__ "RED LIGHT WITH OTHER RED FLASHING LIGHT&#44 ONE OBJECT" ...
##  $ date posted         : chr [1:3256] "1/11/2002" "1/11/2002" "12/14/2004" "5/24/1999" ...
##  $ latitude            : num [1:3256] 32.4 -38.7 55.2 -33.1 53.1 ...
##  $ longitude           : num [1:3256] -64.68 178.02 59.65 151.59 -2.74 ...
##  $ year                : num [1:3256] 1973 1982 1993 1996 2003 ...
##  $ month               : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 10 10 10 10 10 10 10 10 10 10 ...
##  $ hour                : int [1:3256] 23 7 3 20 23 15 23 22 10 19 ...
##  $ minute              : int [1:3256] 0 0 0 0 0 20 20 0 0 28 ...
##  - attr(*, "problems")= tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:4] 27823 35693 43783 58592
##   ..$ col     : chr [1:4] "duration (seconds)" "duration (seconds)" "latitude" "duration (seconds)"
##   ..$ expected: chr [1:4] "no trailing characters" "no trailing characters" "no trailing characters" "no trailing characters"
##   ..$ actual  : chr [1:4] "`" "`" "q.200088" "`"
##   ..$ file    : chr [1:4] "'scrubbed.csv'" "'scrubbed.csv'" "'scrubbed.csv'" "'scrubbed.csv'"
##  - attr(*, "spec")=
##   .. cols(
##   ..   datetime = col_character(),
##   ..   city = col_character(),
##   ..   state = col_character(),
##   ..   country = col_character(),
##   ..   shape = col_character(),
##   ..   `duration (seconds)` = col_double(),
##   ..   `duration (hours/min)` = col_character(),
##   ..   comments = col_character(),
##   ..   `date posted` = col_character(),
##   ..   latitude = col_double(),
##   ..   longitude = col_double()
##   .. )

Plotting the data missing both a country and state identifier on a map using latitude and longitude shows where most of these missing data are actually from.

Map of Sightings with Neither State nor Country Identified

It is obvious from this map that we have now located most of our missing non-English-speaking countries. In addition, it is apparent that much of the data missing both a state and a country is still data from the United States.

Plotting these world-wide points over time reveals a very similar pattern to all the other data. Reports increase dramatically in the 2000s.

ggplot(missingcountry_notUS, mapping = aes(x = year)) + geom_bar(fill = "blue")

Analyzing the data only from sightings with a valid country field, there are five countries that have reported UFO sightings. The US is by far the most common source of reports.

Code for counting number of sightings (n) per country

countrysightings <- newufo %>% filter(!is.na(country))
countrysightings %>% count(country)
## # A tibble: 5 x 2
##   country     n
##   <chr>   <int>
## 1 au        538
## 2 ca       3000
## 3 de        105
## 4 gb       1905
## 5 us      65114
ggplot(data = countrysightings, mapping = aes(x = country)) + geom_bar(mapping = aes(fill = country))

ggplot(data = countrysightings, mapping = aes(x = year, y = country)) + geom_point(mapping = aes(color = month))

Why Germany?

Interestingly, all but 105 of the sightings with identified countries were reported in English-speaking countries. It is probably worth exploring the reports from Germany to see if there is an explanation for why it is the only non-English speaking country in this data.

Code pulling out the Germany sightings from the data

germanysightings <- countrysightings %>% filter(country == "de")
head(germanysightings)
## # A tibble: 6 x 15
##   datetime            city  state country shape `duration (seco…
##   <dttm>              <chr> <chr> <chr>   <chr>            <dbl>
## 1 2006-10-13 00:02:00 berl… <NA>  de      fire…              120
## 2 2012-10-20 18:00:00 berl… <NA>  de      unkn…             1500
## 3 2012-10-08 17:10:00 ober… <NA>  de      tria…                2
## 4 2011-01-10 18:38:00 otte… <NA>  de      tria…              240
## 5 1990-11-15 22:30:00 brem… <NA>  de      unkn…               30
## 6 2005-11-15 15:00:00 semb… <NA>  de      egg                120
## # … with 9 more variables: `duration (hours/min)` <chr>, comments <chr>, `date
## #   posted` <chr>, latitude <dbl>, longitude <dbl>, year <dbl>, month <ord>,
## #   hour <int>, minute <int>
tail(germanysightings)
## # A tibble: 6 x 15
##   datetime            city  state country shape `duration (seco…
##   <dttm>              <chr> <chr> <chr>   <chr>            <dbl>
## 1 2009-09-12 19:00:00 graf… <NA>  de      diam…               60
## 2 2011-09-13 12:00:00 heil… <NA>  de      sphe…                5
## 3 2007-09-16 08:15:00 gels… <NA>  de      light               20
## 4 2007-09-16 18:15:00 neck… <NA>  de      other               30
## 5 2011-09-04 05:00:00 mann… <NA>  de      light             1800
## 6 2009-09-09 21:38:00 kais… <NA>  de      light               40
## # … with 9 more variables: `duration (hours/min)` <chr>, comments <chr>, `date
## #   posted` <chr>, latitude <dbl>, longitude <dbl>, year <dbl>, month <ord>,
## #   hour <int>, minute <int>

There is no immediate pattern apparent in the Germany sightings. Some of the sightings are at locations of US military bases, so that may explain some of the reports. At least one of the comments, however, is written in French, suggesting it was not written by an American servicemember.

Map of Sightings in Germany

By examining the year column, we can see if there is a pattern to the time period these were reported.

Code counting the sightings (n) per year in Germany

germanysightings %>% count(year)
## # A tibble: 35 x 2
##     year     n
##    <dbl> <int>
##  1  1962     1
##  2  1968     2
##  3  1969     2
##  4  1970     1
##  5  1971     1
##  6  1973     1
##  7  1974     1
##  8  1975     1
##  9  1979     1
## 10  1981     1
## # … with 25 more rows
ggplot(data = germanysightings, mapping = aes(x = year)) + geom_bar(fill = "blue")

This histogram and table show the sightings were fairly evenly spread out (with 1 or 2 every couple years) until the 2000s when the numbers increased, reaching an anomalous high of 15 in 2008. The increase into the 2000s tracks with what we saw with the overall increase in reports of UFO sightings worldwide in the late 1990s and 2000s.

Difference between Northern and Southern Hemispheres?

The overall data showed the majority of sightings took place between June and November, summer and autumn in the northern hemisphere. Does this hold true for the sightings reported from Australia in the southern hemisphere:

Code pulling out sightings per month in Australia

australiasightings <- countrysightings %>% filter(country == "au")
australiasightings %>% count(month)
## # A tibble: 12 x 2
##    month     n
##    <ord> <int>
##  1 Jan      60
##  2 Feb      32
##  3 Mar      50
##  4 Apr      53
##  5 May      47
##  6 Jun      66
##  7 Jul      49
##  8 Aug      37
##  9 Sep      29
## 10 Oct      28
## 11 Nov      43
## 12 Dec      44
ggplot(data = australiasightings, mapping = aes(x = month)) + geom_bar(fill = "blue")

Calculating the mean and the median for number of sightings per month, we can see which months are higher than average and median.

Code calculating the mean, median and months above the mean and median in Australia.

permonthAU <- count(australiasightings, month)
avgsighthingsAU <- mean(permonthAU$n)
mediansightingsAU <- median(permonthAU$n)
highmonthsAU <- filter(permonthAU, permonthAU$n > avgsighthingsAU)
highmonthsbymedianAU <- filter(permonthAU, permonthAU$n > mediansightingsAU)
highmonthsAU
## # A tibble: 6 x 2
##   month     n
##   <ord> <int>
## 1 Jan      60
## 2 Mar      50
## 3 Apr      53
## 4 May      47
## 5 Jun      66
## 6 Jul      49
highmonthsbymedianAU
## # A tibble: 6 x 2
##   month     n
##   <ord> <int>
## 1 Jan      60
## 2 Mar      50
## 3 Apr      53
## 4 May      47
## 5 Jun      66
## 6 Jul      49

The monthly pattern in the southern hemisphere appears to be opposite that in the northern hemisphere. The months with the most sightings are January through June – southern hemisphere summer and fall. Thus, it seems that UFO sightings are reported most often in the summer and fall in both hemispheres.

Possible explanations for this are that there are more people outside in the summer to see and report UFOs or, of course, that extra-terrestrial beings prefer to visit Earth (or at least the English-speaking parts of Earth and Germany) during the summer season.