UFO sightings have been a part of American pop culture for decades however a meaningful look into the spatial distribution of UFO sighting reports has been often overlooked. This project investigates whether there is a meaningful relationship between population density and UFO sightings and frequency of reports at the county level across the United States. Using UFO sighting data from the National UFO Reporting Center (NUFORC) and population data from the American Community Survey, this project used various spatial analysis methods and tools to investigate this relationship. These spatial analysis methods include Global Moran’s I and Local Indicators of Spatial Association to investigate the spatial autocorrelation of the data, Geographically Weighted Regression to examine the relationship between population density and sighting counts across the study area, and Kernel Density Estimation to generate a heat map of the sighting report intensity across the United States. The results of these spatial analysis methods confirmed significant clustering of UFO sighting reports and the highest raw County of UFO sightings occurring in densely populated counties such as Los Angeles County, California. The result of the Geographically Weighted Regression was against the trend and indicated inconsistent patterns geographically in the relationship between population density and UFO sightings. Finally, culturally prominent UFO areas, such as Area 51 and Roswell, New Mexico, had higher than average sighting rates per 100,000 residents than their population densities would suggest, indicating additional cultural or military factors could be affecting UFO sighting reports.
Where do UFO sightings actually occur? UFO (Unidentified Flying Object) sightings are often associated with the late night hours, perhaps on a long drive home on a highway out in the middle of nowhere. Through pop culture media such as X-Files and Fire in the Sky, this trope has developed and become a prominent storytelling device. But how accurate is this perception? UFOs, also known as UAP (Unidentified Aerial Phenomenon) which the military has started calling the phenomenon to differentiate it from UFO*, caught the public’s attention in 1940 (Mark 2019), but there has been little research into the geographical or spatial trends of UFO sightings. Uncovering geographical trends in UFO sightings can provide insights into fields such as psychology, sociology, and geography among others.
The aim of this study is to fill the gap in the research of geographical or spatial trends of UFO sightings. Pop culture would tell us that certain areas of the United States are more prone to UFO sightings than others. Places such as Roswell and the area outside of Area 51 have built a tourism industry catered to tourists drawn in by the UFO lore and chance to have a sighting of their own. How accurate is this perception? This study will work towards investigating if any relationships exist between UFO sightings and other variables such as population density, urban and rural areas, and east and west coast areas of the country. In short, this study will investigate whether there is a statistically significant relationship between UFO sightings and population density across the United States. Building upon this main research question, we will examine if any counties or states deviate from the expected population density relationship and what commonalities exist between these areas, geographical and culturally.
The data for this project was obtained from two main sources. The UFO data was obtained from kaggle.com that has been uploaded by the National UFO Reporting Center (Nugent, 2019). This dataset contains around 76,000 rows each associated with a sighting. The dataset has 11 columns that represent data on geographic location (city, state, and latitude and longitude coordinates), duration of the encounter, date and time of encounter, and comments about the encounter. To ensure usability of the data, I cleaned and prepped the data. This included ensuring the latitude and longitude coordinates were usable by filtering out coordinates that are impossible or existing outside of North America. There are some limitations with the data. The dataset contains sighting information from a small batch of countries and therefore doesn’t represent worldwide sightings accurately. The dataset also includes an implicit bias as it only contains information on reported sightings, therefore the dataset isn’t a representative or random sample of UFO sightings. Lastly, the age of the dataset is another limitation as it was last updated in 2023.
The second set of data I will be using is the U.S. Census data obtained via the tidycensus R package. This data is from the American Community Survey. This data will be used to generate population densities. A limitation with the census data is that it is aggregated and therefore could suffer from the ecological fallacy of states or counties having a higher population density then they would at a more granular level. I went with the county level as it was granular enough to provide a high enough number of study areas to provide meaningful insights. The state level would have been too broad and impacted by the ecological fallacy. States like New Jersey, while considered the most densely populated state in the U.S., contain rural or sparsely populated areas that would be overlooked at the state level.
The methodology for this analysis will begin by loading the proper packages into RStudio. The packages to install will be tidyr, ggplot2, dplyr, readr, tidycensus, sf, spdep, GWmodel, skimr, MASS, and tmap. After installing the packages and loading the data into RStudio using readr, the next step will be to wrangle the data so it meets the needs of the analysis and is usable. This includes filtering the UFO sighting data for only U.S. sighting data. To ensure we filtered our data correctly, a plot was created to visualize the UFO sightings, Figure 1. Next, we will then create a spatial dataframe from the filtered data. We will also obtain the US census data via tidycensus at this step to get data on population density and boundaries (Walker and Herman, 2026). The final portion of the data wrangling will be to join the US census data to the sighting data using geographic location.
The next portion of the methodology will consist of generating descriptive and exploratory statistics. Here we will get values for the number of sightings per state, average number of sightings for the states, and average duration of sightings among other values. Packages such as skimr will be used to generate the descriptive and exploratory statistics for the data frame (Waring et al., 2026).
After generating the descriptive and explanatory statistics we will begin the spatial portion of the methodology. At this point we will utilize methods such as a Global Moran’s I from the spdep package to generate a value for the spatial autocorrelation of sighting (Bivand and Wong, 2018), geographically weighted regression to identify the change in relationship between UFO sighting numbers and population density across geographic areas and a Local Moran’s I (LISA) to identify hot and cold spots of clusterings. The geographical weight regression will be conducted with functions from the GWmodel package (Liu, 2024). Another method explored is the Kernel Density Estimate (KDE) to generate a heat map.The Kernel Density Estimate was created with functions from the Mass package (Rdocumentation, n.d.) Additionally, map visualizations will be created at this step. Examples of maps created include LISA maps illustrating cluster areas and composite maps showing heat maps of sightings.
To address the secondary research question, residuals from the geographically weighted regression will be examined to identify areas where the model’s predictions diverged significantly from observed sighting counts. These areas will be categorized as ‘counter-trend’ and will be further looked into for commonalities across the areas. Additionally, areas such as Lincoln County, NV (home to Area 51) and Roswell, NM will be examined.
The results of the exploratory analysis revealed that the sighting_count, sightings_per_100k, and total_sightings were right skewed with long tails. These histrograms can be viewed in Figures 2, 3, and 4. A look at the values revealed that there were a few counties and states that were outliers. California had the highest number of total sightings with 8912 as revealed in Table 1. The next state, Washington, had 3966, which is less than half of the total number of sightings in California. Los Angeles County had the highest number of sightings on a county basis with 1858 sightings which is displayed in Table 3. The next highest county, Maricopa County, AZ, had 1224 sightings. The top 10 and lowest 10 states with UFO sightings are listed in table 2. The top 10 counties with UFO sightings are listed in table 2. A significant number of these states and counties had higher population densities (California, Florida, Illinois, and New York) which hinted at a positive relationship between population density and UFO sightings. When the data was normalized to sightings per 100,000 residents and counties with less than 10,000 residents were excluded, several counties in less densely populated western states rose up. These results are shown in table 4. This was a bit inconsistent with the results of total sightings which suggested a positive relationship between ufo sightings and population density.
## # A tibble: 10 × 4
## NAME sighting_count population sightings_per_100k
## <chr> <int> <dbl> <dbl>
## 1 Los Angeles County, California 1858 10040682 18.5
## 2 Maricopa County, Arizona 1324 4412779 30.0
## 3 King County, Washington 1315 2225064 59.1
## 4 Cook County, Illinois 860 5169517 16.6
## 5 San Diego County, California 846 3323970 25.5
## 6 Orange County, California 686 3170345 21.6
## 7 New York County, New York 529 1629153 32.5
## 8 Clark County, Nevada 496 2228866 22.3
## 9 Riverside County, California 485 2437864 19.9
## 10 Snohomish County, Washington 436 811572 53.7
## # A tibble: 10 × 4
## NAME sighting_count population sightings_per_100k
## <chr> <int> <dbl> <dbl>
## 1 Inyo County, California 32 17930 178.
## 2 Lincoln County, Washington 19 10732 177.
## 3 Dare County, North Carolina 61 36698 166.
## 4 Klickitat County, Washington 36 22055 163.
## 5 La Paz County, Arizona 33 21035 157.
## 6 Mono County, California 22 14395 153.
## 7 Lincoln County, Oregon 66 49336 134.
## 8 Plumas County, California 24 18844 127.
## 9 Emery County, Utah 12 10099 119.
## 10 Williamsburg city, Virginia 17 15034 113.
| State | Total Sightings |
|---|---|
| ca | 8912 |
| wa | 3966 |
| fl | 3835 |
| tx | 3447 |
| ny | 2980 |
| il | 2499 |
| az | 2414 |
| pa | 2366 |
| oh | 2275 |
| mi | 1836 |
| State | Total Sightings |
|---|---|
| nd | 129 |
| de | 166 |
| wy | 175 |
| sd | 183 |
| ri | 228 |
| vt | 260 |
| hi | 262 |
| ak | 319 |
| ms | 375 |
| ne | 381 |
| County | Sighting Count | Population | Pop. Density (per sq mi) |
|---|---|---|---|
| Los Angeles County, California | 1858 | 10040682 | 2453.1 |
| Maricopa County, Arizona | 1324 | 4412779 | 478.1 |
| King County, Washington | 1315 | 2225064 | 1018.1 |
| Cook County, Illinois | 860 | 5169517 | 5406.8 |
| San Diego County, California | 846 | 3323970 | 780.3 |
| Orange County, California | 686 | 3170345 | 3969.8 |
| New York County, New York | 529 | 1629153 | 51437.7 |
| Clark County, Nevada | 496 | 2228866 | 276.6 |
| Riverside County, California | 485 | 2437864 | 333.7 |
| Snohomish County, Washington | 436 | 811572 | 385.7 |
| County | Sighting Count | Population | Sightings per 100k |
|---|---|---|---|
| Inyo County, California | 32 | 17930 | 178.5 |
| Lincoln County, Washington | 19 | 10732 | 177.0 |
| Dare County, North Carolina | 61 | 36698 | 166.2 |
| Klickitat County, Washington | 36 | 22055 | 163.2 |
| La Paz County, Arizona | 33 | 21035 | 156.9 |
| Mono County, California | 22 | 14395 | 152.8 |
| Lincoln County, Oregon | 66 | 49336 | 133.8 |
| Plumas County, California | 24 | 18844 | 127.4 |
| Emery County, Utah | 12 | 10099 | 118.8 |
| Williamsburg city, Virginia | 17 | 15034 | 113.1 |
The results of the spatial analysis provided some insights into how spatial autocorrelated the data was. The Global Moran produced a value of 0.3569 with a P value of <.05, displayed in Figure 5, which indicated that the ufo sighting data is positively spatial autocorrelated and not randomly distributed through the United States. Areas of similar values tended to cluster together. In other words, counties with higher UFO sightings tended to be around other counties with higher UFO sightings and vice versa for counties with low UFO sightings. This led to the LISA analysis to identify the regions of high-high clustering. The LISA analysis displayed areas of high-high clustering in Southern California, the Pacific Northwest, Florida, Chicago, Detroit and the Northeast. The results of the LISA analysis are displayed in Figure 6.
##
## Moran I test under randomisation
##
## data: county_sf$sighting_count
## weights: lw
## n reduced by no-neighbour observations
##
## Moran I statistic standard deviate = 35.123, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.3569005498 -0.0003115265 0.0001034340
## Table 4: Global Moran's I results for UFO sighting counts at the county level.
The results of the Geographically Weighted Regression (GWR) displayed inconsistency in the relationship between UFO sightings and population density. There were no clear patterns and the results were inconsistent. Two standouts from the GWR were Los Angeles County and Orange County, both in California. The former had more UFO sightings than would be expected given its population density and the latter had less UFO sightings than would be expected given its population density.
The Kernel Density Analysis showed similar results to the LISA analysis. There were areas of high UFO sighting density along the west coast, particularly in Southern California and in the Northeast, Chicagoland area, and the Florida Panhandle as well. These areas are among the most densely populated areas of the United States which shows a positive relationship between total number of UFO sightings and population density.
While examining “Famous Counties”, Lincoln County, Nevada and Chaves County, New Mexico, home to Area 51 and Roswell, New Mexico, both counties had higher sightings per 100,000 residents than the national average. This elevated rate of sightings despite their lower population densities may suggest that other factors such as cultural mythos coyld be a factor in the reported number of sightings. Area 51 is also an active military base and military aircraft could be mistaken for and reported as a UFO sighting. The results are shown in Table 5.
| County | Sighting Count | Population | Sightings per 100k |
|---|---|---|---|
| Lincoln County, Nevada | 17 | 5177 | 328.4 |
| Chaves County, New Mexico | 30 | 64912 | 46.2 |
Overall, I feel the term project turned out well. I was able to
deliver meaningful and insightful results using a UFO dataset and
applying spatial analysis methods. The overall findings that the total
raw number of reported UFO sightings do cluster and happen frequently in
densely populated areas provides an answer to my initial research
question. The strength of the relationship between population density
and UFO sightings and what other factors might be impacting sighting
counties is a strong candidate for future research. Areas that go
against the trend, such as Lincoln County, Nevada and Chaves County, New
Mexico begin to hint at another question of the strength of cultural
mythos and “wanting to see something” play into the number of reported
sightings.
A challenge I encountered during this project was memory usage while
conducting the analysis in RStudio. The program would repeatedly freeze
while attempting to conduct various analyses, particularly the joining
of the UFO and census data. Perhaps using state level data would have
sped up the process but I don’t feel that viewing the data at state
level would have provided the granularity needed for this project. If I
were to attempt this project again I am not sure if I would run the
Geographically Weighted Regression. The results did not provide much
insight, but that wasn’t known until the analysis was completed.
Attempting the project again, I would look into incorporating other
factors that may impact sighting reports such as proximity to military
bases and the amount of light pollution in the area. A temporal
component would be interesting to add to investigate whether UFO reports
have spatially changed over time with shows like X-Files becoming
popular in the 1990s. The UFO dataset also contains other attributes
such as time that could be worth looking into as well.
Bivand R, Wong D (2018). “Comparing implementations of global and local indicators of spatial association.” TEST, 27(3), 716–748. doi:10.1007/s11749-018-0599-x
Liu, Binbin (2024). “GWmodel (version 2.4-1.” Rdocumentation. https://www.rdocumentation.org/packages/GWmodel/versions/2.4-1
Rdocumentation (n.d.). “kde2d: Two-Dimensional Kernel Density Estimation.” Rdocumentation. https://www.rdocumentation.org/packages/MASS/versions/7.3-65/topics/kded
Walker K, Herman M (2026). tidycensus: Load US Census Boundary and Attribute Data as ‘tidyverse’ and ‘sf’-Ready Data Frames. R package version 1.7.5, https://walker-data.com/tidycensus/
Waring E, Quinn M, McNamara A, Arino de la Rubia E, Zhu H, Ellis S (2026). skimr: Compact and Flexible Summaries of Data. R package version 2.2.2, https://docs.ropensci.org/skimr/.
Nugent, Cam. (2019).UFO Sightings around the world. National UFO Reporting Center. Accessed March 29th, 2026 from https://www.kaggle.com/datasets/camnugent/ufo-sightings-around-the-world