This markdown demonstrates how to access and analyze data from NASA’s Open Data portal using concepts learned in R. We will retrieve Near-Earth Object (NEO) data via the NASA API and perform basic exploratory analysis.
Data Source: NASA Open APIs
https://api.nasa.gov/
What these libraries do is assist us with full manipulation capabilities of a particular data set or to fulfill a particular function. These libraries are going to differ between functions you want to deploy and data you are utilizing.
library(httr)
library(jsonlite)
library(dplyr)
library(ggplot2)
library(lubridate)
An API key is useful for this demonstration because it is granting access to use the information published by NASA and make accessible on the website. We are ‘asking’ to use the data by running the key and establishing baseline time frames of the data necessary, as seen by start and end date.
api_key <- "DEMO_KEY"
start_date <- "2024-01-01"
end_date <- "2024-01-07"
url <- paste0(
"https://api.nasa.gov/neo/rest/v1/feed?",
"start_date=", start_date,
"&end_date=", end_date,
"&api_key=", api_key
)
After establishing the API key, we now need to pull the necessary data from the API. We flatten the ingested data to make it easy to digest and enable more predictive modeling, rather than keeping it over complex or difficult to transform.
response <- GET(url)
content_data <- content(response, as = "text", encoding = "UTF-8")
neo_data <- fromJSON(content_data, flatten = TRUE)
The data ingested from the NASA API includes hundreds of columns, and we only want to look at specific ones in relation to our inquiry. We need to bind the rows, because there are so many, and create the dataframe from which we can fully digest and manipulate data to create a common operating picture and tell a story of near earth objects and how hazardous they are based on specific variables. Variables include velocity, close proximity, and size of object. Some of the data needs to be read as numeric, because it may additionally include characters that prevent proper analysis of numerical values and halt graphic rendering. The output that you get here are the columns that are being assessed for significance and visualization. These are the most important columns that we are concerned about with NEO data.
library(tidyr)
asteroids <- bind_rows(neo_data$near_earth_objects)
colnames(asteroids)
## [1] "id"
## [2] "neo_reference_id"
## [3] "name"
## [4] "nasa_jpl_url"
## [5] "absolute_magnitude_h"
## [6] "is_potentially_hazardous_asteroid"
## [7] "close_approach_data"
## [8] "is_sentry_object"
## [9] "links.self"
## [10] "estimated_diameter.kilometers.estimated_diameter_min"
## [11] "estimated_diameter.kilometers.estimated_diameter_max"
## [12] "estimated_diameter.meters.estimated_diameter_min"
## [13] "estimated_diameter.meters.estimated_diameter_max"
## [14] "estimated_diameter.miles.estimated_diameter_min"
## [15] "estimated_diameter.miles.estimated_diameter_max"
## [16] "estimated_diameter.feet.estimated_diameter_min"
## [17] "estimated_diameter.feet.estimated_diameter_max"
## [18] "sentry_data"
clean_df <- asteroids %>%
unnest(close_approach_data) %>%
transmute(
name,
close_approach_date = as.Date(close_approach_date),
velocity_kph = as.numeric(relative_velocity.kilometers_per_hour),
miss_distance_km = as.numeric(miss_distance.kilometers),
estimated_diameter_min = estimated_diameter.meters.estimated_diameter_min,
estimated_diameter_max = estimated_diameter.meters.estimated_diameter_max,
hazardous = is_potentially_hazardous_asteroid
)
After cleaning the columns and transforming the rows of data to all read as numeric, we want to take a look at the product we have so far. We don’t necessarily want to see every column and every row, because it is still a lot of information, but we can group the output by filtering by hazardous objects, true or false. By analyzing the velocity and miss distance and taking the average of both, we can create a delineating factor that establishes the object as hazardous or not.
summary(clean_df)
## name close_approach_date velocity_kph miss_distance_km
## Length:106 Min. :2024-01-01 Min. : 13413 Min. : 242676
## Class :character 1st Qu.:2024-01-02 1st Qu.: 28202 1st Qu.:16086292
## Mode :character Median :2024-01-03 Median : 43828 Median :31380741
## Mean :2024-01-03 Mean : 45917 Mean :34348710
## 3rd Qu.:2024-01-06 3rd Qu.: 61269 3rd Qu.:54965219
## Max. :2024-01-07 Max. :136268 Max. :73609797
## estimated_diameter_min estimated_diameter_max hazardous
## Min. : 1.756 Min. : 3.927 Mode :logical
## 1st Qu.: 17.091 1st Qu.: 38.218 FALSE:98
## Median : 38.508 Median : 86.108 TRUE :8
## Mean : 80.872 Mean : 180.835
## 3rd Qu.:108.218 3rd Qu.: 241.984
## Max. :620.233 Max. :1386.883
clean_df %>%
group_by(hazardous) %>%
summarise(
avg_velocity = mean(velocity_kph, na.rm = TRUE),
avg_miss_distance = mean(miss_distance_km, na.rm = TRUE),
count = n()
)
## # A tibble: 2 × 4
## hazardous avg_velocity avg_miss_distance count
## <lgl> <dbl> <dbl> <int>
## 1 FALSE 45149. 34427519. 98
## 2 TRUE 55322. 33383307. 8
We are going to visualize the approach velocity of the objects by using the function ggplot and creating a bar chart from the histogram selection. The x variable is established, and the labels are created appropriately to showcase the number of objects traveling at particular speeds. We keep the theme as minimal for simplicity.
ggplot(clean_df, aes(x = velocity_kph)) +
geom_histogram(bins = 30) +
labs(
title = "Distribution of Asteroid Approach Velocity",
x = "Velocity (km/h)",
y = "Count"
) +
theme_minimal()
If we want to look at two particular variables in conjunction, we can use the point plot and add the filter of hazardous by color. You can see that true hazardous objects are labeled as blue.
ggplot(clean_df, aes(x = velocity_kph, y = miss_distance_km, color = hazardous)) +
geom_point(alpha = 0.7) +
labs(
title = "Miss Distance vs Velocity",
x = "Velocity (km/h)",
y = "Miss Distance (km)",
color = "Potentially Hazardous"
) +
theme_minimal()