In this report, I explore the “satellite database” data set, which comes from the Union of Concerned Scientists Satellite Database. This dataset was used for a graduate course that I took previously to predict satellite life expectancy, or the time in years that an artificial satellite is expected to be operational. The data shows each satellite’s life expectancy, as provided by the manufacturer, and their respective orbital attributes. The orbital attributes mainly pertain to the positioning of the satellite as it orbits around the Earth. In this report, I focus on the relationships that life expectancy has with other varibables, closely looking at LEOs or Low Earth Orbit satellites.
Below is a preview of my previous project. If interested, right-click to download and view the full report.
knitr::include_graphics("KQuimzon_Final.pdf")
These are the packages that will be utilized for this report.
library(tidyverse)
library(here)
library(skimr)
library(janitor)
library(psych)
library(cowplot)
First, the satellite data was loaded into RStudio and then cleaned up using clean_names(). Though the data had been previously cleaned for another study, this was to ensure all variables had consistent naming conventions. Then, three variables were renamed to indicate their belonging to a set of categorical Boolean variables. Each satellite entry can have multiple types, thus the Boolean variables.
sats <- read_csv(here("Lab3", "data", "satellites.csv")) # The data is loaded.
## Rows: 3095 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): orbit_class
## dbl (8): life, geo_longitude, perigee, apogee, eccentricity, inclination, pe...
## lgl (3): gov, com, mil
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
clean_sats <- sats %>% #This is the clean up step.
na.omit() %>% #Omit entries with NA values
clean_names() %>% #Clean variable names
rename(type_gov = gov, type_com = com, type_mil = mil) #Rename the Boolean variables.
head(clean_sats) #This shows the sample of the cleaned data.
## # A tibble: 6 × 12
## life orbit_class geo_longitude perigee apogee eccentricity inclination period
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Ellip 0 460 33200 0.706 31 580
## 2 1 Ellip 0 952 1155 0.0137 31 106.
## 3 2 Ellip 0 6292 156833 0.856 54.0 4033.
## 4 2 Ellip 0 461 87304 0.864 15.7 1869.
## 5 2 Ellip 0 467 87260 0.864 15.7 1868.
## 6 2 Ellip 0 474 87526 0.864 15.7 1876.
## # ℹ 4 more variables: launch_mass <dbl>, type_gov <lgl>, type_com <lgl>,
## # type_mil <lgl>
The above table shows a subset of the data. “Life” corresponds to the life expectancy in years, “orbit_class” to a type of orbit, “geo_longitude” is where a GEO satellite sits in relation to the Earth, ““perigee” is the closest distance of the satellite to the Earth in its orbits, “apogee” is the farthest distance of the satellite from the Earth, “eccentricity” is how close the orbit comes to a perfect circle, “inclination” is the angle of inclination for the orbit, “period” is the time it takes a satellite to complete a full orbit, and “launch_mass” is the mass of the satellite in kilograms. The last three variables “type_gov,” “type_com,” and “type_mil” indicate whether the satellite is considered a government, commercial, or military satellite, respectively.
Summary statistics for the data using the “psych” package is shown below:
describe(clean_sats)
## vars n mean sd median trimmed mad min
## life 1 3066 6.21 4.20 4.0 5.57 1.48 0.25
## orbit_class* 2 3066 2.87 0.46 3.0 2.92 0.00 1.00
## geo_longitude 3 3066 2.36 36.05 0.0 0.00 0.00 -179.80
## perigee 4 3066 6874.57 12919.78 548.0 4089.07 189.77 170.00
## apogee 5 3066 7643.33 16451.28 561.0 4428.84 202.37 280.00
## eccentricity 6 3066 0.01 0.07 0.0 0.00 0.00 0.00
## inclination 7 3066 59.54 30.97 53.0 62.05 43.74 0.00
## period 8 3066 346.71 574.16 95.6 229.00 1.63 91.34
## launch_mass 9 3066 1008.73 1802.85 260.0 576.12 166.05 1.00
## type_gov 10 3066 NaN NA NA NaN NA Inf
## type_com 11 3066 NaN NA NA NaN NA Inf
## type_mil 12 3066 NaN NA NA NaN NA Inf
## max range skew kurtosis se
## life 30.00 29.75 1.42 0.77 0.08
## orbit_class* 4.00 3.00 -1.07 2.93 0.01
## geo_longitude 180.00 359.80 0.53 9.26 0.65
## perigee 37782.00 37612.00 1.67 0.92 233.33
## apogee 330000.00 329720.00 5.57 74.49 297.11
## eccentricity 0.96 0.96 11.26 128.44 0.00
## inclination 143.40 143.40 -0.54 -0.41 0.56
## period 11520.00 11428.66 4.84 63.08 10.37
## launch_mass 22500.00 22499.00 3.71 23.89 32.56
## type_gov -Inf -Inf NA NA NA
## type_com -Inf -Inf NA NA NA
## type_mil -Inf -Inf NA NA NA
There are n = 3,066 observations with a total of 12 variables. The highest variation can be seen in “perigee” and “apogee.”
How does eccentricity affect the lifespan of a satellie? Eccentricity is how “round” the orbit of the satellite is. An orbit that is a perfect circle has an eccentricity of zero, while an ellipse has an eccentricity between zero and one. Read more here:Eccentricity
clean_sats %>% #A plot of eccentricity vs life expectancy, colored by the orbit class.
na.omit() %>%
ggplot(aes(x = life, y = eccentricity, color=orbit_class)) +
geom_jitter() +
labs(title = "Eccentricity vs. Life Expectancy",
subtitle = "Coded by orbit class",
caption = "Data from https://www.ucsusa.org/resources/satellite-database",
fill = "Orbit Class",
y="Eccentricity",
x="Satellite Life Expectancy")
ggsave("sat_lifespan.png") # Save the plot!
## Saving 7 x 5 in image
When the plot was first created without color, at first glance eccentricity didn’t seem to have a clear pattern. When colored by the orbit class, however, one can see that the LEO satellites, in light blue, which tend to be highly eccentric (close to zero), have shorter lifespans. LEOs, or Low Earth Orbit satellites orbit that Earth at an altitude of less than 1000 km. The other main types of orbits are GEO and MEO. Compared to GEO, which orbits at 35,000 km, and MEO, which at half the distance of GEO, LEOs orbit the Earth at a much closer distance. Find out more about orbit types here: Orbit Types
clean_sats %>% #Histogram created for LEO satellite life expectancy.
na.omit() %>%
filter(orbit_class == "LEO") %>% # Filer for LEO
ggplot(aes(x=life)) +
geom_histogram(binwidth = 0.75) + # Set bin width
theme_bw() + # Set theme to black and white
labs(title = "Histogram of LEO satellites life expectancy",
caption = "Data from https://www.ucsusa.org/resources/satellite-database",
y="Count",
x="Satellite Life Expectancy")
Overall, the data for LEO satellites look like it could have a positive skew, though there are quite a few values that fall under 15 years life expectancy. This could be a default or more generic life expectancy selected by aerospace companies for certain types of satellites.
Next, violin plots were created to compare the distribution of LEOs to the other orbit classes.
clean_sats %>%
na.omit() %>%
ggplot(aes(x=eccentricity, y=life, color=orbit_class)) + # Plots created for eccentricity vs life expectancy, by orbit class.
geom_violin() +
facet_wrap(~orbit_class) +
labs(title = "Violin plots of satellites eccentricity vs. life expectancy",
caption = "Data from https://www.ucsusa.org/resources/satellite-database",
fill = "Orbit Class",
y="Life expectancy",
x="Eccentricity")
The violin plots show that LEOs do indeed have smaller eccentricity, as shown by the bottom-left plot in the figure above. The life expectancy is pretty variable, but most of the data points seem to fall around the 5-year point, as indicated by the wide area near the bottom of the violin. In contrast, this is higher for MEO and GEO, at around 10 years and 15 years, respectively. The catch-all category of “ellip”, shown on the top-left, contains any satellite that has an elliptical orbit. This distribution seems to vary greatly, but also has most data falling around a 5-year lifespan.
Scatterplots were created plotting life expectancy against three different variables: “apogee,” “launch mass,” and “period.” Additionally, a smoothed function line was also drawn over each line. Again, apogee is the point farthest away from Earth within the satellite’s orbit, launch mass is the weight of the satellite in kilograms and period is the time it takes the satellite to do one rotation around the earth, calculated in minutes.
p1<- clean_sats %>% # Scatterplot for apogee and life expectancy.
na.omit() %>%
filter(orbit_class == "LEO") %>%
ggplot(aes(x=apogee, y=life, color=type_gov)) +
geom_point() +
geom_smooth() +
labs(title = "Apogee vs life",
fill = "Government satellite?",
y="Life expectancy",
x="Apogee")
p2<- clean_sats %>% # Scatterplot for laumch mass and life expectancy.
na.omit() %>%
filter(orbit_class == "LEO") %>%
ggplot(aes(x=launch_mass, y=life, color=type_gov)) +
geom_point() +
geom_smooth() +
labs(title = "Launch mass vs life",
fill = "Government satellite?",
y="Life expectancy",
x="Launch mass")
p3<- clean_sats %>% # Scatterplot for period and life expectancy.
na.omit() %>%
filter(orbit_class == "LEO") %>%
ggplot(aes(x=period, y=life, color=type_gov)) +
geom_point() +
geom_smooth() +
labs(title = "Period vs life",
fill = "Government satellite?",
y="Life expectancy",
x="Period")
A composite plot was created combining the three scatterplots, as shown below.
all_plot <- plot_grid(p1, p2, p3, labels= c('A', 'B', 'C'), label_size=10) # puts all the plot together
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
all_plot # show the plot
The last figure shows the three scatterplots created. For each plot, the data points were sorted based on whether or not the satellite was categorized as a “government”-operated one based on the variable “type_gov.” Though there are not clear trend lines for any of the variables, they seem to “spike” at one point. These could potentially indicate “sweet spots” where life expectancy is highest for each of the variables. Additionally, it’s interesting to note that the smooth line rises and than falls for high values for non-government satellites, but it tends to rise for launch mass and period for government satellites. This could indicate that government satellites may have more life expectancy built into them since they operate for a long period time.
LEOs or Low Earth Orbit satellites have small eccentricity or close to zero, meaning almost a perfect circle. They also seem to have the shortest life expectancy compared to other satellite orbit classes. For the data, the distribution of LEOs does not seem to have a normal distribution and the violin plots show that life expectancy is generally lower compared to the other classes. Finally, there does not seem to be direct relationships between “apogee,” “launch mass,” or “period” with life expectancy. However, government satellites may have slightly better life expectancy compared to non-government satellites.
Thank you for reading!