My Image Source: Nasa.gov

Introduction

This project explores and compares the characteristics of planets in our Solar System with know exoplanets. The analysis focus on comparing mass and radius and identifying exoplanets similar to Earth.

Data source: - Planets in the Solar System: NASA Planetary Fact Sheet (https://nssdc.gsfc.nasa.gov/planetary/factsheet/) - Exoplanets: NASA Exoplanet Archive (https://exoplanetarchive.ipac.caltech.edu/)

Load Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggrepel)

Load Data

planets <- read_csv("planets_updated.csv")
## Rows: 8 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Planet, Color, Surface Pressure (bars), Ring System?, Global Magne...
## dbl (16): Mass (10^24kg), Diameter (km), Density (kg/m^3), Surface Gravity(m...
## num  (1): Orbital Period (days)
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
exoplanets <- read_csv("exoplanets.csv")
## Rows: 4855 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): Name, Mass (MJ), Radius (RJ), Period (days), Semi-major axis (AU),...
## dbl  (1): Disc. Year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Cleaning

# Convert mass and radius to Jupiter units for consistency
jupiter_mass_kg <- 1.898e27
jupiter_radius_km <- 69911

planets <- planets %>%
  mutate(`Mass (MJ)` = (`Mass (10^24kg)` * 1e24) / jupiter_mass_kg,
         `Radius (RJ)` = `Diameter (km)` / (2 * jupiter_radius_km),
         Type = "Solar System Planet")

exoplanets <- exoplanets %>%
  mutate(`Mass (MJ)` = as.numeric(`Mass (MJ)`),  # Convert mass to numeric
         `Radius (RJ)` = as.numeric(`Radius (RJ)`),  # Convert radius to numeric
         Type = "Exoplanet")  # Add Type for exoplanets
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `Mass (MJ) = as.numeric(`Mass (MJ)`)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
# Removing rows with NA values
exoplanets_before <- nrow(exoplanets)
exoplanets <- drop_na(exoplanets, `Mass (MJ)`, `Radius (RJ)`)
exoplanets_after <- nrow(exoplanets)
cat("Rows removed due to NA values:", exoplanets_before - exoplanets_after, "\n")
## Rows removed due to NA values: 4210
# Combine planets and exoplanets data
combined_data <- bind_rows(
  planets %>% select(Name = Planet, `Mass (MJ)`, `Radius (RJ)`, Type),
  exoplanets %>% select(Name, `Mass (MJ)`, `Radius (RJ)`, Type)
)

Linear Regression Analysis: Mass vs Radius

lm_model <- lm(`Radius (RJ)` ~ `Mass (MJ)`, data = combined_data)
summary(lm_model)
## 
## Call:
## lm(formula = `Radius (RJ)` ~ `Mass (MJ)`, data = combined_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3026 -0.5528  0.1173  0.4069  2.5294 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.819455   0.022749  36.021  < 2e-16 ***
## `Mass (MJ)` 0.026742   0.004955   5.397 9.51e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5368 on 651 degrees of freedom
## Multiple R-squared:  0.04282,    Adjusted R-squared:  0.04135 
## F-statistic: 29.13 on 1 and 651 DF,  p-value: 9.508e-08

Diagnostic Plot

par(mfrow = c(2, 2))
plot(lm_model)

Solar System Planets vs. Exoplanets

# Define colors
planet_colors <- c(
  "Mercury" = "gray", "Venus" = "goldenrod", "Earth" = "blue", "Mars" = "red", 
  "Jupiter" = "orange", "Saturn" = "gold", "Uranus" = "lightblue", "Neptune" = "darkblue"
)
p1 <- ggplot() +
  geom_point(data = combined_data %>% filter(Type == "Exoplanet"),
             aes(x = `Mass (MJ)`, y = `Radius (RJ)`, fill = `Mass (MJ)`), 
             alpha = 0.5, shape = 21, size = 3, color = "white") +
  # Exoplanets
  geom_text_repel(data = combined_data %>% filter(Type == "Solar System"),
                  aes(x = `Mass (MJ)`, y = `Radius (RJ)`, label = Name),
                  size = 3, color = "white", max.overlaps = 20) +  # Adjust max.overlaps to control text density
  # Solar System Planets (Fixed colors using color aesthetic)
  geom_point(data = combined_data %>% filter(Type == "Solar System Planet"),
             aes(x = `Mass (MJ)`, y = `Radius (RJ)`, color = Name), 
             size = 5) +
  geom_text_repel(data = combined_data %>% filter(Type == "Exoplanet"),
                aes(x = `Mass (MJ)`, 
                    y = `Radius (RJ)`, 
                    label = Name),
                size = 3, color = "white", max.overlaps = sqrt(nrow(combined_data))) +
  scale_x_log10() +
  scale_y_log10() +

  scale_color_manual(values = planet_colors, na.translate = FALSE, guide = "legend") +  # Planets' true colors
  scale_fill_gradient(low = "white", high = "red", na.value = "gray", guide = "colorbar") +  # Exoplanets' mass gradient
  labs(title = "Comparison of Solar System Planets vs. Exoplanets",
       x = "Mass (Jupiter Masses)", y = "Radius (Jupiter Radius)",
       fill = "Exoplanet Mass (MJ)", color = "Solar System Planets") +
  theme_minimal(base_family = "Arial") +
  theme(
    panel.background = element_rect(fill = "black", color = NA),
    plot.background = element_rect(fill = "black", color = NA),
    legend.background = element_rect(fill = "black"),
    legend.text = element_text(color = "white"),
    legend.title = element_text(color = "white"),
    axis.text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    plot.title = element_text(color = "white")
  )

ggplotly(p1)
## Warning in geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

Exoplanets Similar to Earth

Data Cleaning Plot 2

earth_mass_kg <- 5.972e24
earth_radius_km <- 6371

planets <- planets %>%
  mutate(`Mass (ME)` = (`Mass (10^24kg)` * 1e24) / earth_mass_kg,
         `Radius (RE)` = `Diameter (km)` / (2 * earth_radius_km),
         Type = "Solar System Planet")

exoplanets <- exoplanets %>%
  mutate(`Mass (ME)` = suppressWarnings(as.numeric(`Mass (MJ)`) * 317.8),  # 1 MJ = 317.8 ME
         `Radius (RE)` = suppressWarnings(as.numeric(`Radius (RJ)`) * 11.2),  # 1 RJ = 11.2 RE
         Type = "Exoplanet") %>%
  drop_na(`Mass (ME)`, `Radius (RE)`)

combined_data_earth_exoplanets <- bind_rows(
  planets %>% select(Name = Planet, `Mass (ME)`, `Radius (RE)`, Type),
  exoplanets %>% select(Name, `Mass (ME)`, `Radius (RE)`, Type)
)
earth <- combined_data_earth_exoplanets %>% filter(Name == "Earth")
earth_mass <- earth$`Mass (MJ)`
## Warning: Unknown or uninitialised column: `Mass (MJ)`.
earth_radius <- earth$`Radius (RJ)`
## Warning: Unknown or uninitialised column: `Radius (RJ)`.
earth
## # A tibble: 1 × 4
##   Name  `Mass (ME)` `Radius (RE)` Type               
##   <chr>       <dbl>         <dbl> <chr>              
## 1 Earth        1.00          1.00 Solar System Planet
earth_mass <- 1
earth_radius <- 1

similar_exoplanets <- exoplanets %>%
  filter(
    between(`Mass (ME)`, earth_mass * 0.6, earth_mass * 1.4),  # Mass within ±40% of Earth
    between(`Radius (RE)`, earth_radius * 0.6, earth_radius * 1.4)  # Radius within ±40% of Earth
  )

similar_exoplanets <- bind_rows(
  planets %>% filter(Planet == "Earth") %>% select(Name = Planet, `Mass (ME)`, `Radius (RE)`),
  similar_exoplanets %>% select(Name, `Mass (ME)`, `Radius (RE)`)
)

print(similar_exoplanets)
## # A tibble: 8 × 3
##   Name        `Mass (ME)` `Radius (RE)`
##   <chr>             <dbl>         <dbl>
## 1 Earth             1.00          1.00 
## 2 Kepler-70c        0.667         0.862
## 3 Kepler-138d       0.639         1.21 
## 4 TRAPPIST-1b       0.849         1.09 
## 5 TRAPPIST-1c       1.38          1.05 
## 6 TRAPPIST-1e       0.620         0.918
## 7 TRAPPIST-1f       0.680         1.04 
## 8 TRAPPIST-1g       1.34          1.13

Earth vs. Exoplanets

p2 <- ggplot(similar_exoplanets, aes(x = `Mass (ME)`, y = `Radius (RE)`, color = Name, text = Name)) +
  geom_point(size = 4) +
  labs(title = "Comparison of Earth with Similar Exoplanets",
       x = "Mass (Earth Masses)", y = "Radius (Earth Radius)") +
  theme_minimal(base_family = "Arial") +
  theme(
    panel.background = element_rect(fill = "black", color = NA),
    plot.background = element_rect(fill = "black", color = NA),
    legend.background = element_rect(fill = "black"),
    legend.text = element_text(color = "white"),
    legend.title = element_text(color = "white"),
    axis.text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    plot.title = element_text(color = "white")
  )

ggplotly(p2, tooltip = "text")

Essay

Data Cleaning

The dataset required significant cleaning to ensure accurate and consistent comparisons between Solar System planets and exoplanets. First, the mass and radius values for both sets of planets were converted into common units—Jupiter Masses (MJ) and Jupiter Radius (RJ) to allow for proper scaling and comparison. For Solar System planets, the mass was initially in kilograms, and the radius was derived from the diameter in kilometers. I converted the mass using the known mass of Jupiter (1.898e27 kg), and the radius was derived using the diameter of the planet, divided by twice Jupiter’s radius (69,911 km). For exoplanets, the data needed cleaning in several ways: some values were not numeric, so I converted the mass and radius columns to numeric types. I then added a “Type” column to label the exoplanets distinctively from Solar System planets.bI used the drop_na() function to remove any rows with missing values in mass or radius, which resulted in some rows being excluded. The number of rows removed was logged for transparency. Finally, the two datasets, Solar System planets and exoplanets, were merged into one combined dataset to facilitate direct comparison (combined_data). This was done using the bind_rows() function, ensuring that both sets of data had matching columns for mass, radius, and planet names.

Visualization

The visualization compares the masses and radius of Solar System planets and exoplanets. Using a scatter plot, I visualized the relationship between mass and radius on a logarithmic scale for both groups of planets. The Solar System planets are colored according to their true colors, while the exoplanets are represented with a mass gradient from white to red, providing a clear visual distinction between the two groups. One of the key patterns that emerged was the stark difference between the sizes of exoplanets and Solar System planets. The exoplanets tend to be more diverse, with some approaching the size and mass of Jupiter-like gas giants and others resembling rocky Earth-like planets. A particularly surprising finding was the concentration of exoplanets that closely matched Earth’s mass and radius. This was explored further by filtering for exoplanets with a mass and radius within ±40% of Earth’s, resulting in a list of potential “Earth-like” exoplanets.

While the project successfully compared the characteristics of Solar System planets and exoplanets, there were a few challenges and limitations. One of the biggest hurdles was the inconsistent nature of exoplanet data. Some exoplanets had incomplete or uncertain measurements, especially for smaller planets. Despite cleaning the data, some inaccuracies likely remain in the dataset, which could affect the precision of the visualization. Additionally, I had hoped to include more detailed information about the atmospheres or potential habitability of the exoplanets, which would have provided a more comprehensive comparison to Earth. However, the dataset did not contain such information, and incorporating this would have required additional data sources. In the future, I would like to refine the data cleaning process further, possibly including more advanced techniques. I would also explore other types of visualizations, such as 3D plots or heatmaps, to provide additional insights into the data. I attempted to replace the points with images of the planets, but unfortunately, it was unsuccessful. Finally, integrating real-time data from exoplanet discovery databases could keep the analysis up to date with the latest findings in the field.