FInding Earth-Like Exoplanets in NASA Data

Introduction

Is Earth that much of an anomaly in the galaxy or are there planets which are extremely similar to Earth that we could be very close relatives to? The dataset I will use to answer this question is from the NASA Exoplanet Science Institute. The dataset is titled ‘Transiting Exoplanets’ and is made up of 35,774 observations of planets outside our solar system, often times hundreds of light years away (Christiansen et al. 2025). The dataset compiles information on the planet’s location, distance from it’s star, it’s orbital period, orbital parameters like eccentricity and inclination, as well as information on the planet’s size. All of this information can be compared to Earth to find Earth-like exoplanets based on data alone.

Exoplanet Data Analysis

To properly analyze the Transiting Exoplanets dataset we will have to first clean out the things we do not need. To clean the data we will select the following columns from the whole: Planet Radius, Orbital Period, Stellar Radius, Stellar Effective Temperature, Ratio of Planet to Stellar Mass, Inclination, and Eccentricity

# Here we have to clean up the dataset using R functions
clean_exoplanets <- exoplanets |> 
  select(c("pl_name", "pl_orbper", "pl_rade", "pl_orbeccen", "pl_orbincl", "pl_ratror", "st_teff"))

head(clean_exoplanets)
## # A tibble: 6 × 7
##   pl_name  pl_orbper pl_rade pl_orbeccen pl_orbincl pl_ratror st_teff
##   <chr>        <dbl>   <dbl>       <dbl>      <dbl>     <dbl>   <dbl>
## 1 55 Cnc e     0.737   NA          0.032         NA   NA         5292
## 2 55 Cnc e    NA       NA         NA             NA   NA         5250
## 3 55 Cnc e     0.737    2.17       0             NA   NA         5250
## 4 55 Cnc e     2.80    NA          0.264         NA   NA         5250
## 5 55 Cnc e     2.81    NA          0.174         NA   NA           NA
## 6 55 Cnc e    NA        1.92      NA             83    0.0187    5250

Here we face a dilemma, we need to handle all of the NA values but planets vary extremely widely, so the data needs to be summarized by each of the planet names using summarize(across(everything(), ...)) and grouping by pl_name. We also have to remove any rows where there was no data for some of the variables. The rows where NaN exists have absolutely no data for one of the columns we require, so we need to clean that up right afterward. Unfortunately this has to be done by completely omitting via na.omit() for these rows because they are not useful whatsoever.

# Creature a dataset of summarized exoplanets with the averages for each planet
summary_exoplanets <- clean_exoplanets |> 
  group_by(pl_name) |> 
  summarize(across(everything(), mean, na.rm=TRUE)) |> 
  na.omit(sumamry_exoplanets)
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(everything(), mean, na.rm = TRUE)`.
## ℹ In group 1: `pl_name = "55 Cnc e"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
head(summary_exoplanets)
## # A tibble: 6 × 7
##   pl_name      pl_orbper pl_rade pl_orbeccen pl_orbincl pl_ratror st_teff
##   <chr>            <dbl>   <dbl>       <dbl>      <dbl>     <dbl>   <dbl>
## 1 55 Cnc e         0.954    1.97      0.0695       86.0    0.0188   5243.
## 2 AU Mic b         8.46     4.15      0.0652       89.2    0.0499   3664.
## 3 AU Mic c        18.9      2.78      0.0709       89.3    0.033    3664 
## 4 BD+20 594 b     41.7      2.39      0            89.5    0.0223   5745 
## 5 BD-14 3065 b     4.29    21.2       0.066        80.8    0.0843   6836.
## 6 CoRoT-1 b        1.51    17.7       0.0267       85.4    0.138    6008

Now our summary dataset has 4,557 individual planets that we can compare to our home planet once we know our home planet’s statistics. The statistics we need for Earth’s data are:

[Earth, 365.256, 1, 0.0167, 7.155, 0.00968545781228, 5778]

(Clabon Allen 2000, J.L. Simon et al 1994, arXiv:1510.07674)

# Adding Earth Data
summary_exoplanets %>% add_row(pl_name="Earth", pl_orbper=365.256, pl_rade=1, pl_orbeccen=0.0167, pl_orbincl=7.155, pl_ratror=0.00968545781228, st_teff=5778, .before = 0)
## # A tibble: 3,835 × 7
##    pl_name      pl_orbper pl_rade pl_orbeccen pl_orbincl pl_ratror st_teff
##    <chr>            <dbl>   <dbl>       <dbl>      <dbl>     <dbl>   <dbl>
##  1 Earth          365.       1         0.0167       7.16   0.00969   5778 
##  2 55 Cnc e         0.954    1.97      0.0695      86.0    0.0188    5243.
##  3 AU Mic b         8.46     4.15      0.0652      89.2    0.0499    3664.
##  4 AU Mic c        18.9      2.78      0.0709      89.3    0.033     3664 
##  5 BD+20 594 b     41.7      2.39      0           89.5    0.0223    5745 
##  6 BD-14 3065 b     4.29    21.2       0.066       80.8    0.0843    6836.
##  7 CoRoT-1 b        1.51    17.7       0.0267      85.4    0.138     6008 
##  8 CoRoT-10 b      13.2      9.92      0.527       88.6    0.127     5062.
##  9 CoRoT-11 b       2.99    17.1       0.12        83.2    0.105     6408.
## 10 CoRoT-12 b       2.83    16.0       0.06        85.6    0.149     5675 
## # ℹ 3,825 more rows
ggplot(summary_exoplanets, aes(x=pl_orbper, y=pl_rade, color=st_teff)) + 
  geom_point(size=1) +
  labs(
    title="Transiting Exoplanets by Orbital Period, Radius, and Solar Effective Temperature",
    x="Orbital Period in Days",
    y="Radius in Earth Radii",
    color="Solar Effective Temperature"
  ) +
  theme_dark()

This graph is zoomed out very far so lets limit it to similar radius to Earth and regraph.

earthlike_exoplanets <- summary_exoplanets |>
  filter(pl_rade < 1.25) |> 
  filter(pl_rade > 0.75)

ggplot(earthlike_exoplanets, aes(x=pl_orbper, y=pl_rade, color=st_teff)) + 
  geom_point(size=1) +
  labs(
    title="Transiting Exoplanets by Orbital Period, Radius, and Solar Effective Temperature",
    x="Orbital Period in Days",
    y="Radius in Earth Radii",
    color="Solar Effective Temperature"
  ) +
  theme_dark()

Results

From the graphing above, we can see that there are very, very, very few exoplanets that show similar characteristics to earth. Even when filtering the dataset down to just 386 exoplanets, the majority of the candidates are far too close to their sun (seen by a lower orbital period). Overall, there are nearly zero candidates that are similar to Earth by solar effective temperature, radius, and orbital period. If we wanted to move off of our planet to another solar system, we would have to worry about high temperatures, low temperatures, too much or too little gravity, or seasons that last days instead of months.

These results hint at the fact that Earth is extremely special, is almost entirely on its own based on what we have seen, and needs to be cherished because there aren’t any alternatives that we can just get up and go to. In relation to the question posed, yes, Earth is a rarity based on what we have seen, and astronomers can say that Earth is at the very least, a one in 3834 occurrence. This number will only get smaller as we find more planets, and even if we find a new Earthlike exoplanet, we can’t say for sure that we can get there because it could be hundreds, thousands, or millions of lightyears away.

Obstacles: Issues with analyzing the data.

Obstacles with researching Earthlike exoplanets were mainly based on what we should consider Earthlike. I selected planets based on their radius and orbital period first, which yielded a total of zero results. When I narrowed it down to just based on radius, I had to move to within a whole quarter Earth Radius to get a significant number of planets to graph. This is not the fault of the dataset, rather it is a proof of how extremely rare a planet like Earth is.

Future Steps for Exoplanet Research

In the future, we have to play the long game. Having identified only about 6150 total exoplanets according to NASA’s Exoplanet Science Institute (Christiansen et al.) we still have many millions more to go. There simply isn’t enough existing data to say for sure that Earth is alone however as we find more data we can narrow down the precise conditions for a planet like ours being created. All in all, the future needs to hold more data, and with more data, scientists can find better candidates, better limitations to the data, and potentially, another Earth.

Citations

Jessie L. Christiansen et al 2025 Planet. Sci. J. 6 186
J.L. Simon et al 1994 Astronomy and Astrophysics. 282 (2): 663–683.
Clabon Walter Allen, Arthur N. Cox 2000 Allen’s Astrophysical Quantities, Springer, p. 294.

Acknowledgement

This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.