Exoplanets

In this visualization, I will attempt to correlate relationships between aspects of planets and stars.

I am using the Open Exoplanet Catalogue from kaggle.com: https://www.kaggle.com/mrisdal/open-exoplanet-catalogue

Primary Variables used:

PlanetIdentifier: name/code of the planet

PlanetaryMassJpt: Mass of the planet (measured in Jupiter masses)

Eccentricity: measure of which the planet’s orbit deviates from a circle

SurfaceTempK: Surface Temperature of the Planet (in degrees Kelvin)

DiscoveryMethod: Method by which planet was discovered

HostStarMassSlrMass: Mass of the star (measured in solar masses)

HostStarMetallicity: Measure of the abundance of heavy elements present inside the star (measured in Fe/H)

If Fe/H=0, the star has the same abundance of iron as the Sun. If Fe/H = -1, it has 1/10th of the sun’s value and vice versa.

HostStarTempK: Temperature of the Star (measured in degrees Kelvin)

SemiMajorAxisAU: The major axis of an orbit at its longest diameter (measured in AU(distance from Earth to Sun))

#install.packages("scatterplot3d")
#install.packages("rgl")
#install.packages("contourPlot")
# Load necessary packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(readr)
library(ggplot2)
library(dplyr)
library(ggthemes)
library(ggrepel)
library(contourPlot)
## Loading required package: interp
## Loading required package: RColorBrewer
library(rgl)
library(plot3D)
library(knitr)
library(scatterplot3d)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
# Link .csv file in directory to R Studio
setwd("C:/Users/tycho/Desktop/DATA110")
exoplanets <- read_csv("oec.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   PlanetIdentifier = col_character(),
##   AgeGyr = col_logical(),
##   DiscoveryMethod = col_character(),
##   LastUpdated = col_character(),
##   RightAscension = col_character(),
##   Declination = col_character(),
##   ListsPlanetIsOn = col_character()
## )
## i Use `spec()` for the full column specifications.
## Warning: 2 parsing failures.
##  row    col           expected actual      file
## 1681 AgeGyr 1/0/T/F/TRUE/FALSE 0.0055 'oec.csv'
## 2916 AgeGyr 1/0/T/F/TRUE/FALSE 3      'oec.csv'
head(exoplanets)
## # A tibble: 6 x 25
##   PlanetIdentifier TypeFlag PlanetaryMassJpt RadiusJpt PeriodDays
##   <chr>               <dbl>            <dbl>     <dbl>      <dbl>
## 1 HD 143761 b             0           1.04      NA         39.8  
## 2 HD 143761 c             0           0.079     NA        103.   
## 3 KOI-1843.03             0           0.0014     0.054      0.177
## 4 KOI-1843.01             0          NA          0.114      4.19 
## 5 KOI-1843.02             0          NA          0.071      6.36 
## 6 Kepler-9 b              0           0.25       0.84      19.2  
## # ... with 20 more variables: SemiMajorAxisAU <dbl>, Eccentricity <dbl>,
## #   PeriastronDeg <dbl>, LongitudeDeg <dbl>, AscendingNodeDeg <dbl>,
## #   InclinationDeg <dbl>, SurfaceTempK <dbl>, AgeGyr <lgl>,
## #   DiscoveryMethod <chr>, DiscoveryYear <dbl>, LastUpdated <chr>,
## #   RightAscension <chr>, Declination <chr>, DistFromSunParsec <dbl>,
## #   HostStarMassSlrMass <dbl>, HostStarRadiusSlrRad <dbl>,
## #   HostStarMetallicity <dbl>, HostStarTempK <dbl>, HostStarAgeGyr <dbl>,
## #   ListsPlanetIsOn <chr>
# View & load the data
view(exoplanets)
exoplanets
## # A tibble: 3,584 x 25
##    PlanetIdentifier TypeFlag PlanetaryMassJpt RadiusJpt PeriodDays
##    <chr>               <dbl>            <dbl>     <dbl>      <dbl>
##  1 HD 143761 b             0           1.04      NA         39.8  
##  2 HD 143761 c             0           0.079     NA        103.   
##  3 KOI-1843.03             0           0.0014     0.054      0.177
##  4 KOI-1843.01             0          NA          0.114      4.19 
##  5 KOI-1843.02             0          NA          0.071      6.36 
##  6 Kepler-9 b              0           0.25       0.84      19.2  
##  7 Kepler-9 c              0           0.17       0.82      39.0  
##  8 Kepler-9 d              0           0.022      0.147      1.59 
##  9 GJ 160.2 b              0           0.0321    NA          5.24 
## 10 Kepler-566 b            0          NA          0.192     18.4  
## # ... with 3,574 more rows, and 20 more variables: SemiMajorAxisAU <dbl>,
## #   Eccentricity <dbl>, PeriastronDeg <dbl>, LongitudeDeg <dbl>,
## #   AscendingNodeDeg <dbl>, InclinationDeg <dbl>, SurfaceTempK <dbl>,
## #   AgeGyr <lgl>, DiscoveryMethod <chr>, DiscoveryYear <dbl>,
## #   LastUpdated <chr>, RightAscension <chr>, Declination <chr>,
## #   DistFromSunParsec <dbl>, HostStarMassSlrMass <dbl>,
## #   HostStarRadiusSlrRad <dbl>, HostStarMetallicity <dbl>, HostStarTempK <dbl>,
## #   HostStarAgeGyr <dbl>, ListsPlanetIsOn <chr>
# Examine datafram structure
str(exoplanets)
## spec_tbl_df [3,584 x 25] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ PlanetIdentifier    : chr [1:3584] "HD 143761 b" "HD 143761 c" "KOI-1843.03" "KOI-1843.01" ...
##  $ TypeFlag            : num [1:3584] 0 0 0 0 0 0 0 0 0 0 ...
##  $ PlanetaryMassJpt    : num [1:3584] 1.045 0.079 0.0014 NA NA ...
##  $ RadiusJpt           : num [1:3584] NA NA 0.054 0.114 0.071 0.84 0.82 0.147 NA 0.192 ...
##  $ PeriodDays          : num [1:3584] 39.846 102.54 0.177 4.195 6.356 ...
##  $ SemiMajorAxisAU     : num [1:3584] 0.2196 0.4123 0.0048 0.039 0.052 ...
##  $ Eccentricity        : num [1:3584] 0.037 0.05 NA NA NA 0.0626 0.0684 NA 0.06 NA ...
##  $ PeriastronDeg       : num [1:3584] 271 190 NA NA NA ...
##  $ LongitudeDeg        : num [1:3584] NA NA NA NA NA NA NA NA NA NA ...
##  $ AscendingNodeDeg    : num [1:3584] NA NA NA NA NA NA NA NA NA NA ...
##  $ InclinationDeg      : num [1:3584] NA NA 72 89.4 88.2 ...
##  $ SurfaceTempK        : num [1:3584] NA NA NA NA NA ...
##  $ AgeGyr              : logi [1:3584] NA NA NA NA NA NA ...
##  $ DiscoveryMethod     : chr [1:3584] "RV" "RV" "transit" "transit" ...
##  $ DiscoveryYear       : num [1:3584] 2016 2016 2012 NA NA ...
##  $ LastUpdated         : chr [1:3584] "16/07/11" "16/07/11" "13/07/15" NA ...
##  $ RightAscension      : chr [1:3584] "16 01 03" "16 01 03" "19 00 03.14" "19 00 03.14" ...
##  $ Declination         : chr [1:3584] "+33 18 13" "+33 18 13" "+40 13 14.7" "+40 13 14.7" ...
##  $ DistFromSunParsec   : num [1:3584] 17.2 17.2 NA NA NA ...
##  $ HostStarMassSlrMass : num [1:3584] 0.889 0.889 0.46 0.46 0.46 1.07 1.07 1.07 0.69 0.83 ...
##  $ HostStarRadiusSlrRad: num [1:3584] 1.36 1.36 0.45 0.45 0.45 ...
##  $ HostStarMetallicity : num [1:3584] -0.31 -0.31 0 0 0 0.12 0.12 0.12 NA -0.01 ...
##  $ HostStarTempK       : num [1:3584] 5627 5627 3584 3584 3584 ...
##  $ HostStarAgeGyr      : num [1:3584] NA NA NA NA NA NA NA NA NA NA ...
##  $ ListsPlanetIsOn     : chr [1:3584] "Confirmed planets" "Confirmed planets" "Controversial" "Controversial" ...
##  - attr(*, "problems")= tibble [2 x 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:2] 1681 2916
##   ..$ col     : chr [1:2] "AgeGyr" "AgeGyr"
##   ..$ expected: chr [1:2] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE"
##   ..$ actual  : chr [1:2] "0.0055" "3"
##   ..$ file    : chr [1:2] "'oec.csv'" "'oec.csv'"
##  - attr(*, "spec")=
##   .. cols(
##   ..   PlanetIdentifier = col_character(),
##   ..   TypeFlag = col_double(),
##   ..   PlanetaryMassJpt = col_double(),
##   ..   RadiusJpt = col_double(),
##   ..   PeriodDays = col_double(),
##   ..   SemiMajorAxisAU = col_double(),
##   ..   Eccentricity = col_double(),
##   ..   PeriastronDeg = col_double(),
##   ..   LongitudeDeg = col_double(),
##   ..   AscendingNodeDeg = col_double(),
##   ..   InclinationDeg = col_double(),
##   ..   SurfaceTempK = col_double(),
##   ..   AgeGyr = col_logical(),
##   ..   DiscoveryMethod = col_character(),
##   ..   DiscoveryYear = col_double(),
##   ..   LastUpdated = col_character(),
##   ..   RightAscension = col_character(),
##   ..   Declination = col_character(),
##   ..   DistFromSunParsec = col_double(),
##   ..   HostStarMassSlrMass = col_double(),
##   ..   HostStarRadiusSlrRad = col_double(),
##   ..   HostStarMetallicity = col_double(),
##   ..   HostStarTempK = col_double(),
##   ..   HostStarAgeGyr = col_double(),
##   ..   ListsPlanetIsOn = col_character()
##   .. )
# Box plot for Star temperature
  ggp1 <- ggplot(exoplanets) +
  geom_boxplot(aes(x = HostStarTempK)) +
  xlab("Star Temperature (degrees K)") +
  ggtitle("Star Temperature Distribution")

# Box plot for Exoplanet Temperature
  ggp2 <- ggplot(exoplanets) +
  geom_boxplot(aes(x = SurfaceTempK)) +
  xlab("Exoplanet Temperature (in degrees K)") +
  ggtitle("Exoplanet Temperature Distribution")
  
  grid.arrange(ggp1, ggp2, ncol = 2)
## Warning: Removed 129 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2843 rows containing non-finite values (stat_boxplot).

# Box plot for Star Mass
  ggp3 <- ggplot(exoplanets) +
  geom_boxplot(aes(x = HostStarMassSlrMass)) +
  xlab("Star Mass (in Solar Masses)") +
  ggtitle("Star Mass Distribution")

# Box plot for Exoplanet Mass
 ggp4 <-  ggplot(exoplanets) +
  geom_boxplot(aes(x = PlanetaryMassJpt)) +
  xlab("Exoplanet Mass (in Jupiter Masses)") +
  ggtitle("Exoplanet Mass Distribution")

grid.arrange(ggp3, ggp4, ncol = 2)
## Warning: Removed 168 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2271 rows containing non-finite values (stat_boxplot).

# Scatter plot comparing Star Mass with Exoplanet Mass
exoplanets_scatter <- ggplot(exoplanets, aes(x = PlanetaryMassJpt, y = HostStarMassSlrMass,)) + 
  ggtitle("Planet Mass vs Star Mass") + 
  xlab("Planet Mass (in Jupiter Masses)") + 
  ylab("Star Mass (in Solar Masses") + 
  geom_point(aes(color=DiscoveryMethod), size = 0.5, alpha = 0.5)
ggplotly(exoplanets_scatter)

Now without significant outliers:

e <- exoplanets %>% 
  filter(PlanetaryMassJpt<5)
  
e %>%  
  
  ggplot(aes(x = PlanetaryMassJpt, y = HostStarMassSlrMass)) + 
  ggtitle("Planet Mass vs Star Mass") + 
  xlab("Planet Mass (in Jupiter Masses)") + 
  ylab("Star Mass (in Solar Masses") + 
  geom_point(aes(color=DiscoveryMethod), size = 0.5, alpha = 0.5)
## Warning: Removed 56 rows containing missing values (geom_point).

Now all planets with lower than 0.25 Jupiter Masses

e <- exoplanets %>% 
  filter(PlanetaryMassJpt<0.25)
  
e %>%  
  
  ggplot(aes(x = PlanetaryMassJpt, y = HostStarMassSlrMass)) + 
  ggtitle("Planet Mass vs Star Mass") + 
  xlab("Planet Mass (in Jupiter Masses)") + 
  ylab("Star Mass (in Solar Masses") + 
  geom_point(aes(color=DiscoveryMethod), size = 1, alpha = 0.5) + 
  geom_smooth(method = lm, se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 43 rows containing non-finite values (stat_smooth).
## Warning: Removed 43 rows containing missing values (geom_point).

# Scatter plot comparing Star Metallicity with Planet Mass
e_metal <- exoplanets %>% 
  filter(PlanetaryMassJpt<10)
  
e_metal %>%  
  
  ggplot(aes(x = PlanetaryMassJpt, y = HostStarMetallicity)) + 
  ggtitle("Planet Mass vs Star Metallicity") + 
  xlab("Planet Mass (in Jupiter Masses)") + 
  ylab("Star Metallicity (in [Fe/H]") + 
  geom_point(aes(color=DiscoveryMethod), size = 1, alpha = 0.5) + 
  geom_smooth(method = lm, se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 217 rows containing non-finite values (stat_smooth).
## Warning: Removed 217 rows containing missing values (geom_point).

And again for planets with lower than 0.25 Jupiter Masses

e_metal <- exoplanets %>% 
  filter(PlanetaryMassJpt<0.25)
  
e_metal %>%  
  
  ggplot(aes(x = PlanetaryMassJpt, y = HostStarMetallicity)) + 
  ggtitle("Planet Mass vs Star Metallicity") + 
  xlab("Planet Mass (in Jupiter Masses)") + 
  ylab("Star Metallicity (in [Fe/H]") + 
  geom_point(aes(color=DiscoveryMethod), size = 1, alpha = 0.5) + 
  geom_smooth(method = lm, se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 126 rows containing non-finite values (stat_smooth).
## Warning: Removed 126 rows containing missing values (geom_point).

#3d Scatter plot comparing planet and star temperature
library(rgl)
exoplanets %>%
  filter(SemiMajorAxisAU<10)
## # A tibble: 1,384 x 25
##    PlanetIdentifier TypeFlag PlanetaryMassJpt RadiusJpt PeriodDays
##    <chr>               <dbl>            <dbl>     <dbl>      <dbl>
##  1 HD 143761 b             0           1.04      NA         39.8  
##  2 HD 143761 c             0           0.079     NA        103.   
##  3 KOI-1843.03             0           0.0014     0.054      0.177
##  4 KOI-1843.01             0          NA          0.114      4.19 
##  5 KOI-1843.02             0          NA          0.071      6.36 
##  6 Kepler-9 b              0           0.25       0.84      19.2  
##  7 Kepler-9 c              0           0.17       0.82      39.0  
##  8 Kepler-9 d              0           0.022      0.147      1.59 
##  9 GJ 160.2 b              0           0.0321    NA          5.24 
## 10 WASP-124 b              0           0.6        1.24       3.37 
## # ... with 1,374 more rows, and 20 more variables: SemiMajorAxisAU <dbl>,
## #   Eccentricity <dbl>, PeriastronDeg <dbl>, LongitudeDeg <dbl>,
## #   AscendingNodeDeg <dbl>, InclinationDeg <dbl>, SurfaceTempK <dbl>,
## #   AgeGyr <lgl>, DiscoveryMethod <chr>, DiscoveryYear <dbl>,
## #   LastUpdated <chr>, RightAscension <chr>, Declination <chr>,
## #   DistFromSunParsec <dbl>, HostStarMassSlrMass <dbl>,
## #   HostStarRadiusSlrRad <dbl>, HostStarMetallicity <dbl>, HostStarTempK <dbl>,
## #   HostStarAgeGyr <dbl>, ListsPlanetIsOn <chr>
mycolors <- c('royalblue1', 'darkcyan', 'oldlace', 'deeppink')
exoplanets$color <- mycolors[ as.numeric(exoplanets$"DiscoveryMethod") ]
## Warning: NAs introduced by coercion
plot3d(
  x=exoplanets$"SurfaceTempK", y=exoplanets$"HostStarTempK",
  col = exoplanets$color,
  type = "s",
  radius = 100,
  xlab="Planet Temperature (in degrees K)",  
  ylab="Star Temperature (in degrees K)"
)
rglwidget()
options(rgl.printRglwidget = TRUE)
#3d Scatter plot comparing Planet Temperature, Star Temperature and Semi-major Axis
library(rgl)

mycolors <- c('royalblue1', 'darkcyan', 'oldlace', 'deeppink')
exoplanets$color <- mycolors[ as.numeric(exoplanets$"DiscoveryMethod") ]
## Warning: NAs introduced by coercion
plot3d(
  x=exoplanets$"SurfaceTempK", y=exoplanets$"HostStarTempK", z=exoplanets$"HostStarMetallicity",
  col = exoplanets$color,
  type = "s",
  radius = 100,
  xlab="Planet Temperature (in degrees K)",  
  ylab="Star Temperature (in degrees K)", 
  zlab="Star Metallicity (in Fe/H)"
)
rglwidget()
options(rgl.printRglwidget = TRUE)

For the data, I first created boxplots for both temperature and mass of exoplanets and their host stars so I could compare the two as well as locate the general distribution and outliers. I found that stars and exoplanets are closely correlated in temperature but not as much so in terms of mass. Then I created a scatterplot comparing star mass and planet mass. The trend becomes easier to see when outliers are omitted, planet mass is positively correlated with star mass when looking at planet masses lower than 0.25 Jupiters. However the current data is biased in this regard, considering the fact that it is much easier to detect high mass planets orbiting low mass stars. In reality, we would see far more datapoints in the lower left section of the graph. Then I attempted to correlate planet mass with the star metallicity by the assumption that more metallic stars should produce planets that are more massive. Again, the positive trend was difficult to establish unless outliers were removed. Next I made a 3d scatterplots, one for Star Temperature & Planet Temperature and another for Star & Planet Temperature and Star Metallicity. One of the overarching issues that persisted throughout these visualizations is the bias that is present in the data: There are many higher mass planets being discovered orbiting around small, cooler stars since they are much easier to discover. If the data contained the real plot points for all the stars in the galaxy, it’s likely there would be a massive abundance of low mass planets. This I cannot say for certain that any of the assumptions I made based off these visualizations would be proved.