2023-05-05

NASA Exoplanet Archive

5,300 observations as of March 14th and growing…

Research Questions

  • Is there a difference in the type or quantity of exoplanets found with different discovery methods?
  • Is there any relationship between the planet orbital period (in days) and the planet’s radius, mass, or density?

Variables

313 Variables Narrowed Down to 10

Variable Definition Type Description
planet_name Planet Name character Planet name most commonly used in the literature
sy_snum Number of Stars numeric Number of stars in the planetary system
sy_pnum Number of Planets numeric Number of planets in the planetary system
discovery_method Discovery Method factor Method by which the planet was first identified
orbital_period Orbital Period [days] numeric Time the planet takes to make a complete orbit around the host star or system
planet_radius Planet Radius [Earth Radius] numeric Length of a line segment from the center of the planet to its surface, measured in units of radius of the Earth
planet_mass Planet Mass or Mass*sin(i) [Earth Mass] numeric Amount of matter contained in the planet, measured in units of masses of the Earth
planet_density Planet Density [g/cm**3] numeric Amount of mass per unit of volume of the planet
n_stars Number of Stars ordered Number of stars in the planetary system
n_planets Number of Planets ordered Number of planets in the planetary system

Summary Statistics

vars n mean sd min max range se
orbital_period 1 5,085 82,936.26 5,639,480.84 0.09 402,000,000.00 401,999,999.91 79,084.91
planet_radius 2 5,283 5.66 5.33 0.30 77.34 77.05 0.07
planet_mass 3 5,277 457.20 3,740.90 0.02 239,000.00 238,999.98 51.50
planet_density 4 5,185 4.26 22.07 0.03 1,290.00 1,289.97 0.31
sy_snum 5 5,300 1.10 0.34 1.00 4.00 3.00 0.00
n_stars* 6 5,300 1.10 0.34 1.00 4.00 3.00 0.00
sy_pnum 7 5,300 1.76 1.16 1.00 8.00 7.00 0.02
n_planets* 8 5,300 1.76 1.16 1.00 8.00 7.00 0.02

Summary Statistics by discovery_method

planet_radius
discovery_method n mean sd min max range se
Imaging 61 15.77 9.37 10.47 77.34 66.87 1.20
Microlensing 176 9.63 4.70 1.00 14.30 13.30 0.35
Radial Velocity 1,021 10.02 4.66 0.91 15.58 14.67 0.15
Transit 3,970 4.17 4.47 0.30 23.54 23.24 0.07
planet_mass
discovery_method n mean sd min max range se
Imaging 61 7,821.9 30,161 635.66 239,000 238,364 3,861.77
Microlensing 176 666.0 1,143 0.96 5,721 5,720 86.16
Radial Velocity 1,029 1,038.7 1,693 0.70 17,668 17,667 52.79
Transit 3,952 172.8 1,732 0.04 45,700 45,700 27.55

Summary Statistics by discovery_method

planet_density
discovery_method n mean sd min max range se
Imaging 61 9.54 7.57 0.18 33.5 33.31 0.97
Microlensing 176 2.58 3.21 0.25 17.3 17.05 0.24
Radial Velocity 1,019 3.64 5.08 0.23 62.2 61.97 0.16
Transit 3,873 4.29 24.65 0.03 1,290.0 1,289.97 0.40
orbital_period
discovery_method n mean sd min max range se
Microlensing 9 3,004.2 1,407.99 1,220.00 5,480 4,260 469.33
Imaging 17 24,681,683.4 97,265,430.25 2,090.00 402,000,000 401,997,910 23,590,331.92
Radial Velocity 1,029 1,882.5 5,565.09 0.74 77,114 77,113 173.49
Transit 3,970 25.4 89.97 0.18 3,650 3,650 1.43

Distributions

ANOVA / Tukey HSD

log10(planet_radius) ~ discovery_method

planet_radius

planet_radius

ANOVA / Tukey HSD

log10(planet_mass) ~ discovery_method

planet_mass

planet_mass

ANOVA / Tukey HSD

log10(planet_density) ~ discovery_method

planet_density

planet_density

ANOVA / Tukey HSD

log10(orbital_period) ~ discovery_method

orbital_period

orbital_period

Bivariate Linear Regression Models

planet_mass as a function of planet_radius log transformed

Observations 5210 (28 missing obs. deleted)
Dependent variable log10(planet_mass)
Type OLS linear regression
F(1,5208) 31743.932
0.859
Adj. R² 0.859
Est. S.E. t val. p
(Intercept) -0.010 0.009 -1.115 0.265
log10(planet_radius) 2.376 0.013 178.168 0.000
Standard errors: OLS

Confidence Intervals

2.5 % 97.5 %
(Intercept) -0.03 0.01
log10(planet_radius) 2.35 2.40






planet radius explains about 86% of the variance in planetary mass

Same Model Without Log Transformation

planet_mass as a function of planet_radius without log transformation

Observations 5210 (28 missing obs. deleted)
Dependent variable planet_mass
Type OLS linear regression
F(1,5208) 580.684
0.100
Adj. R² 0.100
Est. S.E. t val. p
(Intercept) -807.773 71.795 -11.251 0.000
planet_radius 223.992 9.295 24.097 0.000
Standard errors: OLS

Confidence Intervals

2.5 % 97.5 %
(Intercept) -948.5 -667.0
planet_radius 205.8 242.2






if not log transformed the \(Adj. R^2\) drops down to only about 10% of the variance in planet mass that can be explained by the radius

Bivariate Linear Regression Models

orbital_period as a function of planet_mass log transformed

Observations 5007 (231 missing obs. deleted)
Dependent variable log10(orbital_period)
Type OLS linear regression
F(1,5005) 1691.568
0.253
Adj. R² 0.252
Est. S.E. t val. p
(Intercept) 0.692 0.018 38.761 0.000
log10(planet_mass) 0.451 0.011 41.129 0.000
Standard errors: OLS

Confidence Intervals

2.5 % 97.5 %
(Intercept) 0.66 0.73
log10(planet_mass) 0.43 0.47






planet mass explains about 25% of the variance in the orbital period

Bivariate Linear Regression Models

orbital_period as a function of planet_radius log transformed

Observations 5017 (221 missing obs. deleted)
Dependent variable log10(orbital_period)
Type OLS linear regression
F(1,5015) 1156.542
0.187
Adj. R² 0.187
Est. S.E. t val. p
(Intercept) 0.729 0.019 37.446 0.000
log10(planet_radius) 0.980 0.029 34.008 0.000
Standard errors: OLS

Confidence Intervals

2.5 % 97.5 %
(Intercept) 0.69 0.77
log10(planet_radius) 0.92 1.04






planet radius explains about 19% of the variance in the orbital period

Bivariate Linear Regression Models

orbital_period as a function of planet_density log transformed

Observations 4918 (320 missing obs. deleted)
Dependent variable log10(orbital_period)
Type OLS linear regression
F(1,4916) 3.823
0.001
Adj. R² 0.001
Est. S.E. t val. p
(Intercept) 1.289 0.017 76.286 0.000
log10(planet_density) -0.059 0.030 -1.955 0.051
Standard errors: OLS

Confidence Intervals

2.5 % 97.5 %
(Intercept) 1.26 1.32
log10(planet_density) -0.12 0.00






planet density does not seem to explain much of the variance in the orbital period

Check for Multicollinearity

Correlation plot of log10 transformed numeric variables

log10(orbital_period) as a function of number of stars in the system, number of planets and log10() transformed numeric variables

Multivariable Regression Model

orbital_period as a function of log10(planet_mass) + log10(planet_density) + sy_pnum


Observations 4918 (320 missing obs. deleted)
Dependent variable log10(orbital_period)
Type OLS linear regression
F(3,4914) 653.215
0.285
Adj. R² 0.285
Est. S.E. t val. p VIF
(Intercept) 0.368 0.029 12.545 0.000 NA
log10(planet_mass) 0.507 0.011 44.162 0.000 1.116
log10(planet_density) 0.226 0.026 8.575 0.000 1.082
sy_pnum 0.099 0.010 10.208 0.000 1.051
Standard errors: OLS

Confidence Intervals

2.5 % 97.5 %
(Intercept) 0.31 0.43
log10(planet_mass) 0.48 0.53
log10(planet_density) 0.17 0.28
sy_pnum 0.08 0.12







Model Residuals

Diagnostic Plots

Conclusions

  • There is a statistically significant difference in the group means for planet radius, mass, density and orbital period for different discovery methods.
  • There is a statistically significant association between a planet’s orbital period and it’s mass or radius.
  • About 25% of the variance in planet orbital_period can be explained by the planet_mass alone.
  • About 19% of the variance in planet orbital_period can be explained by the planet_radius alone.
  • About 28% of the variance in planet orbital_period can be explained by a multivariable model that uses planet_mass, planet_density, and sy_pnum (number of planets in the system) as predictors (not much improvement over planet_mass alone!).