Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
######################### Warning from 'xts' package ##########################
# #
# The dplyr lag() function breaks how base R's lag() function is supposed to #
# work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
# source() into this session won't work correctly. #
# #
# Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
# conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
# dplyr from breaking base R's lag() function. #
# #
# Code in packages is not affected. It's protected by R's namespace mechanism #
# Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
# #
###############################################################################
Attaching package: 'xts'
The following objects are masked from 'package:dplyr':
first, last
library(tsibble)
Attaching package: 'tsibble'
The following object is masked from 'package:zoo':
index
The following object is masked from 'package:lubridate':
interval
The following objects are masked from 'package:base':
intersect, setdiff, union
Rows: 5250 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): name, planet_type, mass_wrt, radius_wrt, detection_method
dbl (8): distance, stellar_magnitude, discovery_year, mass_multiplier, radiu...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The dataset used in this project is a processed version of data from the NASA Exoplanet Archive. While the original data was collected through various astronomical observations, this version was reorganized by a third-party contributor on Kaggle to improve clarity and usability for analysis.
Each row in the dataset represents a confirmed exoplanet, while each column contains specific information about that planet???s physical and orbital characteristics. The dataset contains approximately 5,250 entries and 13 variables, including distance from Earth, mass, radius, orbital radius, orbital period, eccentricity, and detection method. The dataset includes multiple planet types, such as gas giants, super-Earths, Neptune-like planets, terrestrial planets, and unknown classifications, allowing for comparisons across different categories. The dataset also includes planets discovered across a range of years, enabling analysis of how detection methods and discoveries have evolved over time. The cleaned and structured format improves data organization, making it easier to identify patterns and relationships between variables.
Variable Description
name : The name or designation of the exoplanet.
distance: The distance from Earth to the star system where the planet is located
stellar_magnitude: The brightness of the host star as seen from Earth; lower values indicate brighter stars
planet_type: The classification of the planet (e.g., Gas Giant, Terrestrial)
discovery_year: The year the planet was discovered
mass_multiplier: The mass of the planet relative to a reference planet
mass_wrt: The reference unit used for mass comparison
radius_multiplier: The radius of the planet relative to a reference planet
radius_wrt: The reference unit used for radius comparison
orbital_radius: The average distance between the planet and its host star
orbital_period: The time it takes for the planet to complete one orbit around its star
eccentricity: A measure of how elliptical the planets orbit is, where 0 is a perfect circle and values closer to 1 indicate more elongated orbits
detection_method: The technique used to discover the planet (e.g., Radial Velocity, Direct Imaging, Eclipse Timing Variations)
Audience
This analysis is intended for space researchers and individuals interested in astronomy and exoplanet systems. The purpose of this project is to explore similarities and differences between exoplanet systems using observational data.
The audience may include data science teams at space research organizations or educational astronomy platforms. These users are assumed to have a basic understanding of astronomy and data analysis, but are primarily interested in identifying patterns in exoplanet systems rather than detailed astrophysical theory.
Project Information
The International Astronomical Union developed a standardized system for naming exoplanets in order to keep track of the rapidly growing number of discoveries. As detection methods continue to improve, the number of known planets has increased significantly, making a consistent naming convention essential.
Under this system, planets are named after their host star, followed by a lowercase letter that indicates the order in which the planet was discovered. For example, the first planet discovered in a system is labeled b, the second c, and so on. This naming structure allows astronomers to easily identify planets that belong to the same system.
This above image shows the exoplanet HIP 65426 b, captured by the James Webb Space Telescope. It serves as a clear example of how exoplanets are named in relation to their host star. In this case, the star is called HIP 65426, and the planet orbiting it is labeled HIP 65426 b
Exploration Question: How do planets within the same system compare in orbital and physical properties?
Project Goals
The goal of this project is to analyze how planets that are closer to their host star compare in their physical and orbital properties, and to determine whether any observed differences can be explained by planetary formation processes.
Current Limitation
A key limitation of this dataset is the uncertainty regarding completeness within individual planetary systems. It is not guaranteed that all planets orbiting a given host star are included, which may introduce observational bias and affect comparisons between planets in the same system. Additionally, detection methods are more likely to identify larger or closer planets, meaning smaller or more distant planets may be underrepresented.
Exploration Data Analysis
This code creates a new column called system_id by using str_replace() to remove the final lowercase letter (such as b or c) from each planet’s name. This group’s planets are from the same star system under a single ID, allowing them to be analyzed together.
This code groups the data by system_id and counts the number of planets in each system. It then sorts the systems in descending order to identify those with the most planets. To analyze how planets within the same system compare in orbital and physical properties, we would focus on the top four systems with the highest number of exoplanets.
top_systems |>group_by(system_id) |>summarise(num_planets =n()) |>ggplot(aes(x =reorder(system_id, num_planets), y = num_planets)) +geom_col(fill ="steelblue") +coord_flip() +labs(title ="Number of Planets per System",x ="System",y ="Planet Count")
The bar graph shows the top four systems and the number of exoplanets within each system. This highlights differences in planetary abundance across systems.
Exploring discovery year and detection method, while not physical or orbital properties, provides useful context for understanding how exoplanets are identified across different systems and may help explain patterns observed in the data.
# A tibble: 4 × 4
system_id min_year max_year span_years
<chr> <dbl> <dbl> <dbl>
1 HD 191939 2020 2022 3
2 TRAPPIST-1 2016 2017 2
3 HD 10180 2010 2010 1
4 HD 219134 2015 2015 1
The data shows that most exoplanets within each system were discovered within a short time span of one another. For example, systems such as HD 191939 and TRAPPIST-1 have discovery spans of only a few years. This suggests that once an initial exoplanet is detected within a system, it often leads to further discoveries in that same system over a relatively short period.
This pattern may be due to increased observational focus after the first discovery, as astronomers continue to study the same star using similar detection methods. It could also indicate that certain systems are more suitable for detection techniques, making it easier to identify multiple planets once one has already been found.
`summarise()` has grouped output by 'system_id'. You can override using the
`.groups` argument.
# A tibble: 5 × 3
# Groups: system_id [4]
system_id detection_method count
<chr> <chr> <int>
1 HD 10180 Radial Velocity 6
2 HD 191939 Transit 4
3 HD 191939 Radial Velocity 2
4 HD 219134 Radial Velocity 6
5 TRAPPIST-1 Transit 7
The method exploration investigates the different detection techniques used to discover exoplanets within each system. The results show that most systems rely on a small number of detection methods, with some systems using only one method and others using two. This suggests that once a detection method is effective for a system, it is often used repeatedly to identify additional planets.
The second table compares detection methods within each system and shows how many planets were discovered using each method. For example, some systems have most of their planets detected using Radial Velocity or Transit methods. This highlights that certain detection techniques are more dominant within specific systems.
However, this also suggests a limitation in the data, as some systems may not have been explored using multiple detection methods. As a result, there may be undiscovered planets that could potentially be identified if alternative detection techniques were applied.
ggplot(top_systems, aes(x = system_id, fill = planet_type)) +geom_bar(position ="fill") +scale_fill_manual(values =c("Gas Giant"="#A6CEE3","Terrestrial"="#FDBF6F","Neptune-like"="#B2DF8A","Super Earth"="#CAB2D6","Unknown"="#D9D9D9" )) +labs(title ="Planet Type Distribution by System",x ="System",y ="Proportion")
This code shows the distribution of planet types across different systems. The results suggest that similar planet types may be found within the same system, indicating that planetary composition could be influenced by the conditions present during system formation.
For example, HD 10180 contains a mix of Neptune-like planets and gas giants, which are larger exoplanets. In contrast, TRAPPIST-1 is made up of smaller planets such as terrestrial planets and super-Earths, with no gas giants present. This difference highlights how systems can vary significantly in their planetary structure and composition.
ggplot(top_systems, aes(x = system_id, y = eccentricity)) +geom_boxplot(fill ="skyblue") +labs(title ="Orbital Eccentricity by System",x ="System",y ="Eccentricity")
The box plot shows the distribution of eccentricity for each system. Overall, most systems display relatively similar eccentricity levels, with only a few outliers. Interestingly, HD 191939 and TRAPPIST-1 show consistently low eccentricity values across their planets, suggesting more uniform and stable orbital patterns within these systems.
This may indicate that eccentricity is influenced by the overall properties and formation conditions of a system. For example, gravitational interactions refer to the gravitational forces between planets, which can alter their orbits over time and affect orbital eccentricity. Additionally, the protoplanetary disk, a rotating disk of gas and dust surrounding a young star where planets form, can also influence the structure of a system and how circular or eccentric planetary orbits become.
Analysis and Support
Within-System Variation
This code converts planet mass and radius values into a common unit (Earth equivalents) to allow for direct comparison between exoplanets. Since some values are given relative to Jupiter and others relative to Earth, standardising them ensures consistency across the dataset.
This code compares the distribution of planet masses within each system by calculating the mean, standard deviation, minimum, maximum, and overall range of mass values
The results indicate that there are clear differences in mass variation between systems. For example, TRAPPIST-1 has a very small mass range and low standard deviation, suggesting its planets are relatively similar in size. In contrast, systems such as HD 191939 show a much larger range and higher standard deviation, indicating a wide variation in planet masses within the system.
Exoplanet Systems Radius
This code compares the distribution of planet radius within each system by calculating the mean, standard deviation, minimum, maximum, and overall range of radius values.
The results show differences in radius variation between systems. TRAPPIST-1 has very low mean radius and a very small standard deviation and range, indicating that its planets are consistently small and very similar in size. In contrast, systems such as HD 219134 and HD 191939 show much larger ranges and higher standard deviations, suggesting a wider variation in planet sizes within those systems.
These differences in planetary radius may be influenced by variations in planet type and mass within each system. For example, gas giants tend to have much larger radii compared to terrestrial planets, and higher-mass planets are generally associated with larger sizes. As a result, systems containing more gas giants are likely to show greater variation in radius, while systems dominated by smaller, rocky planets tend to be more uniform.
top_systems |>ggplot(aes(x = system_id, y = mass_est_earth)) +geom_boxplot() +labs(title ="Mass Variation Within Each Planetary System")
top_systems |>ggplot(aes(x = system_id, y = radius_est_earth)) +geom_boxplot() +labs(title ="Radius Variation Within Each Planetary System")
ggplot(top_systems, aes(x = mass_est_earth, y = radius_est_earth, color = system_id)) +geom_point() +geom_smooth(method ="lm", se =FALSE) +labs(title ="Relationship Between Mass and Radius by System",x ="Mass (Earth units)",y ="Radius (Earth units)")
`geom_smooth()` using formula = 'y ~ x'
The data demonstrates a positive correlation between mass and radius across all observed planetary systems; as a planet’s mass increases, its radius generally increases as well. However, the strength of this relationship varies between systems, as shown by differences in the slope of the trend. This suggests that while the mass, radius relationship is a general pattern, the specific composition and density of planets may be influenced by the conditions within their individual planetary systems.
This analysis supports the research question by showing that there is a consistent positive relationship between mass and radius across different systems. This suggests that, regardless of the system, larger planets tend to have greater radius, indicating a general physical relationship between these properties.
However, the variation in the strength of this relationship between systems shows that planetary characteristics are not identical across systems. Instead, differences in composition, density, and formation conditions may cause planets within and between systems to behave differently.
Regression Models
model_orbitt1 <-lm(orbital_radius ~ radius_est_earth, data = top_systems)summary(model_orbitt1)
Call:
lm(formula = orbital_radius ~ radius_est_earth, data = top_systems)
Residuals:
Min 1Q Median 3Q Max
-2.07765 -0.24328 0.03922 0.15236 1.65670
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.25450 0.18686 -1.362 0.186
radius_est_earth 0.21065 0.03353 6.282 2.07e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6456 on 23 degrees of freedom
Multiple R-squared: 0.6318, Adjusted R-squared: 0.6158
F-statistic: 39.46 on 1 and 23 DF, p-value: 2.074e-06
A linear regression model was used to investigate whether larger planets tend to be further from their star by examining the relationship between planet radius and orbital radius.
The results show a statistically significant positive relationship between planet radius and orbital radius, with an R² value of 0.63, showing that planet size explains a large amount of the differences in orbital radius. This suggests that larger planets in the dataset are generally found at greater distances from their star.
Relating this back to the research question, the findings suggest that there are consistent physical, orbital relationships within planetary systems, where planet size is associated with orbital distance. This indicates that while planets within systems may differ in size, their physical properties still follow observable patterns.
model_orbitt2 <-lm(orbital_period ~ mass_est_earth, data = top_systems)summary(model_orbitt2)
Call:
lm(formula = orbital_period ~ mass_est_earth, data = top_systems)
Residuals:
Min 1Q Median 3Q Max
-1.2798 -0.4871 -0.4495 -0.3713 4.9142
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.448782 0.323756 1.386 0.178987
mass_est_earth 0.009875 0.002369 4.168 0.000371 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.529 on 23 degrees of freedom
Multiple R-squared: 0.4302, Adjusted R-squared: 0.4055
F-statistic: 17.37 on 1 and 23 DF, p-value: 0.0003708
A linear regression model was used to investigate whether planet mass is related to orbital period across the top systems.
The results show a statistically significant positive relationship between mass and orbital period. The R² value of 0.43 indicates that planet mass has a moderate influence on orbital period, suggesting that physical properties may be linked to orbital behavior within planetary systems.
This relationship may reflect how physical properties are linked to orbital behavior within planetary systems. It connects back to the main research question by showing that while planets within systems vary in their physical characteristics, these properties can also be associated with differences in how they orbit their stars.
model_er <-lm(eccentricity ~ radius_est_earth, data = top_systems)summary(model_er)
Call:
lm(formula = eccentricity ~ radius_est_earth, data = top_systems)
Residuals:
Min 1Q Median 3Q Max
-0.06682 -0.03989 -0.02315 0.01648 0.20479
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.048455 0.018805 2.577 0.0169 *
radius_est_earth 0.001377 0.003375 0.408 0.6871
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.06497 on 23 degrees of freedom
Multiple R-squared: 0.007183, Adjusted R-squared: -0.03598
F-statistic: 0.1664 on 1 and 23 DF, p-value: 0.6871
A linear regression model was used to investigate whether planet radius is related to orbital eccentricity across the top systems.
The results show no statistically significant relationship between planet radius and eccentricity (p = 0.687). The R² value of 0.007 indicates that planet radius has almost no influence on eccentricity. This suggests that planet size does not have a meaningful influence on how elliptical a planet’s orbit is in this dataset.
Relating this back to the research question, the findings suggest that while planets within systems may vary in physical size, orbital shape is likely influenced by other factors such as gravitational interactions or system formation conditions rather than planet radius.
Conclusion
In conclusion, the data provide evidence that planetary systems show relationships between the physical and orbital properties of exoplanets. Several variables, including mass, radius, orbital radius, and orbital period, display statistically significant relationships, while others, such as radius and eccentricity, show little to no association.
Overall, the results suggest that certain planetary properties are strongly related, particularly physical size and orbital characteristics, while other relationships are weak or not present. This indicates that exoplanet properties are not entirely independent within systems, but instead are influenced by formation and development within the protoplanetary disk and their position within the system.
These patterns may be linked to differences in planetary formation environments, particularly the protoplanetary disk. As this disk provides the material from which planets form, variations in its composition and structure could help explain the differences in planetary types and physical properties observed across systems.
Overall, while exoplanets vary significantly between systems, the findings highlight consistent relationships between certain physical and orbital properties, suggesting that system-level formation processes play an important role in shaping planetary characteristics.