library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(pwrss)
## Warning: package 'pwrss' was built under R version 4.5.2
##
## Attaching package: 'pwrss'
##
## The following object is masked from 'package:stats':
##
## power.t.test
nasa_data <- read_delim("C:/Users/imaya/Downloads/cleaned_5250.csv",delim = ",")
## Rows: 5250 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, planet_type, mass_wrt, radius_wrt, detection_method
## dbl (8): distance, stellar_magnitude, discovery_year, mass_multiplier, radiu...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(nasa_data)
## # A tibble: 6 × 13
## name distance stellar_magnitude planet_type discovery_year mass_multiplier
## <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 11 Coma… 304 4.72 Gas Giant 2007 19.4
## 2 11 Ursa… 409 5.01 Gas Giant 2009 14.7
## 3 14 Andr… 246 5.23 Gas Giant 2008 4.8
## 4 14 Herc… 58 6.62 Gas Giant 2002 8.14
## 5 16 Cygn… 69 6.22 Gas Giant 1996 1.78
## 6 17 Scor… 408 5.23 Gas Giant 2020 4.32
## # ℹ 7 more variables: mass_wrt <chr>, radius_multiplier <dbl>,
## # radius_wrt <chr>, orbital_radius <dbl>, orbital_period <dbl>,
## # eccentricity <dbl>, detection_method <chr>
Do Neptune-like planets have a larger orbital radius than Super Earth planets?
Variables
Outcome variable (continuous): Orbital radius
Group 1: Neptune-like planets
Group 2: Super Earth planets
Null Hypothesis: There is no difference in the average orbital radius between Neptune-like planets and Super Earth planets.
Alternative Hypothesis: Neptune-like planets have a larger average orbital radius than Super Earth planets.
Significance Level (Alpha) Alpha is set to 0.05.
Power is set to 0.80.
nasa_data |>
filter(planet_type %in% c("Neptune-like", "Super Earth")) |>
filter(orbital_radius > 0) |>
ggplot() +
geom_boxplot(
mapping = aes(
x = factor(planet_type, levels = c("Super Earth", "Neptune-like")),
y = orbital_radius
),
notch = TRUE,
fill = "skyblue",
outlier.alpha = 0.2)+
scale_y_log10() +
labs(
title = "Orbital Radius of Neptune-like vs Super Earth Planets",
x = "Planet Type",
y = "Orbital Radius (AU, Log Scale)"
) +
theme_minimal()
I created the visualizations beforehand to explore the data and
assess whether this was a meaningful hypothesis to investigate. The
plots show a clear difference in orbital radius between Neptune-like and
Super Earth planets. This visual exploration guided the hypothesis and
provided preliminary evidence that the null hypothesis of no difference
could be rejected.
or_fd <- nasa_data |>
filter(planet_type %in% c("Neptune-like", "Super Earth")) |>
filter(orbital_radius > 0) |>
drop_na(orbital_radius)
or_fd$planet_type <- factor(or_fd$planet_type, levels = c("Neptune-like", "Super Earth"))
t_test_results <- t.test(
orbital_radius ~ planet_type,
data = or_fd,
alternative = "greater",
var.equal = FALSE
)
nasa_sd <- sd(or_fd$orbital_radius, na.rm = TRUE)
s_n <- pwrss.t.2means(mu1 = 0.5,
sd1 = nasa_sd,
kappa = 1,
power = 0.95,
alpha = 0.05,
alternative = "greater")
## +--------------------------------------------------+
## | SAMPLE SIZE CALCULATION |
## +--------------------------------------------------+
##
## Welch's T-Test (Independent Samples)
##
## ---------------------------------------------------
## Hypotheses
## ---------------------------------------------------
## H0 (Null Claim) : d - null.d <= 0
## H1 (Alt. Claim) : d - null.d > 0
##
## ---------------------------------------------------
## Results
## ---------------------------------------------------
## Sample Size = 13 and 13 <<
## Type 1 Error (alpha) = 0.050
## Type 2 Error (beta) = 0.037
## Statistical Power = 0.963
plot(s_n)
t_test_results
##
## Welch Two Sample t-test
##
## data: orbital_radius by planet_type
## t = 9.2206, df = 3171, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Neptune-like and group Super Earth is greater than 0
## 95 percent confidence interval:
## 0.09443775 Inf
## sample estimates:
## mean in group Neptune-like mean in group Super Earth
## 0.2249017 0.1099521
The power analysis determined that a minimum of 13 observations per group were required to achieve 95% statistical power at α = 0.05. The analyzed dataset included 13 observations per group, resulting in achieved power of 0.963, slightly exceeding the target power.
At the α = 0.05 significance level, we reject the null hypothesis. The p-value (< 2.2e -16) is substantially smaller than 0.05, indicating strong statistical evidence that Neptune-like planets have a larger average orbital radius than Super Earth planets.
The figure displays two probability density curves representing the sampling distributions under the null and alternative hypotheses. The small red shaded region under the null distribution corresponds to the Type I error rate (α = 0.05), which is the probability of incorrectly rejecting a true null hypothesis. The large gray shaded area under the alternative represents statistical power (0.963. The minimal overlap between the curves indicates a low Type II error rate (β ≈ 0.04), which shows the study has high power (≈ 0.96) to detect the assumed effect size
Research question: Do Neptune-like planets have more elliptical orbits than Super Earth planets?
Null Hypothesis: There is no difference in mean orbital eccentricity between Neptune-like and Super Earth planets.
Alternative Hypothesis: Neptune-like planets have higher mean orbital eccentricity than Super Earth planets.
ecc_fd <- nasa_data |>
filter(planet_type %in% c("Neptune-like", "Super Earth")) |>
drop_na(eccentricity)
ecc_summary <- ecc_fd |>
group_by(planet_type) |>
summarise(
mean_orbit = mean(eccentricity, na.rm = TRUE),
sd_orbit = sd(eccentricity, na.rm = TRUE),
n = n()
)
print(ecc_summary)
## # A tibble: 2 × 4
## planet_type mean_orbit sd_orbit n
## <chr> <dbl> <dbl> <int>
## 1 Neptune-like 0.0330 0.0926 1825
## 2 Super Earth 0.0172 0.0620 1595
ecc_fd$planet_type <- factor(ecc_fd$planet_type, levels = c("Neptune-like", "Super Earth"))
t_test_results <- t.test(
eccentricity ~ planet_type,
data = ecc_fd,
alternative = "greater",
var.equal = FALSE
)
t_test_results
##
## Welch Two Sample t-test
##
## data: eccentricity by planet_type
## t = 5.9425, df = 3209.5, p-value = 1.554e-09
## alternative hypothesis: true difference in means between group Neptune-like and group Super Earth is greater than 0
## 95 percent confidence interval:
## 0.01146117 Inf
## sample estimates:
## mean in group Neptune-like mean in group Super Earth
## 0.03300422 0.01715473
A Welch two-sample t-test compared orbital eccentricity between Neptune-like and Super Earth planets. The p-value (1.554e-09) is much smaller than 0.05, providing strong evidence against the null hypothesis. Neptune-like planets have a higher mean eccentricity (0.0330) than Super Earth planets (0.0172).
nasa_data |>
filter(planet_type %in% c("Neptune-like", "Super Earth")) |>
drop_na(eccentricity) |>
ggplot() +
geom_boxplot(
mapping = aes(
x = factor(planet_type, levels = c("Super Earth", "Neptune-like")),
y = eccentricity
),
notch = TRUE,
fill = "skyblue",
outlier.alpha = 0.2
) +
labs(
title = "Orbital Eccentricity of Neptune-like vs Super Earth Planets",
x = "Planet Type",
y = "Eccentricity"
) +
theme_minimal()
It is difficult to distinguish the differences visually because most planets in this dataset have very low eccentricities, causing the boxes to sit near the bottom of the scale. However, the underlying data confirms a clear trend, Neptune-like planets have a higher mean eccentricity (0.0330) compared to Super Earth planets (0.0172).