Kepler’s 3rd Law

Jared Cross

11/28/2018

Kepler’s Mathematical Models (circa 1619)

alt text

alt text

Johannes Kepler

Using the Tycho Brahe’s astonomical observations, Kepler deduced three mathematical laws of planetary motion. Three quarters of a century later, Isaac Newton would discover the physics underlying these relationships. Kepler’s third law related the orbital period of planets to their distance from the sun (or more accurately, the semi-major axis of their elliptical orbit).

In this lab, you will try ot deduce this mathematical relatinship based on astronimal data similar to what Kepler had on hand in the early 17th centry. This is your training data. Your model will be tested on the astonomical orbit that were discovered in later years (the test set). Your model will show its scientific worth by its ability to make accurate predicitons on this test set.

Planetary Data

train <- read.csv('/home/rstudioshared/shared_files/data/planetary_objects_train.csv')
test <- read.csv('/home/rstudioshared/shared_files/data/planetary_objects_test.csv')
train
##    Planet    Period Distance
## 1 Mercury  0.240846 0.387098
## 2   Venus  0.615000 0.723327
## 3   Earth  1.000000 1.000000
## 4    Mars  1.880800 1.523679
## 5 Jupiter 11.861800 5.204267
## 6  Saturn 29.457100 9.582017

A Plot of the Training Set

The training set has the orbital period, in Earth years, and the orbital distance, in Astonomical Units (AU) for 6 planets.

library(ggplot2); library(dplyr)

train %>% ggplot(aes(Distance, Period))+geom_point()+geom_text(aes(label=Planet))+
  xlab("Distance (AU)")+ylab("Period (years)")

The Test Set

These planetary objects had not yet been observed when Kepler constructed his model but the right model could predict their period.

Object Period Distance
Apophis 0 0.92241
Vesta 0 2.36200
Ceres 0 2.76540
Hygiea 0 3.13900
Uranus 0 19.22900
Neptune 0 30.10366
Pluto 0 39.26400
Eris 0 68.01000
Sedna 0 518.57000

Build a Model

You goal is to build the best model to predict orbital period from orbital distance:

simple_model <- lm(Period~Distance, data=train)
coef(simple_model) #use summary(simple_model) to see more detail
## (Intercept)    Distance 
##   -2.212003    3.166468
train$preds <- predict(simple_model, train)

How does the model look for the training set?

train
##    Planet    Period Distance      preds
## 1 Mercury  0.240846 0.387098 -0.9862699
## 2   Venus  0.615000 0.723327  0.0783884
## 3   Earth  1.000000 1.000000  0.9544645
## 4    Mars  1.880800 1.523679  2.6126772
## 5 Jupiter 11.861800 5.204267 14.2671404
## 6  Saturn 29.457100 9.582017 28.1291453

Not bad… except for Mercury and Venus.

How does the model look graphically

train %>% ggplot(aes(Distance, Period))+geom_point()+geom_text(aes(label=Planet))+
  xlab("Distance (AU)")+ylab("Period (years)")+
  geom_abline(aes(intercept=coef(simple_model)[1], slope=coef(simple_model)[2]), col="blue")

When you have the model you want

Use it to make predictions on the test set

test$Period <- predict(simple_model, test)
test 
##    Object       Period  Distance
## 1 Apophis    0.7087783   0.92241
## 2   Vesta    5.2671936   2.36200
## 3   Ceres    6.5445467   2.76540
## 4  Hygiea    7.7275391   3.13900
## 5  Uranus   58.6760053  19.22900
## 6 Neptune   93.1102657  30.10366
## 7   Pluto  122.1161869  39.26400
## 8    Eris  213.1394692  68.01000
## 9   Sedna 1639.8231836 518.57000

One Last (but potentially quite important) Note!

If you build a model that predicts a function of orbital Period:

funny_model <- lm(I(Period^2)~Distance, data=train)

Then your predictions will be for that function of orbital period and you need to use the inverse of that function to make your predictions… IOW:

test$pred <- sqrt(predict(funny_model, test))

Then send me your predictions

Write the predictions to a .csv file

write.csv(test, 'my_stellar_predictions.csv', row.names=FALSE)

And download this .csv file and email it to me. I will act as Kaggle in this case.

Scoring Function

I will judge your predictions using average absolute percent error:

avg.percent.error <- function(pred,act) {  
  
  100*mean(abs(pred-act)/act)  
  
  }