Jared Cross
11/28/2018
alt text
Using the Tycho Brahe’s astonomical observations, Kepler deduced three mathematical laws of planetary motion. Three quarters of a century later, Isaac Newton would discover the physics underlying these relationships. Kepler’s third law related the orbital period of planets to their distance from the sun (or more accurately, the semi-major axis of their elliptical orbit).
In this lab, you will try ot deduce this mathematical relatinship based on astronimal data similar to what Kepler had on hand in the early 17th centry. This is your training data. Your model will be tested on the astonomical orbit that were discovered in later years (the test set). Your model will show its scientific worth by its ability to make accurate predicitons on this test set.
train <- read.csv('/home/rstudioshared/shared_files/data/planetary_objects_train.csv')
test <- read.csv('/home/rstudioshared/shared_files/data/planetary_objects_test.csv')
train
## Planet Period Distance
## 1 Mercury 0.240846 0.387098
## 2 Venus 0.615000 0.723327
## 3 Earth 1.000000 1.000000
## 4 Mars 1.880800 1.523679
## 5 Jupiter 11.861800 5.204267
## 6 Saturn 29.457100 9.582017
The training set has the orbital period, in Earth years, and the orbital distance, in Astonomical Units (AU) for 6 planets.
library(ggplot2); library(dplyr)
train %>% ggplot(aes(Distance, Period))+geom_point()+geom_text(aes(label=Planet))+
xlab("Distance (AU)")+ylab("Period (years)")
These planetary objects had not yet been observed when Kepler constructed his model but the right model could predict their period.
Object | Period | Distance |
---|---|---|
Apophis | 0 | 0.92241 |
Vesta | 0 | 2.36200 |
Ceres | 0 | 2.76540 |
Hygiea | 0 | 3.13900 |
Uranus | 0 | 19.22900 |
Neptune | 0 | 30.10366 |
Pluto | 0 | 39.26400 |
Eris | 0 | 68.01000 |
Sedna | 0 | 518.57000 |
You goal is to build the best model to predict orbital period from orbital distance:
simple_model <- lm(Period~Distance, data=train)
coef(simple_model) #use summary(simple_model) to see more detail
## (Intercept) Distance
## -2.212003 3.166468
train$preds <- predict(simple_model, train)
train
## Planet Period Distance preds
## 1 Mercury 0.240846 0.387098 -0.9862699
## 2 Venus 0.615000 0.723327 0.0783884
## 3 Earth 1.000000 1.000000 0.9544645
## 4 Mars 1.880800 1.523679 2.6126772
## 5 Jupiter 11.861800 5.204267 14.2671404
## 6 Saturn 29.457100 9.582017 28.1291453
Not bad… except for Mercury and Venus.
train %>% ggplot(aes(Distance, Period))+geom_point()+geom_text(aes(label=Planet))+
xlab("Distance (AU)")+ylab("Period (years)")+
geom_abline(aes(intercept=coef(simple_model)[1], slope=coef(simple_model)[2]), col="blue")
Use it to make predictions on the test set
test$Period <- predict(simple_model, test)
test
## Object Period Distance
## 1 Apophis 0.7087783 0.92241
## 2 Vesta 5.2671936 2.36200
## 3 Ceres 6.5445467 2.76540
## 4 Hygiea 7.7275391 3.13900
## 5 Uranus 58.6760053 19.22900
## 6 Neptune 93.1102657 30.10366
## 7 Pluto 122.1161869 39.26400
## 8 Eris 213.1394692 68.01000
## 9 Sedna 1639.8231836 518.57000
If you build a model that predicts a function of orbital Period:
funny_model <- lm(I(Period^2)~Distance, data=train)
Then your predictions will be for that function of orbital period and you need to use the inverse of that function to make your predictions… IOW:
test$pred <- sqrt(predict(funny_model, test))
Write the predictions to a .csv file
write.csv(test, 'my_stellar_predictions.csv', row.names=FALSE)
And download this .csv file and email it to me. I will act as Kaggle in this case.
I will judge your predictions using average absolute percent error:
avg.percent.error <- function(pred,act) {
100*mean(abs(pred-act)/act)
}