Kepler's Conundrum

alt text

Planetary Data

First, read in the data and take a look at the training data

train <- read.csv('shared/planetary_objects_train.csv')
test <- read.csv('shared/planetary_objects_test.csv')
train
   Planet  Period Distance
1 Mercury  0.2408   0.3871
2   Venus  0.6150   0.7233
3   Earth  1.0000   1.0000
4    Mars  1.8808   1.5237
5 Jupiter 11.8618   5.2043
6  Saturn 29.4571   9.5820

The Training Set

The training set has the orbital period, in Earth years, and the orbital distance, in Astonomical Units (AU) for 6 planets. plot of chunk unnamed-chunk-4

Build a Model

You goal is to build the best model to predict orbital period from orbital distance:

simple_model <- lm(Period~Distance, data=train)
coef(simple_model) #use summary(simple_model) to see more detail
(Intercept)    Distance 
     -2.212       3.166 
train$preds <- predict(simple_model, train)

How does the model look for the training set?

train
   Planet  Period Distance    preds
1 Mercury  0.2408   0.3871 -0.98627
2   Venus  0.6150   0.7233  0.07839
3   Earth  1.0000   1.0000  0.95446
4    Mars  1.8808   1.5237  2.61268
5 Jupiter 11.8618   5.2043 14.26714
6  Saturn 29.4571   9.5820 28.12915

Not bad… except for Mercury and Venus.

When you have the model you want

Use it to make predictions on the test set

test$Period <- predict(simple_model, test)
test 
   Object    Period Distance
1 Apophis    0.7088   0.9224
2   Vesta    5.2672   2.3620
3   Ceres    6.5445   2.7654
4  Hygiea    7.7275   3.1390
5  Uranus   58.6760  19.2290
6 Neptune   93.1103  30.1037
7   Pluto  122.1162  39.2640
8    Eris  213.1395  68.0100
9   Sedna 1639.8232 518.5700

Then send me those predictions

Write the predictions to a .csv file

write.csv(test, 'my_stellar_predictions.csv', row.names=FALSE)

Then download the file and email it to me. I'll serve as Kaggle and see who built the best model

One Last (but potentially quite important) Note!

If you build a model that predicts a function of orbital Period:

funny_model <- lm(I(Period^2)~Distance, data=train)

Then your predictions will be for that function of orbital period and you need to use the inverse of that function to make your predictions… IOW:

test$pred <- sqrt(predict(funny_model, test))