First, read in the data and take a look at the training data
train <- read.csv('shared/planetary_objects_train.csv')
test <- read.csv('shared/planetary_objects_test.csv')
train
Planet Period Distance
1 Mercury 0.2408 0.3871
2 Venus 0.6150 0.7233
3 Earth 1.0000 1.0000
4 Mars 1.8808 1.5237
5 Jupiter 11.8618 5.2043
6 Saturn 29.4571 9.5820
The training set has the orbital period, in Earth years, and the orbital distance, in Astonomical Units (AU) for 6 planets.
You goal is to build the best model to predict orbital period from orbital distance:
simple_model <- lm(Period~Distance, data=train)
coef(simple_model) #use summary(simple_model) to see more detail
(Intercept) Distance
-2.212 3.166
train$preds <- predict(simple_model, train)
train
Planet Period Distance preds
1 Mercury 0.2408 0.3871 -0.98627
2 Venus 0.6150 0.7233 0.07839
3 Earth 1.0000 1.0000 0.95446
4 Mars 1.8808 1.5237 2.61268
5 Jupiter 11.8618 5.2043 14.26714
6 Saturn 29.4571 9.5820 28.12915
Not bad… except for Mercury and Venus.
Use it to make predictions on the test set
test$Period <- predict(simple_model, test)
test
Object Period Distance
1 Apophis 0.7088 0.9224
2 Vesta 5.2672 2.3620
3 Ceres 6.5445 2.7654
4 Hygiea 7.7275 3.1390
5 Uranus 58.6760 19.2290
6 Neptune 93.1103 30.1037
7 Pluto 122.1162 39.2640
8 Eris 213.1395 68.0100
9 Sedna 1639.8232 518.5700
Write the predictions to a .csv file
write.csv(test, 'my_stellar_predictions.csv', row.names=FALSE)
Then download the file and email it to me. I'll serve as Kaggle and see who built the best model
If you build a model that predicts a function of orbital Period:
funny_model <- lm(I(Period^2)~Distance, data=train)
Then your predictions will be for that function of orbital period and you need to use the inverse of that function to make your predictions… IOW:
test$pred <- sqrt(predict(funny_model, test))