Performance Information
Performance Information | |
---|---|
Execution Mode | Single-Machine |
Number of Threads | 2 |
Specialisation | Data Analysis and Interpretation |
Course | Machine Learning for Data Analysis |
Education Institution | Wesleyan University |
Publisher | Coursera |
Assignment | Running a Random Forest |
Besides the historical fascination on Mars due to its proximity and some similarities with Earth, the collection of facts allow scientists to put together a big jigsaw puzzle.
An announcement made by NASA just recently about evidence of flowing liquid water on the surface of Mars just adds to all that is known and the curiosity, or even need, to find out much more.
When trying to identify an association of the physical and geographical characteristics of a crater with its morphology using a classification tree algorithm it is possible to identify a good number for some particular and more common formations. The identification does vary depending on the kind of formation and there are some characteristics that are common to some formations.
With all the talks about Mars, including both Science and Fiction (a Ridley Scott’s movie called The Martian, based on the book with the same name, by Andy Weir was released on early Oct/15), the data set chosen for this assignment is the Mars Craters.
The Mars Craters Study, presents a global database that includes 378,540 Mars craters, with diameter of 1 km or larger, that were created between 4.2 and 3.8 billion years ago during a period of heavy bombardment (i.e. impacts of asteroids, proto-planets, and comets).
The data set was made available by Wesleyan University/Coursera as part of the Data Analysis and Interpretation Specialisation, from the Ph.D. Thesis Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database (2011) by Robbins, S.J., University of Colorado at Boulder.
The data set provides a catalogue of craters on Mars. The initial thoughts are about checking for patterns that could identify specific major events that might have happened and that would have significant impact on Mars’ geology, climate and life as a planetary body.
As the initial data set has only nine variables, they could all the relevant to formulate hypothesis and help in leading to a conclusion, so all the variables will be kept for this assignment.
MorphoE1_RD is a categorical variable that has value “Yes” is the primary morphology is classified as “Rd” (Radial), and “No” otherwise.
Quadrangle is a variable derived from both LATITUDE_CIRCLE_IMAGE and LONGITUDE_CIRCLE_IMAGE variables. (see below a definition from Wikipedia)
Each Quadrangle has approximately from one to five percent of the recorded craters, being MC-16, Memnonia the one with the most observations (20455 = 5.32%), and MC-10: Lunae Palus the one with the lower number of records (3478 = 0.90%).
List of quadrangles on Mars (Wikipedia):
The surface of Mars has been divided into 30 quadrangles by the United States Geological Survey, so named because their borders lie along lines of latitude and longitude and so maps appear rectangular. Martian quadrangles are named after local features and are numbered with the prefix “MC” for “Mars Chart”. West longitude is used.
The following imagemap of the planet Mars is divided into 30 linked quadrangles. North is at the top; 0°N 180°W is at the far left on the equator. The map images were taken by the Mars Global Surveyor.
From Wikipedia, Source: http://photojournal.jpl.nasa.gov/catalog/PIA03467
Is the morphology of a crater strongly associate with its physical and geographical characteristics?
The variables Quadrangle, DIAM_CIRCLE_IMAGE, DEPTH_RIMFLOOR_TOPOG and NUMBER_LAYERS were used to classify the main characteristic of MORPHOLOGY_EJECTA_1 as “Rd” (Radial) or not (coded as “Yes” or “No” in the variable MorphoE1_RD).
The number of valid observations was 44625. Using a seed of 6587 the Misclassification Rate was 39.3%, conversely the Success Rate was 60.7%.
From the Fit Statistics Table, the Misclassification Rate on the Out Of Bag flattens around trees of size 55 or bigger.
The most important variable to contribute to the classification is NUMBER_LAYERS, with a Gini value of 0.3803. The remaining variables have a much less important contribution with Gini values lower than 0.0340.
The HPFOREST Procedure
Performance Information | |
---|---|
Execution Mode | Single-Machine |
Number of Threads | 2 |
Data Access Information | |||
---|---|---|---|
Data | Engine | Role | Path |
WORK.WORK | V9 | Input | On Client |
Model Information | ||
---|---|---|
Parameter | Value | |
Variables to Try | 2 | (Default) |
Maximum Trees | 100 | (Default) |
Inbag Fraction | 0.6 | (Default) |
Prune Fraction | 0 | (Default) |
Prune Threshold | 0.1 | (Default) |
Leaf Fraction | 0.00001 | (Default) |
Leaf Size Setting | 1 | (Default) |
Leaf Size Used | 1 | |
Category Bins | 30 | (Default) |
Interval Bins | 100 | |
Minimum Category Size | 5 | (Default) |
Node Size | 100000 | (Default) |
Maximum Depth | 20 | (Default) |
Alpha | 1 | (Default) |
Exhaustive | 5000 | (Default) |
Rows of Sequence to Skip | 5 | (Default) |
Split Criterion | . | Gini |
Preselection Method | . | Loh |
Missing Value Handling | . | Valid value |
Number of Observations | |
---|---|
Type | N |
Number of Observations Read | 44625 |
Number of Observations Used | 44625 |
Baseline Fit Statistics | |
---|---|
Statistic | Value |
Average Square Error | 0.239 |
Misclassification Rate | 0.393 |
Log Loss | 0.670 |
Fit Statistics | |||||||
---|---|---|---|---|---|---|---|
Number of Trees |
Number of Leaves |
Average Square Error (Train) |
Average Square Error (OOB) |
Misclassification Rate (Train) |
Misclassification Rate (OOB) |
Log Loss (Train) |
Log Loss (OOB) |
1 | 1416 | 0.0332 | 0.0602 | 0.0380 | 0.0631 | 0.4765 | 1.114 |
2 | 2753 | 0.0252 | 0.0571 | 0.0334 | 0.0613 | 0.1500 | 0.955 |
3 | 4207 | 0.0221 | 0.0551 | 0.0299 | 0.0603 | 0.0908 | 0.843 |
4 | 5500 | 0.0209 | 0.0536 | 0.0282 | 0.0605 | 0.0770 | 0.740 |
5 | 6999 | 0.0199 | 0.0515 | 0.0272 | 0.0588 | 0.0694 | 0.654 |
6 | 8251 | 0.0195 | 0.0497 | 0.0268 | 0.0579 | 0.0683 | 0.567 |
7 | 9642 | 0.0192 | 0.0486 | 0.0267 | 0.0574 | 0.0672 | 0.500 |
8 | 11087 | 0.0188 | 0.0476 | 0.0271 | 0.0563 | 0.0664 | 0.448 |
9 | 12626 | 0.0185 | 0.0468 | 0.0262 | 0.0557 | 0.0656 | 0.413 |
10 | 13979 | 0.0183 | 0.0463 | 0.0258 | 0.0557 | 0.0653 | 0.382 |
11 | 15387 | 0.0182 | 0.0457 | 0.0261 | 0.0554 | 0.0650 | 0.354 |
12 | 16826 | 0.0180 | 0.0453 | 0.0258 | 0.0550 | 0.0647 | 0.329 |
13 | 18297 | 0.0179 | 0.0447 | 0.0258 | 0.0544 | 0.0646 | 0.303 |
14 | 19459 | 0.0181 | 0.0443 | 0.0258 | 0.0539 | 0.0650 | 0.277 |
15 | 20844 | 0.0180 | 0.0441 | 0.0262 | 0.0536 | 0.0649 | 0.266 |
16 | 22289 | 0.0179 | 0.0438 | 0.0257 | 0.0533 | 0.0647 | 0.252 |
17 | 23689 | 0.0179 | 0.0436 | 0.0260 | 0.0533 | 0.0647 | 0.244 |
18 | 24937 | 0.0179 | 0.0433 | 0.0260 | 0.0531 | 0.0647 | 0.234 |
19 | 26271 | 0.0178 | 0.0430 | 0.0261 | 0.0529 | 0.0646 | 0.223 |
20 | 27532 | 0.0178 | 0.0429 | 0.0262 | 0.0528 | 0.0647 | 0.214 |
21 | 28887 | 0.0178 | 0.0427 | 0.0263 | 0.0526 | 0.0647 | 0.207 |
22 | 30227 | 0.0178 | 0.0426 | 0.0262 | 0.0525 | 0.0646 | 0.201 |
23 | 31685 | 0.0177 | 0.0424 | 0.0260 | 0.0521 | 0.0645 | 0.196 |
24 | 33151 | 0.0176 | 0.0423 | 0.0263 | 0.0523 | 0.0643 | 0.192 |
25 | 34606 | 0.0175 | 0.0422 | 0.0262 | 0.0522 | 0.0642 | 0.189 |
26 | 35996 | 0.0175 | 0.0421 | 0.0261 | 0.0520 | 0.0641 | 0.186 |
27 | 37452 | 0.0174 | 0.0420 | 0.0261 | 0.0515 | 0.0640 | 0.185 |
28 | 38797 | 0.0174 | 0.0418 | 0.0264 | 0.0515 | 0.0639 | 0.182 |
29 | 40193 | 0.0174 | 0.0417 | 0.0261 | 0.0513 | 0.0639 | 0.178 |
30 | 41461 | 0.0174 | 0.0416 | 0.0260 | 0.0513 | 0.0640 | 0.175 |
31 | 42961 | 0.0173 | 0.0415 | 0.0259 | 0.0511 | 0.0638 | 0.174 |
32 | 44415 | 0.0173 | 0.0415 | 0.0261 | 0.0509 | 0.0638 | 0.174 |
33 | 45833 | 0.0173 | 0.0414 | 0.0259 | 0.0510 | 0.0637 | 0.172 |
34 | 47055 | 0.0173 | 0.0413 | 0.0259 | 0.0508 | 0.0638 | 0.169 |
35 | 48521 | 0.0173 | 0.0413 | 0.0261 | 0.0510 | 0.0637 | 0.168 |
36 | 49946 | 0.0173 | 0.0413 | 0.0262 | 0.0512 | 0.0636 | 0.167 |
37 | 51352 | 0.0172 | 0.0412 | 0.0259 | 0.0511 | 0.0636 | 0.165 |
38 | 52598 | 0.0173 | 0.0412 | 0.0264 | 0.0509 | 0.0637 | 0.164 |
39 | 54005 | 0.0173 | 0.0412 | 0.0261 | 0.0509 | 0.0636 | 0.163 |
40 | 55372 | 0.0172 | 0.0411 | 0.0262 | 0.0509 | 0.0636 | 0.162 |
41 | 56815 | 0.0172 | 0.0411 | 0.0261 | 0.0507 | 0.0636 | 0.162 |
42 | 58209 | 0.0172 | 0.0410 | 0.0262 | 0.0508 | 0.0635 | 0.161 |
43 | 59521 | 0.0172 | 0.0410 | 0.0261 | 0.0507 | 0.0635 | 0.159 |
44 | 61023 | 0.0171 | 0.0410 | 0.0261 | 0.0507 | 0.0634 | 0.158 |
45 | 62456 | 0.0171 | 0.0409 | 0.0259 | 0.0507 | 0.0634 | 0.155 |
46 | 63746 | 0.0172 | 0.0409 | 0.0259 | 0.0507 | 0.0634 | 0.154 |
47 | 65133 | 0.0171 | 0.0408 | 0.0259 | 0.0505 | 0.0634 | 0.153 |
48 | 66517 | 0.0171 | 0.0408 | 0.0258 | 0.0506 | 0.0634 | 0.153 |
49 | 68010 | 0.0171 | 0.0408 | 0.0259 | 0.0506 | 0.0634 | 0.153 |
50 | 69318 | 0.0171 | 0.0408 | 0.0259 | 0.0505 | 0.0634 | 0.153 |
51 | 70666 | 0.0171 | 0.0407 | 0.0258 | 0.0505 | 0.0634 | 0.152 |
52 | 71933 | 0.0171 | 0.0407 | 0.0261 | 0.0503 | 0.0634 | 0.151 |
53 | 73318 | 0.0171 | 0.0407 | 0.0258 | 0.0503 | 0.0634 | 0.150 |
54 | 74511 | 0.0172 | 0.0407 | 0.0259 | 0.0501 | 0.0635 | 0.150 |
55 | 75875 | 0.0172 | 0.0407 | 0.0261 | 0.0502 | 0.0635 | 0.150 |
56 | 77296 | 0.0172 | 0.0407 | 0.0262 | 0.0498 | 0.0635 | 0.150 |
57 | 78805 | 0.0171 | 0.0407 | 0.0262 | 0.0500 | 0.0635 | 0.149 |
58 | 80183 | 0.0171 | 0.0406 | 0.0262 | 0.0502 | 0.0634 | 0.149 |
59 | 81623 | 0.0171 | 0.0406 | 0.0262 | 0.0502 | 0.0634 | 0.149 |
60 | 83060 | 0.0171 | 0.0406 | 0.0262 | 0.0501 | 0.0634 | 0.149 |
61 | 84451 | 0.0171 | 0.0406 | 0.0263 | 0.0501 | 0.0634 | 0.149 |
62 | 85821 | 0.0171 | 0.0406 | 0.0263 | 0.0500 | 0.0634 | 0.148 |
63 | 87227 | 0.0171 | 0.0406 | 0.0263 | 0.0499 | 0.0634 | 0.148 |
64 | 88676 | 0.0171 | 0.0405 | 0.0261 | 0.0500 | 0.0634 | 0.148 |
65 | 90043 | 0.0171 | 0.0405 | 0.0261 | 0.0499 | 0.0634 | 0.147 |
66 | 91297 | 0.0171 | 0.0405 | 0.0262 | 0.0500 | 0.0634 | 0.147 |
67 | 92713 | 0.0171 | 0.0405 | 0.0260 | 0.0498 | 0.0633 | 0.147 |
68 | 94120 | 0.0171 | 0.0404 | 0.0261 | 0.0497 | 0.0633 | 0.147 |
69 | 95453 | 0.0171 | 0.0404 | 0.0260 | 0.0497 | 0.0634 | 0.147 |
70 | 96707 | 0.0171 | 0.0405 | 0.0261 | 0.0496 | 0.0634 | 0.147 |
71 | 98029 | 0.0171 | 0.0404 | 0.0259 | 0.0496 | 0.0634 | 0.147 |
72 | 99532 | 0.0171 | 0.0404 | 0.0259 | 0.0496 | 0.0634 | 0.147 |
73 | 100836 | 0.0171 | 0.0404 | 0.0261 | 0.0498 | 0.0634 | 0.146 |
74 | 102162 | 0.0171 | 0.0404 | 0.0261 | 0.0496 | 0.0634 | 0.146 |
75 | 103619 | 0.0171 | 0.0404 | 0.0261 | 0.0497 | 0.0634 | 0.146 |
76 | 104964 | 0.0171 | 0.0404 | 0.0262 | 0.0496 | 0.0634 | 0.146 |
77 | 106365 | 0.0171 | 0.0404 | 0.0263 | 0.0498 | 0.0634 | 0.146 |
78 | 107697 | 0.0171 | 0.0404 | 0.0263 | 0.0499 | 0.0634 | 0.146 |
79 | 109079 | 0.0171 | 0.0404 | 0.0264 | 0.0499 | 0.0634 | 0.145 |
80 | 110558 | 0.0170 | 0.0404 | 0.0263 | 0.0497 | 0.0634 | 0.145 |
81 | 112049 | 0.0170 | 0.0404 | 0.0262 | 0.0497 | 0.0633 | 0.145 |
82 | 113297 | 0.0170 | 0.0403 | 0.0261 | 0.0496 | 0.0633 | 0.145 |
83 | 114748 | 0.0170 | 0.0403 | 0.0262 | 0.0495 | 0.0633 | 0.145 |
84 | 116097 | 0.0170 | 0.0403 | 0.0262 | 0.0495 | 0.0633 | 0.145 |
85 | 117523 | 0.0170 | 0.0403 | 0.0261 | 0.0497 | 0.0633 | 0.145 |
86 | 118913 | 0.0170 | 0.0403 | 0.0262 | 0.0497 | 0.0633 | 0.145 |
87 | 120325 | 0.0170 | 0.0403 | 0.0261 | 0.0495 | 0.0633 | 0.145 |
88 | 121774 | 0.0170 | 0.0403 | 0.0261 | 0.0496 | 0.0633 | 0.145 |
89 | 123193 | 0.0170 | 0.0403 | 0.0261 | 0.0495 | 0.0632 | 0.145 |
90 | 124542 | 0.0170 | 0.0403 | 0.0259 | 0.0497 | 0.0633 | 0.145 |
91 | 125852 | 0.0170 | 0.0403 | 0.0261 | 0.0498 | 0.0633 | 0.145 |
92 | 127229 | 0.0170 | 0.0402 | 0.0260 | 0.0497 | 0.0633 | 0.145 |
93 | 128529 | 0.0170 | 0.0402 | 0.0259 | 0.0496 | 0.0633 | 0.145 |
94 | 129897 | 0.0170 | 0.0402 | 0.0260 | 0.0496 | 0.0633 | 0.145 |
95 | 131210 | 0.0170 | 0.0402 | 0.0260 | 0.0497 | 0.0633 | 0.145 |
96 | 132595 | 0.0170 | 0.0402 | 0.0261 | 0.0495 | 0.0633 | 0.144 |
97 | 134016 | 0.0170 | 0.0402 | 0.0260 | 0.0495 | 0.0633 | 0.144 |
98 | 135293 | 0.0170 | 0.0402 | 0.0261 | 0.0494 | 0.0633 | 0.144 |
99 | 136605 | 0.0170 | 0.0402 | 0.0262 | 0.0495 | 0.0633 | 0.144 |
100 | 137914 | 0.0170 | 0.0402 | 0.0262 | 0.0495 | 0.0633 | 0.144 |
Loss Reduction Variable Importance | |||||
---|---|---|---|---|---|
Variable |
Number of Rules |
Gini |
OOB Gini |
Margin |
OOB Margin |
NUMBER_LAYERS | 10734 | 0.380336 | 0.37668 | 0.760671 | 0.757030 |
Quadrangle | 18501 | 0.011123 | 0.00321 | 0.022245 | 0.015800 |
DIAM_CIRCLE_IMAGE | 55843 | 0.034090 | -0.01061 | 0.068181 | 0.023322 |
DEPTH_RIMFLOOR_TOPOG | 52736 | 0.020689 | -0.01432 | 0.041378 | 0.006347 |
1 /* Use Course's Library */
2 LIBNAME mydata "/courses/d1406ae5ba27fe300" ACCESS = readonly;
3
4 DATA WORK;
5 SET mydata.marscrater_pds;
6
7 WHERE MORPHOLOGY_EJECTA_1 NE " ";
8
9 /* Collapse the Morphology of Eject 1 to its Main Feature, to reduce the output */
10 IF (INDEX(MORPHOLOGY_EJECTA_1, "/") = 0)
11 THEN MorphoE1 = MORPHOLOGY_EJECTA_1;
12 ELSE MorphoE1 = SUBSTR(MORPHOLOGY_EJECTA_1, 1, INDEX(MORPHOLOGY_EJECTA_1, "/") - 1);
13 MorphoE1 = UPCASE(TRIM(MorphoE1));
14
15 /* Does the the Morphology 1 equals to "RD" */
16 IF MorphoE1 = "RD"
17 THEN MorphoE1_RD = "Yes";
18 ELSE MorphoE1_RD = "No";
19
20 /* convert coordinates to Quadrangles: https://en.wikipedia.org/wiki/List_of_quadrangles_on_Mars */
21 LA = LATITUDE_CIRCLE_IMAGE;
22 LO = LONGITUDE_CIRCLE_IMAGE + 180;
23 IF LA >= 65 AND LA <= 90 AND LO >= 0 AND LO <= 360 THEN Quadrangle = "MC-01: Mare Boreum (North Pole)";
24 IF LA >= 30 AND LA < 65 AND LO >= 120 AND LO < 180 THEN Quadrangle = "MC-02: Diacria";
25 IF LA >= 30 AND LA < 65 AND LO >= 60 AND LO < 120 THEN Quadrangle = "MC-03: Arcadia";
26 IF LA >= 30 AND LA < 65 AND LO >= 0 AND LO < 60 THEN Quadrangle = "MC-04: Mare Acidalium";
27 IF LA >= 30 AND LA < 65 AND LO >= 300 AND LO <= 360 THEN Quadrangle = "MC-05: Ismenius Lacus";
28 IF LA >= 30 AND LA < 65 AND LO >= 240 AND LO < 300 THEN Quadrangle = "MC-06: Casius";
29 IF LA >= 30 AND LA < 65 AND LO >= 180 AND LO < 240 THEN Quadrangle = "MC-07: Cebrenia";
30 IF LA >= 0 AND LA < 30 AND LO >= 135 AND LO < 180 THEN Quadrangle = "MC-08: Amazonis";
31 IF LA >= 0 AND LA < 30 AND LO >= 90 AND LO < 135 THEN Quadrangle = "MC-09: Tharsis";
32 IF LA >= 0 AND LA < 30 AND LO >= 45 AND LO < 90 THEN Quadrangle = "MC-10: Lunae Palus";
33 IF LA >= 0 AND LA < 30 AND LO >= 0 AND LO < 45 THEN Quadrangle = "MC-11: Oxia Palus";
34 IF LA >= 0 AND LA < 30 AND LO >= 315 AND LO <= 360 THEN Quadrangle = "MC-12: Arabia";
35 IF LA >= 0 AND LA < 30 AND LO >= 270 AND LO < 315 THEN Quadrangle = "MC-13: Syrtis Major";
36 IF LA >= 0 AND LA < 30 AND LO >= 225 AND LO < 270 THEN Quadrangle = "MC-14: Amenthes";
37 IF LA >= 0 AND LA < 30 AND LO >= 180 AND LO < 225 THEN Quadrangle = "MC-15: Elysium";
38 IF LA >= -30 AND LA < 0 AND LO >= 135 AND LO < 180 THEN Quadrangle = "MC-16: Memnonia";
39 IF LA >= -30 AND LA < 0 AND LO >= 90 AND LO < 135 THEN Quadrangle = "MC-17: Phoenicis Lacus";
40 IF LA >= -30 AND LA < 0 AND LO >= 45 AND LO < 90 THEN Quadrangle = "MC-18: Coprates";
41 IF LA >= -30 AND LA < 0 AND LO >= 0 AND LO < 45 THEN Quadrangle = "MC-19: Margaritifer Sinus";
42 IF LA >= -30 AND LA < 0 AND LO >= 315 AND LO <= 360 THEN Quadrangle = "MC-20: Sinus Sabaeus";
43 IF LA >= -30 AND LA < 0 AND LO >= 270 AND LO < 315 THEN Quadrangle = "MC-21: Iapygia";
44 IF LA >= -30 AND LA < 0 AND LO >= 225 AND LO < 270 THEN Quadrangle = "MC-22: Mare Tyrrhenum";
45 IF LA >= -30 AND LA < 0 AND LO >= 180 AND LO < 225 THEN Quadrangle = "MC-23: Aeolis";
46 IF LA >= -65 AND LA < -30 AND LO >= 120 AND LO < 180 THEN Quadrangle = "MC-24: Phaethontis";
47 IF LA >= -65 AND LA < -30 AND LO >= 60 AND LO < 120 THEN Quadrangle = "MC-25: Thaumasia";
48 IF LA >= -65 AND LA < -30 AND LO >= 0 AND LO < 60 THEN Quadrangle = "MC-26: Argyre";
49 IF LA >= -65 AND LA < -30 AND LO >= 300 AND LO <= 360 THEN Quadrangle = "MC-27: Noachis";
50 IF LA >= -65 AND LA < -30 AND LO >= 240 AND LO < 300 THEN Quadrangle = "MC-28: Hellas";
51 IF LA >= -65 AND LA < -30 AND LO >= 180 AND LO < 240 THEN Quadrangle = "MC-29: Eridania";
52 IF LA >= -90 AND LA < -65 AND LO >= 0 AND LO <= 360 THEN Quadrangle = "MC-30: Mare Australe (South Pole)";
53
54 LABEL Quadrangle = "Quadrangle"
55 DIAM_CIRCLE_IMAGE = "Diameter"
56 DEPTH_RIMFLOOR_TOPOG = "Depth"
57 MorphoE1_RD = "Morphology 1-RD"
58 NUMBER_LAYERS = "Layers";
59
60 RUN;
61
62 ODS GRAPHICS ON;
63
64 PROC HPFOREST DATA = WORK SEED = 6587;
65 INPUT Quadrangle / LEVEL = BINARY;
66 INPUT DIAM_CIRCLE_IMAGE
67 DEPTH_RIMFLOOR_TOPOG
68 NUMBER_LAYERS / LEVEL = INTERVAL;
69 TARGET MorphoE1_RD / LEVEL = BINARY;
70 TITLE "Mars' Craters - Random Forest - Morphology = RD - Full Dataset";
71 RUN;
72
73 TITLE;