Summary statistics
Hemisphere | Big Depth | Has Layers | N Obs |
---|---|---|---|
North | 0 | 0 | 121109 |
1 | 1161 | ||
1 | 0 | 20117 | |
1 | 8507 | ||
South | 0 | 0 | 184766 |
1 | 503 | ||
1 | 0 | 38620 | |
1 | 9560 |
Specialisation | Data Analysis and Interpretation |
Course | Data Analysis Tools |
Education Institution | Wesleyan University |
Publisher | Coursera |
Assignment | Testing a Potential Moderator |
Besides the historical fascination on Mars due to its proximity and some similarities with Earth, the collection of facts allows scientists to put together a big jigsaw puzzle.
An announcement made by NASA just recently about evidence of flowing liquid water on the surface of Mars just adds to all that is known and the curiosity, or even need, to find out much more.
With all the talks about Mars, including both Science and Fiction (a Ridley Scott’s movie called The Martian, based on the book with the same name, by Andy Weir is showing starting early Oct/15), the data set chosen for this assignment is the Mars Craters.
The Mars Craters Study, presents a global database that includes over 300,000 Mars craters 1 km or larger that were created between 4.2 and 3.8 billion years ago during a period of heavy bombardment (i.e. impacts of asteroids, proto-planets, and comets).
The data set was made available by Wesleyan University/Coursera as part of the Data Management and Visualisation course, of the Data Analysis and Interpretation Specialisation, from the Ph.D. Thesis Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database (2011) by Robbins, S.J., University of Colorado at Boulder.
The data set provides a catalogue of craters on Mars. The initial thoughts are about checking for patterns that could identify specific major events that might have happened and that would have significant impact on Mars’ geology, climate and life as a planetary body.
As the initial data set has only nine variables, they could all the relevant to formulate hypothesis and help in leading to a conclusion, so all the variables will be kept for this assignment.
LATITUDE_CIRCLE_IMAGE
is lower than zero, and North otherwiseDEPTH_RIMFLOOR_TOPOG
is greater than zero, and 0 otherwiseNUMBER_LAYERS
is greater than zero, and 0 otherwiseFor the Northern Hemisphere there is a statistically significant relationship between Depth and Layers, indicated by very large Chi-Squared, \(\chi^2 = 32015.4059\), and the very low probability, \({p{-}value} < .0001\).
Likewise, the Southern Hemisphere has a statistically significant relationship between Depth and Layers, indicated by very large Chi-Squared, \(\chi^2 = 35505.3319\), and the very low probability, \({p{-}value} < .0001\).
As the data for both hemispheres show the same direction and size, which can also be seen in the graphic shown in the Output session below, the conclusion is that Hemisphere does not moderate the relationship between Depth and Layers.
/* Using SAS Educational Virtual Machine running locally */
/* For CSV Files uploaded from Unix/MacOS */
FILENAME CSV "/folders/myfolders/marscrater_pds.csv" TERMSTR = CRLF;
PROC IMPORT
DATAFILE = CSV
OUT = WORK
DBMS = CSV
REPLACE;
RUN;
/* Unassign the file reference. */
FILENAME CSV;
DATA WORK;
SET WORK;
/* Moderator (?) */
IF LATITUDE_CIRCLE_IMAGE < 0
THEN Hemisphere = "South";
ELSE Hemisphere = "North";
/* Explanatory - Categorical */
IF DEPTH_RIMFLOOR_TOPOG > 0
THEN Big_Depth = 1;
ELSE Big_Depth = 0;
/* Response - Categorical */
IF NUMBER_LAYERS > 0
THEN Has_Layers = 1;
ELSE Has_Layers = 0;
LABEL
Hemisphere = "Hemisphere"
Big_Depth = "Big Depth"
Has_Layers = "Has Layers";
RUN;
/* order the data by the craters'ID */
PROC SORT ;
BY Hemisphere;
RUN;
PROC SUMMARY PRINT;
CLASS Hemisphere Big_Depth Has_Layers;
RUN;
PROC FREQ ;
TABLES Has_Layers * Big_Depth / CHISQ;
BY Hemisphere;
RUN;
PROC SGPANEL ;
TITLE "Mars' Craters - Has Layers by Big Depth within Hemisphere";
PANELBY Hemisphere;
VBAR Big_Depth
/ RESPONSE = Has_Layers
STAT = FREQ
FILLATTRS = (COLOR = GREY);
RUN;
The SUMMARY Procedure
Hemisphere | Big Depth | Has Layers | N Obs |
---|---|---|---|
North | 0 | 0 | 121109 |
1 | 1161 | ||
1 | 0 | 20117 | |
1 | 8507 | ||
South | 0 | 0 | 184766 |
1 | 503 | ||
1 | 0 | 38620 | |
1 | 9560 |
The FREQ Procedure
|
|
Statistics for Table of Has_Layers by Big_Depth
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 32015.4059 | <.0001 |
Likelihood Ratio Chi-Square | 1 | 23875.3396 | <.0001 |
Continuity Adj. Chi-Square | 1 | 32010.6083 | <.0001 |
Mantel-Haenszel Chi-Square | 1 | 32015.1937 | <.0001 |
Phi Coefficient | 0.4606 | ||
Contingency Coefficient | 0.4184 | ||
Cramer's V | 0.4606 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 121109 |
Left-sided Pr <= F | 1.0000 |
Right-sided Pr >= F | <.0001 |
Table Probability (P) | <.0001 |
Two-sided Pr <= P | <.0001 |
Sample Size = 150894
The FREQ Procedure
|
|
Statistics for Table of Has_Layers by Big_Depth
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 35505.3319 | <.0001 |
Likelihood Ratio Chi-Square | 1 | 28007.6076 | <.0001 |
Continuity Adj. Chi-Square | 1 | 35500.5874 | <.0001 |
Mantel-Haenszel Chi-Square | 1 | 35505.1798 | <.0001 |
Phi Coefficient | 0.3900 | ||
Contingency Coefficient | 0.3633 | ||
Cramer's V | 0.3900 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 184766 |
Left-sided Pr <= F | 1.0000 |
Right-sided Pr >= F | <.0001 |
Table Probability (P) | <.0001 |
Two-sided Pr <= P | <.0001 |
Sample Size = 233449