Summary

Specialisation Data Analysis and Interpretation
Course Data Analysis Tools
Education Institution Wesleyan University
Publisher Coursera
Assignment Testing a Potential Moderator

Introduction

Besides the historical fascination on Mars due to its proximity and some similarities with Earth, the collection of facts allows scientists to put together a big jigsaw puzzle.

An announcement made by NASA just recently about evidence of flowing liquid water on the surface of Mars just adds to all that is known and the curiosity, or even need, to find out much more.

The Data Set

With all the talks about Mars, including both Science and Fiction (a Ridley Scott’s movie called The Martian, based on the book with the same name, by Andy Weir is showing starting early Oct/15), the data set chosen for this assignment is the Mars Craters.

The Mars Craters Study, presents a global database that includes over 300,000 Mars craters 1 km or larger that were created between 4.2 and 3.8 billion years ago during a period of heavy bombardment (i.e. impacts of asteroids, proto-planets, and comets).

The data set was made available by Wesleyan University/Coursera as part of the Data Management and Visualisation course, of the Data Analysis and Interpretation Specialisation, from the Ph.D. Thesis Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database (2011) by Robbins, S.J., University of Colorado at Boulder.

Topic of Interest

The data set provides a catalogue of craters on Mars. The initial thoughts are about checking for patterns that could identify specific major events that might have happened and that would have significant impact on Mars’ geology, climate and life as a planetary body.

Codebook

As the initial data set has only nine variables, they could all the relevant to formulate hypothesis and help in leading to a conclusion, so all the variables will be kept for this assignment.

Variables

  • CRATER_ID: crater ID for internal sue, based upon the region of the planet (\({1 \over 16}\)), the “pass” under which the crate was identified, ad the order in which it was identified
  • LATITUDE_CIRCLE_IMAGE: latitude from the derived centre of a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim (units are decimal degrees North)
  • LONGITUDE_CIRCLE_IMAGE: longitude from the derived centre of a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim (units are decimal degrees East)
  • DIAM_CIRCLE_IMAGE: diameter from a non-linear least squares circle fit to the vertices selected to manually identify the crater rim (units are km)
  • DEPTH_RIMFLOOR_TOPOG: average elevation of each of the manually determined N points along (or inside) the crater rim (units are km)
    • Depth Rim: Points are selected as relative topographic highs under the assumption they are the least eroded so most original points along the rim
    • Depth Floor: Points were chosen as the lowest elevation that did not include visible embedded craters
  • MORPHOLOGY_EJECTA_1: ejecta morphology classified.
    • If there are multiple values, separated by a “/”, then the order is the inner-most ejecta through the outer-most, or the top-most through the bottom-most
  • MORPHOLOGY_EJECTA_2: the morphology of the layer(s) itself/themselves. This classification system is unique to this work.
  • MORPHOLOGY_EJECTA_3: overall texture and/or shape of some of the layer(s)/ejecta that are generally unique and deserve separate morphological classification.
  • NUMBER_LAYERS: the maximum number of cohesive layers in any azimuthal direction that could be reliably identified

Testing a Potential Moderator

Classifying Variables

  • Hemisphere is classified as South if LATITUDE_CIRCLE_IMAGE is lower than zero, and North otherwise
  • Big_Depth is classified as 1 if DEPTH_RIMFLOOR_TOPOG is greater than zero, and 0 otherwise
  • Has_Layers is classified as 1 if NUMBER_LAYERS is greater than zero, and 0 otherwise

For the Northern Hemisphere there is a statistically significant relationship between Depth and Layers, indicated by very large Chi-Squared, \(\chi^2 = 32015.4059\), and the very low probability, \({p{-}value} < .0001\).

Likewise, the Southern Hemisphere has a statistically significant relationship between Depth and Layers, indicated by very large Chi-Squared, \(\chi^2 = 35505.3319\), and the very low probability, \({p{-}value} < .0001\).

As the data for both hemispheres show the same direction and size, which can also be seen in the graphic shown in the Output session below, the conclusion is that Hemisphere does not moderate the relationship between Depth and Layers.

Using SAS

SAS Code

/* Using SAS Educational Virtual Machine running locally */
/* For CSV Files uploaded from Unix/MacOS */
FILENAME CSV "/folders/myfolders/marscrater_pds.csv" TERMSTR = CRLF;

PROC IMPORT
    DATAFILE = CSV
    OUT      = WORK
    DBMS     = CSV
    REPLACE;
RUN;

/* Unassign the file reference.  */
FILENAME CSV;

DATA WORK;
    SET WORK;

    /* Moderator (?) */
    IF LATITUDE_CIRCLE_IMAGE < 0
        THEN Hemisphere = "South";
        ELSE Hemisphere = "North";

    /* Explanatory - Categorical */
    IF DEPTH_RIMFLOOR_TOPOG > 0
        THEN Big_Depth = 1;
        ELSE Big_Depth = 0;

    /* Response - Categorical */
    IF NUMBER_LAYERS > 0
        THEN Has_Layers = 1;
        ELSE Has_Layers = 0;

    LABEL
        Hemisphere     = "Hemisphere"
        Big_Depth      = "Big Depth"
        Has_Layers     = "Has Layers";
RUN;

/* order the data by the craters'ID */
PROC SORT ;
    BY Hemisphere;
RUN;

PROC SUMMARY PRINT;
    CLASS Hemisphere Big_Depth Has_Layers;
RUN;

PROC FREQ ;
    TABLES Has_Layers * Big_Depth / CHISQ;
    BY Hemisphere;
RUN;

PROC SGPANEL ;
    TITLE   "Mars' Craters - Has Layers by Big Depth within Hemisphere";
    PANELBY Hemisphere;
    VBAR    Big_Depth
    /       RESPONSE  = Has_Layers
            STAT      = FREQ
            FILLATTRS = (COLOR = GREY);
RUN;

SAS Output

Results: W04-Testing a Potential Moderator-Local.sas

Results: W04-Testing a Potential Moderator-Local.sas

The SUMMARY Procedure

The SUMMARY Procedure

Summary statistics

Hemisphere Big Depth Has Layers N Obs
North 0 0 121109
    1 1161
  1 0 20117
    1 8507
South 0 0 184766
    1 503
  1 0 38620
    1 9560

The FREQ Procedure

The FREQ Procedure

Hemisphere=North

Table Has_Layers * Big_Depth

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of Has_Layers by Big_Depth
Has_Layers(Has Layers) Big_Depth(Big Depth)
0 1 Total
0
121109
80.26
85.76
99.05
20117
13.33
14.24
70.28
141226
93.59
 
 
1
1161
0.77
12.01
0.95
8507
5.64
87.99
29.72
9668
6.41
 
 
Total
122270
81.03
28624
18.97
150894
100.00

Statistics for Table of Has_Layers by Big_Depth

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 32015.4059 <.0001
Likelihood Ratio Chi-Square 1 23875.3396 <.0001
Continuity Adj. Chi-Square 1 32010.6083 <.0001
Mantel-Haenszel Chi-Square 1 32015.1937 <.0001
Phi Coefficient   0.4606  
Contingency Coefficient   0.4184  
Cramer's V   0.4606  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 121109
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 150894

The FREQ Procedure

Hemisphere=South

Table Has_Layers * Big_Depth

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of Has_Layers by Big_Depth
Has_Layers(Has Layers) Big_Depth(Big Depth)
0 1 Total
0
184766
79.15
82.71
99.73
38620
16.54
17.29
80.16
223386
95.69
 
 
1
503
0.22
5.00
0.27
9560
4.10
95.00
19.84
10063
4.31
 
 
Total
185269
79.36
48180
20.64
233449
100.00

Statistics for Table of Has_Layers by Big_Depth

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 35505.3319 <.0001
Likelihood Ratio Chi-Square 1 28007.6076 <.0001
Continuity Adj. Chi-Square 1 35500.5874 <.0001
Mantel-Haenszel Chi-Square 1 35505.1798 <.0001
Phi Coefficient   0.3900  
Contingency Coefficient   0.3633  
Cramer's V   0.3900  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 184766
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 233449

The SGPANEL Procedure

The SGPanel Procedure

The SGPanel Procedure