Summary
Specialisation |
Data Analysis and Interpretation |
Course |
Data Analysis Tools |
Education Institution |
Wesleyan University |
Publisher |
Coursera |
Assignment |
Run an analysis of variance |
Introduction
Besides the historical fascination on Mars due to its proximity and some similarities with Earth, the collection of facts allows scientists to put together a big jigsaw puzzle.
An announcement made by NASA just recently about evidence of flowing liquid water on the surface of Mars just adds to all that is know and the curiosity, or even need, to find out much more.
The Data Set
With all the talks about Mars, including both Science and Fiction (a Ridley Scott’s movie called The Martian, based on the book with the same name, by Andy Weir is showing starting early Oct/15), the data set chosen for this assignment is the Mars Craters.
The Mars Craters Study, presents a global database that includes over 300,000 Mars craters 1 km or larger that were created between 4.2 and 3.8 billion years ago during a period of heavy bombardment (i.e. impacts of asteroids, proto-planets, and comets).
The data set was made available by Wesleyan University/Coursera as part of the Data Management and Visualisation course, of the Data Analysis and Interpretation Specialisation, from the Ph.D. Thesis Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database (2011) by Robbins, S.J., University of Colorado at Boulder.
Topic of Interest
The data set provides a catalogue of craters on Mars. The initial thoughts are about checking for patterns that could identify specific major events that might have happened and that would have significant impact on Mars’ geology, climate and life as a planetary body.
Codebook
As the initial data set has only nine variables, they could all the relevant to formulate hypothesis and help in leading to a conclusion, so all the variables will be kept for this assignment.
Variables
- CRATER_ID: crater ID for internal sue, based upon the region of the planet (\({1 \over 16}\)), the “pass” under which the crate was identified, ad the order in which it was identified
- LATITUDE_CIRCLE_IMAGE: latitude from the derived centre of a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim (units are decimal degrees North)
- LONGITUDE_CIRCLE_IMAGE: longitude from the derived centre of a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim (units are decimal degrees East)
- DIAM_CIRCLE_IMAGE: diameter from a non-linear least squares circle fit to the vertices selected to manually identify the crater rim (units are km)
- DEPTH_RIMFLOOR_TOPOG: average elevation of each of the manually determined N points along (or inside) the crater rim (units are km)
- Depth Rim: Points are selected as relative topographic highs under the assumption they are the least eroded so most original points along the rim
- Depth Floor: Points were chosen as the lowest elevation that did not include visible embedded craters
- MORPHOLOGY_EJECTA_1: ejecta morphology classified.
- If there are multiple values, separated by a “/”, then the order is the inner-most ejecta through the outer-most, or the top-most through the bottom-most
- MORPHOLOGY_EJECTA_2: the morphology of the layer(s) itself/themselves. This classification system is unique to this work.
- MORPHOLOGY_EJECTA_3: overall texture and/or shape of some of the layer(s)/ejecta that are generally unique and deserve separate morphological classification.
- NUMBER_LAYERS: the maximum number of cohesive layers in any azimuthal direction that could be reliably identified
Hypothesis
Are the diameter and depth of bigger craters associated with the Hemisphere where they are located?
Bigger Craters
For this analysis, Larger Diameters is of craters with more than 50 km wide, and Bigger Depth is of craters with more than 1 km deep.
Using SAS
SAS Code
/* Using SAS Educational Virtual Machine running locally */
/* For CSV Files uploaded from Unix/MacOS */
FILENAME CSV "/folders/myfolders/marscrater_pds.csv" TERMSTR=LF;
PROC IMPORT DATAFILE=CSV
OUT=WORK
DBMS=CSV
REPLACE;
RUN;
DATA WORK; /* Configure the Data */
SET WORK; /* Data set */
N_LAYERS = VAR10; /* Fixing variable identification by SAS */
IF LATITUDE_CIRCLE_IMAGE < 0
THEN Hemisphere = "South";
ELSE Hemisphere = "North";
IF DIAM_CIRCLE_IMAGE > 50
THEN LARGE_DIAMETER = DIAM_CIRCLE_IMAGE;
ELSE LARGE_DIAMETER = .;
IF DEPTH_RIMFLOOR_TOPOG > 1
THEN BIG_DEPTH = DEPTH_RIMFLOOR_TOPOG;
ELSE BIG_DEPTH = .;
LABEL Hemisphere = "Hemisphere"
N_LAYERS = "Layers"
LARGE_DIAMETER = "Diameter"
BIG_DEPTH = "Depth";
RUN;
/* order the data by the craters'ID */
PROC SORT;
BY CRATER_ID;
RUN;
/* Show some basic statistics */
/* for Diameter and Depth by Layers */
PROC MEANS;
VARIABLES LARGE_DIAMETER BIG_DEPTH;
RUN;
PROC MEANS;
CLASS Hemisphere;
VARIABLES LARGE_DIAMETER BIG_DEPTH;
RUN;
/* Calculate ANOVA */
/* Calculate Post Hoc of ANOVA for Diameter on Hemisphere */
PROC ANOVA;
CLASS Hemisphere;
MODEL LARGE_DIAMETER = Hemisphere;
MEANS Hemisphere;
RUN;
/* Calculate Post Hoc of ANOVA for Depth on Hemisphere */
PROC ANOVA;
CLASS Hemisphere;
MODEL BIG_DEPTH = Hemisphere;
MEANS Hemisphere;
RUN;
/* Unassign the file reference. */
FILENAME CSV;
SAS Output
The MEANS Procedure
Summary statistics
The MEANS Procedure
Summary statistics
North
|
150894
|
|
|
|
|
|
|
|
South
|
233449
|
|
|
|
|
|
|
|
The ANOVA Procedure
Data
Class Levels
Number of Observations
Analysis of Variance
LARGE_DIAMETER
Overall ANOVA
1
|
123.281
|
123.281
|
0.04
|
0.8399
|
2068
|
6245267.081
|
3019.955
|
|
|
2069
|
6245390.362
|
|
|
|
Fit Statistics
0.000020
|
68.85000
|
54.95412
|
79.81717
|
Anova Model ANOVA
1
|
123.2814444
|
123.2814444
|
0.04
|
0.8399
|
Box Plot
Means
572
|
79.4222378
|
43.0653131
|
1498
|
79.9679706
|
58.8595601
|
The ANOVA Procedure
Data
Class Levels
Number of Observations
Analysis of Variance
BIG_DEPTH
Overall ANOVA
1
|
0.1252827
|
0.1252827
|
0.93
|
0.3360
|
4806
|
650.4345825
|
0.1353380
|
|
|
4807
|
650.5598652
|
|
|
|
Fit Statistics
0.000193
|
26.54980
|
0.367883
|
1.385634
|
Anova Model ANOVA
1
|
0.12528274
|
0.12528274
|
0.93
|
0.3360
|
Box Plot
Means
1480
|
1.37797973
|
0.35730327
|
3328
|
1.38903846
|
0.37248995
|