Specialisation | Data Analysis and Interpretation |
Course | Data Management and Visualisation |
Education Institution | Wesleyan University |
Publisher | Coursera |
Assignment | Running Your First Program |
The Mars Craters data set was made available by Wesleyan University/Coursera as part of the Data Management and Visualisation course, of the Data Analysis and Interpretation Specialisation, from the Ph.D. Thesis Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database (2011) by Robbins, S.J., University of Colorado at Boulder.
The data set has a total of 384343 observations and 10 variables.
The variables are: CRATER_ID, CRATER_NAME, LATITUDE_CIRCLE_IMAGE, LONGITUDE_CIRCLE_IMAGE, DIAM_CIRCLE_IMAGE, DEPTH_RIMFLOOR_TOPOG, MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2, MORPHOLOGY_EJECTA_3 and NUMBER_LAYERS.
Hemisphere is a variable derived from the LATITUDE_CIRCLE_IMAGE variable to transform the continuous coordinates into categories, for the sake of brevity.
Hemisphere shows seven occurrences in the Equator, same as Latitude equals to zero. Just above 60% of the observations are located in the South Hemisphere. Also, all the observations have values.
The variable MORPHOLOGY_EJECTA_1 has 339718 out of 384343 values missing, or 88.3%. The recording with existing content are divided in a large number of categories if considered the full morphology qualification. If taken into account just the first classification, the number of categories is reduced to 31.
From the recorded data, considering just the first classification, shows that 27068, or 60.6%, are of the Rd category. The only two other categories that have more than 10% are SLERS (11.45% = 5110) and SLEPS (11.20% = 4998).
The NUMBER_LAYERS variable has six categories (0, 1, 2, 3, 4 and 5) and none of its observations are missing. The vast majority of craters are identified as having “0” layers, counting 364612, or 94.87% of the records.
/* Use Course's Library */
LIBNAME mydata "/courses/d1406ae5ba27fe300" ACCESS = readonly;
/* Configure the Data */
DATA NEW;
/* Data set */
SET mydata.marscrater_pds;
LABEL Hemisphere = "Hemisphere"
MorphoE1 = "Ejecta Morphology 1 (Grouped by Main Feature)"
NUMBER_LAYERS = "Maximum Number of Cohesive Layers";
/* Categorise the Latitude in Hemispheres */
IF (LATITUDE_CIRCLE_IMAGE > 0)
THEN Hemisphere = "North";
ELSE Hemisphere = "South";
IF (LATITUDE_CIRCLE_IMAGE = 0)
THEN Hemisphere = "Equator";
/* Collapse the Morphology of Eject 1 to its Main Feature, to reduce the output */
IF (INDEX(MORPHOLOGY_EJECTA_1, "/") = 0)
THEN MorphoE1 = MORPHOLOGY_EJECTA_1;
ELSE MorphoE1 = SUBSTR(MORPHOLOGY_EJECTA_1, 1, INDEX(MORPHOLOGY_EJECTA_1, "/") - 1);
PROC SORT;
BY CRATER_ID;
/* Calculate Frequencies and Proportions */
PROC FREQ;
TABLE Hemisphere MorphoE1 NUMBER_LAYERS;
RUN;
Hemisphere | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
Equator | 7 | 0.00 | 7 | 0.00 |
North | 150887 | 39.26 | 150894 | 39.26 |
South | 233449 | 60.74 | 384343 | 100.00 |
MorphoE1 | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
DLEPC | 495 | 1.11 | 495 | 1.11 |
DLEPCPd | 10 | 0.02 | 505 | 1.13 |
DLEPS | 631 | 1.41 | 1136 | 2.55 |
DLEPSPd | 2 | 0.00 | 1138 | 2.55 |
DLEPd | 1 | 0.00 | 1139 | 2.55 |
DLERC | 386 | 0.86 | 1525 | 3.42 |
DLERCPd | 7 | 0.02 | 1532 | 3.43 |
DLERS | 1242 | 2.78 | 2774 | 6.22 |
DLERSRd | 2 | 0.00 | 2776 | 6.22 |
DLSPC | 1 | 0.00 | 2777 | 6.22 |
MLEPC | 22 | 0.05 | 2799 | 6.27 |
MLEPS | 43 | 0.10 | 2842 | 6.37 |
MLERC | 24 | 0.05 | 2866 | 6.42 |
MLERS | 491 | 1.10 | 3357 | 7.52 |
MLERSRd | 1 | 0.00 | 3358 | 7.52 |
Pd | 2 | 0.00 | 3360 | 7.53 |
RD | 1 | 0.00 | 3361 | 7.53 |
Rd | 27068 | 60.66 | 30429 | 68.19 |
SLEPC | 2601 | 5.83 | 33030 | 74.02 |
SLEPCPd | 75 | 0.17 | 33105 | 74.18 |
SLEPCRd | 2 | 0.00 | 33107 | 74.19 |
SLEPS | 4998 | 11.20 | 38105 | 85.39 |
SLEPSPd | 52 | 0.12 | 38157 | 85.51 |
SLEPSRd | 3 | 0.01 | 38160 | 85.51 |
SLEPd | 44 | 0.10 | 38204 | 85.61 |
SLERC | 1280 | 2.87 | 39484 | 88.48 |
SLERCPd | 10 | 0.02 | 39494 | 88.50 |
SLERS | 5110 | 11.45 | 44604 | 99.95 |
SLERSPd | 16 | 0.04 | 44620 | 99.99 |
SLERSRd | 4 | 0.01 | 44624 | 100.00 |
SLErS | 1 | 0.00 | 44625 | 100.00 |
Frequency Missing = 339718
NUMBER_LAYERS | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
0 | 364612 | 94.87 | 364612 | 94.87 |
1 | 15467 | 4.02 | 380079 | 98.89 |
2 | 3435 | 0.89 | 383514 | 99.78 |
3 | 739 | 0.19 | 384253 | 99.98 |
4 | 85 | 0.02 | 384338 | 100.00 |
5 | 5 | 0.00 | 384343 | 100.00 |
"""
Created on Tue Sep 29 18:12:40 2015
@author: angeloklin
"""
# Import libraries
import pandas as pd
import numpy as np
# load data
data = pd.read_csv("marscrater_pds.csv", na_values = [" "], low_memory = False)
# function to return hemisphere
def Hemisphere(Latitude):
if Latitude > 0:
return "North"
elif Latitude < 0:
return "South"
else:
return "Equator"
# function to get the morphology's main feature
def MainMorpho(Morpho):
if pd.isnull(Morpho):
return Morpho
foundAt = Morpho.find("/")
if foundAt >= 0:
return Morpho[0:foundAt]
else:
return Morpho
print("Mars Craters' data set summary:")
print("- Number of observations(rows): ", len(data))
print("- Number of variables(columns): ", len(data.columns))
print("")
print("Hemispheres:")
Hemispheres = data["LATITUDE_CIRCLE_IMAGE"].map(lambda lat: Hemisphere(lat))
freq = Hemispheres.value_counts(sort = True)
prop = Hemispheres.value_counts(sort = True, normalize = True)
print("- Missing Values: ", Hemispheres.isnull().sum())
print("- Frequency Table: ")
print("| | Frequency | Proportion |")
for i in range(len(freq)):
print("|", format(freq.index[i], "<10s"), "| ", format(freq[i], ">6d"), "| ", format(prop[i], ">.4f"), "|")
print("")
print("Ejecta Morphology 1 (Group by Main Feature):")
MorphoE1 = data["MORPHOLOGY_EJECTA_1"].map(lambda morpho: MainMorpho(morpho))
MorphoE1a = MorphoE1[MorphoE1.notnull()]
freq = MorphoE1a.value_counts(sort = True)
prop = MorphoE1a.value_counts(sort = True, normalize = True, dropna = False)
print("- Missing Values: ", MorphoE1.isnull().sum())
print("- Frequency Table: ")
print("| | Frequency | Proportion |")
for i in range(len(freq)):
print("|", format(freq.index[i], "<10s"), "| ", format(freq[i], ">6d"), "| ", format(prop[i], ">.4f"), "|")
print("")
print("Maximum Number of Cohesive Layers:")
freq = data["NUMBER_LAYERS"].value_counts(sort = False)
prop = data["NUMBER_LAYERS"].value_counts(sort = False, normalize = True)
print("- Missing Values: ", data["NUMBER_LAYERS"].isnull().sum())
print("- Frequency Table: ")
print("| | Frequency | Proportion |")
for i in range(len(freq)):
print("|", format(freq.index[i], "<6d"), "| ", format(freq[i], ">6d"), "| ", format(prop[i], ">.4f"), "|")
print("")
Mars Craters’ data set summary:
Missing Values: 0
Frequency Table:
Frequency | Proportion | |
---|---|---|
South | 233449 | 0.6074 |
North | 150887 | 0.3926 |
Equator | 7 | 0.0000 |
Missing Values: 339718
Frequency Table:
Frequency | Proportion | |
---|---|---|
Rd | 27068 | 0.6066 |
SLERS | 5110 | 0.1145 |
SLEPS | 4998 | 0.1120 |
SLEPC | 2601 | 0.0583 |
SLERC | 1280 | 0.0287 |
DLERS | 1242 | 0.0278 |
DLEPS | 631 | 0.0141 |
DLEPC | 495 | 0.0111 |
MLERS | 491 | 0.0110 |
DLERC | 386 | 0.0086 |
SLEPCPd | 75 | 0.0017 |
SLEPSPd | 52 | 0.0012 |
SLEPd | 44 | 0.0010 |
MLEPS | 43 | 0.0010 |
MLERC | 24 | 0.0005 |
MLEPC | 22 | 0.0005 |
SLERSPd | 16 | 0.0004 |
SLERCPd | 10 | 0.0002 |
DLEPCPd | 10 | 0.0002 |
DLERCPd | 7 | 0.0002 |
SLERSRd | 4 | 0.0001 |
SLEPSRd | 3 | 0.0001 |
SLEPCRd | 2 | 0.0000 |
DLEPSPd | 2 | 0.0000 |
DLERSRd | 2 | 0.0000 |
Pd | 2 | 0.0000 |
RD | 1 | 0.0000 |
MLERSRd | 1 | 0.0000 |
DLEPd | 1 | 0.0000 |
SLErS | 1 | 0.0000 |
DLSPC | 1 | 0.0000 |
Missing Values: 0
Frequency Table:
Frequency | Proportion | |
---|---|---|
0 | 364612 | 0.9487 |
1 | 15467 | 0.0402 |
2 | 3435 | 0.0089 |
3 | 739 | 0.0019 |
4 | 85 | 0.0002 |
5 | 5 | 0.0000 |