Summary

Specialisation Data Analysis and Interpretation
Course Data Management and Visualisation
Education Institution Wesleyan University
Publisher Coursera
Assignment Running Your Second Program

Exploratory Data Analysis

Source

The Mars Craters data set was made available by Wesleyan University/Coursera as part of the Data Management and Visualisation course, of the Data Analysis and Interpretation Specialisation, from the Ph.D. Thesis Planetary Surface Properties, Cratering Physics, and the Volcanic History of Mars from a New Global Martian Crater Database (2011) by Robbins, S.J., University of Colorado at Boulder.

Size

The data set has a total of 384343 observations and 10 variables.

The variables are: CRATER_ID, CRATER_NAME, LATITUDE_CIRCLE_IMAGE, LONGITUDE_CIRCLE_IMAGE, DIAM_CIRCLE_IMAGE, DEPTH_RIMFLOOR_TOPOG, MORPHOLOGY_EJECTA_1, MORPHOLOGY_EJECTA_2, MORPHOLOGY_EJECTA_3 and NUMBER_LAYERS.

Univariate Analysis

Hemisphere

Hemisphere is a variable derived from the LATITUDE_CIRCLE_IMAGE variable to transform the continuous coordinates into categories, for the sake of brevity.

Hemisphere shows seven occurrences in the Equator, same as Latitude equals to zero. Just above 60% of the observations are located in the South Hemisphere. Also, all the observations have values.

Quadrangle

Quadrangle is a variable derived from both LATITUDE_CIRCLE_IMAGE and LONGITUDE_CIRCLE_IMAGE variables. (see below a definition from Wikipedia)

Each Quadrangle has approximatelly from one to five percent of the recorded craters, being MC-16, Memnonia the one with the most observations (20455 = 5.32%), and MC-10: Lunae Palus the one with the lower number of records (3478 = 0.90%).

List of quadrangles on Mars (Wikipedia):

The surface of Mars has been divided into 30 quadrangles by the United States Geological Survey, so named because their borders lie along lines of latitude and longitude and so maps appear rectangular. Martian quadrangles are named after local features and are numbered with the prefix “MC” for “Mars Chart”. West longitude is used.

The following imagemap of the planet Mars is divided into 30 linked quadrangles. Click on the quadrangle and you will be taken to the corresponding article pages. North is at the top; 0°N 180°W is at the far left on the equator. The map images were taken by the Mars Global Surveyor.

Ejecta Morphology 1 (Group by Main Feature)

The variable MORPHOLOGY_EJECTA_1 has 339718 out of 384343 values missing, or 88.3%. The recording with existing content are divided in a large number of categories if considered the full morphology qualification. If taken into account just the first classification, the number of categories is reduced to 29.

From the recorded data, considering just the first classification, shows that 27069, or 60.6%, are of the RD category. The only two other categories that have more than 10% are SLERS (11.45% = 5111) and SLEPS (11.20% = 4998).

Ejecta Morphology 2 (Group by Main Feature)

The variable MORPHOLOGY_EJECTA_2 has 364867 out of 384343 values missing, or 94.9%. The recording with existing content are divided in a large number of categories if considered the full morphology qualification. If taken into account just the first classification, the number of categories is reduced to 10.

From the recorded data, considering just the first classification, shows that 6540, or 33.6%, are of the HUSL category. The only two other categories that have more than 10% are HUBS (4459 = 22.9%) and SMSL (2797 = 14.3%).

Ejecta Morphology 3 (Group by Main Feature)

The variable MORPHOLOGY_EJECTA_3 has 383050 out of 384343 values missing, or 99.6%. The recording with existing content are divided in a large number of categories if considered the full morphology qualification. If taken into account just the first classification, the number of categories is reduced to 24.

From the recorded data, considering just the first classification, shows that 351, or 27.1%, are of the PIN-CUSHION category. The other category that has more than 10% is SMALL-CROWN (20.6% = 267).

Maximum Number of Cohesive Layers

The NUMBER_LAYERS variable has six categories (0, 1, 2, 3, 4 and 5) and none of its observations are missing. The vast majority of craters are identified as having “0” layers, counting 364612, or 94.87% of the records.

Using SAS

Code

/* Use Course's Library */
LIBNAME mydata "/courses/d1406ae5ba27fe300" ACCESS = readonly;

/* Configure the Data */
DATA NEW;
  /* Data set */
  SET   mydata.marscrater_pds;
  LABEL Hemisphere    = "Hemisphere"
        Quadrangles   = "Quadrangles"
        MorphoE1U     = "Ejecta Morphology 1 (Group by Main Feature)"
        MorphoE2U     = "Ejecta Morphology 2 (Group by Main Feature)"
        MorphoE3U     = "Ejecta Morphology 3 (Group by Main Feature)"
        NUMBER_LAYERS = "Maximum Number of Cohesive Layers";

  /* Categorise the Latitude in Hemispheres */
  IF (LATITUDE_CIRCLE_IMAGE < 0)
    THEN Hemisphere = "South  ";
    ELSE IF (LATITUDE_CIRCLE_IMAGE > 0)
      THEN Hemisphere = "North  ";
      ELSE Hemisphere = "Equator";

  /* convert coordinates to Quadrangles: https://en.wikipedia.org/wiki/List_of_quadrangles_on_Mars */
  LA = LATITUDE_CIRCLE_IMAGE;
  LO = LONGITUDE_CIRCLE_IMAGE + 180;
  IF LA >=  65 AND LA <=  90 AND LO >=   0 AND LO <= 360 THEN Quadrangle = "MC-01: Mare Boreum (North Pole)";
  IF LA >=  30 AND LA <   65 AND LO >= 120 AND LO <  180 THEN Quadrangle = "MC-02: Diacria";
  IF LA >=  30 AND LA <   65 AND LO >=  60 AND LO <  120 THEN Quadrangle = "MC-03: Arcadia";
  IF LA >=  30 AND LA <   65 AND LO >=   0 AND LO <   60 THEN Quadrangle = "MC-04: Mare Acidalium";
  IF LA >=  30 AND LA <   65 AND LO >= 300 AND LO <= 360 THEN Quadrangle = "MC-05: Ismenius Lacus";
  IF LA >=  30 AND LA <   65 AND LO >= 240 AND LO <  300 THEN Quadrangle = "MC-06: Casius";
  IF LA >=  30 AND LA <   65 AND LO >= 180 AND LO <  240 THEN Quadrangle = "MC-07: Cebrenia";
  IF LA >=   0 AND LA <   30 AND LO >= 135 AND LO <  180 THEN Quadrangle = "MC-08: Amazonis";
  IF LA >=   0 AND LA <   30 AND LO >=  90 AND LO <  135 THEN Quadrangle = "MC-09: Tharsis";
  IF LA >=   0 AND LA <   30 AND LO >=  45 AND LO <   90 THEN Quadrangle = "MC-10: Lunae Palus";
  IF LA >=   0 AND LA <   30 AND LO >=   0 AND LO <   45 THEN Quadrangle = "MC-11: Oxia Palus";
  IF LA >=   0 AND LA <   30 AND LO >= 315 AND LO <= 360 THEN Quadrangle = "MC-12: Arabia";
  IF LA >=   0 AND LA <   30 AND LO >= 270 AND LO <  315 THEN Quadrangle = "MC-13: Syrtis Major";
  IF LA >=   0 AND LA <   30 AND LO >= 225 AND LO <  270 THEN Quadrangle = "MC-14: Amenthes";
  IF LA >=   0 AND LA <   30 AND LO >= 180 AND LO <  225 THEN Quadrangle = "MC-15: Elysium";
  IF LA >= -30 AND LA <    0 AND LO >= 135 AND LO <  180 THEN Quadrangle = "MC-16: Memnonia";
  IF LA >= -30 AND LA <    0 AND LO >=  90 AND LO <  135 THEN Quadrangle = "MC-17: Phoenicis Lacus";
  IF LA >= -30 AND LA <    0 AND LO >=  45 AND LO <   90 THEN Quadrangle = "MC-18: Coprates";
  IF LA >= -30 AND LA <    0 AND LO >=   0 AND LO <   45 THEN Quadrangle = "MC-19: Margaritifer Sinus";
  IF LA >= -30 AND LA <    0 AND LO >= 315 AND LO <= 360 THEN Quadrangle = "MC-20: Sinus Sabaeus";
  IF LA >= -30 AND LA <    0 AND LO >= 270 AND LO <  315 THEN Quadrangle = "MC-21: Iapygia";
  IF LA >= -30 AND LA <    0 AND LO >= 225 AND LO <  270 THEN Quadrangle = "MC-22: Mare Tyrrhenum";
  IF LA >= -30 AND LA <    0 AND LO >= 180 AND LO <  225 THEN Quadrangle = "MC-23: Aeolis";
  IF LA >= -65 AND LA <  -30 AND LO >= 120 AND LO <  180 THEN Quadrangle = "MC-24: Phaethontis";
  IF LA >= -65 AND LA <  -30 AND LO >=  60 AND LO <  120 THEN Quadrangle = "MC-25: Thaumasia";
  IF LA >= -65 AND LA <  -30 AND LO >=   0 AND LO <   60 THEN Quadrangle = "MC-26: Argyre";
  IF LA >= -65 AND LA <  -30 AND LO >= 300 AND LO <= 360 THEN Quadrangle = "MC-27: Noachis";
  IF LA >= -65 AND LA <  -30 AND LO >= 240 AND LO <  300 THEN Quadrangle = "MC-28: Hellas";
  IF LA >= -65 AND LA <  -30 AND LO >= 180 AND LO <  240 THEN Quadrangle = "MC-29: Eridania";
  IF LA >= -90 AND LA <  -65 AND LO >=   0 AND LO <= 360 THEN Quadrangle = "MC-30: Mare Australe (South Pole)";

  /* Collapse the Morphology of Eject 1 to its Main Feature, to reduce the output */
  IF (INDEX(MORPHOLOGY_EJECTA_1, "/") = 0)
    THEN MorphoE1 = MORPHOLOGY_EJECTA_1;
    ELSE MorphoE1 = SUBSTR(MORPHOLOGY_EJECTA_1, 1, INDEX(MORPHOLOGY_EJECTA_1, "/") - 1);
  MorphoE1U = UPCASE(TRIM(MorphoE1));

  /* Collapse the Morphology of Eject 2 to its Main Feature, to reduce the output */
  IF (INDEX(MORPHOLOGY_EJECTA_2, "/") = 0)
    THEN MorphoE2 = MORPHOLOGY_EJECTA_2;
    ELSE MorphoE2 = SUBSTR(MORPHOLOGY_EJECTA_2, 1, INDEX(MORPHOLOGY_EJECTA_2, "/") - 1);
  MorphoE2U = UPCASE(TRIM(MorphoE2));

  /* Collapse the Morphology of Eject 3 to its Main Feature, to reduce the output */
  IF (INDEX(MORPHOLOGY_EJECTA_3, "/") = 0)
    THEN MorphoE3 = MORPHOLOGY_EJECTA_3;
    ELSE MorphoE3 = SUBSTR(MORPHOLOGY_EJECTA_3, 1, INDEX(MORPHOLOGY_EJECTA_3, "/") - 1);
  MorphoE3U = UPCASE(TRIM(MorphoE3));

PROC SORT;
  BY CRATER_ID;
/* Calculate Frequencies and Proportions */
PROC FREQ;
  TABLE Hemisphere Quadrangle MorphoE1U MorphoE2U MorphoE3U NUMBER_LAYERS;
RUN;

Output

Hemisphere

Hemisphere Frequency Percent Cumulative Frequency Cumulative Percent
Equator 7 0.00 7 0.00
North 150887 39.26 150894 39.26
South 233449 60.74 384343 100.00

Quadrangle

Quadrangle Frequency Percent Cumulative Frequency Cumulative Percent
MC-01: Mare Boreum (North Pole) 6405 1.67 6405 1.67
MC-02: Diacria 9592 2.50 15997 4.16
MC-03: Arcadia 7835 2.04 23832 6.20
MC-04: Mare Acidalium 4436 1.15 28268 7.35
MC-05: Ismenius Lacus 6762 1.76 35030 9.11
MC-06: Casius 9068 2.36 44098 11.47
MC-07: Cebrenia 15028 3.91 59126 15.38
MC-08: Amazonis 16738 4.35 75864 19.74
MC-09: Tharsis 12921 3.36 88785 23.10
MC-10: Lunae Palus 3478 0.90 92263 24.01
MC-11: Oxia Palus 4666 1.21 96929 25.22
MC-12: Arabia 7640 1.99 104569 27.21
MC-13: Syrtis Major 13522 3.52 118091 30.73
MC-14: Amenthes 13937 3.63 132028 34.35
MC-15: Elysium 18866 4.91 150894 39.26
MC-16: Memnonia 20455 5.32 171349 44.58
MC-17: Phoenicis Lacus 14561 3.79 185910 48.37
MC-18: Coprates 5777 1.50 191687 49.87
MC-19: Margaritifer Sinus 17022 4.43 208709 54.30
MC-20: Sinus Sabaeus 17664 4.60 226373 58.90
MC-21: Iapygia 19422 5.05 245795 63.95
MC-22: Mare Tyrrhenum 19977 5.20 265772 69.15
MC-23: Aeolis 18703 4.87 284475 74.02
MC-24: Phaethontis 14786 3.85 299261 77.86
MC-25: Thaumasia 16011 4.17 315272 82.03
MC-26: Argyre 15775 4.10 331047 86.13
MC-27: Noachis 16141 4.20 347188 90.33
MC-28: Hellas 9535 2.48 356723 92.81
MC-29: Eridania 13930 3.62 370653 96.44
MC-30: Mare Australe (South Pol 13690 3.56 384343 100.00

Ejecta Morphology 1 (Group by Main Feature)

MorphoE1U Frequency Percent Cumulative Frequency Cumulative Percent
DLEPC 495 1.11 495 1.11
DLEPCPD 10 0.02 505 1.13
DLEPD 1 0.00 506 1.13
DLEPS 631 1.41 1137 2.55
DLEPSPD 2 0.00 1139 2.55
DLERC 386 0.86 1525 3.42
DLERCPD 7 0.02 1532 3.43
DLERS 1242 2.78 2774 6.22
DLERSRD 2 0.00 2776 6.22
DLSPC 1 0.00 2777 6.22
MLEPC 22 0.05 2799 6.27
MLEPS 43 0.10 2842 6.37
MLERC 24 0.05 2866 6.42
MLERS 491 1.10 3357 7.52
MLERSRD 1 0.00 3358 7.52
PD 2 0.00 3360 7.53
RD 27069 60.66 30429 68.19
SLEPC 2601 5.83 33030 74.02
SLEPCPD 75 0.17 33105 74.18
SLEPCRD 2 0.00 33107 74.19
SLEPD 44 0.10 33151 74.29
SLEPS 4998 11.20 38149 85.49
SLEPSPD 52 0.12 38201 85.60
SLEPSRD 3 0.01 38204 85.61
SLERC 1280 2.87 39484 88.48
SLERCPD 10 0.02 39494 88.50
SLERS 5111 11.45 44605 99.96
SLERSPD 16 0.04 44621 99.99
SLERSRD 4 0.01 44625 100.00

Frequency Missing = 339718

Ejecta Morphology 2 (Group by Main Feature)

MorphoE2U Frequency Percent Cumulative Frequency Cumulative Percent
HU 1466 7.53 1466 7.53
HUAM 1373 7.05 2839 14.58
HUBL 4459 22.89 7298 37.47
HUSL 6540 33.58 13838 71.05
HUSP 77 0.40 13915 71.45
SM 805 4.13 14720 75.58
SMAM 844 4.33 15564 79.91
SMBL 1097 5.63 16661 85.55
SMSL 2797 14.36 19458 99.91
SMSP 18 0.09 19476 100.00

Frequency Missing = 364867

Ejecta Morphology 3 (Group by Main Feature)

MorphoE3U Frequency Percent Cumulative Frequency Cumulative Percent
BUMBLEBEE 18 1.39 18 1.39
BUTTERFLY 76 5.88 94 7.27
INNER IS BUTTERFLY 2 0.15 96 7.42
INNER IS PIN-CUSHION 87 6.73 183 14.15
INNER IS PSEUDO-BUTTERFLY 1 0.08 184 14.23
INNER IS PSEUDO-PIN-CUSHION 1 0.08 185 14.31
INNER IS PSEUDO-SMALL-CROWN 4 0.31 189 14.62
INNER IS SMALL-CROWN 66 5.10 255 19.72
INNER-MOST IS SMALL-CROWN 1 0.08 256 19.80
MIDDLE IS RECTANGULAR 1 0.08 257 19.88
OUTER IS BUTTERFLY 4 0.31 261 20.19
OUTER IS PSEUDO-BUTTERFLY 4 0.31 265 20.49
OUTER IS RECTANGULAR 1 0.08 266 20.57
OUTER IS SPLASH 56 4.33 322 24.90
PIN-CUSHION 351 27.15 673 52.05
PSEDUO-BUTTERFLY 1 0.08 674 52.13
PSEUDO-BUTTERFLY 118 9.13 792 61.25
PSEUDO-PIN-CUSHION 1 0.08 793 61.33
PSEUDO-RECTANGULAR 26 2.01 819 63.34
PSEUDO-SMALL-CROWN 56 4.33 875 67.67
RECTANGULAR 38 2.94 913 70.61
SANDBAR 58 4.49 971 75.10
SMALL-CROWN 267 20.65 1238 95.75
SPLASH 55 4.25 1293 100.00

Frequency Missing = 383050

Maximum Number of Cohesive Layers

NUMBER_LAYERS Frequency Percent Cumulative Frequency Cumulative Percent
0 364612 94.87 364612 94.87
1 15467 4.02 380079 98.89
2 3435 0.89 383514 99.78
3 739 0.19 384253 99.98
4 85 0.02 384338 100.00
5 5 0.00 384343 100.00

Using Python

Code

"""
Created on Tue Oct 01 01:27:35 2015

@author: angeloklin
"""
# Import libraries
import pandas as pd

# load data
data = pd.read_csv("marscrater_pds.csv", na_values = [" "], low_memory = False)

# function to return hemisphere
def Hemisphere(Latitude):
  if Latitude > 0:
    return "North"
  elif Latitude < 0:
    return "South"
  else:
    return "Equator"

# function to return hemisphere
def Quadrangle(Coordinates):
  la = Coordinates["LATITUDE_CIRCLE_IMAGE"]
  lo = Coordinates["LONGITUDE_CIRCLE_IMAGE"] + 180
  if la >=  65 and la <=  90 and lo >=   0 and lo <= 360: return "MC-01: Mare Boreum (North Pole)"
  if la >=  30 and la <   65 and lo >= 120 and lo <  180: return "MC-02: Diacria"
  if la >=  30 and la <   65 and lo >=  60 and lo <  120: return "MC-03: Arcadia"
  if la >=  30 and la <   65 and lo >=   0 and lo <   60: return "MC-04: Mare Acidalium"
  if la >=  30 and la <   65 and lo >= 300 and lo <= 360: return "MC-05: Ismenius Lacus"
  if la >=  30 and la <   65 and lo >= 240 and lo <  300: return "MC-06: Casius"
  if la >=  30 and la <   65 and lo >= 180 and lo <  240: return "MC-07: Cebrenia"
  if la >=   0 and la <   30 and lo >= 135 and lo <  180: return "MC-08: Amazonis"
  if la >=   0 and la <   30 and lo >=  90 and lo <  135: return "MC-09: Tharsis"
  if la >=   0 and la <   30 and lo >=  45 and lo <   90: return "MC-10: Lunae Palus"
  if la >=   0 and la <   30 and lo >=   0 and lo <   45: return "MC-11: Oxia Palus"
  if la >=   0 and la <   30 and lo >= 315 and lo <= 360: return "MC-12: Arabia"
  if la >=   0 and la <   30 and lo >= 270 and lo <  315: return "MC-13: Syrtis Major"
  if la >=   0 and la <   30 and lo >= 225 and lo <  270: return "MC-14: Amenthes"
  if la >=   0 and la <   30 and lo >= 180 and lo <  225: return "MC-15: Elysium"
  if la >= -30 and la <    0 and lo >= 135 and lo <  180: return "MC-16: Memnonia"
  if la >= -30 and la <    0 and lo >=  90 and lo <  135: return "MC-17: Phoenicis Lacus"
  if la >= -30 and la <    0 and lo >=  45 and lo <   90: return "MC-18: Coprates"
  if la >= -30 and la <    0 and lo >=   0 and lo <   45: return "MC-19: Margaritifer Sinus"
  if la >= -30 and la <    0 and lo >= 315 and lo <= 360: return "MC-20: Sinus Sabaeus"
  if la >= -30 and la <    0 and lo >= 270 and lo <  315: return "MC-21: Iapygia"
  if la >= -30 and la <    0 and lo >= 225 and lo <  270: return "MC-22: Mare Tyrrhenum"
  if la >= -30 and la <    0 and lo >= 180 and lo <  225: return "MC-23: Aeolis"
  if la >= -65 and la <  -30 and lo >= 120 and lo <  180: return "MC-24: Phaethontis"
  if la >= -65 and la <  -30 and lo >=  60 and lo <  120: return "MC-25: Thaumasia"
  if la >= -65 and la <  -30 and lo >=   0 and lo <   60: return "MC-26: Argyre"
  if la >= -65 and la <  -30 and lo >= 300 and lo <= 360: return "MC-27: Noachis"
  if la >= -65 and la <  -30 and lo >= 240 and lo <  300: return "MC-28: Hellas"
  if la >= -65 and la <  -30 and lo >= 180 and lo <  240: return "MC-29: Eridania"
  if la >= -90 and la <  -65 and lo >=   0 and lo <= 360: return "MC-30: Mare Australe (South Pole)"

# function to get the morphology's main feature
def MainMorpho(Morpho):
  if pd.isnull(Morpho):
    return Morpho
  foundAt = Morpho.find("/")
  if foundAt >= 0:
    m = Morpho[0:foundAt]
  else:
    m = Morpho
  return m.strip().upper()

print("Mars Craters' data set summary:")
print("- Number of observations(rows): ", len(data))
print("- Number of variables(columns): ", len(data.columns))
print("")

print("Hemispheres:")
Hemispheres = data["LATITUDE_CIRCLE_IMAGE"].map(lambda lat: Hemisphere(lat))
freq = Hemispheres.value_counts(sort = True)
prop = Hemispheres.value_counts(sort = True, normalize = True)
print("- Missing Values: ", Hemispheres.isnull().sum())
print("- Frequency Table: ")
print("|            | Frequency | Proportion |")
for i in range(len(freq)):
  print("|", format(freq.index[i], "<10s"), "|   ", format(freq[i], ">6d"), "|    ", format(prop[i], ">.4f"), "|")
print("")

print("Quadrangles:")
Quadrangles = data.loc[:, "LATITUDE_CIRCLE_IMAGE":"LONGITUDE_CIRCLE_IMAGE"].apply(Quadrangle, axis = 1)
freq = Quadrangles.value_counts(sort = True)
prop = Quadrangles.value_counts(sort = True, normalize = True)
print("- Missing Values: ", Quadrangles.isnull().sum())
print("- Frequency Table: ")
print("|                                     | Frequency | Proportion |")
for i in range(len(freq)):
  print("|", format(freq.index[i], "<35s"), "|   ", format(freq[i], ">6d"), "|    ", format(prop[i], ">.4f"), "|")
print("")

print("Ejecta Morphology 1 (Group by Main Feature):")
MorphoE1 = data["MORPHOLOGY_EJECTA_1"].map(lambda morpho: MainMorpho(morpho))
MorphoE1a = MorphoE1[MorphoE1.notnull()]
freq = MorphoE1a.value_counts(sort = True)
prop = MorphoE1a.value_counts(sort = True, normalize = True, dropna = False)
print("- Missing Values: ", MorphoE1.isnull().sum())
print("- Frequency Table: ")
print("|            | Frequency | Proportion |")
for i in range(len(freq)):
  print("|", format(freq.index[i], "<10s"), "|   ", format(freq[i], ">6d"), "|    ", format(prop[i], ">.4f"), "|")
print("")

print("Ejecta Morphology 2 (Group by Main Feature):")
MorphoE2 = data["MORPHOLOGY_EJECTA_2"].map(lambda morpho: MainMorpho(morpho))
MorphoE2a = MorphoE2[MorphoE2.notnull()]
freq = MorphoE2a.value_counts(sort = True)
prop = MorphoE2a.value_counts(sort = True, normalize = True, dropna = False)
print("- Missing Values: ", MorphoE2.isnull().sum())
print("- Frequency Table: ")
print("|            | Frequency | Proportion |")
for i in range(len(freq)):
  print("|", format(freq.index[i], "<10s"), "|   ", format(freq[i], ">6d"), "|    ", format(prop[i], ">.4f"), "|")
print("")

print("Ejecta Morphology 3 (Group by Main Feature):")
MorphoE3 = data["MORPHOLOGY_EJECTA_3"].map(lambda morpho: MainMorpho(morpho))
MorphoE3a = MorphoE3[MorphoE3.notnull()]
freq = MorphoE3a.value_counts(sort = True)
prop = MorphoE3a.value_counts(sort = True, normalize = True, dropna = False)
print("- Missing Values: ", MorphoE3.isnull().sum())
print("- Frequency Table: ")
print("|                              | Frequency | Proportion |")
for i in range(len(freq)):
  print("|", format(freq.index[i], "<28s"), "|   ", format(freq[i], ">6d"), "|    ", format(prop[i], ">.4f"), "|")
print("")


print("Maximum Number of Cohesive Layers:")
freq = data["NUMBER_LAYERS"].value_counts(sort = False)
prop = data["NUMBER_LAYERS"].value_counts(sort = False, normalize = True)
print("- Missing Values: ", data["NUMBER_LAYERS"].isnull().sum())
print("- Frequency Table: ")
print("|        | Frequency | Proportion |")
for i in range(len(freq)):
  print("|", format(freq.index[i], "<6d"), "|   ", format(freq[i], ">6d"), "|    ", format(prop[i], ">.4f"), "|")
print("")

Output

Mars Craters’ data set summary:

  • Number of observations(rows): 384343
  • Number of variables(columns): 10

Hemisphere

Missing Values: 0

Frequency Table:

Frequency Proportion
South 233449 0.6074
North 150887 0.3926
Equator 7 0.0000

Quadrangle

Missing Values: 0

Frequency Table:

Frequency Proportion
MC-16: Memnonia 20455 0.0532
MC-22: Mare Tyrrhenum 19977 0.0520
MC-21: Iapygia 19422 0.0505
MC-15: Elysium 18866 0.0491
MC-23: Aeolis 18703 0.0487
MC-20: Sinus Sabaeus 17664 0.0460
MC-19: Margaritifer Sinus 17022 0.0443
MC-08: Amazonis 16738 0.0435
MC-27: Noachis 16141 0.0420
MC-25: Thaumasia 16011 0.0417
MC-26: Argyre 15775 0.0410
MC-07: Cebrenia 15028 0.0391
MC-24: Phaethontis 14786 0.0385
MC-17: Phoenicis Lacus 14561 0.0379
MC-14: Amenthes 13937 0.0363
MC-29: Eridania 13930 0.0362
MC-30: Mare Australe (South Pole) 13690 0.0356
MC-13: Syrtis Major 13522 0.0352
MC-09: Tharsis 12921 0.0336
MC-02: Diacria 9592 0.0250
MC-28: Hellas 9535 0.0248
MC-06: Casius 9068 0.0236
MC-03: Arcadia 7835 0.0204
MC-12: Arabia 7640 0.0199
MC-05: Ismenius Lacus 6762 0.0176
MC-01: Mare Boreum (North Pole) 6405 0.0167
MC-18: Coprates 5777 0.0150
MC-11: Oxia Palus 4666 0.0121
MC-04: Mare Acidalium 4436 0.0115
MC-10: Lunae Palus 3478 0.0090

Ejecta Morphology 1 (Group by Main Feature)

Missing Values: 339718

Frequency Table:

Frequency Proportion
RD 27069 0.6066
SLERS 5111 0.1145
SLEPS 4998 0.1120
SLEPC 2601 0.0583
SLERC 1280 0.0287
DLERS 1242 0.0278
DLEPS 631 0.0141
DLEPC 495 0.0111
MLERS 491 0.0110
DLERC 386 0.0086
SLEPCPD 75 0.0017
SLEPSPD 52 0.0012
SLEPD 44 0.0010
MLEPS 43 0.0010
MLERC 24 0.0005
MLEPC 22 0.0005
SLERSPD 16 0.0004
SLERCPD 10 0.0002
DLEPCPD 10 0.0002
DLERCPD 7 0.0002
SLERSRD 4 0.0001
SLEPSRD 3 0.0001
SLEPCRD 2 0.0000
DLEPSPD 2 0.0000
DLERSRD 2 0.0000
PD 2 0.0000
DLEPD 1 0.0000
DLSPC 1 0.0000
MLERSRD 1 0.0000

Ejecta Morphology 2 (Group by Main Feature)

Missing Values: 364867

Frequency Table:

Frequency Proportion
HUSL 6540 0.3358
HUBL 4459 0.2289
SMSL 2797 0.1436
HU 1466 0.0753
HUAM 1373 0.0705
SMBL 1097 0.0563
SMAM 844 0.0433
SM 805 0.0413
HUSP 77 0.0040
SMSP 18 0.0009

Ejecta Morphology 3 (Group by Main Feature)

Missing Values: 383050

Frequency Table:

Frequency Proportion
PIN-CUSHION 351 0.2715
SMALL-CROWN 267 0.2065
PSEUDO-BUTTERFLY 118 0.0913
INNER IS PIN-CUSHION 87 0.0673
BUTTERFLY 76 0.0588
INNER IS SMALL-CROWN 66 0.0510
SANDBAR 58 0.0449
PSEUDO-SMALL-CROWN 56 0.0433
OUTER IS SPLASH 56 0.0433
SPLASH 55 0.0425
RECTANGULAR 38 0.0294
PSEUDO-RECTANGULAR 26 0.0201
BUMBLEBEE 18 0.0139
OUTER IS BUTTERFLY 4 0.0031
OUTER IS PSEUDO-BUTTERFLY 4 0.0031
INNER IS PSEUDO-SMALL-CROWN 4 0.0031
INNER IS BUTTERFLY 2 0.0015
INNER-MOST IS SMALL-CROWN 1 0.0008
OUTER IS RECTANGULAR 1 0.0008
INNER IS PSEUDO-BUTTERFLY 1 0.0008
MIDDLE IS RECTANGULAR 1 0.0008
PSEUDO-PIN-CUSHION 1 0.0008
INNER IS PSEUDO-PIN-CUSHION 1 0.0008
PSEDUO-BUTTERFLY 1 0.0008

Maximum Number of Cohesive Layers

Missing Values: 0

Frequency Table:

Frequency Proportion
0 364612 0.9487
1 15467 0.0402
2 3435 0.0089
3 739 0.0019
4 85 0.0002
5 5 0.0000