This is an R Markdown document for providing documentation for performing Data Exploration of the Mars Craters DataSet
Attempted to study data that comprises of various Craters on Mars with recorded information of the Latitude, Longitude, Circular Diameter, Depth of rim floor, Different Morphological data, Number of Layers The Mars Crater Study aims to focus on the Circular Diameter as well as Depth of rim floor and Number of Layers.
Following attributes are extracted and mapped to the explicit values for the observations selected as above, from the Mars Craters DataSet:
DIAM_CIRCLE_IMAGE : Diameter from a non-linear least-squares circle fit to the vertices selected to manually identify the crater rim. Units are km.
DEPTH_RIMFLOOR_TOPOG : Defined as DEPTH_RIM_TOPOG - DEPTH_FLOOR_TOPOG Where - DEPTH_RIM_TOPOG : Average elevation of each of the manually determined N points along the crater rim. Points are selected as relative topographic highs under the assumption they are the least eroded so most original points along the rim. Units are km. - DEPTH_FLOOR_TOPOG : Average elevation of each of the manually determined N points inside the crater floor. Points were chosen as the lowest elevation that did not include visible embedded craters. Units are km.
NUMBER_LAYERS : The maximum number of cohesive layers in any azimuthal direction that could be reliably identified.
Hence observations selected comprise of Craters that has Depth of Rim Floor Topology of at least 1 and Circular Diameter of 10 and has more than 0 Layers
Loading RCurl package to help scrape data from web (stored on GitHub).
knitr::opts_chunk$set(message = FALSE, echo = TRUE)
library(RCurl)
library(ggplot2)
library(gcookbook)
library(DT)
Extracting data from GitHub Data file, and reading the same in CSV format
data.giturl <- "https://raw.githubusercontent.com/DataDriven-MSDA/RProgramming/master/marscrater_pds.csv"
mars.gitdata <- getURL(data.giturl)
mars.gitdata.csv <- read.csv2(text = mars.gitdata, header = T, sep = ",", stringsAsFactors = FALSE)
Subsetting the data based on Circular Diamater, Depth and Number of Layers
marscrater.datastudy <- subset(mars.gitdata.csv, (NUMBER_LAYERS > 0 & DEPTH_RIMFLOOR_TOPOG >
1 & DIAM_CIRCLE_IMAGE > 10), na.rm = T, select = c(DIAM_CIRCLE_IMAGE, DEPTH_RIMFLOOR_TOPOG,
NUMBER_LAYERS))
# Verifying the number of attributes
cat("Number of attributes for study : ", length(marscrater.datastudy))
## Number of attributes for study : 3
# Verifying the number of observations selected
cat("Number of observations for study : ", nrow(marscrater.datastudy))
## Number of observations for study : 1794
datatable(marscrater.datastudy, options = list(searching = FALSE, pageLength = 5,
lengthMenu = c(5, 10, 15, 20)))
Plotting Histograms based on continuous variables, treating Number of Layers as a continuous variable. IT plots the count for each number of layer.
marscrater.datastudy$NUMBER_LAYERS <- as.numeric(marscrater.datastudy$NUMBER_LAYERS)
hist(marscrater.datastudy$NUMBER_LAYERS, main = "Craters On Mars Histogram", xlab = "Layers In Craters",
border = "blue", col = "green")
A densisty plot can also be achieved by following.
ggplot(data = marscrater.datastudy) + geom_density(aes(x = NUMBER_LAYERS), fill = "yellow")
## Warning in plyr::split_indices(scale_id, n): '.Random.seed' is not an
## integer vector but of type 'NULL', so ignored
Plotting a histogram with ggplot function
ggplot(data = marscrater.datastudy) + geom_histogram(aes(x = NUMBER_LAYERS))
ggplot(marscrater.datastudy, aes(x = NUMBER_LAYERS)) + geom_histogram(binwidth = 1)
marscrater.datastudy$DEPTH_RIMFLOOR_TOPOG <- as.numeric(marscrater.datastudy$DEPTH_RIMFLOOR_TOPOG)
marscrater.datastudy$DIAM_CIRCLE_IMAGE <- as.numeric(marscrater.datastudy$DIAM_CIRCLE_IMAGE)
marscrater.datastudy$NUMBER_LAYERS <- as.character(marscrater.datastudy$NUMBER_LAYERS)
Plotting a simple scatter plot with grayscale / monochrome
Plotting simple scatter plot with base plot function
plot(DIAM_CIRCLE_IMAGE ~ DEPTH_RIMFLOOR_TOPOG, data = marscrater.datastudy)
Plotting simple scatter plot with ggplot function
ggplot(marscrater.datastudy, aes(x = DEPTH_RIMFLOOR_TOPOG, y = DIAM_CIRCLE_IMAGE)) +
geom_point() + scale_x_continuous(name = "Depth Of Crater", breaks = c(0, 0.5,
1, 1.5, 2, 2.5)) + scale_y_continuous(name = "Diameter Of Crater", breaks = c(30,
60, 90, 120, 150))
This divides the data with different shapes and color for categorical variable of Number Of Layers
ggplot(marscrater.datastudy, aes(x = (DEPTH_RIMFLOOR_TOPOG), y = (DIAM_CIRCLE_IMAGE),
shape = NUMBER_LAYERS, colour = NUMBER_LAYERS)) + geom_point() + scale_x_continuous(name = "Depth Of Crater",
breaks = seq(0, 2.5, 0.5), labels = seq(0, 2.5, 0.5)) + scale_y_continuous(name = "Diameter Of Crater",
breaks = seq(0, 150, 30), labels = seq(0, 150, 30))
This can also be achieved by setting a base variable to the basic structure/ parameters of the scatter plot and adding enhacements to the base variable We find the same results.
craterplot <- ggplot(marscrater.datastudy, aes(x = (DEPTH_RIMFLOOR_TOPOG), y = (DIAM_CIRCLE_IMAGE),
shape = NUMBER_LAYERS, colour = NUMBER_LAYERS))
craterplot + geom_point() + scale_x_continuous(name = "Depth Of Crater", breaks = seq(0,
4, 0.5), labels = seq(0, 4, 0.5)) + scale_y_continuous(name = "Diameter Of Crater",
breaks = seq(0, 100, 30), labels = seq(0, 100, 30)) + ggtitle("Mars Crater")
This creates faceted scatter plots for different levels of Number Of Layers and makes a separate pane for each.
craterbyfacetplot <- ggplot(marscrater.datastudy, aes(x = DEPTH_RIMFLOOR_TOPOG, y = DIAM_CIRCLE_IMAGE))
craterbyfacetplot + geom_point(aes(color = NUMBER_LAYERS)) + facet_wrap(~NUMBER_LAYERS,
labeller = "label_value") + labs(x = "Depth Of Crater", y = "Diameter Of Crater") +
ggtitle("Mars Crater")
ggplot(marscrater.datastudy, aes(x = NUMBER_LAYERS, y = DIAM_CIRCLE_IMAGE)) + geom_boxplot()
The colors area for the box plots are repeated cyclically
boxplot(DEPTH_RIMFLOOR_TOPOG ~ NUMBER_LAYERS, data = marscrater.datastudy, notch = FALSE,
col = (c("gold", "blue", "green")), main = "Mars Craters", xlab = "Number Of Layers",
ylab = "Depth")
Creating notched boxplots
boxplot(DEPTH_RIMFLOOR_TOPOG ~ NUMBER_LAYERS, data = marscrater.datastudy, notch = TRUE,
col = (c("gold", "blue", "green")), main = "Mars Craters", xlab = "Number Of Layers",
ylab = "Depth")
## Warning in bxp(structure(list(stats = structure(c(1.01, 1.07, 1.15, 1.34, :
## some notches went outside hinges ('box'): maybe set notch=FALSE
The following code allows the plots to be spooled to a file of desired format. Here it is spooled to PDF format.
sink("MarsCraterPlots", append = TRUE, split = TRUE)
cat("Plots for Mars Crater Data ")
pdf("F:\\MarsCraterPlots_PDF.pdf")
ggplot(marscrater.datastudy, aes(x = NUMBER_LAYERS, y = DIAM_CIRCLE_IMAGE)) + geom_boxplot()
jpeg("F:\\MarsCraterPlots_JPEG.jpeg")
marscrater.datastudy$DIAM_CIRCLE_IMAGE <- as.numeric(marscrater.datastudy$DIAM_CIRCLE_IMAGE)
hist(x = marscrater.datastudy$DIAM_CIRCLE_IMAGE, xlab = "Histogram of Mars Crater Diameter")
dev.off()