Graphical Data Analysis (GDA) Understanding the philosophy of ggplot2: Bar plot, Pie chart, Histogram, Boxplot, Scatter plot, Regression plots Data visualization with ggplot2 package also termed as Grammar of Graphics is a free, open-source, and easy-to-use visualization package widely used in R. It is the most powerful visualization package written by Hadley Wickham. Building Blocks of layers with the grammar of graphics Data: The element is the data set itself Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, line type Geometrics: How our data being displayed using point, line, histogram, bar, boxplot Facets: It displays the subset of the data using Columns and rows Statistics: Binning, smoothing, descriptive, intermediate Coordinates: the space between data and display using Cartesian, fixed, polar, limits Themes: Non-data link

The Iris Dataset contains four features (length and width of sepals and petals) of 50 samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). These measures were used to create a linear discriminant model to classify the species.

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Explore the iris data frame with str()

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

##Data Layer: In the data Layer, the source of the information is to be visualized i.e the mtcars dataset in the ggplot2 package.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
#Data Layer
ggplot(data = iris) + labs(title ="iris Data Plot")

# Aesthetic Layer

ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width , col = Petal.Length ))+labs(title = "iris Data Plot")

##control the essential elements, how our data being displayed using point, line, histogram, bar, boxplot

# Geometric layer
ggplot(data = iris, aes(x =Sepal.Length , y = Sepal.Width, col =Petal.Length  )) +
  geom_point() +
  labs(title = "Sepal.Length vs Sepal.Width", x = "Sepal.Length", y = "Sepal.Width")

##then plotting the Histogram plot # Adding size

ggplot(data = iris, aes(x = Sepal.Length, y =Sepal.Width , size= Petal.Length)) +
geom_point() +
labs(title = "Sepal.Length vs Sepal.Width", x = "Sepal.Length", y = "Sepal.Width")

# Adding shape and color
ggplot(data = iris, aes(x = Sepal.Length, y =Sepal.Width, col = factor(Petal.Length), shape = factor(Species))) +geom_point() +
labs(title = "Sepal.Length vs Sepal.Width", x = "Sepal.Length", y = "Sepal.Width")

categorical # A scatter plot

iris$Sepal.Length<-factor(iris$Sepal.Length)
ggplot(iris, aes(x = factor(Sepal.Length), y =Sepal.Width )) +
  geom_point()

# Histogram plot
ggplot(data = iris, aes(x =Sepal.Width )) +
geom_histogram(binwidth = 5,color="black", fill="lightblue") +
labs(title = "Histogram of Sepal.Width", x = "Sepal.Width")

Boxplot – geom_boxplot() Scatter plot – geom_point()

ggplot(data = iris, aes(x=as.factor(Sepal.Length), fill=Sepal.Length)) + 
       geom_bar(stat="count")

boxplot creation ##majority of the cars (62%) have 2 or 4 Carbs. Other common carb frequency is 1 (22%), 3/6/8 carbs are very uncommon based on the dataset. Boxplot The distribution of a quantitative response variable and a categorical explanatory variable. The example below displays the distribution of gas mileage based on the number of cylinders.

bx <- ggplot(data = iris, aes(x = factor(Sepal.Length), y = Sepal.Width )) + 
  geom_boxplot(fill = "red") + 
  ggtitle("iris dataset") +
  ylab("Sepal length") + 
  xlab("Sepal Width") 
bx

Petal.Width = table(iris$Petal.Width)
data.labels = names(Petal.Width)
share = round(Petal.Width/sum(Petal.Width)*100)
data.labels = paste(data.labels, share)
data.labels = paste(data.labels,"%",sep="") 
pie(Petal.Width,labels = data.labels,clockwise=TRUE, col=heat.colors(length(data.labels)), main="frequency of petal Width")

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.