###Welcome to Lecture IV (Chapter II)
I would like you to start this handout by defining the following parameters: # hosts - Is a device connect to computer network
# create a new data frame of hosts & high vuln counts
assets.df <- data.frame(
name=c("danube","gander","ganges","mekong","orinoco"),
os=c("W2K8","RHEL5","W2K8","RHEL5","RHEL5"),
highvulns=c(1,0,2,0,0))
#Let us now take a look at the data frame assets.df
View(assets.df)
#Describe our data frame in a few words
# take a look at the data frame structure & contents
str(assets.df)
## 'data.frame': 5 obs. of 3 variables:
## $ name : chr "danube" "gander" "ganges" "mekong" ...
## $ os : chr "W2K8" "RHEL5" "W2K8" "RHEL5" ...
## $ highvulns: num 1 0 2 0 0
head(assets.df)
# show a "slice" just the operating systems
assets.df["os"]
#Explain the output as well as the syntax of the R-code used for this activity (above).
# What was we did was slice the data frame to retrieve the row "os" from the assets.df data frame.
# it gives us a description of 1 row 5 elements.
# by default R creates "factors" for categorical data so
# we use as.character() to expand the factors out
head(assets.df$os)
## [1] "W2K8" "RHEL5" "W2K8" "RHEL5" "RHEL5"
#Explain the output
# What was we did was slice the data frame to retrieve the row "os" from the assets.df data frame.
# it gives us a description of 1 row 5 elements.
head(assets.df$os)
## [1] "W2K8" "RHEL5" "W2K8" "RHEL5" "RHEL5"
#Explain the output
# Retrieve the same results as the assets.df("os") function but rather than in a column it shows in a row format.
# add a new column
assets.df$ip <- c("192.168.1.5","10.2.7.5","192.168.1.7",
"10.2.7.6", "10.2.7.7")
assets.df
Describe what we have just accomplished in the chunk above. In addition, explain the syntax used to complete this task. # assets.df$ip is add a new column ip to the assets.df data frame and combined the ip address under the ip label.
# extract only nodes with more than one high vulnerability
head(assets.df[assets.df$highvulns>1,])
Explain the syntax as well as the output of the task executed above # Query was done to extract anything in assets.df data frame in the higvulns that was greater than 2.
# create a 'zones' column based on prefix IP value
assets.df$zones <- ifelse(grepl("^192",assets.df$ip),"Zone1","Zone2")
assets.df$zones
## [1] "Zone1" "Zone2" "Zone1" "Zone2" "Zone2"
Explain the syntax as well as the output of the chunk ran above. # Creates a “zones” by creating assets.df$zones and then create in the column base on the prefix IP value, if 192 then Zone1 else Zone2
head(assets.df)
Take a final look at the data frame and explain your findings on this activity. # assest.df data frame now has the name, os, highvulns, ip and the zones and the any IP starting with 192 is in Zone1, anything else in Zone2.
library(ggplot2)
assets.df$highvulns<-as.factor(assets.df$highvulns)
p<-ggplot(data=assets.df,aes())+geom_bar(aes(zones,fill=zones))+labs(title="IP counts by Zone")
p
# This bar chart diagram shows the count of by zones in Zone1 we have 2 IP addresses in Zone2 we have 3 IP addresses.
assets.df$highvulns<-as.factor(assets.df$highvulns)
p<-ggplot(data=assets.df,aes())+geom_bar(aes(os,fill=zones))+labs(title="IP counts by Zone")#Nice graph
p
# This bar chart diagram is showing the IP counts Operating systems in each zone. RHEL5 has 3 count and the W2K8 has two all can be verified when displaying the assets.df dataframe.
assets.df$highvulns<-as.factor(assets.df$highvulns)
p<-ggplot(data=assets.df,aes())+geom_bar(aes(highvulns,fill=highvulns))+facet_grid(~zones)+labs(title="High vulnerability counts by Zone")
p
# From the assets.df dataframe this bar chart diagram is showing the vulerabilities of each zone we can see that in Zone1 we have 1 high vulernabilty for danube and two for ganges and zeros for Zone 2, labels for name not clear I had to look at the data frame to distinguish what the bar charts were trying to tell me. It is little data so easy to figure out but is there is large amounts of data would not be so easy.