###Welcome to Lecture IV (Chapter II)

I would like you to start this handout by defining the following parameters: # hosts - Is a device connect to computer network

vulnerability - in Information Technology is a flaw in code or design that can create a potential point of security compromise for an endpoint or network. Vulnerabilities create possible attack vectors, through which an intruder could run code or access a target system memory.

data frame - A data frame is a list of vectors which are of equal length. A matrix contains only one type of data, while a data frame accepts different types(numeric, character, factor, etc.)

list - List are objects which contain differenct types of elements - numbers, strings, vectors and another list within a list. List can contain a matrix or a function as its elements.

array - An array is a data structure that can hold multi-dimensional data. In R, the array is objects that can hold two or more than two-dimensional data. For example, in square matrices can contain two rows and two columns and dimension can take five. Arrays can store the values having only a similar kind of data types. The data can be more than one dimensional, where there are rows and columns and dimensions of some length.

matrix - A matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.

categorical data - Is data that contains categories like example OS has various operatiing systems, or a palette of sequential colors.

operating systems - An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.

nodes - A node is any active, physical, electronic device attached to a network. These devices are capable of either sending, receiving, or forwarding information; sometimes a combination of the three. Examples of nodes include bridges, switches, hubs , and modems to other computers, printers, and servers .

zones - Zoning implies some grouping of computing resources. This grouping could be by location, function, purpose, access type, subnet, etc. Because this is my how-to, I’m going to zone according to functional area and subnet.

IP - An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. An IP address serves two main functions: host or network interface identification and location addressing.

# create a new data frame of hosts & high vuln counts
assets.df <- data.frame(
  name=c("danube","gander","ganges","mekong","orinoco"),
  os=c("W2K8","RHEL5","W2K8","RHEL5","RHEL5"),
  highvulns=c(1,0,2,0,0))
#Let us now take a look at the data frame assets.df
View(assets.df)
#Describe our data frame in a few words

1. Describe our data frame in a few words

A data frame is a list of vectors which are of equal length. A matrix contains only one type of data, while a data frame accepts different types(numeric, character, factor, etc.)

Example above we created a data frame name assets.df, df standing for data frame. in the data frame we have combined data for name and os and highvulns columns names and their pertinent data.

# take a look at the data frame structure & contents
str(assets.df)
## 'data.frame':    5 obs. of  3 variables:
##  $ name     : chr  "danube" "gander" "ganges" "mekong" ...
##  $ os       : chr  "W2K8" "RHEL5" "W2K8" "RHEL5" ...
##  $ highvulns: num  1 0 2 0 0
head(assets.df)
# show a "slice" just the operating systems
assets.df["os"]
#Explain the output as well as the syntax of the R-code used for this activity (above).
# What was we did was slice the data frame to retrieve the row "os" from the assets.df data frame.
# it gives us a description of 1 row 5 elements.

# by default R creates "factors" for categorical data so
# we use as.character() to expand the factors out
head(assets.df$os)
## [1] "W2K8"  "RHEL5" "W2K8"  "RHEL5" "RHEL5"
#Explain the output
# What was we did was slice the data frame to retrieve the row "os" from the assets.df data frame.
# it gives us a description of 1 row 5 elements.
head(assets.df$os)
## [1] "W2K8"  "RHEL5" "W2K8"  "RHEL5" "RHEL5"
#Explain the output
# Retrieve the same results as the assets.df("os") function but rather than in a column it shows in a row format.
# add a new column
assets.df$ip <- c("192.168.1.5","10.2.7.5","192.168.1.7",
                     "10.2.7.6", "10.2.7.7")
assets.df

Describe what we have just accomplished in the chunk above. In addition, explain the syntax used to complete this task. # assets.df$ip is add a new column ip to the assets.df data frame and combined the ip address under the ip label.

# extract only nodes with more than one high vulnerability
head(assets.df[assets.df$highvulns>1,])

Explain the syntax as well as the output of the task executed above # Query was done to extract anything in assets.df data frame in the higvulns that was greater than 2.

# create a 'zones' column based on prefix IP value
assets.df$zones <- ifelse(grepl("^192",assets.df$ip),"Zone1","Zone2")
assets.df$zones
## [1] "Zone1" "Zone2" "Zone1" "Zone2" "Zone2"

Explain the syntax as well as the output of the chunk ran above. # Creates a “zones” by creating assets.df$zones and then create in the column base on the prefix IP value, if 192 then Zone1 else Zone2

head(assets.df)

Take a final look at the data frame and explain your findings on this activity. # assest.df data frame now has the name, os, highvulns, ip and the zones and the any IP starting with 192 is in Zone1, anything else in Zone2.

library(ggplot2)
assets.df$highvulns<-as.factor(assets.df$highvulns)
p<-ggplot(data=assets.df,aes())+geom_bar(aes(zones,fill=zones))+labs(title="IP counts by Zone")
p

# This bar chart diagram shows the count of by zones in Zone1 we have 2 IP addresses in Zone2 we have 3 IP addresses.

assets.df$highvulns<-as.factor(assets.df$highvulns)
p<-ggplot(data=assets.df,aes())+geom_bar(aes(os,fill=zones))+labs(title="IP counts by Zone")#Nice graph
p

# This bar chart diagram is showing the IP counts Operating systems in each zone. RHEL5 has 3 count and the W2K8 has two all can be verified when displaying the assets.df dataframe.

assets.df$highvulns<-as.factor(assets.df$highvulns)
p<-ggplot(data=assets.df,aes())+geom_bar(aes(highvulns,fill=highvulns))+facet_grid(~zones)+labs(title="High vulnerability counts by Zone")
p

# From the assets.df dataframe this bar chart diagram is showing the vulerabilities of each zone we can see that in Zone1 we have 1 high vulernabilty for danube and two for ganges and zeros for Zone 2, labels for name not clear I had to look at the data frame to distinguish what the bar charts were trying to tell me. It is little data so easy to figure out but is there is large amounts of data would not be so easy.