Week 2 Discussion

Part 1)

Classes

A class in R is a way to classify types of data. There are 6 basic data types a variable can be: numeric, single, character, logical, complex, or raw. These types describe what a variable is. For example if a variable data is logical, that means the it represents Boolean values. Knowing what type/class our variables are is important information, as it may sway the way we analyze the data.

Data Structures

Data structures are used to organize variables. There are 6 basic data structures in R: vector, list, matrix, data frame, array, and factors. A data structure can be homogeneous, which would consist of only one type of data. A data structure could also be heterogeneous, which can consist of multiple types of data.

Part 2)

Standard Deviation

sd

## function (x, na.rm = FALSE) 
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <bytecode: 0x00000148a6422820>
## <environment: namespace:stats>

It looks like for this function, by default does not remove missing values. The function seems to read as, take the square root of the variance for vector(x) or factor(x). However, if x is NOT a vector or factor then double x and take the square root of that.

Interquartile Range

IQR

## function (x, na.rm = FALSE, type = 7) 
## diff(quantile(as.numeric(x), c(0.25, 0.75), na.rm = na.rm, names = FALSE, 
##     type = type))
## <bytecode: 0x00000148a44a3040>
## <environment: namespace:stats>

This seems to say, given x, keeping missing values, type = 7 (I’m not too sure what the 7 represents here). It then seems to take the difference or the results from another function being quantile. With what I know about IQR I’m assuming the quantile function is producing the 0.75 and 0.25 quartiles of x. So this would produce the 0.50 quartile, being the IQR.

My Function

I decided to write a function that converts pounds to kilos

pounds_to_kilograms <- function(pounds){
  pounds * 0.453592
}
pounds_to_kilograms(100)

## [1] 45.3592

Part 3)

Bayes` Theorem:

Bayes’ Theorem is a form of condition probabilities, it’s a formula we can use to update probabilities if new evidence is available. For example, say there are two boxes, box one contains 3 socks and 1 shoe, box two contains 3 shoes and 1 sock. We had to choose one of these boxes blindfolded and then told to guess which box we chose. With that information the chance we guess correctly is 50/50 since the probability is 1/2 for each box. However, say we pulled a shoe from our box then were told to guess which box we chose from. This new information, or evidence, makes us lean towards box 2 since we know there were more shoes in that box. Bayes Theorem gives us a formula that allows for us to calculate the change in probabilities give new evidence. Bayes` Theorem is written as the follow P(A | B) = (P (B | A) * P(A)) / P(B)

Part 4)

Guided Practice 3.43:

We will let A be academic events, S be sporting events, N be no events, and F be a full Parking garage. P(A) = 0.35, P(S) = 0.20, P(N) = 0.45 P(F|A) = 0.25, P(F|S) = 0.70, P(F|N) = 0.05

Using Bayes` Theorem:

P(S|F) = (P(F|S) * P(F)) / (P(F|S) + P(F|A) + P(F|N)) P(S|F) = (0.70 * 0.20) / ((0.70 * 0.20) + (0.25 * 0.35) + (0.05 * 0.45)) P(S|F) = 0.56

The probability there is a sporting event given the parking garage is full, is 56%

Tree Diagram

# Using Rgraphviz library
library(Rgraphviz)

## Loading required package: graph

## Loading required package: BiocGenerics

## Loading required package: generics

## 
## Attaching package: 'generics'

## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min

## Loading required package: grid

# Adding labels and probabilities to the nodes
Node1 <- "Event"
Node2 <- "Academic, 0.35"
Node3 <- "Sporting, 0.20"
Node4 <- "None, 0.45"
Node5 <- "Full, 0.25"
Node6 <- "Spaces Available, 0.75"
Node7 <- "Full, 0.7"
Node8 <- "Spaces Available, 0.3"
Node9 <- "Full, 0.05"
Node10 <- "Spaces Available, 0.95"
nodeNames <- c(Node1,Node2,Node3,Node4,Node5,Node6,Node7,Node8,Node9,Node10)

# Creating the graph
graph <- new("graphNEL", nodes=nodeNames, edgemode="directed")

# Adding the edges
graph <- addEdge(nodeNames[1], nodeNames[2], graph)
graph <- addEdge(nodeNames[1], nodeNames[3], graph)
graph <- addEdge(nodeNames[1], nodeNames[4], graph)
graph <- addEdge(nodeNames[2], nodeNames[5], graph)
graph <- addEdge(nodeNames[2], nodeNames[6], graph)
graph <- addEdge(nodeNames[3], nodeNames[7], graph)
graph <- addEdge(nodeNames[3], nodeNames[8], graph)
graph <- addEdge(nodeNames[4], nodeNames[9], graph)
graph <- addEdge(nodeNames[4], nodeNames[10], graph)

#plotting the created graph
plot(graph)

I need some more practice on the formatting of these diagrams.