Example dataset

str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

class(iris)

## [1] "data.frame"

typeof(iris)

## [1] "list"

Using the str(), class(), and typeof() commands on the iris data, we can see that we get a data frame, with a list of data. It also uses a factor with the species variable by identifying three levels of data that can then be used to sort and filter the numeric data.

Part II Reading and Writing Functions in R

  sd(iris$Sepal.Length)

## [1] 0.8280661

  mad(iris$Sepal.Length)

## [1] 1.03782

Opening up the documentation for sd() and mad() we get that sd() calculates the standard deviation of the values in the vector or R object provided. We are also told it uses the denominator (n-1) as it assumes we are working with a sample of the data and not using the whole population. mad() is the median absolute deviation, which takes a numeric vector as an input with some variables that could change (center, constant, na.rm, low, and high) and uses the formula constant * cMedian(abs(x - center)).

  trapezoid_area <- function(l, b1, b2){
    return(0.5*(b1 + b2)*l)
  }
  trapezoid_area(2, 1, 4)

## [1] 5

Part III Bayes Theorem

Bayes Theorem is a method to calculate the probability of an event occurring given some other event already occurred. It is given by the formula \(P(A|B) = \frac{P(B|A)*P(A)}{P(B)}\) Where P(X) is the probability of event X occurring and P(X|Y) is the probability of X given that Y has already occurred. This theorem is useful in the cases where we might know the individual probability of both events A and B occurring, and that we would know the probability of P(B|A).

An example would be that given that the probability of rolling two six sided (fair) dice (1 blue, 1 red) and getting a sum of 3 is \(\frac{2}{36}\) (P(B) = \(\frac{2}{36}\)), the probability of rolling a 2 on the blue six sided (fair) die is \(\frac{1}{6}\) (P(A) = \(\frac{1}{6}\)), and given we rolled a 2 on the blue die, the probability of the sum being 3 would be \(\frac{1}{6}\) (P(B|A) = \(\frac{1}{6}\)). So if we wanted to know the probability of rolling a 2 on the blue die given that the sum of two dice was 3 we would use \(P(A|B) = \frac{\frac{1}{6}*\frac{1}{6}}{\frac{2}{36}} = \frac{1}{2}\).

Part IV

Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this

Event A: Academic event in evening \(P(A) = 0.35\) Event S: Sporting event in evening \(P(S) = 0.2\) Event N: No event in evening \(P(N) = 0.45\) Event G: Garage fills up \(P(G|A) = 0.25\) & \(P(G|S) = 0.7\) & \(P(G|N) = 0.05\) To find the probability of G, we can identify look at each discrete event, i.e. the probability of G given A/S/N, which is the product of each event times the probability of the garage being full during each event. \(P(G) = P(G|A)*P(A) + P(G|S)*P(S) + P(G|N)*P(N)\) \(P(G) = 0.25\) Putting this into Bayes Theorem, we get: \(P(S|G) = \frac{P(G|S)*P(S)}{P(G)}\) \(P(S|G) = \frac{0.14}{0.25}\) \(P(S|G) = 0.56\) So there is a 56% chance that there is a sporting event given the garage is full.

library(BiocManager)
BiocManager::install("Rgraphviz")

## Bioconductor version 3.22 (BiocManager 1.30.27), R 4.5.1 (2025-06-13)

## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'Rgraphviz'

## Old packages: 'BH', 'boot', 'broom', 'clock', 'colorspace', 'cpp11',
##   'data.table', 'digest', 'distributional', 'e1071', 'extraDistr', 'fable',
##   'forecast', 'future'

To build the tree, we will use the bnlearn library

  library(bnlearn)
  tree = model2network("[Initial][Academic (35%)|Initial][Sport (20%)|Initial][No Event (45%)|Initial][Garage Full (25%)|Academic (35%)][Garage Not Full (75%)|Academic (35%)][Garage Full (70%)|Sport (20%)][Garage Not Full (30%)|Sport (20%)][Garage Full (5%)|No Event (45%)][Garage Not Full (95%)|No Event (45%)]")
  graphviz.plot(tree, layout = "dot")

## Loading required namespace: Rgraphviz

ADAN7301_Disc2

2026-01-23

Part I R Nuts and Bolts/Programming Readings

Example dataset

Part II Reading and Writing Functions in R

Part III Bayes Theorem

Part IV