Ia. Write one paragraph each (in your own words) describing what are classes and one paragraph on what are data structures (with examples).

Data Structures - Are the organizational format used to store data in memory. They are defined by their dimensions and whether they are homogeneous or heterogeneous.
* Homogeneous: All the elements must be of the same data type
* Heterogeneous: Each element can be of a different data type

*Examples of Data Structures
+ Vectors: One-dimensional and homogeneous
+ Matrices: Two-dimensional and homogeneous
+ Lists: “Similar” to a vector but heterogeneous
+ Data Frames: Table-like and heterogenous

Classes - I admit I am a little unclear on this. In my brief exposure to C++ programming language, classes were sort of a custom data type where not only the data type is stored but what can be done with that data type. I remember this as methods.

Ib. Pick a dataset (from base R, AER package or even the titanic dataset), and apply the two commands on your data. What do you find, and does it make sense? Please explain.

I used the titanic dataset. Using class(titanic_train) command, I see it is a data.frame, which I feel is obvious. When I run str(titanic_train), I see an expansion of the data.frame details, including the size of the data.frame (891 obs of 12 variables). I also see each variable and their data type. These types include integer, character, and number.

#Grab the titancic dataset and determine Data Class and Structure
library(titanic)
class(titanic_train)
## [1] "data.frame"
str(titanic_train)
## 'data.frame':    891 obs. of  12 variables:
##  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
##  $ Sex        : chr  "male" "female" "female" "female" ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  "" "C85" "" "C123" ...
##  $ Embarked   : chr  "S" "C" "S" "S" ...

IIa. Explore functions

A simple example I looked into was from base R: Exclusive Or function (xor).
It contains one line of beautiful line of logic to look at.

(x | y) & !(x & y)

Here we can break it down into sections:
(x | y) = x OR y (ie TRUE if at least one is TRUE)
(x & y) = x AND y (TRUE only if both are TRUE)
!(x & y) = NOT(x AND y) (TRUE if they are not both TRUE…I am getting dizzy)
Bring these both together and it is a function that returns TRUE when at least one is TRUE and not both TRUE

IIb. Write your own function

For this custom function, I wanted to keep it simple and quickly realized how complex this can become. So I finally landed on the classic and simple: Print “Hello World, my name is (name)”

Here, we have all the basic information we need for a function: Function Name = The name of your function (YourName) Argument = What will get passed into the function (name) Script = What action will we take on the argument

# The wickedly classic and ubiquitous Hello World; 
# every programmer's first program?

PrintName <- function(name){
  
  return(paste("Hello, world! My name is", name))
  }

HelloWorld <- PrintName("Dan")
print(HelloWorld)
## [1] "Hello, world! My name is Dan"

IIIa. Bayes’ Theorem

Bayes Theorem is closely related to condition probability and is used to update the probability of an event based on new evidence. It allows you to update prior understanding by using conditional probability. It simply allows us to recalculate our results as we gather new information. Side note from the PDF: I personally love the term “Posterior Probability!”

Also, write out the formula. Pick up on how to to type equations in R Markdown using Latex terminology here

\[ P(A\mid B) = \frac {P(B\mid A) * P(A)}{P(B)} \]

IVa. Guided Practice 3.43

From the information given in the guided practice:

Condition that lot is full = \(B\)
Whether there is an sports event = \(P(A_1) = 0.2\)
Conditional Probability of B given \(A_1 = P(B \mid A_1) = 0.7\)
Academic event = \(P(A_2)\) = 0.35
Conditional Probability of B given \(A_2 = P(B \mid A_2) = 0.25\)
No event = \(P(A_3) = 0.45\)
Conditional Probability of B given \(A_3 = P(B \mid A_3) = 0.05\)

Using Bayes’ Theorem and simply plugging in values…

\[ P(A_1\vert B) = \frac {P(B\mid A_1) * P(A_1)}{P(B\vert A_1) * P(A_1) + P(B\vert A_2) * P(A_2) + P(B\vert A_3) * P(A_3)} = 0.56 \]

IVb. Tree Diagram (to the best of my ability!)

##
## I used the data.tree package and following instructions..
## https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html
##

library(data.tree)

Begin <- Node$new("Begin")
academic <- Begin$AddChild("Academic\nP=0.35")
AFull <-  academic$AddChild("Full\nP=0.25")
ANFull <- academic$AddChild("Not Full\nP=0.75")

sporting <- Begin$AddChild("Sporting\nP=0.20")
SFull <- sporting$AddChild("Full\nP=0.70")
SNFull <- sporting$AddChild("Not Ful\nP=0.30l")

neither <- Begin$AddChild("Neither\nP=0.45")
NFull <- neither$AddChild("Full\nP=0.05")
NNFull <- neither$AddChild("Not Full\nP=0.95")

print(Begin)
##                   levelName
## 1  Begin                   
## 2   ¦--Academic\nP=0.35    
## 3   ¦   ¦--Full\nP=0.25    
## 4   ¦   °--Not Full\nP=0.75
## 5   ¦--Sporting\nP=0.20    
## 6   ¦   ¦--Full\nP=0.70    
## 7   ¦   °--Not Ful\nP=0.30l
## 8   °--Neither\nP=0.45     
## 9       ¦--Full\nP=0.05    
## 10      °--Not Full\nP=0.95
plot(Begin)