Chapter 15 Probability
This is nose bleeding time for those not into Statistics.
15.1 What is Probability
15.1.1 Events and Probability
Events: a set of outcomes of an experiment (a subset of the sample space) to which a probability is assigned Probability: describes the “magnitude of chance” associated with making a particular observation or statement
Two types: frequentlist or classical: it is the assumed relative frequency with which an event occurs over many identical, objective trials. Bayesian: defines a “probability” in exactly the same way that most non-statisticians do - namely an indication of the plausibility of a proposition or a situation–based on personal observations
The Bayesian is subjective and uses a prior beliefs to define a prior probability distribution on the possible values of the unknown parameters. … This is in line with the theory of probability as developed by Kolmogorov and von Mises. A frequentist does parametric inference using just the likelihood function
See also: Likelihood: http://students.brown.edu/seeing-theory/basic-probability/index.html#first Expectation: http://students.brown.edu/seeing-theory/basic-probability/index.html#second and Estimation: http://students.brown.edu/seeing-theory/basic-probability/index.html#third
When two events are said to be
independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin.
15.1.2 Conditional Probability
Conditional probability is the probability of one event occuring AFTER taking into account the occurrence of another event. Conditional is the usual kind of probability that we reason with. If I take this action, what are the odds that ZZ? [If] is the key word here
The quantity Pr(A|B) represents the probability that A occurs, given that B has already occured
15.1.3 Intersection
The intersection of 2 events is written as Pr(A ??? B) and means the probability that both A and B occur simultaneously.
Pr(A ??? B) = Pr(A|B) x Pr(b) or pr(B|A) x Pr(A)
If Pr(A ??? B) =0 , then A and B are mutually exclusive.
See:
https://upload.wikimedia.org/wikipedia/commons/thumb/9/99/Venn0001.svg/330px-Venn0001.svg.png
15.1.4 Union
The union of two sets A and B is the set of elements which are in A, in B, or in both A and B. In symbols, {AB={x:xA{}xB}} A B = { x: x A x B}.[2]
The union of two events is the probability that A or B occurs.
15.1.5 Complement
In set theory, the complement of a set A refers to elements not in A. The relative complement of A with respect to a set B, also termed the difference of sets A and B, written B ??? A, is the set of elements in B but not in A
See:
https://en.wikipedia.org/wiki/File:Venn0010.svg
See Also: Set Theory: http://students.brown.edu/seeing-theory/compound-probability/index.html#first Combinatorics: http://students.brown.edu/seeing-theory/compound-probability/index.html#second Conditional Probability: http://students.brown.edu/seeing-theory/compound-probability/index.html#third
15.2 Random Variables and Probability Distributions
Random variables are those whose specific outcomes are assumed to arise by chance or according to some random or stochastic mechanism.
Probabilistic distributions are functions that define these probabilities
A quick example is the sample space of any number of coin flips, the outcomes will always be integer values, and you’ll never have half heads or quarter tails. Such a random variable is referred to as discrete. Discrete random variables give rise to discrete probability distributions.
Following are elementary ways in which random variables are summarized and how their corresponding probability distributions are dealt with statistically.
15.2.1 Realizations
Reaizations are actual observations of random variables.
15.2.2 Discrete Random Variables
Discrete Random Variables The word discrete means separate and individual. Thus discrete random variables are those that take on integer values only. They never include fractions or decimals.
Probability mass functions are probability distributions tied to discrete random varaibles.
Cumulative probability distributions of discrete random variables is the probability of observing less than or equal to x . The probability that X will take a value less than or equal to x.
The cumulative distribution function of a real-valued random variable X is the function given by {F_{X}(x)= (Xx),} F_{X}(x)= (Xx), where the right-hand side represents the probability that the random variable X takes on a value less than or equal to x. The probability that X lies in the semi-closed interval (a, b], where a < b, is therefore { (a
The cumulative distribution function (cdf) is the probability that the variable takes a value less than or equal to x. That is F(x)=Pr[X???x]=??
For a continuous distribution, this can be expressed mathematically as
F(x)=???x??????f(??)d??
For a discrete distribution, the cdf can be expressed as
F(x)=???xi=0f(i)
The following is the plot of the normal cumulative distribution function.
plot of normal cumulative distribution function see: http://www.itl.nist.gov/div898/handbook/eda/section3/gif/norcdf.gif
The horizontal axis is the allowable domain for the given probability function. Since the vertical axis is a probability, it must fall between zero and one. It increases from zero to one as we go from left to right on the horizontal axis.
X.outcomes <-c(-4,0,1,8)
X.prob <-c(.32,.48,.15,.05)
barplot(X.prob,ylim=c(0,.5),names.arg=X.outcomes,space = .05,xlab="x",ylab="Pr(X=x)")

X.cumul <-cumsum(X.prob)
X.cumul
[1] 0.32 0.80 0.95 1.00
barplot(X.cumul,names.arg=X.outcomes,space = .05,xlab="x",ylab="Pr(X=x)")

Mean and Variance of a discrete random variable<.br> Mean and variance are the two most useful properties used to describe or summarize the properties of a random variable.
It is a measure of spread for a distribution of a random variable that determines the degree to which the values of a random variable differ from the expected value.
Mean is the expected value or the average outcome that you can expect over many realizations
The variance of random variable X is often written as Var(X) or ??2 or ??2x.
For a discrete random variable the variance is calculated by summing the product of the square of the difference between the value of the random variable and the expected value (mean), and the associated probability of the value of the random variable, taken over all of the values of the random variable.
In symbols, Var(X) = (x - ??)2 P(X = x)
An equivalent formula is, Var(X) = E(X2) ??? [E(X)]2
The square root of the variance is equal to the standard deviation.
mu.X<-sum(X.outcomes*X.prob)
mu.X
[1] -0.73
# variance of X
var.X<-sum((X.outcomes-mu.X)^2*X.prob)
var.X
[1] 7.9371
# compute for standard deviation
sd.X<-sqrt(var.X)
sd.X
[1] 2.817286
The expected outcome of -.73 suggest that on average you’ll lose 0.73 per turn with a standard deviation of about 2.82. They describe the behavior of the random mechanism over the long run.
15.2.3 Continuous Random Variables
Continuous Random Variable Continuous is the opposite of discrete. Continuous random variables are those that take on any value including fractions and decimals. Continuous random variables give rise to continuous probability distributions.
Since they are not integers, it is not possible to assign probabilities to a specific number. Instead use intervals of values. Probabilities are now computed as areas underneath.
Cummulative Probability distributions of continuous Random Variables The cumulative distribution function for continuous random variables is just a straightforward extension of that of the discrete case. All we need to do is replace the summation with an integral.
Definition. The cumulative distribution function (“c.d.f.”) of a continuous random variable X is defined as:
F(x)=???x??????f(t)dtF(x)=?????????xf(t)dt
for ?????? < x < ???.
w<-seq(35,95,by=5)
lower.w <- w>=40 & w<=65
upper.w<-w>65 & w<=90
Fw<-rep(0,length(w))
Fw[lower.w]<-(w[lower.w]^2-80*w[lower.w]+1600)/1250
Fw[upper.w]<-(180*w[upper.w]-w[upper.w]^2-6850)/1250
Fw[w>90]<-1
plot(w,Fw,type="l",ylab="F(w)")
abline(h=c(0,1),col="gray",lty=2)

plot(w,Fw,type="l",ylab="F(w)")
abline(h=c(0,1),col="gray",lty=2)
# new lines
fw.specific<-(55.2-40)/625
fw.specific.area<-.5*15.2*fw.specific
abline(v=55.2,lty=3)
abline(h=fw.specific.area,lty=3)

Mean and Variance of a Continuous Random Variable
The mean of a discrete random variable X is a weighted average of the possible values that the random variable can take. Unlike the sample mean of a group of observations, which gives each observation equal weight, the mean of a random variable weights each outcome xi according to its probability, pi. The common symbol for the mean (also known as the expected value of X) is , formally defined by:
The mean of a random variable provides the long-run average of the variable, or the expected average outcome over many observations.
Example
Suppose an individual plays a gambling game where it is possible to lose $1.00, break even, win $3.00, or win $10.00 each time she plays. The probability distribution for each outcome is provided by the following table: Outcome -$1.00 $0.00 $3.00 $5.00 Probability 0.30 0.40 0.20 0.10 The mean outcome for this game is calculated as follows: = (-1.3) + (0.4) + (3.2) + (100.1) = -0.3 + 0.6 + 0.5 = 0.8. In the long run, then, the player can expect to win about 80 cents playing this game – the odds are in her favor. For a continuous random variable, the mean is defined by the density curve of the distribution. For a symmetric density curve, such as the normal density, the mean lies at the center of the curve. The law of large numbers states that the observed random mean from an increasingly large number of observations of a random variable will always approach the distribution mean . That is, as the number of observations increases, the mean of these observations will become closer and closer to the true mean of the random variable. This does not imply, however, that short term averages will reflect the mean.
In the above gambling example, suppose a woman plays the game five times, with the outcomes $0.00, -$1.00, $0.00, $0.00, -$1.00. She might assume, since the true mean of the random variable is $0.80, that she will win the next few games in order to “make up” for the fact that she has been losing. Unfortunately for her, this logic has no basis in probability theory. The law of large numbers does not apply for a short string of events, and her chances of winning the next game are no better than if she had won the previous game.
The variance of a discrete random variable X measures the spread, or variability, of the distribution, and is defined by
The standard deviation is the square root of the variance.
Example
In the original gambling game above, the probability distribution was defined to be: Outcome -$1.00 $0.00 $3.00 $5.00 Probability 0.30 0.40 0.20 0.10 The variance for this distribution, with mean = 0.8, may be calculated as follows: (-1 - 0.8)20.3 + (0 - 0.8)20.4 + (3 - 0.8)20.2 + (5 - 0.3)20.1 = (-1.8)20.3 + (-0.8)20.4 + (2.2)20.2 + (4.2)20.1 = 3.240.3 + 0.640.4 + 4.840.2 + 17.640.1 = 0.972 + 0.256 + 0.968 + 1.764 = 3.960, with standard deviation = 1.990. Since there is not a very large range of possible values, the variance is small.
# variable
fw <-rep(0,length(w))
fw[lower.w]<-(w[lower.w]-40)/625
fw[upper.w]<-(90-w[upper.w])/625
# plot
plot(w,w*fw,type="l",ylab="wf(w)")

Integrands for the expected value.
plot(w,(w-65)^2*fw,type="l",ylab="(w-65)^2 f(w)")

Variance of the probability density function for the tmeparture example.
15.2.4 Shape, Skew and Modality
What do we look for in ???Shape???? We must be able to describe the visual impressions. Shape of distribution: Populations with the same mean and standard deviation can still have distributions with very different shapes.
Symmetry
bell shaped or normal uniform Skewness skewed to the right (skewed positively) skewed to the left (skewed ??? ) graphs
Modality - # of prominent peaks unimodal bimodal
see picture here: https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson01/skew_graphs.gif
---
title: "Chapter 15 Probability"
output: html_notebook
---

<h1> Chapter 15 Probability </h1>
This is nose bleeding time for those not into Statistics.

<h2> 15.1 What is Probability</h2>
<h3> 15.1.1 Events and Probability</h3>
Events: a set of outcomes of an experiment (a subset of the sample space) to which a probability is assigned</br>
Probability: describes the "magnitude of chance" associated with making a particular observation or statement</br>
</br>

Two types:</br>
frequentlist or classical: it is the assumed relative frequency with which an event occurs over many identical, objective trials.</br>
Bayesian: defines a "probability" in exactly the same way that most non-statisticians do - namely an indication of the plausibility of a proposition or a situation--based on personal observations </p>

The Bayesian is subjective and uses a prior beliefs to define a prior probability distribution on the possible values of the unknown parameters. ... This is in line with the theory of probability as developed by Kolmogorov and von Mises. A frequentist does parametric inference using just the likelihood function </p>

See also:
Likelihood: http://students.brown.edu/seeing-theory/basic-probability/index.html#first </br>
Expectation: http://students.brown.edu/seeing-theory/basic-probability/index.html#second </br>
and Estimation: http://students.brown.edu/seeing-theory/basic-probability/index.html#third </br>

When two events are said to be <b>independent</b> of each other, what this means is that the probability that one event occurs in no way affects the probability of the other event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin. </p> 

<h3>15.1.2 Conditional Probability</h3>
Conditional probability is the probability of one event occuring AFTER taking into account the occurrence of another event. Conditional is the usual kind of probability that we reason with. If I take this action, what are the odds that ZZ? [If] is the key word here </p>

The quantity Pr(A|B) represents the probability that A occurs, given that B has already occured</P>



<h3>15.1.3 Intersection</h3>
The intersection of 2 events is written as Pr(A ??? B) and means the probability that both A and B occur simultaneously. </p>

Pr(A ??? B) = Pr(A|B) x Pr(b) or pr(B|A) x Pr(A) </p>

If Pr(A ??? B) =0 , then A and B are mutually exclusive.

See: https://upload.wikimedia.org/wikipedia/commons/thumb/9/99/Venn0001.svg/330px-Venn0001.svg.png </p>


<h3>15.1.4 Union </h3>
The union of two sets A and B is the set of elements which are in A, in B, or in both A and B. In symbols,
{\displaystyle A\cup B=\{x:x\in A{\text{ or }}x\in B\}} A  \cup B = \{ x: x \in A \text{  or  } x \in B\}.[2]

</p>
The union of two events is the probability that A or B occurs.

<h3>15.1.5 Complement</h3>
In set theory, the complement of a set A refers to elements not in A. The relative complement of A with respect to a set B, also termed the difference of sets A and B, written B ??? A, is the set of elements in B but not in A </p>

See: https://en.wikipedia.org/wiki/File:Venn0010.svg </p>


See Also: </br>
Set Theory: http://students.brown.edu/seeing-theory/compound-probability/index.html#first </br>
Combinatorics: http://students.brown.edu/seeing-theory/compound-probability/index.html#second </br>
Conditional Probability: http://students.brown.edu/seeing-theory/compound-probability/index.html#third </br>
</br>

<h2> 15.2 Random Variables and Probability Distributions </h2>
Random variables are those whose specific outcomes are assumed to arise by chance or according to some random or stochastic mechanism.</br>

Probabilistic distributions are functions that define these probabilities </br>


A quick example is the sample space of any number of coin flips, the outcomes will always be integer values, and you'll never have half heads or quarter tails. Such a random variable is referred to as discrete. Discrete random variables give rise to discrete probability distributions.</br>



Following are elementary ways  in which random variables are summarized and how their corresponding probability distributions are dealt with statistically.

<h3>15.2.1 Realizations</h3>
Reaizations are actual observations of random variables.

<h3>15.2.2 Discrete Random Variables</h3>
Discrete Random Variables</br>
The word discrete means separate and individual. Thus discrete random variables are those that take on integer values only. They never include fractions or decimals.</br>

Probability mass functions are probability distributions tied to discrete random varaibles.
</br>

Cumulative probability distributions of discrete random variables is the probability of observing less than or equal to x . The probability that X will take a value less than or equal to x. </p>

The cumulative distribution function of a real-valued random variable X is the function given by
{\displaystyle F_{X}(x)=\operatorname {P} (X\leq x),} F_{X}(x)=\operatorname {P} (X\leq x),
where the right-hand side represents the probability that the random variable X takes on a value less than or equal to x. The probability that X lies in the semi-closed interval (a, b], where a  <  b, is therefore
{\displaystyle \operatorname {P} (a<X\leq b)=F_{X}(b)-F_{X}(a).} \operatorname {P} (a<X\leq b)=F_{X}(b)-F_{X}(a).
</p>

The cumulative distribution function (cdf) is the probability that the variable takes a value less than or equal to x. That is
F(x)=Pr[X???x]=?? </p>

For a continuous distribution, this can be expressed mathematically as</br>

F(x)=???x??????f(??)d?? </br>

For a discrete distribution, the cdf can be expressed as</br>

F(x)=???xi=0f(i)</br>

The following is the plot of the normal cumulative distribution function. </br>

plot of normal cumulative distribution function
see: http://www.itl.nist.gov/div898/handbook/eda/section3/gif/norcdf.gif 

The horizontal axis is the allowable domain for the given probability function. Since the vertical axis is a probability, it must fall between zero and one. It increases from zero to one as we go from left to right on the horizontal axis.</p>

```{r}
X.outcomes <-c(-4,0,1,8)
X.prob <-c(.32,.48,.15,.05)
barplot(X.prob,ylim=c(0,.5),names.arg=X.outcomes,space = .05,xlab="x",ylab="Pr(X=x)")
```

```{r}
X.cumul <-cumsum(X.prob)
X.cumul
```

```{r}
barplot(X.cumul,names.arg=X.outcomes,space = .05,xlab="x",ylab="Pr(X=x)")
```

<b>Mean and Variance of a discrete random variable</b><.br>
Mean and variance are the two most useful properties used to describe or summarize the properties of a random variable. </p>

It is a measure of spread for a distribution of a random variable that determines the degree to which the values of a random variable differ from the expected value.</p>

Mean is the expected value  or the average outcome that you can expect over many realizations </p>


The variance of random variable X is often written as Var(X) or ??2 or ??2x.</p>

For a discrete random variable the variance is calculated by summing the product of the square of the difference between the value of the random variable and the expected value (mean), and the associated probability of the value of the random variable, taken over all of the values of the random variable.</p>

In symbols, Var(X) = (x - ??)2 P(X = x)  </p>

An equivalent formula is, Var(X) = E(X2) ??? [E(X)]2 </p>

The square root of the variance is equal to the standard deviation.</p>

```{r}
mu.X<-sum(X.outcomes*X.prob)
mu.X

```

```{r}
# variance of X
var.X<-sum((X.outcomes-mu.X)^2*X.prob)
var.X

```
```{r}
# compute for standard deviation
sd.X<-sqrt(var.X)
sd.X
```

The expected outcome of -.73 suggest that on average you'll lose 0.73 per turn with a standard deviation of about 2.82. They describe the behavior of the random mechanism over the long run.

<h3>15.2.3 Continuous Random Variables</h3>
Continuous Random Variable</br>
Continuous is the opposite of discrete. Continuous random variables are those that take on any value including fractions and decimals. Continuous random variables give rise to continuous probability distributions.</br>

Since they are not integers, it is not possible to assign probabilities to a specific number. Instead use intervals of values. Probabilities are now computed as areas underneath.</p>

<b>Cummulative Probability distributions of continuous Random Variables</b></br>
The cumulative distribution function for continuous random variables is just a straightforward extension of that of the discrete case. All we need to do is replace the summation with an integral. </p>

Definition. The cumulative distribution function ("c.d.f.") of a continuous random variable X is defined as:

F(x)=???x??????f(t)dtF(x)=?????????xf(t)dt </br>

for ?????? < x < ???.</br>

```{r}
w<-seq(35,95,by=5)
lower.w <- w>=40 & w<=65
upper.w<-w>65 & w<=90
Fw<-rep(0,length(w))
Fw[lower.w]<-(w[lower.w]^2-80*w[lower.w]+1600)/1250
Fw[upper.w]<-(180*w[upper.w]-w[upper.w]^2-6850)/1250
Fw[w>90]<-1
plot(w,Fw,type="l",ylab="F(w)")
abline(h=c(0,1),col="gray",lty=2)

```
```{r}
plot(w,Fw,type="l",ylab="F(w)")
abline(h=c(0,1),col="gray",lty=2)
# new lines
fw.specific<-(55.2-40)/625
fw.specific.area<-.5*15.2*fw.specific
# added lines
abline(v=55.2,lty=3)
abline(h=fw.specific.area,lty=3)

```


<b>Mean and Variance of a Continuous Random Variable</b></br>

The mean of a discrete random variable X is a weighted average of the possible values that the random variable can take. Unlike the sample mean of a group of observations, which gives each observation equal weight, the mean of a random variable weights each outcome xi according to its probability, pi. The common symbol for the mean (also known as the expected value of X) is  , formally defined by:</br>

The mean of a random variable provides the long-run average of the variable, or the expected average outcome over many observations.</br>

Example</br>

Suppose an individual plays a gambling game where it is possible to lose $1.00, break even, win $3.00, or win $10.00 each time she plays. The probability distribution for each outcome is provided by the following table:
Outcome		-$1.00	$0.00	$3.00	$5.00	</br>
Probability	  0.30	 0.40	 0.20	 0.10	</br>
The mean outcome for this game is calculated as follows: </br>
 = (-1*.3) + (0*.4) + (3*.2) + (10*0.1) = -0.3 + 0.6 + 0.5 = 0.8. </br>
In the long run, then, the player can expect to win about 80 cents playing this game -- the odds are in her favor.</br>
For a continuous random variable, the mean is defined by the density curve of the distribution. For a symmetric density curve, such as the normal density, the mean lies at the center of the curve.
The law of large numbers states that the observed random mean from an increasingly large number of observations of a random variable will always approach the distribution mean  . That is, as the number of observations increases, the mean of these observations will become closer and closer to the true mean of the random variable. This does not imply, however, that short term averages will reflect the mean.</br>

In the above gambling example, suppose a woman plays the game five times, with the outcomes $0.00, -$1.00, $0.00, $0.00, -$1.00. She might assume, since the true mean of the random variable is $0.80, that she will win the next few games in order to "make up" for the fact that she has been losing. Unfortunately for her, this logic has no basis in probability theory. The law of large numbers does not apply for a short string of events, and her chances of winning the next game are no better than if she had won the previous game.</br>


The variance of a discrete random variable X measures the spread, or variability, of the distribution, and is defined by</br>

The standard deviation  is the square root of the variance.</br>

Example</br>

In the original gambling game above, the probability distribution was defined to be:</br>
Outcome		-$1.00	$0.00	$3.00	$5.00	</br>
Probability	  0.30	 0.40	 0.20	 0.10	</br>
The variance for this distribution, with mean = 0.8, may be calculated as follows:</br>
(-1 - 0.8)2*0.3 + (0 - 0.8)2*0.4 + (3 - 0.8)2*0.2 + (5 - 0.3)2*0.1 </br>
= (-1.8)2*0.3 + (-0.8)2*0.4 + (2.2)2*0.2 + (4.2)2*0.1 </br>
= 3.24*0.3 + 0.64*0.4 + 4.84*0.2 + 17.64*0.1 </br>
= 0.972 + 0.256 + 0.968 + 1.764 = 3.960, with standard deviation = 1.990. </br>
Since there is not a very large range of possible values, the variance is small.</br>

```{r}
# variable
fw <-rep(0,length(w))
fw[lower.w]<-(w[lower.w]-40)/625
fw[upper.w]<-(90-w[upper.w])/625
# plot
plot(w,w*fw,type="l",ylab="wf(w)")


```
Integrands for the expected value.

```{r}
plot(w,(w-65)^2*fw,type="l",ylab="(w-65)^2 f(w)")
```
Variance of the probability density function for the tmeparture example.


<h3>15.2.4 Shape, Skew and Modality</h3>

What do we look for in ???Shape???? We must be able to describe the visual impressions. Shape of distribution: Populations with the same mean and standard deviation can still have distributions with very different shapes.</p>

Symmetry</br>

bell shaped or normal</br>
uniform</br>
Skewness</br>
</br>
skewed to the right (skewed positively)</br>
skewed to the left (skewed ??? )</br>
graphs</br>

Modality - # of prominent peaks</br>
</br>
unimodal</br>
bimodal</br>

see picture here:
https://onlinecourses.science.psu.edu/stat504/sites/onlinecourses.science.psu.edu.stat504/files/lesson01/skew_graphs.gif 



