Probability distribution

for loop

Let’s print the following: Today is [day of the week] where [day of the week] is equal to Monday, Tuesday, and so on.

print(paste("Today is", "Monday"))

## [1] "Today is Monday"

print(paste("Today is", "Tuesday"))

## [1] "Today is Tuesday"

print(paste("Today is", "Wednesday"))

## [1] "Today is Wednesday"

print(paste("Today is", "Thursday"))

## [1] "Today is Thursday"

print(paste("Today is", "Friday"))

## [1] "Today is Friday"

print(paste("Today is", "Saturday"))

## [1] "Today is Saturday"

print(paste("Today is", "Sunday"))

## [1] "Today is Sunday"

However, this violates the DRY principle, known in every programming language: Don’t Repeat Yourself we can use a for loop

for (day in c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")){
  print(paste("Today is", day))
}

## [1] "Today is Monday"
## [1] "Today is Tuesday"
## [1] "Today is Wednesday"
## [1] "Today is Thursday"
## [1] "Today is Friday"
## [1] "Today is Saturday"
## [1] "Today is Sunday"

Using the index i

for (i in c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")){
  print(paste("Today is", i))
}

## [1] "Today is Monday"
## [1] "Today is Tuesday"
## [1] "Today is Wednesday"
## [1] "Today is Thursday"
## [1] "Today is Friday"
## [1] "Today is Saturday"
## [1] "Today is Sunday"

Individual work

Now, you write a for loop. Using a for() loop, adds 1 to each number of this sequence c(7, 4, 3, 8, 9, 25), Store the result in an object y, then print the first 4 elements of y

x <- c(7, 4, 3, 8, 9, 25)
for(i in x){
 y <- x + 1
}
y[1:4]

## [1] 8 5 4 9

Using a for loop, print integers from 0 to 100, with increments of 5

for(i in seq(0, 100, 5)){
  print(i)
}

## [1] 0
## [1] 5
## [1] 10
## [1] 15
## [1] 20
## [1] 25
## [1] 30
## [1] 35
## [1] 40
## [1] 45
## [1] 50
## [1] 55
## [1] 60
## [1] 65
## [1] 70
## [1] 75
## [1] 80
## [1] 85
## [1] 90
## [1] 95
## [1] 100

Calculate mean and variance, using probablilities

Remember the example from class? Calculate the expected value, aka the mean

y <- c(0, 1, 2, 3, 4)
p <- c(0.1, 0.35, 0.07, 0.36, 0.12)
  
sum(y*p)

## [1] 2.05

# alternatively 
weighted.mean(y, p)

## [1] 2.05

Calculate the variance

ev <- sum(y*p)
v <- sum((y - ev)^2)

Probabilities

Assume that the test scores of a college entrance exam fits a normal distribution. Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring 84 or more in the exam?

look up ?pnorm

We apply the function pnorm of the normal distribution with mean 72 and standard deviation 15.2. Since we are looking for the percentage of students scoring higher than 84, we are interested in the upper tail of the normal distribution.

pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)  # notice the use of lower.tail=FALSE.

## [1] 0.2149176

right - upper, left - lower

lower.tail logical; if TRUE (default), probabilities are P[X greater then or equal to x] otherwise, P[X > x].

 pnorm(84, mean=72, sd=15.2)

## [1] 0.7850824

Independent work

Suppose IQ scores are normally distributed with mean 100 and standard deviation 15. What is the probability that a person has an IQ score higher than 107?

Let’s go through all functions dnorm, rnorm, pnorm, qnorm

Keep in mind that a standard normal distribution has the mean of 0 and st.dev of 1. If we don’t specify otherwise in the function, that’s what the functions will use as default

dnorm()

returns the height of the probability distribution at each point

 dnorm(0)

## [1] 0.3989423

 dnorm(0, mean = 3, sd = 1.3)

## [1] 0.02140727

pnorm()

for looking up probabilities. It is also know as Cumulative Distribution Function. It computes the probability that a normally distributed random number will be less than that number.

 pnorm(0)

## [1] 0.5

 pnorm(1)

## [1] 0.8413447

 pnorm(0, mean = 3, sd = 1.3)

## [1] 0.01050813

To get the probability that a number is larger than the given number, we can use the lower.tail argument

 pnorm(1)

## [1] 0.8413447

 pnorm(1,lower.tail=FALSE)

## [1] 0.1586553

rnorm()

for generating samples of normally distributed variables

 rnorm(5)

## [1] -1.8583180 -0.9333285  0.3557916  2.4206877 -1.5596239

 y <- rnorm(200, mean = 2, sd = 4)
 hist(y)

Independent work

Create this histogram in ggplot. Add a title and make the bars purple

qnorm()

for the quantile function. is the inverse of pnorm, meaning that we give it a probality, and it returns the number whose cumulative distribution matches the probability

qnorm(0.5)

## [1] 0

qnorm(0.8413)

## [1] 0.9998151

qnorm(0.25,mean = 2, sd = 2)

## [1] 0.6510205

Go back to the slides for more exercises! :)

The end

The end!

Data Analysis in Social Science - lab3

Raluca Popp

6 February 2018

Probability distribution

for loop

Individual work

Calculate mean and variance, using probablilities

Probabilities

Independent work

Let’s go through all functions dnorm, rnorm, pnorm, qnorm

dnorm()

pnorm()

rnorm()

Independent work

qnorm()

The end