September 8, 2016

Importing data

data <- read.csv(
  "../data/census2000.csv") 
head(data)
##            state puma educ lweekinc exper expersq
## 1 South Carolina  100   13 6.471038    37    1369
## 2   Pennsylvania 2502   13 6.087648    14     196
## 3        Georgia 1800   12 7.034049    21     441
## 4         Nevada  100   13 6.694181    12     144
## 5        Arizona  206   16 7.338538    18     324
## 6     California 1601   12 6.422247    15     225

Quadratic formulas

model1 <- lm(lweekinc ~ exper + expersq + educ, data = data)
results <- as.data.frame(summary(model1)$coefficients)
# colnames(results)[1] <- "Estimated Slope"
# For the next slide I will rename the first column to be more 
# descriptive. Normally it is labelled "Estimate"
print(results)
##                  Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept)  4.5160614126 3.859468e-02 117.01252  0.000000e+00
## exper        0.0437227743 1.749953e-03  24.98511 2.329137e-136
## expersq     -0.0007428117 3.477016e-05 -21.36348 1.690037e-100
## educ         0.1190963804 2.306519e-03  51.63468  0.000000e+00

Quadratic formulas

##             Estimated Slope Std. Error t value Pr(>|t|)
## (Intercept)           4.516      0.039 117.013        0
## exper                 0.044      0.002  24.985        0
## expersq              -0.001      0.000 -21.363        0
## educ                  0.119      0.002  51.635        0

In this case, holding constant education, experience would increase wages by about 4.4% for each additional year, but each squared year reduces wages by about 0.07%. After about 30 years, the effect of the quadratic term will outweigh the linear term and additional experience will have a negative overall effect on expected wages…

What our data looks like

We'll have to figure out what that strip around 8.8 is before using that dataset for making any important decisions.

Logarithms

\[ log_e (x) = y \iff e^y = x \]

x <- 64
y <- log(x) # base e. Run "?log" for other options.
print(y)
## [1] 4.158883
print(exp(y)) # e^y
## [1] 64

Exponential function

\[log(y) = \beta_0 + \beta_1 x \iff y = e^{\beta_0 + \beta_1 x}\]

The "Anti-log" function is when we raise our base (usually \(e\)) to the power of something. We can use the R function exp().

cat("exp(1) is ",exp(1)," and exp(0) is ",exp(0),sep="")
## exp(1) is 2.718282 and exp(0) is 1
print(exp(log(pi)))
## [1] 3.141593
print(log(exp(pi)))
## [1] 3.141593

Logarithms: diminishing marginal returns

Logarithms: mathematical properties

\[ log(x) + log(y) = log(x y)\] \[ log(x) - log(y) = log\left(\frac{x}{y}\right)\] \[ log(x^c) = c \times log(x)\]

… as long as all the \(x\)'s and \(y\)'s are positive.

Logarithms: mathematical properties

\[ log(x_1) - log(x_0) \approx \frac{(x_1-x_0)}{x_0} \]

When we use logarithms in practice we interpret changes in percentages. For example, instead of:

  • "Adding 1 more unit of force to the gas peddle increased the car's speed by 5mph."

We might say:

  • "Adding 1 more unit of force to the gas peddle increased the car's speed by 3%." ("semi elasticity")
  • "Adding 1% more force to the gas peddle increased the car's speed by 0.23%." ("elasticity")

Logarithms: elasticity

Because Logarithms are approximating percentage changes, we can use them for estimating elasticities.

\[log(q) = 4.7 - 1.25 log(p) \]

In this case the elasticity of demand is -1.25… a 1% increase in price (\(p\)) is associated with a 1.25% fall in quantity demanded (\(q\)).

How might a business use this?

How might a business use this?

If you're the head economist for a local grocery chain, you have access to an enormous database. What information is in that database?

How might a business use this?

And what other data can you collect other ways (e.g. from the Census Bureau or a private firm)?

And what can you do with this data?

Some terminology

Experiment Some procedure that can be repeated as many times as necessary, with well defined outcomes.

  • Flipping a coin 10 times
  • Evaluating whether drug XYZ works better than a placebo
  • Giving residents of a poor village cash or medical supplies

Random Variable Some aspect of our model that can take on different values (i.e. it can vary), where we don't know the outcome ahead of time (i.e. it's random).

  • Coin face can be either heads (or H, or 1, etc.) or tails (T,0)
  • Blood pressure after taking drug
  • Mortality, reported school attendance, employment status, etc.

Experiment

Let's flip a coin 10 times and count how many heads we get.

# we'll treat "1" as heads and "0" as tails
print(rbinom(1,1,0.5)) # flip the coin once, one time
## [1] 0
print(rbinom(10,1,0.5)) # flip the coin once, ten times
##  [1] 0 1 1 1 0 1 0 1 1 0
print(rbinom(1,10,0.5)) # flip the coin ten times, once
## [1] 5

Let's see that again

Re-running our experiment 50 times, we get this mix of heads:

samples <- rbinom(50000,10,0.5)
st <- table(samples)
print(st)
## samples
##     0     1     2     3     4     5     6     7     8     9    10 
##    64   496  2209  5847 10210 12329 10321  5819  2212   454    39

Let's see that again

Re-running our experiment 50 times, we get this mix of heads:

samples <- rbinom(50000,10,0.5)
ggplot(data.frame(x=samples),aes(x)) + geom_histogram(binwidth=0.5)

Now mathier

Let \(X_i\) be the number of heads from 10 flips of a coin for trial \(i\).

Then \(x_i \in {0,1,2,3,...,10}\).

Because it can only take on whole numbers, it's a discrete random variable.

Bernoulli/Binary variables

Each flip can also be thought of as a discrete random variable. If a variable has two possible outcomes (heads/tails,TRUE/FALSE,1/0,yes/no) then it's a Bernoulli or Binary random variable.

The Bernoullis were a Dutch-Swiss family of mathematical badasses. They're responsible for a lot of foundational work in probability and other areas of mathematics. We're dealing with the Bernoulli distribution that governs coin tosses.

Bernoulli/Binary variables

Variable \(X\) can take the value 1 or 0. If

\[P(X=1) = p\]

then \[P(X=0) = 1 - p\]

More generally…

If Variable \(X\) can take any value in the set \({x_1,x_2,...,x_k}\) then we can have a vector of probabilities \(P\) where

\[p_i = P(X=x_i)\] for \(i \in {1,2,3,...,k}\).

Each \(p_i\) has to have a value between 0 and 1, and \(\sum_i p_i = 1\). In other words, the various \(x_i\) outcomes are mutually exclusive (otherwise we could get \(\sum P>1\) ) and must cover all possible outcomes (otherwise we could got \(\sum P<1\)).

Probability Density Function (pdf)

The pdf describes the relative probabilities of the different possible outcomes, so that

\[f(x_i) = p_i\]

where \(p_i=0\) for any impossible \(x_i\) (e.g. flipping 12 heads in 10 flips).

Cumulative Density Function (cdf)

The cdf describes how likely it is for one of a growing list of outcomes to happen. For our coin flipping example:

samples <- rbinom(1000000,10,0.5)
pdf.hat <- table(samples)/length(samples) # observed proportions of outcomes
cdf.hat <- cumsum(pdf.hat) # cumulative sum of estimated pdf
print(rbind(pdf.hat,cdf.hat))
##                0        1        2        3        4        5        6
## pdf.hat 0.000958 0.009731 0.043520 0.117156 0.204745 0.246534 0.205916
## cdf.hat 0.000958 0.010689 0.054209 0.171365 0.376110 0.622644 0.828560
##                7        8        9       10
## pdf.hat 0.116694 0.043948 0.009797 0.001001
## cdf.hat 0.945254 0.989202 0.998999 1.000000

How do we get \(P(3<X<8)\)?

cdf's

\[F(x) \equiv P(X \leq x)\]

To find \(P(3<X<8)\) we can subtract \(F(3)\) from \(F(7)\).

Because we're using \(<\) instead of \(\leq\) we use \(F(7)\) instead of \(F(8)\)

Continuous functions

Mathematically speaking, the probability of \(X=x\) for any specific \(x\) is zero because there are infinite \(x\)'s in a continuous distibrution.

In practice everything is discrete and we have limited decimal places. Many people might weigh 150.05 lbs but very few will weigh exactly 150.05000000000000000000000000000000000000000000134 lbs. Even fewer have access to a scale that precise.

Continuous functions

For continuous functions, the pdf can't give you the probability of a specific outcome, only the probability of a range of possible outcomes. It's easier to work with the cdf.

\[F(x) \equiv P(X \leq x)\]

Useful properties of probability

Where \(F(x) \equiv P(X \leq x)\):

\[P(X > x) = 1 - F(x)\]

and,

\[P(a < X \leq b) = F(b) - F(a)\] when \(a<b\)

Useful property of continuous distributions

Since \(P(X=x)=0\),

\[P(X \leq x) = P(X < x)\]

Joint distributions and independence

A joint probability density function describes the probabilities of outcomes involving multiple variables:

\[f_{X,Y}(x,y) = P(X=x,Y=y)\]

Random variables \(X\) and \(Y\) are independent iff ("if and only if"):

\[f_{X,Y}(x,y) = f_X(x) f_Y(y)\]