Exponents and Logarithms

Some note on exponents and logarigthms for Statistics Dot Com

This is really about logarithms, but to make any sense of them we will start with exponents and go way back.

First, came sticks and stones.

Cavemen easily saw how to group stones so that if Org brought 3 stones and Ugg brought 4 and they made a pile they had 7 stones (3 + 4).

Then mathematicians came along and made it harder by abstracting away from stones and using the dreaded $ x $. So then the caveman problem becomes:

\[ (x + x + x ) + (x + x + x + x) = (x + x + x + x + x + x + x) = 7x \]

The first equals sign is putting them in a pile, the last introduces a short cut – multiplication – for addition.

Well, I don't know if cavemen like multiplication, but certainly mathematicians did. Pretty soon they tried to multiply the same thing together, perhaps starting with 7 times 7, but eventually moving on to

\[ x \text{ times } x = x \cdot x \]

So here we have some notation for multiplication. Fine, but then they wanted to multiple the same thing say 4 times. It became:

\[ x \cdot x \cdot x \cdot x \]

And now the mathematicians were happy, but not the scribes – as this was cumbersome to write. So they created a shorthand notation – the power or exponent:

\[ x^4 = x \cdot x \cdot x \cdot x \]

And that was good (for any integer $ n = 1, 2, 3, ... $). Now though, they kept multiplying and noticed a neat thing:

\[ x^3 \cdot x^4 = (x \cdot x \cdot x) \cdot (x \cdot x \cdot x \cdot x) = (x \cdot x \cdot x \cdot x \cdot x \cdot x \cdot x) = x^7 = x^{(3+4)} \]

So multiplying terms leads to adding of powers –
at least when the base is the same and the powers are positive integers.

Well that is really useful, as it saved the scribes lots of writing and makes it easy to multiply, as all we really need to do is add. So people bent over backwards to make this work:

What about division? Well, clearly
\[ (x \cdot x \cdot x) / x = (x \cdot x) \]

So $ x^3/x = x^2 $. If we want adding to work, then we must have dividing by $ x $ is like multiplying by $ x^{-1} $. And negative exponents were formed. So our main rule holds for for positive n and negative n.

What about $ n=0 $? Well we would want

\[ x^2 x^0 = x^{2+0} = x^2 \]
So dividing, we'd want $ x^0 = 0 $. Which is does if $ x $ is not 0. (Which we should have assumed.

What about square roots? Well we'd like square roots to be some power, say $ x^a $. Then we'd need

\[ x^a \cdot x^a = x \]

Or $ 2a = 1 $, so $ a = 1/2 $. Thus it is born that $ \sqrt{x}=x^{1/2} $. (There is some subtlety, in that more than one number will square to give $ x $ when $ x > 0 $. Here we take the positive one.)

Generalizing to cube roots, … we can make sense of $ x^{1/n} $ for integer values of $ n $. From there we can combine:

\[ (x^{1/n})^m = x^{m/n} \]
and we have a definition for $ x^r $ for any rational value of $ r $.

Okay. Mathematicians can use calculus (limits) to get a definition for
$ x^r $. for any positive $ x $ and any real value of $ r $ so that we have our special rule kept:

\[ x^a x^b = x^{a+b} \]

One other important rule (used above!) is when we multiply:

\[ x^a x^a x^a = x^{3a} \]

And in general
\[ (x^a)^b = x^a \cdot x^a \cdot ... \cdot x^a = x^{ab}. \]

This turns powers into products. This rule also holds for any real values of $ a $ and $ b $.

Logarithms

Okay, let's switch notation and use $ a $ for the base and $ x $ for the exponent and assume $ a > 0 $.

Then we have the function $ f(x) = a^x $ has these properties:

It is defined for all $ x $ (negative, 0, or positive).

It is increasing if $ a > 1 $ and descreasing if $ 0 < a < 1 $ and flat if $ a = 1 $.

It is continuous.

We know $ f(0) = a^0 = 1 $ and $ f(1) = a^1 = a $.

Okay, once we have a function we can compute. Say $ a=2 $ and we want to know $ a^3 $ – easy its 8. What about $ a^\pi $? Here we rely on the computer:

a <- 2
a^pi

## [1] 8.825

Nothing interesting about this number, just that it exists.

Now, let's reverse the problem. If $ a=2 $, what is $ x $ that makes $ a^x = 10 $?

Well it is more than $ \pi $ and less than 4 – but we want more, an actual answer. Clearly there is one, as look at the graph, it intersects the line $ y=10 $:

curve(2^x, 0, 4)
abline(h = 10, col = "gray")

plot of chunk unnamed-chunk-2

How to find it? Well we are solving

\[ f(x) = 10 \]

for $ x $. This requires an inverse function.

Mathematicians know one exists – $ f(x) $ is continuous and always increasing – so they gave it a name: the logarithm with base 2 and a notation $ \log_2(x) $.

That is log base 2 solves this question: what is the power of 2 so that $ 2^x $ is some given number (10 in our example).

Well, we can't get it algebraically, so here it is:

log(10, 2)

## [1] 3.322

A bit bigger than $ \pi $ as we guessed.

Now some logarithms can be guessed:

We have $ 2^3 = 8 $ so $ \log_2(8) = 3 $. (That 2 in the subscript is the base). We have $ 2^8 = 64 $ so $ \log_2(64) = 8 $.

In general if $ 2^x = y $ then $ \log_2(y) = x $.

Now 2 was just for our example, in general it could be $ a > 0 $ but not equal to 1. Then we have these properties:

$ \log_a(1) = 0 $ as $ a^0 = 1 $

$ \log_a(a) = 1 $ as $ a^1 = a $.

$ \log_a(a^x) = x $.

The function is only defined for positive values and blows up as it goes to 0

if $ a > 1 $, then $ \log_a(x) < 0 $ when $ x < 1 $ and non-negative otherwise.

Here is graph for the latter:

curve(log(x, 2), 0.1, 2)

plot of chunk unnamed-chunk-4

Notice how it blows up (going to negative infinity) at $ x $ goes to 0 from the right. Also it goes to infinity as x gets larger – but slowly.

Properties

There are a few main properties. The first I just typed:

\[ \log_a(a^x) = x \]

But there was a reason I did all that work on exponents at the beginning, as it leads to this:

\[ \log_a(mn) = \log_a(m) + \log_a(n) \]

Why? Well, lets write $ m = a^x $ and $ n = a^y $. Then

\[ \log_a(mn) = \log_a(a^x a^y) = \log_a(a^{x+y}) = x + y = \log_a(a^x) + \log_a(a^y) = \log_a(m) + \log_a(n) \]
So it is true (and also shows you that one can represent any positive number as a power of $ a $.

Logarithms turn products into sums

In words, “the log of a product is the sum of the logs.” This was why slide rules worked – the scales involved logarithms which physically added so the numbers were multiplied. We saw this property put to good use with the negative log likelihood functions. For likelihoods found by multiplying densities (using independence of the random sample), the log turns that product into sums making it much easier to take a derivative.

One other useful property is that one can relate powers of a to powers of b. Why?
Well, just write b in terms of a:

If $ b = a^x $ then what is $ x $? Well, $ \log_a(b) $ of course. So we have $ b = a^{log_a(b)} $ and so

\[ b^x = (a^{log_a(b)})^x = a^{x \log_a(b)} \]
So
\[ x = \log_b(b^x) = \log_a(b^x) = x \log_a(b) \]

That means the choice made for the base for the log is really not that important, as one can transfer to any other positive base. It can be 2, 10 or any other number that makes sense – you can figure the others out. The two most common are 10 and e.

The value of 10 for the base of the logarigthm is a useful number – basically it is the number of digits the number has, as notice

n <- 1:5
x <- 10^n
rbind(x, log(x, 10))

##   [,1] [,2] [,3]  [,4]  [,5]
## x   10  100 1000 10000 1e+05
##      1    2    3     4 5e+00

We often think this way already when trying to understand costs – 10 times as much is a step up )on a “logarithmic scale”.

Here is a less common choice, but one made historically. Astronomers use an apparent brightness scale (http://en.wikipedia.org/wiki/Apparent_magnitude) which was set so that that a step of 5 is 100 times dimmer. What is the base: a⁵ = 100, so a is the 5th root of 100 or

1000^(1/5)

With this base, a star a 50th as bright (50 times as dim) as another has a apparent magitude differing by

a <- 1000^(1/5)
log(50, a)

## [1] 2.832

Natural Log

There is a special value of a, called $ e $ which rounds to:

exp(1)

## [1] 2.718

Where does it come from? Well, in calculus we have a limit that produces it:

\[ \lim (1 + 1/n)^n \]

Notice this is a balance between two facts 1) 1 raised to any power is 1 2) a number bigger than 1 will get arbitrarily large if we raise it to a large enough power. (However as that number gets closer to 1, that power must get larger).

So the (1 + 1/n) gets just bigger than 1 as the n gets larger and it turns out that these come into balance. The answer isn't 1 or infinity as n gets big, but rather a compromise:

f <- function(n) (1 + 1/n)^n
n <- c(10, 100, 1000, 10000)
rbind(n, f(n))

##     [,1]    [,2]     [,3]      [,4]
## n 10.000 100.000 1000.000 10000.000
##    2.594   2.705    2.717     2.718

Well, you can kind of see the numbers get bigger with n and are getting close to the value of e. (By n=10,000 they agree to 3 decimal points.)

The limit above comes up in many places in mathematics so its value got the special name $ e $ after its popularizer Euler.

Logarigthms with base $ e $ are called natural logarithms. Often math books write $\latex ln$ for $ \log_e $, and computers just use log(x) instead of log(x, e).

The two functions $ f(x) = e^x $ and $ g(x) = \ln(x) $ have special properties in caculus:

\[ f'(x) = [e^x]' = e^x$ \]

Or the slope of the tangent line is answered by the function itself. This is the only continuous function satisfying the so-called differential equation $ f'(x) = f(x) $.

Whereas, the log satisfies an integral equation from week 3:

\[ \int_1^t \frac{1}{x} dx = \ln(t) \]