Ch2.2.2 Floating Point Numbers, Part 1

Floating Point Numbers

  • Floating point numbers provide the way around the limitations of binary integers.
  • Floating point numbers are capable of storing noninteger values, such as 2.71828182845905, 3.14159265358979, and 0.25.
  • How many significant digits shown below?

title

Floating Point Numbers

  • Floating point numbers provide the way around the limitations of binary integers.
  • Floating point numbers are capable of storing noninteger values, such as 2.71828182845905, 3.14159265358979, and 0.25.
  • How many significant digits shown below? (Ans = 5)

title

Floating Point Numbers = R Default

  • Floating point numbers can store much larger numbers.
as.integer(2^31)
[1] NA
2^31
[1] 2147483648
(2^31)*(2^31)
[1] 4.611686e+18

Floating Point Numbers: Double Precision

  • There are several standards for floating point.
  • We focus on double precision, or just double.
  • It has approximately double the storage space (64 bits) than standard floating point format (32 bits).

title

Floating Point Numbers: Double Precision

  • Compare with single precision.

title title

Floating Point Numbers: Single Precision

  • The C float data type is sometimes called single precision.
  • R can convert a number to a single precision value.
  • Underlying data type within R is still double precision.

title title

Example 1: Single Precision

  • Consider the example shown in the figure.

title title

Example 1: Single Precision

  • 8-bit exponent range split to enable positive and negative exponents.
(x <- 2^8 - 1)
[1] 255
x/2
[1] 127.5

title

1.75*2^(-120)
[1] 1.316554e-36

Scientific Notation in Base 10

  • Scientific notation always starts with nonzero digit \( d \), where \( 0 < d \leq9 \), followed by a decimal part \( f \):

title

  • In this example, we can think of the number \( x \) as

\[ \begin{align*} x & = (8 + f) \times 10^{-3} \\ f & = 0.72 \end{align*} \]

  • The number \( f \) is the fractional part, or mantissa.

Floating Point Form, Base 10

  • In general,

\[ \begin{align*} x & = (-1)^s10^n(d + f) \\ 0 & < d \leq9 \\ 0 & \leq f < 1 \end{align*} \]

title

  • In base 10, \( d = 8 \) is a digit that must be stored.
  • This is not so for binary, since \( d = 1 \).

title

Floating Point Form, Base 2 (Single)

  • For binary,

\[ x = (-1)^{s}2^{n}(1 + f) \]

  • Floating point form:

\[ x = (-1)^s2^{c-127}(1 + f) \]

title

(x <- 2^8 - 1)
[1] 255
x/2
[1] 127.5

Floating Point Form, Base 2 (Double)

  • For binary,

\[ x = (-1)^{s} 2^{n} (1 + f) \]

  • Floating point form:

\[ x = (-1)^s 2^{c-1023} (1 + f) \]

title

(x <- 2^11 - 1)
[1] 2047
x/2
[1] 1023.5

Example 2: Single Precision

  • Consider the long real form

\[ x = (0)(10100001)(01100...0) \]

  • Write in floating point form

\[ (-1)^s 2^{c-127} (1+f) \]

  • Need to find \( s \), \( f \), \( c \) and \( p = c - 127 \).

title

Example 2: Single Precision

  • Given

\[ x = (0)(10100001)(01100...0) \]

  • Identify sign indicator \( s \):
(s <- 0)
[1] 0

Example 2: Single Precision

  • Given

\[ x = (0)(10100001)(01100...0) \]

  • Find characteristic \( c \) and power \( p = c - 127 \):
(c <- 1*2^7+0*2^6+1*2^5+0*2^4+0*2^3+0*2^2+0*2^1+1*2^0)
[1] 161
(p <- c - 127)
[1] 34

Example 2: Single Precision

  • Given

\[ x = (0)(10100001)(01100...0) \]

  • Determine mantissa \( f \):
(f <- 0*(1/2)+1*(1/2)^2+1*(1/2)^3+0*(1/2)^4)
[1] 0.375

Example 2: Single Precision

  • Thus the floating point form is

\[ \begin{align*} x &= (-1)^0 2^{34} (1+0.375) \Leftarrow \mathrm{Answer} \\ & = 1.375 \times 2^{34} \end{align*} \]

  • Determine base-10 value for \( x \):
(x <- (-1)^s*2^p*(1+f))
[1] 23622320128

Example 3: Single Precision

  • Consider the long real form

\[ x = (1)(01010010)(100100...0) \]

  • Write in floating point form

\[ (-1)^s 2^{c-127} (1+f) \]

  • Need to find \( s \), \( f \), \( c \) and \( p = c - 127 \).

title

Example 3: Single Precision

  • Given

\[ x = (1)(01010010)(100100...0) \]

  • Identify sign indicator \( s \):

Example 3: Single Precision

  • Given

\[ x = (1)(01010010)(100100...0) \]

  • Identify sign indicator \( s \):
(s <- 1 )
[1] 1

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Find characteristic \( c \) and power \( p = c - 127 \):

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Find characteristic \( c \) and power \( p = c - 127 \):

\[ \]

\[ \]

  • From Example 2: \( \, x = (0)(10100001)(01100...0) \)
c <- 1*2^7+0*2^6+1*2^5+0*2^4+0*2^3+0*2^2+0*2^1+1*2^0

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Find characteristic \( c \) and power \( p = c - 127 \):

(c <- 0*2^7+1*2^6+0*2^5+1*2^4+0*2^3+0*2^2+1*2^1+0*2^0)
(p <- c - 127)

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Find characteristic \( c \) and power \( p = c - 127 \):

(c <- 0*2^7+1*2^6+0*2^5+1*2^4+0*2^3+0*2^2+1*2^1+0*2^0)
[1] 82
(p <- c - 127)
[1] -45

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Determine mantissa \( f \):

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Determine mantissa \( f \):

\[ \]

\[ \]

  • From Example 2: \( \, x = (0)(10100001)(01100...0) \)
f <- 0*(1/2)+1*(1/2)^2+1*(1/2)^3+0*(1/2)^4

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Determine mantissa \( f \):

(f <- 1*(1/2)+0*(1/2)^2+0*(1/2)^3+1*(1/2)^4)

Example 3: Single Precision

  • Given \( x = (1)(01010010)(100100...0) \)

  • Determine mantissa \( f \):

(f <- 1*(1/2)+0*(1/2)^2+0*(1/2)^3+1*(1/2)^4)
[1] 0.5625

Example 3: Single Precision

  • Thus the floating point form is

\[ \]

  • Determine base-10 value for \( x \):
(x <- (-1)^s*2^p*(1+f))

Example 3: Single Precision

  • Thus the floating point form is

\[ \begin{align*} x &= (-1)^1 2^{-45} (1+0.5625) \Leftarrow \mathrm{Answer} \\ & = - 1.5625 \times 2^{-45} \end{align*} \]

  • Determine base-10 value for \( x \):
(x <- (-1)^s*2^p*(1+f))
[1] -4.440892e-14

Example 4: Book Example

  • Our book provides this binary representation of 1000:

\[ 1000 = 0b1111101000 * 2^9 \]

  • Here, \( s=0 \), \( p = 9 \), and \( f = 1111101000 \).
  • Let's see if this is correct.

\[ x = (-1)^{s}2^{p}(1 + f) \]

Example 4: Book Example

  • With \( f = 1111101000 \), we have
(f <- 1*(1/2) + 1*(1/2)^2 + 1*(1/2)^3 + 1*(1/2)^4 + 1*(1/2)^5 + 0*(1/2)^6 + 1*(1/2)^7)
[1] 0.9765625
(1+f)*2^9
[1] 1012
  • The book is incorrect.

Example 4: Book Example Correction

  • Our book should have

\[ 1000 = 0b111101000 * 2^9 \]

  • instead of

\[ 1000 = 0b1111101000 * 2^9 \]

  • Let's check this next.

\[ x = (-1)^{s}2^{p}(1 + f) \]

Example 4: Book Example Correction

  • With \( f = 111101000 \), we have
(f <- 1*(1/2) + 1*(1/2)^2 + 1*(1/2)^3 + 1*(1/2)^4 + 0*(1/2)^5 + 1*(1/2)^6 + 0*(1/2)^7)
[1] 0.953125
(1+f)*2^9
[1] 1000
  • Thus our correction is valid.

Example 5: Book Example

  • Our book provides this binary representation of 0.75:

\[ 0.75 = 0b0001 * 2^{-1} \]

  • Here, \( s=0 \), \( p = -1 \), and \( f = 0001 \).
  • Let's see if this is correct.

\[ x = (-1)^{s}2^{p}(1 + f) \]

Example 5: Book Example

  • With \( f = 0001 \), we have
(f <- 0*(1/2) + 0*(1/2)^2 + 0*(1/2)^3 + 1*(1/2)^4 )
[1] 0.0625
(1+f)*2^(-1)
[1] 0.53125
  • The book is incorrect.

Example 5: Book Example Correction

  • Our book should have

\[ 0.75 = 0b1000 * 2^{-1} \]

  • instead of

\[ 0.75 = 0b0001 * 2^{-1} \]

  • Let's check this next.

\[ x = (-1)^{s}2^{p}(1 + f) \]

Example 5: Book Example Correction

  • With \( f = 1000 \), we have
(f <- 1*(1/2) + 0*(1/2)^2 + 0*(1/2)^3 + 0*(1/2)^4)
[1] 0.5
(1+f)*2^(-1)
[1] 0.75
  • Thus our correction is valid.

Example 5: Book Example

  • One more comment for this example from the book.

\[ 0.75 = 0b1000 * 2^{-1} \]

  • The author explains that “there is only one digit included because the initial digit is implicit. Therefore this number is \( 0b0.11 \) in binary.”

  • For the implicit initial digit, the author is probably referring to the 1 in the \( 1+f \).

  • For the \( 0b0.11 \), the author is probably referring to

\[ 1*(1/2) + 1*(1/2)^2 = 0.75 \]