Feedback: Area and Probability for density probability functions, and code to do the same using R.

MAT 325 | Probability and Statistics @ ORU

Author

E. Valderrama-Araya, Ph.D.


I receive an student in my office hours with questions about the area under the curve and its relationship with probability, so I decide to create a tutorial so all can benefit.

Bonus: I decide to include a second part, in how to use symbolic math to calculate integrals using R.


Here is what we worked in the board yesterday1

Here is the detailed tutorial:

Tutorial: Calculating Probabilities by Finding the Area Under the Curve

Introduction

In probability theory, when dealing with continuous random variables, the probability of an event occurring in a certain range corresponds to the area under the curve of the probability density function (PDF) over that range. This is because the PDF represents the likelihood of different outcomes, and integrating the PDF over a particular interval gives the cumulative probability of outcomes falling within that range.

For continuous random variables:

\[ P(a \leq X \leq b) = \int_a^b f(x) \, dx \]

Where:

  • \(P(a \leq X \leq b)\) is the probability that the random variable \(X\) takes a value between \(a\) and \(b\),
  • \(f(x)\) is the probability density function, and
  • \(\int_a^b f(x) \, dx\) is the area under the curve of \(f(x)\) between \(a\) and \(b\).

Now, let’s solve two parts of an example problem using this principle.

Problem Overview

We are given a linear probability density function:

\[ f(x) = \cases{2 - 2x& for $0 \leq x \leq 1$\\ 0,& elsewhere} \]

We will calculate the following probabilities:

  1. \(P(x \leq \frac{1}{13})\)
  2. \(P(x > 0.77)\)

Part A: Calculating \(P(x \leq \frac{1}{13})\)

This is the probability that the random variable \(x\) is less than or equal to \(\frac{1}{13}\), so we need to integrate the PDF from 0 to \(\frac{1}{13}\).

The setup is:

\[ P(x \leq \frac{1}{13}) = \int_0^{\frac{1}{13}} (2 - 2x) \, dx \]

We can break this integral into two parts:

\[ P(x \leq \frac{1}{13}) = \int_0^{\frac{1}{13}} 2 \, dx - \int_0^{\frac{1}{13}} 2x \, dx \]

Let’s calculate each term:

\[ \int_0^{\frac{1}{13}} 2 \, dx = 2x \Big|_0^{\frac{1}{13}} = 2 \times \frac{1}{13} - 2 \times 0 = \frac{2}{13} \]

For the second term:

\[ \int_0^{\frac{1}{13}} 2x \, dx = x^2 \Big|_0^{\frac{1}{13}} = \left( \frac{1}{13} \right)^2 - 0^2 = \frac{1}{169} \]

Now subtract the two results:

\[ P(x \leq \frac{1}{13}) = \frac{2}{13} - \frac{1}{169} \]

To simplify, find a common denominator:

\[ \frac{2}{13} = \frac{26}{169} \]

So:

\[ P(x \leq \frac{1}{13}) = \frac{26}{169} - \frac{1}{169} = \frac{25}{169} \]

Thus, the probability is:

\[ P(x \leq \frac{1}{13}) = \frac{25}{169} \]

Part B: Calculating \(P(x > 0.77)\)

For this part, we need to find the probability that \(x\) is greater than 0.77, which means we will integrate the PDF from \(0.77\) to 1:

\[ P(x > 0.77) = \int_{0.77}^{1} (2 - 2x) \, dx \]

As before, break this into two integrals:

\[ P(x > 0.77) = \int_{0.77}^{1} 2 \, dx - \int_{0.77}^{1} 2x \, dx \]

The first integral is straightforward:

\[ \int_{0.77}^{1} 2 \, dx = 2x \Big|_{0.77}^{1} = 2 \times 1 - 2 \times 0.77 = 2 - 1.54 = 0.46 \]

For the second term:

\[ \int_{0.77}^{1} 2x \, dx = x^2 \Big|_{0.77}^{1} = 1^2 - 0.77^2 = 1 - 0.5929 = 0.4071 \]

Now subtract the two results:

\[ P(x > 0.77) = 0.46 - 0.4071 = 0.0529 \]

Thus, the probability is:

\[ P(x > 0.77) = 0.0529 \]

Conclusion

To calculate probabilities for continuous random variables, we integrate the probability density function over the desired range, which gives us the area under the curve. In this example:

  • \(P(x \leq \frac{1}{13}) = \frac{25}{169}\)
  • \(P(x > 0.77) = 0.0529\)

Tutorial: Calculating Probabilities by Finding the Area Under the Curve Using Symbolic Mathematics in R

When dealing with continuous random variables, probabilities correspond to the area under the curve of the probability density function (PDF) over a given range. Symbolic mathematics can simplify this process by allowing you to perform exact calculations in R.

In this tutorial, we will use the Ryacas package to compute these integrals symbolically.

Step-by-Step Guide

1. Install the necessary library

First, install and load the Ryacas package, which allows you to perform symbolic computations in R, such as symbolic integration.

install.packages("Ryacas")
library(Ryacas)
2. Define the probability density function (PDF) symbolically

For this example, we have the PDF \(f(x) = 2 - 2x\), which is valid for \(0 \leq x \leq 1\). We’ll define this PDF symbolically using the ysym() function.

f <- ysym("2 - 2*x")  # Define the PDF symbolically
3. Calculate \(P(x \leq \frac{1}{13})\)

To find the probability that \(x\) is less than or equal to \(\frac{1}{13}\), we integrate the PDF from \(0\) to \(\frac{1}{13}\) symbolically.

Pa <- integrate(f, "x", 0, 1/13)  # Calculate the integral for P(x <= 1/13)
print(paste("P(x <= 1/13) =", Pa))

Here:

  • integrate() performs the symbolic integration of the function f with respect to \(x\) over the interval from \(0\) to \(\frac{1}{13}\).
  • The result is stored in Pa and then printed.
4. Calculate \(P(x > 0.77)\)

To compute the probability that \(x\) is greater than \(0.77\), we integrate the PDF from \(0.77\) to \(1\):

Pb <- integrate(f, "x", 0.77, 1)  # Calculate the integral for P(x > 0.77)
print(paste("P(x > 0.77) =", Pb))

This integrates the PDF from \(0.77\) to \(1\) and stores the result in Pb.

5. Output and Interpretation

The results from the symbolic integration give the exact probabilities. The paste() function is used to display the result with a clear message.

Full R Code Example

#install.packages("Ryacas")
library(Ryacas)

# Define the PDF symbolically
f <- ysym("2 - 2*x")

# Calculate P(x <= 1/13)
Pa <- integrate(f, "x", 0, 1/13)
print(paste("P(x <= 1/13) =", Pa))
[1] "P(x <= 1/13) = 25/169"
# Calculate P(x > 0.77)
Pb <- integrate(f, "x", 0.77, 1)
print(paste("P(x > 0.77) =", Pb))
[1] "P(x > 0.77) = 0.0529"

Why Use Symbolic Mathematics in R?

  • Exact Solutions: You get precise, analytical solutions, which can be preferable over numerical approximations.
  • Clearer Insight: Solving integrals symbolically can help you understand the underlying mathematics better.
  • Efficient for Some Problems: Symbolic integration can be more efficient for certain types of problems that involve continuous probability distributions.

This tutorial demonstrated how to set up and solve integrals to calculate probabilities by finding the area under the curve, showing how this area corresponds to the likelihood of outcomes within specified ranges. Additionally, we explored how to perform the same calculations symbolically using the Ryacas library in R, providing a powerful tool for solving integrals with exact precision and gaining deeper insight into the underlying mathematics.

Footnotes

  1. is missing the audio↩︎