Chapter 4: PROBABILITY DISTRIBUTIONS

1 Probability Distributions

This is an document with the examples and exercises

1.1 Discrete Distributions

1.1.1 Bernoulli Distributions

There are only two possible outcomes, called a success or failure.
The probability of occurrence of a success (or a failure) is constant.
The trials are independent, that is, the outcome of a trial does not depend on the outcome of another trial.

\[ p_X(x) = \Pr[X = x; p] = p^{x}(1-p)^{1-x}, \quad x \in \{0,1\} \]

$p$ denotes the probability of success, with $0 \leq p \leq 1$. \

The Mean an Variance are:

\[ E[X] = p \\ Var[X] = (1-p)p \]

1.1.1.1 Example 1:

Let $p = 0.3$.

For $x = 0$:
\[ p^0 (1-p)^{1} = (1 - 0.3) = 0.7 \]
For $x = 1$:
\[ p^1 (1-p)^{0} = (0.3)(1) = 0.3 \]

# Define the probability of success
p <- 0.3  

# Possible outcomes
x <- c(0, 1)

# Compute probabilities using dbinom
prob <- dbinom(x, size = 1, prob = p)

# Show results
print(data.frame(x, prob))

##   x prob
## 1 0  0.7
## 2 1  0.3

# Plot with vertical lines
plot(x, prob,
     type = "h",        # "h" = draw vertical lines
     lwd = 2,           # line width
     col = "black",   # line color
     main = paste("Bernoulli Distribution (p =", p, ")"),
     xlab = "x",
     ylab = "P(X = x)",
     xaxt = "n")        # suppress default x-axis

# Add custom x-axis labels
axis(1, at = x, labels = x)

# Add points at the top of each line
points(x, prob, pch = 19, col = "black")

It is used dbinom() since the Bernoulli distribution is a unique case of the Binomial distribution.

dbinom(x, size, prob)

x: possible results (0 = failure, 1 = success)
size (n): number of trials (for Bernoulli, size = 1)
prob: probability of success in each trial

1.1.2 Binomial Distribution

This determines whether an outcome of a trial is a success or failure.
The outcomes are independent of one another with a constant probability $p$.

The PDF is given by:

\[ p_X(x) = \Pr[X = x; n, p] = \begin{cases} \binom{n}{x} p^{x}(1-p)^{n-x}, & x = 0,1,\dots,n,\; 0 \leq p \leq 1 \\ 0, & \text{otherwise} \end{cases} \]

x: Number of success.
n: Number of trials.
p: Probability of success.

The CDF is given by:

\[ F_X(x) = \Pr[X \leq x] = \sum_{k=0}^{x} \binom{n}{k} p^{k} (1-p)^{n-k} \]

A series of Bernoulli trials is made, each of which has only one of two possible outcomes: a success or a failure.
The trials are conducted under the same conditions and the probability $p$ of a success is constant.
The number of trials $n$ is fixed.
The outcomes of the trials are independent.
The random variable $X$ is the total number of successes in $n$ trials, and the order in which the events occur is immaterial.

The Mean and Variance are:

\[ E[X] = np \\ Var[X] = np(1-p) \]

1.1.2.1 Example 1 - Water pollution

Take $n$ water samples to determine whether a particular pollutant is detectable or not and observe that there are $x$ success in all the trials.

Let: $n = 4, x = 1, p = 0.8$

\[ p_X(1) = \Pr[X = 1; 4, 0.8] = \binom{4}{1} 0.8^{1}(1-0.8)^{4-1} \]

# Define the number o success
x <- c(0,1,2,3,4)

# Define the number of trials
n <- 4

# Define the probability of success
p <- 0.8

# PDF
# Compute probabilities using dbinom
prob <- dbinom(x, size = n, prob = p)
prob

## [1] 0.0016 0.0256 0.1536 0.4096 0.4096

# CDF
# Compute probabilities using pbinom
probc <- pbinom(x, size = n, prob = p)
probc

## [1] 0.0016 0.0272 0.1808 0.5904 1.0000

# Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("Binomial PDF (p =", p, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("Binomial CDF (p =", p, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.2.2 Example 2 - Protective sea embankment

To counteract the effects of erosion and damage caused by sea waves, an embankment wall is built alongside a railway line. From recorded data, the annual maximum wave height exceeds that of the embankment, on average, once in eight years. What is the probability that the embankment will be over-topped at least once during the next 10 years?

Using the PDF

We need to sum the probabilities of 1,2,..,10 to obtain the probability of at least 1 flood.

The probability is given by $P(X \geq 1) = 1 - P(X = 0)$

Then $p = 1/8$ , $n = 10$ , $x = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }$

# Define the number o success
x <- c(1:10)

# Define the number of trials
n <- 10

# Define the probability of success
p <- 0.125

# Using dbinom 
prob = dbinom(x,size = n, prob=p)
prob

##  [1] 3.758223e-01 2.416000e-01 9.203810e-02 2.300953e-02 3.944490e-03
##  [6] 4.695822e-04 3.833324e-05 2.053566e-06 6.519258e-08 9.313226e-10

# Sum of probabilities to obtain
sumprob <-sum(prob)
sumprob

## [1] 0.7369244

Using the CDF

We only need to find the probability of 0 flood in the cdf, and then to 1 rest this probability.

Then $p = 1/8$ , $n = 10$ , $x = {1}$

# Define the number o success
x <- 0

# Define the number of trials
n <- 10

# Define the probability of success
p <- 0.125

# Using pbinom 
prob = 1 - pbinom(x,size = n, prob=p)
prob

## [1] 0.7369244

1.1.2.3 Example 3 Dam design

Determine the return period that should be used in a design for a small dam so that the design flood is exceeded with a probability of not more than .05 during a 50-year economic time horizon.

We need to find the return period.

Let $T = 1/p$

# Define the probability
p = 0.05

#Calculate the return period
T = 1/p
T

## [1] 20

1.1.2.4 Example 4 Bridge design

A bridge is to be constructed over a river. The design criterion is that a flood should rise above the high-level marks on the piers not more than once in 25 years with a probability not exceeding .1. What return period should be used in the flood design?

We need the bigger return period.

Let $T1 = 25$ and $T2 = 1/.1$

# Define the first return period
T1 = 25

# Define the second return period
T2 = 1/.1

# Compare
T1 > T2

## [1] TRUE

1.1.2.5 Example 5 Revision of dam design

In a situation similar to that of Problem 4.3, supposing the engineer adopts a 100-year design period, determine (a) the probability that the design flood level is not exceeded during a 100-year period (b) the probability that the design flood is exceeded just after the tenth year but not during the first 10 years.

In (a) we use a binomial with $x = 0$, $n = 100$ and $p = 1/100$

# Define number of years
n = 100

# Define the number of floods
x = 0

# Define the probability
p = 1/100

# PDF
# Determine the probability with 1 - dbinom
prob = 1- dbinom(x, size=n,prob=p)
prob

## [1] 0.6339677

In (b) we use the geometric, because it is the probability of the first succes.

Let $x=11$ and $p=1/100$

# Define the number of year at succes
x = 11

# Define the proability
p = 1/100

# PDF
# Determine the probability with dgeom
prob = dgeom(x, prob = p)
prob

## [1] 0.008953383

1.1.2.6 Example 6 Frequent flooding

Calculate the probability of having two 10-year flows in a 5-year period assuming that the events are independent and identically distributed.

We use the binomial, with $n = 5$ , $x = 2$ and $p = 1/10$

# Define the number of succes
x = c(0:5)

# Define the number of years
n = 5

# Define the probability
p = 1/10

# PDf
# Determine the probability with dbimon
prob = dbinom(x,size=n,prob=p)
prob[3]

## [1] 0.0729

# CDF
# Determine the probability with dbimon

probc = pbinom(x,size=n,prob=p)

# Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("Binomial PDF (p =", p, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("Binomial CDF (p =", p, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.3 Geometric Distribution

It is used to determine the probability of getting the first success on trial #x

The PDF is:

\[ p_X(x) = Pr[X = x; p] = (1-p)^{x-1} \, p \]

The CDF is:

\[ F_x(x) = 1-(1-p)^{x} \] The Mean and Variance are:

\[ E[X] = \frac{1}{p} \\ Var[X] = \frac{1-p}{p^2} \] #### Example 1 - Traffic lights

Coming home from work, you always seem to hit every light. You calculate the odds of making it through a light to be 0.2. How many lights can you expect to hit before making it through one? With what std. dev.? What’s the prob. of the 3rd light being the first one that is green?

Let \[ p = 0.2\]

# Define the first success
x = 3

# Define the probability of success
p = 0.2

# Calculate the expected value
mean = 1/p
mean

## [1] 5

# Calculate the dev 
devstd = sqrt((1-p)/p^2)
devstd

## [1] 4.472136

# Calculate the probability with dgeom
prob = dgeom(x-1,prob=p)
prob

## [1] 0.128

We use $x-1$ because dgeom computes the probability of observing $x-1$ ailures before the first success. In other words, if the first success occurs on the $x-th$ trial, then the number of failures is $x-1$

1.1.3.1 Example 2 Delivery of treatment plants for a water supply system.

A company has bid to supply standardized treatment plants for a rural water supply system in a region, having quoted a low price for the job. However the supervising engineer has estimated from previous experience that 10 percent of plants delivered are defective in some way. If five items are required, determine the minimum number of plants to be ordered to be 95 percent sure that a sufficient number of nondefective plants are delivered.

It is assumed that the delivery of a plant is an independent trial and any fault that may occur in one plant is not related to possible faults in other plants.

The orobability of success is $p = 1-.1 = .9$

# Define the probability of success
p = 1-.1

# Define the number of plants
x = c(5:9)

# Estimate the probability with dgeom
prob = dgeom(x, prob = p)
prob

## [1] 9e-06 9e-07 9e-08 9e-09 9e-10

1.1.4 Posisson Distribution

Suppose that we are counting the number of occurrences of an event in a given unit of time, distance, area o volume.

The PDF is: \[ P(X = x) = \frac{\nu^x e^{-\nu}}{x!} \] And note that $\nu = np$

1.1.4.1 Example 1 - Radioactive decay

One nanogram of Plutonium-239 will have an average of 2.3 radioactive decays per second, and the number of decays will follow a Poisson distribution.

What is the probability that in a 2 second period there are exactly 3 radioactive decays?

Let $\nu = 2.3*2$ and $X = 3$

# Define the number of occurrences
x = 3

# Define the expected number of success
nu = 2.3*2

# Calculate trhe probability with dpois
prob = dpois(x, lambda = nu)
prob

## [1] 0.1630676

What is the probability there are no more than 3 radioactive decays?

We need to sum the probability of $ X ={0,1,2,3}$

# Define the number of occurrences
x = c(0:12)

# Define the expected number of success
nu = 2.3*2

# Calculate trhe probability with dpois
prob = sum(dpois(x, lambda = nu))
prob

## [1] 0.999021

Plot the CDF and the CDF

# Define the number of occurrences
x = c(0:12)

# Define the expected number of success
nu = 2.3*2

# PDF
# Calculate trhe probability with dpois 
prob = dpois(x, lambda = nu)

# CDF
# Calculate the cumulative probability whit ppois
probc = ppois(x, lambda = nu)

 # Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.4.2 Example 2 Atmospheric pollution

Dust particles in the atmosphere in a certain location can cause health problems. The count of dust particles for 100 air samples was made and register in the following table.

# Table in R
dust_table <- data.frame(
  particles = c(0, 1, 2, 3, 4, 6),
  sample1   = c(13, 24, 30, 18, 7, 8),
  sample2   = c(12, 26, 27, 19, 10, 6)
)
dust_table

##   particles sample1 sample2
## 1         0      13      12
## 2         1      24      26
## 3         2      30      27
## 4         3      18      19
## 5         4       7      10
## 6         6       8       6

# Calculate the estimated mean
mean <- sum(dust_table$particles * dust_table$sample1)/100
mean

## [1] 2.14

# mean is took as nu
# Define the number of occurrences
x <- c(1:6)

# Define nu
nu  = mean

# PDF
# Calculate the probability with dpois 
prob = dpois(x, lambda = nu)

# CDF
# Calculate the cumulative probability whit ppois
probc = ppois(x, lambda = nu)

 # Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.4.3 Example 3 Reliability of machinery at a water treatment plant

All of the pumps at a water treatment plant have been made to the same specifications by a single manufacturer. From tests made over four-week periods, it has been ascertained that there are on average two breakdowns during each period. A new plant manager assumes that the problem is not serious if there are no more than four breakdowns over a period of four weeks. What is the probability of such an occurrence?

It is assumed that the failures occur randomly in time, the occurrences are independent, and the rate of failure is constant. Thus the Poisson model is applicable with parameter nu = 2

To determine the probability of maximun four breakdowns, it is necesary to calculate the probability of $x={0,1,2,3,4}$ and sum this

# Define nu
nu = 2

# Define the number of breakdowns
x = c(0:4)

# Using PDF
# Calculate the probability with dpois 
prob = dpois(x, lambda = nu)
prob1 = sum(prob)
prob1

## [1] 0.947347

# Using CDF
# Calculate the cumulative probability whit ppois
probc = ppois(x, lambda = nu)
prob2 = probc[5]
prob2

## [1] 0.947347

 # Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.4.4 Example 4 Closure of causeway caused by high flows.

High flows result in the closure of a causeway. From past records, the road has been closed for this reason on 10 days during a 20-year period. An adjoining village, therefore, is concerned about the closure of the causeway because it provides the only access. The villagers assume that the probability of a closure of the road for more than one day during a year is less than 0.10. Is this correct?

We determine the probability of 0 and 1 high flows, adn to 1 we rest this, beacuase the objetive is the probability for more than one day. Let $nu=10/20$ and $x={0,1}

# Define nu
nu = 10/20

# Define the number of High Flows
x = c(0:1)

# Using PDF
# Calculate the probability with dpois 
prob = dpois(x, lambda = nu)
prob1 = 1- sum(prob)
prob1

## [1] 0.09020401

# Using CDF
# Calculate the cumulative probability whit ppois
probc = ppois(x, lambda = nu)
prob2 = 1- probc[2]
prob2

## [1] 0.09020401

 # Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.4.5 Example 5 Strength of timber.

From past data, an engineer has estimated a probability of $p =0.1$ that timber delivered at a construction site from a particular source is below specification. If 150 joists of timber are necessary for a particular construction job, determine the minimum number that should be ordered so that the chance of not having the required number of suitable joists is less than 10 percent.

Selecting suitable timber for the construction work constitutes Bernoulli trials. Let $150 + x$ joists be ordered to meet the required stipulation, where $x$ is the number of defective joists. By using the Poisson approximation with $n = 150 + x$ and $ p = 0.1$ the parameter $nu = (150 + x)p = 150p = 1.5$

We must determine the number x with less than 10% of below specification.

# Define the probability
p = .01

# Define the number of events
n = 150

# Define nu
nu = n*p

# Define the number of High Flows
x = c(0:8)

# Using CDF
# Calculate the probability with dpois 
prob = ppois(x, lambda = nu)
prob

## [1] 0.2231302 0.5578254 0.8088468 0.9343575 0.9814241 0.9955440 0.9990740
## [8] 0.9998304 0.9999723

# We choose 3 as the number x

# Using PDF
# Calculate the probability with dpois 
prob = dpois(x, lambda = nu)

# Using CDF
# Calculate the cumulative probability whit ppois
probc = ppois(x, lambda = nu)

 # Set layout: 1 row, 2 columns
par(mfrow = c(1, 2))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

# --- Right: CDF ---
plot(x, probc,
     type = "s",        # step function for CDF
     lwd = 2,
     col = "black",
     main = paste("POISSON PDF (nu =", nu, ")"),
     xlab = "x",
     ylab = "P(X ≤ x)")
points(x, probc, pch = 19, col = "black")

1.1.4.6 Example 6 Floorboards

Floorboards supplied by a contractor have some imperfections. A builder decides that two imperfections per $40m^2$ is acceptable. Is there at least a 95 percent chance of meeting such requirements if, from previous experience with the same material, an average of one imperfection per $65m^2$ has been found?

We asume that $ nu = 40/60$, and that $x={0,1,2}$ for this data has a mean of 1.

# Define nu
nu = 40/65

# Define the number of erros
x = c(0,1,2)

# Using PDF 
# Estimate the probability with dpois
prob = sum(dpois(x,lambda = nu))
prob

## [1] 0.9753377

# Using the CDF
# Estimate the probability with ppois
probc = ppois(2,lambda=nu)
probc

## [1] 0.9753377

1.1.4.7 Example 7 Vehicle Count

The following count is made of the number of vehicles that pass an observation point every 10 minutes for one hour. What counts are expected theoretically if the distribution is Poisson?

# Table in R
vehicle <- data.frame(
  count = c(0, 1, 2, 3, 4,5, 6),
  frequency = c(220, 94, 23, 11, 4, 2, 1)
)
vehicle

##   count frequency
## 1     0       220
## 2     1        94
## 3     2        23
## 4     3        11
## 5     4         4
## 6     5         2
## 7     6         1

# Calculate the estimated mean
mean <- sum(vehicle$count * vehicle$frequency)/sum(vehicle$frequency)
mean

## [1] 0.5774648

# PDF
# Calculate the frequency
frequency2 = dpois(vehicle$count, lambda = mean)*sum(vehicle$frequency)
frequency2

## [1] 199.26846796 115.07052375  33.22458784   6.39534320   0.92327138
## [6]   0.10663134   0.01026264

# Add theoretical frequency to the data frame
vehicle$frequency2 <- frequency2

# Plot observed frequencies
plot(vehicle$count, vehicle$frequency,
     type = "b", pch = 19, col = "blue",
     xlab = "Count",
     ylab = "Frequency",
     main = "Observed vs Poisson Expected Frequencies",
     ylim = c(0, max(c(vehicle$frequency, vehicle$frequency2))))

# Add theoretical frequencies
lines(vehicle$count, vehicle$frequency2,
      type = "b", pch = 17, col = "red")

# Add legend
legend("topright",
       legend = c("Observed", "Poisson (Expected)"),
       col = c("blue", "red"),
       pch = c(19, 17),
       lty = 1)

1.1.4.8 Example 8 Machine Failure

The probability that a certain make of piling machine breaks down is .00002 per 100 m of piles made. What is the probability of having one breakdown after 1000 m and before 1010 m of piles?

Lambda is $λ = 0.00002/100$, $x = 1$

# Define Lambda
lambda = 0.00002/100

# Define number of ocurrences
x = 1

#PDF
# Estimate the probability between 1000m and 1010m
prob = dpois(x,lambda = lambda*1010) - dpois(x,lambda = lambda*1000) 
prob

## [1] 1.999196e-06

1.1.4.9 Example 9First-time failure.

Taking the probability given in Problem 4.8 as the probability of failure during a week’s work and an average weekly production of 1000 m of piles, determine the probability of failure for the first time after three months. How does the first-time probability of failure vary with time?

The probability is $p = 0.00002/100*1000$, and $x = 13$

# Define the number of months
x = c(0:13)

# Define the probability
p = 0.00002

#PDF
# Estimate the probability
prob = dgeom(x,prob = prob)
prob[13]

## [1] 1.999148e-06

 # Set layout: 1 row, 2 columns
par(mfrow = c(1, 1))

# --- Left: PDF ---
plot(x, prob,
     type = "h",        # vertical lines
     lwd = 2,           # line width
     col = "black",
     main = paste("GEOMETRY PDF (nu =", p, ")"),
     xlab = "x",
     ylab = "P(X = x)")
points(x, prob, pch = 19, col = "black")

1.1.5 Truncated poisson process

When the case $X = 0$ is not an acceptable happening or is not of interest

\[ P(X = x \mid X \geq 1) = \frac{\nu^x}{x! \, (e^\nu - 1)} \]

And the Mean and Variance are:

\[ E[X] = \frac{\nu e^{\nu}}{e^{\nu} - 1} \]

\[ \mathrm{Var}[X] = \frac{\nu e^{2\nu} - \nu(\nu + 1)e^{\nu}}{(e^{\nu} - 1)^{2}} \]

1.1.5.1 Example 1

hurricanes <- data.frame(
  size = c(1:6),
  frequency = c(17, 15, 10,4,2,2)
)
hurricanes

##   size frequency
## 1    1        17
## 2    2        15
## 3    3        10
## 4    4         4
## 5    5         2
## 6    6         2

# determine the mean
mean = sum(hurricanes$size * hurricanes$frequency)/sum(hurricanes$frequency)
mean

## [1] 2.3

# Calculate the nu with the  Expected value
# Define the function for E[X] - 2.3
f <- function(nu) {
  (nu * exp(nu)) / (exp(nu) - 1) - 2.3
}

# Solve numerically with uniroot
nu_solution <- uniroot(f, interval = c(0.1, 10))$root
nu_solution

## [1] 1.983556

# Calculate the Theorical frequency
# PMF of a truncated Poisson (truncated at 0)
dtruncpois <- function(x, lambda) {
  ifelse(x >= 1,
         dpois(x, lambda) / (1 - dpois(0, lambda)),
         0)
}
prob = dtruncpois(hurricanes$size, lambda = nu_solution)* sum(hurricanes$frequency)
prob

## [1] 15.821504 15.691420 10.374938  5.144818  2.041007  0.674742

# Add theoretical frequency to the data frame
hurricanes$prob <- prob

# Plot observed frequencies 
plot(hurricanes$size, hurricanes$frequency,
     type = "b", pch = 19, col = "blue",
     xlab = "Count",
     ylab = "Frequency",
     main = "Observed vs Poisson Truncated Expected Frequencies",
     ylim = c(0, max(c(hurricanes$frequency, hurricanes$prob))))

# Add theoretical frequencies
lines(hurricanes$size, hurricanes$prob,
      type = "b", pch = 17, col = "red")

# Add legend
legend("topright",
       legend = c("Observed", "Poisson (Expected)"),
       col = c("blue", "red"),
       pch = c(19, 17),
       lty = 1)