Lecture # 2 In-class activity: R commands and Output

Question 2:

Suppose that the time (in days) to development of a tumor for rats exposed to a carcinogen is well-approximated by a Weibull distribution with a=2 and b=30.

a. Graph the density, CDF, and survival curve for this random variable.

The curve command can be used to draw these graphs. dweibull and pweibull represent the density and CDF, respectively, of a Weibull.

In the command below, the 0 and 80 tell R to draw the graph from 0 to 80 on the x-axis.

Also, the ylab command puts a label on the y-axis.

curve(dweibull(x,shape=2,scale=30),0,80,ylab="Density")

plot of chunk unnamed-chunk-1

curve(pweibull(x,shape=2,scale=30),0,80,ylab="Cumulative Proportion")

plot of chunk unnamed-chunk-1

curve(1-pweibull(x,shape=2,scale=30),0,80,ylab="Survival Proportion")

plot of chunk unnamed-chunk-1

b. What is the chance that the rat will be tumor free for more than 40 days?

Remember pweibull gives us the cumulative, so here, we need to take “one minus” to get the answer.

1-pweibull(40,shape=2,scale=30)
## [1] 0.169

c. Find the mean time to tumor.

The mean time to tumor is the area under the survival curve, which is an integral.

There are several ways to do this. Here is one: first define the function of interest, and then integrate over the desired range. R returns a precision along with the value of the integral:

f = function(x) 1-pweibull(x,shape=2,scale=30)
integrate(f,0,Inf)
## 26.59 with absolute error < 0.00061

Another way to approximate the mean is to randomly generate many values, and simply take their mean. rweibull generates Weibull values, with the 1,000,000 specifying how many values you want to generate.

Times = rweibull(1000000,shape=2,scale=30)
mean(Times)
## [1] 26.6

Note: The mean time you obtain will be slightly different as the above is a random process.

d. Estimate by eye the median time to tumor, and then find it more precisely using R.

Looking at the CDF or the survival function, the median seems to be roughly 25 days.

More formally, we can use an R function called qweibull to find any percentile. Here, we want the 50th percentile, so we input 0.5 as the first argument:

qweibull(0.5,shape=2,scale=30)
## [1] 24.98

Question 3:

Suppose a failure time random variable (X) follows an Exponential (\(\lambda\)=2) distribution.

a. Find: P( X < 1 ) and P( X < 6 | X > 5 ).

For P(X < 1):

pexp(1,2)
## [1] 0.8647

For P( X < 6 | X > 5 ):

( pexp(6,2) - pexp(5,2) ) / ( 1-pexp(5,2) )
## [1] 0.8647

These are identical! And this has nothing to do with our choice of \(\lambda\), it will hold for any value chosen in the Exponential distribution.

b. Repeat for X ~ Weibull(a = 1.2,b = 0.5), X ~ Weibull(a = 0.8,b = 0.5), and X ~ Weibull(a = 1,b = 0.5):

For X ~ Weibull(a = 1.2,b = 0.5):

# P(X < 1) =
pweibull(1,1.2,0.5)
## [1] 0.8995
# P( X < 6 | X > 5 ) =
( pweibull(6,1.2,0.5) - pweibull(5,1.2,0.5) ) / ( 1-pweibull(5,1.2,0.5) )
## [1] 0.9793

For X ~ Weibull(a = 0.8,b = 0.5):

# P(X < 1) =
pweibull(1,0.8,0.5)
## [1] 0.8247
# P( X < 6 | X > 5 ) =
( pweibull(6,0.8,0.5) - pweibull(5,0.8,0.5) ) / ( 1-pweibull(5,0.8,0.5) )
## [1] 0.6287

For X ~ Weibull(a = 1,b = 0.5):

# P(X < 1) =
pweibull(1,1,0.5)
## [1] 0.8647
# P( X < 6 | X > 5 ) =
( pweibull(6,1,0.5)-pweibull(5,1,0.5) ) / ( 1-pweibull(5,1,0.5) )
## [1] 0.8647

c. Describe in your own words the Exponential property that you have just discovered.

This result is telling us that, for the Exponential distribution, the chance of failure in the first time unit is identical to the chance of failure in any one time unit, given that we know the object has survived to that time unit.

This property, sometimes referred to as the “memoryless property” suggests that the risk of death/failure does not change over time.

This property is unique to the Exponential distribution!

d. Contrast the Exponential property with the Weibull results.

Notice that, for the Weibull distribution, the shape parameter controls what happens: for a=1, the Weibull reduces to the Exponential, as we know, so the “memoryless property” does indeed hold.

Otherwise, for a > 1, the risk of failure increases as time goes on. That is, one unit is not the same regardless of where in time it is.

For a < 1, the risk of failure is actually decreasing as time goes by.