library(PWEXP)
library(RColorBrewer)
set.seed(1818)

Usage of Package

See https://zjph602xtc.github.io/PWEXP/.

Distribution

Piecewise Exponential Distribution

The hazard function, cumulative hazard function, PDF, CDF, quantile function of an exponential distribution r.v. t are: The hazard function, cumulative hazard function, PDF, survival function, quantile function of a piecewise exponential distribution r.v. t with breakpoints d_i are:

The PWEXP package provides dpwexp(), ppwexp(), qpwexp(), rpwexp() functions to the piecewise exponential distribution:

# Left Figure ------------------------------------------------------------
# use rpwexp function to generate piecewise exp samples with rate 2, 1, 3
r_sample <- rpwexp(50000, rate=c(2, 1, 3), breakpoint=c(0.3, 0.8))
hist(r_sample, freq=F, breaks=200, main="Density of Piecewsie Exp Dist", xlab='t', xlim=c(0, 1.2))

# piecewise exp density with rate 2, 1, 3 
t <- seq(0, 1.5, 0.01)
f2 <- dpwexp(t, rate=c(2, 1, 3), breakpoint=c(0.3, 0.8))
points(t, f2, col='red', pch=16)

# exp distribution can be a special case of piecewise exp distribution
f1 <- dpwexp(t, rate=2) 
lines(t, f1, lwd=2)
legend('topright', c('exp dist with rate 2','piecewise exp dist with rate 2, 1, 3',
                     'histogram of piecewise exp dist with rate 2, 1, 3'), 
       col=c('black','red'), fill=c(NA, NA, 'grey'), border=c('white', 'white', 'black'), 
       lty=c(1, NA, NA), pch=c(NA, 16, NA), lwd=2)

# Right Figure ------------------------------------------------------------
# CDF of piecewise exp with rate 2, 1, 3
F2 <- ppwexp(t, rate=c(2, 1, 3), breakpoint=c(0.3, 0.8), lower.tail=T)
plot(t, F2, type='l', col='red', lwd=2, main="CDF and Quantile Function of Piecewsie Exp Dist", 
     xlim=c(0, 1.5), ylim=c(0, 1.5))

# CDF of exp dist is compatible with our package
F1 <- ppwexp(t, rate=2, lower.tail=T)
lines(t, F1, lwd=2)

# plot quantile functions of both distributions
lines(F1, qpwexp(F1, rate=2, lower.tail=T), lty=2, lwd=2)
lines(F2, qpwexp(F2, rate=c(2, 1, 3), breakpoint=c(0.3,0.8), lower.tail=T), col='red', 
      lty=2, lwd=2)

abline(0, 1, col='grey')
legend('topleft', c('CDF of piecewise exp with rate 2, 1, 3', 'quantile function of 
                    piecewise exp with rate 2, 1, 3', 
                    'CDF of exp with rate 2', 'quantile function of exp with rate 2'), 
       col=c('red', 'red', 'black', 'black'), lty=c(1, 2, 1, 2), lwd=2)

Conditional Piecewise Exponential Distribution

The conditional survival function, CDF, PFD and quantile function of an exponential distribution t given t > T is

The conditional survival function and CDF of a piecewise exponential distribution t given t > T is The conditional quantile function of a piecewise exponential distribution t given t > T is The PWEXP package provides ppwexp_conditional(), qpwexp_conditional(), rpwexp_conditional() functions for conditional piecewise exponential distribution:

# Left Figure ------------------------------------------------------------
# CDF and qunatile function of conditional piecewise exp with rate 2, 1, 3 given t > 0.1
t <- seq(0.1, 1.2, 0.01)
F2_con <- ppwexp_conditional(t, qT=0.1, rate=c(2, 1, 3), breakpoint=c(0.3, 0.8))
plot(t, F2_con, type='l', col='red', lwd=2, main="CDF and Quantile Function of 
     Conditional \nPiecewsie Exp Dist", xlim=c(0, 1.2), ylim=c(0, 1.2))
lines(F2_con, qpwexp_conditional(F2_con, qT=0.1, rate=c(2, 1, 3), breakpoint=c(0.3,0.8)), 
      lty=2, lwd=2, col='red')

# compare with CDF and quantile function of unconditional piecewise exp with rate 2, 1, 3
t <- seq(0, 1.2, 0.01)
F2 <- ppwexp(t, rate=c(2, 1, 3), breakpoint=c(0.3,0.8))
lines(t, F2, lwd=2)
lines(F2, qpwexp(F2, rate=c(2, 1, 3), breakpoint=c(0.3,0.8)), lty=2, lwd=2)
abline(v=0.1, col='grey')
abline(h=0.1, col='grey')
legend('topleft', c('CDF of piecewise exp dist given t > 0.1', 'quantile function of 
                    piecewise exp dist given t > 0.1', 'CDF of piecewise exp dist', 
                    'quantile function of piecewise exp dist'), col=c('red', 'red', 'black', 'black'),
       lty=c(1, 2, 1, 2), lwd=2)

# Right Figure ------------------------------------------------------------
# use rpwexp_conditional function to generate piecewise exp samples with rate 2, 1, 3 given t > 0.1
r_sample_con <- rpwexp_conditional(3000, qT=0.1, rate=c(2, 1, 3), breakpoint=c(0.3,0.8))
plot(ecdf(r_sample_con), col='red', lwd=2,  main="Empirical CDF of Conditional 
     Piecewsie Exp Dist", xlim=c(0, 1.2), ylim=c(0, 1))

# compare with its CDF
lines(seq(0.1, 1.2, 0.01), F2_con, lwd=2)
legend('topleft', c('empirial CDF of piecewise exp dist given t > 0.1', 'true CDF of 
                    piecewise exp dist given t > 0.1'), col=c('red', 'black'), lty=c(1,2), lwd=2)

Parameter Estimation

Hazard Rate Estimation with Known Breakpoints

The log likelihood function can be constructed as following: where D_r or C_r are the index set of event time or censoring time that fall into rth interval of the piecewise exponential distribution.

When breakpoints d₁ to d_r are known, we can take a derivative of log (L) wrt λ: where n_{D_r} is the number of events that fall into the rth interval; n_r⁺ is the number of events and censoring that fall into the rth to the end intervals.

Let all derivatives equal to 0, we obtain the MLE estimator of hazard rates:

Estimation of Breakpoints

MLE by Brute-force Search

The log likelihood function is differentiable wrt d_r as long as d_r is not equal to any event time or censoring time. We take the derivative of log (L) wrt d_r: Since λ_r + 1 ≠ λ_r, then the derivative cannot be zero when d_r is between two consecutive sample time points. This fact implies that only when d_r is equal to any of the sample values, the log likelihood function achieves the maximum value.

Therefore, we can use a brute-force search to calculate the log likelihood function values for all potential breakpoint combinations. For each candidate, hazard rates can be estimated by the formula in Section 2.1 and thus the log likelihood can be obtained. The breakpoints and hazard rates with the largest log likelihood value are the MLE estimator.

When the number of samples or the number of breakpoints are relatively large, the combination of breakpoints will be very large and the brute-force search may not be feasible. We will draw a random sub-sample first and then do exhaustive search based on the sub-sample. The argument in function controls the maximum combination candidates to try.

OLS on Survival Function

The second method to estimate breakpoints is based on the survival curve. Once we take a log transformation of the piecewise survival function, we will obtain a piecewise linear function:

Let Y_i be the log value of KM estimate at t_i. Following Muggeo (2003), we use (t_i, Y_i) to fit the piecewise log survival function log (S(t)) and obtain the OLS breakpoints d̂_r. Once breakpoints are determined, hazard rates can be estimated by the formula in Section 2.1.

Hybrid Method

The breakpoints estimated in Section 2.2.2 are not exactly but very close to the MLE estimator. In order to obtain the MLE estimator, we combine brute-force search with OLS method.

Specifically, instead of drawing a random sub-sample in Section 2.2.1, we draw a sub-sample from the values within 95% percent confidence intervals of the estimated breakpoints from Section 2.2.2. Then we do exhaustive search based on the sub-sample. The argument in function controls the maximum combination candidates to try. The hybrid method improves the efficiency and accuracy of the estimation procedure.

Summary

The procedure of parameter estimation is summarized in the diagram below:

Reference

[1] Muggeo, V. M. (2003). Estimating regression models with unknown break-points. Statistics in medicine, 22(19), 3055-3071.

Theory Notes