Welcome to the Harris Geospatial product documentation center. Here you will find reference guides, help documents, and product libraries.


IMSL_SURVIVAL_GLM

IMSL_SURVIVAL_GLM

IMSL_SURVIVAL_GLM

The IMSL_SURVIVAL_GLM function analyzes censored survival data using a generalized linear model and estimates survival probabilities and hazard rates for the various parametric models.

The routine described in this topic has primary application in the areas of reliability and life testing, but they may find application in any situation in which time is a variable of interest. Kalbfleisch and Prentice (1980), Elandt-Johnson and Johnson (1980), Lee (1980), Gross and Clark (1975), Lawless (1982), and Chiang (1968) are references for discussing the models and methods used here.

IMSL_SURVIVAL_GLM fits any of several generalized linear models, and computes estimates of survival probabilities based on the same models.

The IMSL_SURVIVAL_GLM function computes the maximum likelihood estimates of parameters and associated statistics in generalized linear models commonly found in survival (reliability) analysis. Although the terminology used will be from the survival area, the methods discussed have applications in many areas of data analysis, including reliability analysis and event history analysis. These methods can be used anywhere a random variable from one of the discussed distributions is parameterized via one of the models available in IMSL_SURVIVAL_GLM. Thus, while it is not advisable to do so, standard multiple linear regression can be performed by routine IMSL_SURVIVAL_GLM. Estimates for any of 10 standard models can be computed.

Exact, left-censored, right-censored, or interval-censored observations are allowed (note that left censoring is the same as interval censoring with the left endpoint equal to the left endpoint of the support of the distribution).

Let η = xTβ be the linear parameterization, where x is a design vector obtained by IMSL_SURVIVAL_GLM via IMSL_REGRESSORS from a row of x, and β is a vector of parameters associated with the linear model. Let T denote the random response variable and S(t) denote the probability that T > t. All models considered also allow a fixed parameter wi for observation i (input in column Ifix of x). Use of this parameter is discussed below. There also may be nuisance parameters θ > 0, or σ > 0 to be estimated (along with β) in the various models. Let Φ denote the cumulative normal distribution. The survival models available in IMSL_SURVIVAL_GLM are listed below.

model

Name

S(t)

0

Exponential

1

Linear hazard

2

Log-normal

3

Normal

4

Log-logistic

5

Logistic

6

Log least extreme value

7

Least extreme value

8

Log extreme value

9

Extreme value

10

Weibull

Note that the log-least-extreme-value model is a re-parameterization of the Weibull model. Moreover, models 0, 1, 2, 4, 6, 8, and 10 require that T > 0, while all of the remaining models allow any value for T, –∞ < T < ∞.

Each row vector in the data matrix can represent a single observation; or, through the use of vector frequencies, each row can represent several observations. Also note that classification variables and their products are easily incorporated into the models via the usual regression-type specifications.

The constant parameter Wi is input in x and may be used for a number of purposes.

For example, if the parameter in an exponential model is known to depend upon the size of the area tested, volume of a radioactive mass, or population density, and so on, then a multiplicative factor of the exponential parameter λ= exp(xβ) may be known beforehand. This factor can be input in Wi (Wi is the log of the factor).

An alternate use of Wi is as follows: It may be that λ = exp(x1β1 + x2β2), where β2 is known. Letting Wi = x2β2, estimates for β1 can be obtained via IMSL_SURVIVAL_GLM with the known fixed values for β2. Standard methods can then be used to test hypothesis about β1 via computed log-likelihoods.

Computational Details


The computations proceed as follows:

  1. The input parameters are checked for consistency and validity.
    Estimates of the means of the “independent” or design variables are computed. Means are computed as:

  2. If initial estimates are not provided (see keyword Init_Est), they are calculated as follows:
    Models 2-10:
    A. Kaplan-Meier estimates of the survival probability:

    at the upper limit of each failure interval are obtained. (Because upper limits are used, interval- and left-censored data are assumed to be exact failures at the upper endpoint of the failure interval.) The Kaplan-Meier estimate is computed under the assumption that all failure distributions are identical (i.e., all β’s but the intercept, if present, are assumed to be zero).

    B. If there is an intercept in the model, a simple linear regression is perform predicting:


    where t' is computed at the upper endpoint of each failure interval, t' = t in models 3, 5, 7, and 9, and t' = ln(t) in models 2, 4, 6, 8, and 10, and wi is the fixed constant, if present.
    If there is no intercept in the model, then α is fixed at zero, and the model:


    is fit instead. In this model, the coefficients β are used in place of the location estimate α above. Here:


    is estimated from the simple linear regression with α = 0.

    C. If the intercept is in the model, then in log-location-scale models (models 1-8):


    and the initial estimate of the intercept is assumed to be:


    In the Weibull model:


    and the intercept is assumed to be:

    Initial estimates of all parameters β, other than the intercept, are assumed to be zero.
    If there is no intercept in the model, the scale parameter is estimated as above, and the estimates:


    from Step 2 are used as initial estimates for the β’s.
    Models 0 and 1:
    For the exponential models (model = 0 or 1), the “average total time on” test statistic is used to obtain an estimate for the intercept. Specifically, let Tt denote the total number of failures divided by the total time on test. The initial estimates for the intercept is then ln(Tt). Initial estimates for the remaining parameters β are assumed to be zero, and if model = 1, the initial estimate for the linear hazard parameter θ is assumed to be a small positive number. When the intercept is not in the model, the initial estimate for the parameter θ is assumed to be a small positive number, and initial estimates of the parameters β are computed via multiple linear regression as in Part A.
  3. A quasi-Newton algorithm is used in the initial iterations based on a Hessian estimate:


    where l'iα j is the partial derivative of the i-th term in the log-likelihood with respect to the parameter αj , and aj denotes one of the parameters to be estimated.
    When the relative change in the log-likelihood from one iteration to the next is 0.1 or less, exact second partial derivatives are used for the Hessian so the Newton-Rapheson iteration is used.
    If the initial step size results in an increase in the log-likelihood, the full step is used. If the log-likelihood decreases for the initial step size, the step size is halved, and a check for an increase in the log-likelihood performed. Step-halving is performed (as a simple line search) until an increase in the log-likelihood is detected, or until the step size becomes very small (the initial step size is 1.0).
  4. Convergence is assumed when the maximum relative change in any coefficient update from one iteration to the next is less than Eps or when the relative change in the log-likelihood from one iteration to the next is less than Eps/100.
    Convergence is also assumed after Itmax iterations or when step halving leads to a very small step size with no increase in the log-likelihood.
  5. If requested (see keyword Lp_Max), the methods of Clarkson and Jennrich (1988) are used to check for the existence of infinite estimates in:


    As an example of a situation in which infinite estimates can occur, suppose that observation j is right-censored with tj > 15 in a normal distribution model in which the mean is:


    where xj is the observation design vector. If the design vector xj for parameter βm is such that xjm = 1 and xim = 0 for all i ≠ j, then the optimal estimate of βm occurs at:


    leading to an infinite estimate of both βm and ηj. In IMSL_SURVIVAL_GLM, such estimates can be “computed.”
    In all models fit by IMSL_SURVIVAL_GLM, infinite estimates can only occur when the optimal estimated probability associated with the left- or right-censored observation is 1. If infinity checking is on, left- or right-censored observations that have estimated probability greater than 0.995 at some point during the iterations are excluded from the log-likelihood, and the iterations proceed with a log-likelihood based on the remaining observations. This allows convergence of the algorithm when the maximum relative change in the estimated coefficients is small and also allows for a more precise determination of observations with infinite:

    At convergence, linear programming is used to ensure that the eliminated observations have infinite ηi. If some (or all) of the removed observations should not have been removed (because their estimated ηi’s must be finite), then the iterations are restarted with a log-likelihood based upon the finite ηi observations. See Clarkson and Jennrich (1988) for more details.
    By default, or when not using keyword Lp_Max (see keyword Lp_Max), no observations are eliminated during the iterations. In this case, the infinite estimates occur, some (or all) of the coefficient estimates:

    will become large, and it is likely that the Hessian will become (numerically) singular prior to convergence.
  6. The case statistics are computed as follows: Let Ii (θi) denote the log-likelihood of the i-th observation evaluated at θi , let I'i denote the vector of derivatives of Ii with respect to all parameters, denote the derivative of Ii with respect to η = xTβ, H denote the Hessian, and E denote expectation. Then the columns of Case_Analysis are:
    A. Predicted values are computed as E (T/x) according to standard formulas.
    If model is 4 or 8, and if s ≥ 1, then the expected values cannot be computed because they are infinite.

    B. Following Cook and Weisberg (1982), the influence (or leverage) of the i-th observation is assumed to be:


    This quantity is a one-step approximation of the change in the estimates when the i-th observation is deleted (ignoring the nuisance parameters).

    C. The “residual” is computed as I'η,i.

    D. The cumulative hazard is computed at the observation covariate values and, for interval observations, the upper endpoint of the failure interval.
    The cumulative hazard also can be used as a “residual” estimate. If the model is correct, the cumulative hazards should follow a standard exponential distribution. See Cox and Oakes (1984).
    The IMSL_SURVIVAL_GLM function computes estimates of survival probabilities and hazard rates for the parametric survival/reliability models when using the EST_* keywords.
    Let η = xTβ be the linear parameterization, where x is the design vector corresponding to a row of x (IMSL_SURVIVAL_GLM generates the design vector using IMSL_REGRESSORS), and β is a vector of parameters associated with the linear model. Let T denote the random response variable and S(t) denote the probability that T > t. All models considered also allow a fixed parameter w (input in column Ifix of x). Use of the keyword is discussed in above. There also may be nuisance parameters θ > 0 or σ > 0. Let λ(t) denote the hazard rate at time t. Then λ(t) and S(t) are related at:


    Models 0, 1, 2, 4, 6, 8, and 10 require that T > 0 (in which case assume λ(s) = 0 for s < 0), while the remaining models allow arbitrary values for T, −∞ < T < ∞. The computations proceed in IMSL_SURVIVAL_GLM when using the keywords EST_* as follows:
    1. The input arguments are checked for consistency and validity.
    2. For each row of x, the explanatory variables are generated from the classification and variables and the covariates using IMSL_REGRESSORS with keyword Dummy_Method.
    3. For each point requested in the time grid, the survival probabilities and hazard rates are computed.

Programming Notes


Indicator (dummy) variables are created for the classification variables using IMSL_REGRESSORS (Chapter 3, Regression) using keyword Dummy_Method.



© 2018 Harris Geospatial Solutions, Inc. |  Legal
My Account    |    Store    |    Contact Us