Welcome to the Harris Geospatial product documentation center. Here you will find reference guides, help documents, and product libraries.

  >  Docs Center  >  IDL Reference  >  Advanced Math and Stats  >  IMSL_ALLBEST




The IMSL_ALLBEST procedure selects the best multiple linear regression models.

The IMSL_ALLBEST procedure finds the best subset regressions for a regression problem with

n_candidate = (N_ELEMENTS (x (0, *)))

independent variables. Typically, the intercept is forced into all models and is not a candidate variable. In this case, a sum-of-squares and crossproducts matrix for the independent and dependent variables corrected for the mean is computed internally. There may be cases when it is convenient for you to calculate the matrix; see the description of the Cov_Input optional parameter.

"Best" is defined, on option, by one of the following three criteria:

  • R2 (in percent):

  • R2a (adjusted R2 in percent):

  • Note that maximizing the criterion is equivalent to minimizing the residual mean square:

  • Mallows' Cp statistic:

Here, n is equal to the sum of the frequencies (or N_ELEMENTS(x (*, 0)) if Frequencies is not specified) and SST is the total sum of squares. SSEp is the error sum of squares in a model containing p regression parameters including b0 (or p – 1 of the n_candidate candidate variables). Variable is the s2n_candidate error mean square from the model with all n_candidate variables in the model. Hocking (1972) and Draper and Smith (1981, pp. 296–302) discuss these criteria.

The IMSL_ALLBEST procedure is based on the algorithm of Furnival and Wilson (1974). This algorithm finds Max_N_Good candidate regressions for each possible subset size. These regressions are used to identify a set of best regressions. In large problems, many regressions are not computed. They may be rejected without computation based on results for other subsets; this yields an efficient technique for considering all possible regressions.

There are cases when you may wish to input the variance-covariance matrix rather than allow the IMSL_ALLBEST procedure to calculate it. This can be accomplished using keyword Cov_Input. Three situations in which you may want to do this are as follows:

  1. The intercept is not in the model. A raw (uncorrected) sum-of-squares and crossproducts matrix for the independent and dependent variables is required. Keyword Cov_Nobs must be set to 1 greater than the number of observations. Form ATA, where A = [A, Y], to compute the raw sum-of-squares and crossproducts matrix.
  2. An intercept is to be a candidate variable. A raw (uncorrected) sum-of-squares and crossproducts matrix for the constant regressor (= 1.0), independent variables, and dependent variables is required for Cov_Input. In this case, Cov_Input contains one additional row and column corresponding to the constant regressor. This row/column contains the sum of squares and crossproducts of the constant regressor with the independent and dependent variables. The remaining elements in Cov_Input are the same as in the previous case. Keyword Cov_Nobs must be set to 1 greater than the number of observations.
  3. There are m variables to be forced into the models. A sum-of-squares and crossproducts matrix adjusted for the m variables is required (calculated by regressing the candidate variables on the variables to be forced into the model). Keyword Cov_Nobs must be set to m less than the number of observations.

Programming Notes

The IMSL_ALLBEST procedure saves considerable CPU time over explicitly computing all possible regressions. However, the procedure has some limitations that can cause unexpected results for users who are unaware of the limitations of the software.

  1. For n_candidate + 1 > –log2(e), where e is machine precision, some results may be incorrect. This limitation arises because the possible models indicated (the model numbers 1, 2, ..., 2n_candidate) are stored as floating-point values; for sufficiently large n_candidate, the model numbers cannot be stored exactly. On many computers, this means IMSL_ALLBEST (for n_candidate > 24; single precision) and IMSL_ALLBEST (for n_candidate > 49; double precision) can produce incorrect results.
  2. The IMSL_ALLBEST procedure eliminates some subsets of candidate variables by obtaining lower bounds on the error sum of squares from fitting larger models. First, the full model containing all n_candidate is fit sequentially using a forward stepwise procedure in which one variable enters the model at a time, and criterion values and model numbers for all the candidate variables that can enter at each step are stored. If linearly dependent variables are removed from the full model, error is issued. If this error is issued, some submodels that contain variables removed from the full model because of linear dependency can be overlooked if they have not already been identified during the initial forward stepwise procedure. If error is issued and you want the variables that were removed from the full model to be considered in smaller models, rerun the program with a set of linearly independent variables.

© 2018 Harris Geospatial Solutions, Inc. |  Legal
My Account    |    Store    |    Contact Us