Logistic regression using Firth's bias reduction: a solution to the problem of separation in logistic regression

The phenomenon of separation is observed in the fitting process of a logistic regression model if the likelihood converges to a finite value while at least one parameter estimate diverges to (plus or minus) infinity. Separation primarily occurs in small or sparse samples with highly predictive covariates. The simplest case of separation is in the analysis of a 2x2 table with one zero cell count. Statistical software packages for logistic regression using the maximum likelihood method cannot appropriately deal with this problem. Exact solutions exist but require special software and are not applicable if continuous covariates have to be analysed. A bias reduction method originally proposed by Firth 1993 has been proved suggested as an ideal solution to solve the separation problem by Heinze and Schemper 2002. It has been shown that unlike the standard maximum likelihood method, this method always leads to finite parameter estimates. An extensive simulation study can be found in a Technical Report (Heinze, 1999). A recently published study compares the method with exact logistic regression by means of analysis of some small-sample real-life data sets in which separation or a situation close to separation is present (Heinze, 2006). The application of Firth's bias reduction to logistic regression was also recently proposed by Bull et al (2002, 2007) and Heinze and Schemper 2006.

We developed a SAS macro to make this method available from within one of these widely used statistical software packages (Heinze and Ploner, 2003, 2004). Our program is also capable of performing interval estimation based on profile penalized log likelihood (PPL) and of plotting the PPL function as was suggested by Heinze and Schemper 2002. The SAS macro was revised in March 2005 using The SAS System for Windows 9.1, and again improved in March and September 2006.

An R package is available at CRAN or can be downloaded here (version 1.20 for R >= 3.0.0, May 2013). Version 1.20 provides a major update in many respects:

(1) Many S3Methods have been defined for objects of type logistf, including add1, drop1 and anova methods

(2) New forward and backward functions allow for automated variable selection using penalized likelihood ratio tests

(3) The core routines have been transferred to C code, and many improvements for speed have been done

(4) Handling of multiple imputed data sets: the 'combination of likelihood profiles' (CLIP) method (Heinze, Ploner and Beyea, 2013) has been implemented, which builds on datasets that were imputed by the R package mice, but can also handle any imputed data.

Another SAS macro CFL applies Firth's correction to conditional logistic regression, as outlined in Heinze and Puhr 2010. It can be used for any sparse data analyses of clustered data with binary outcomes, such as matched case-control studies, or studies including a nuisance random effect.

Please note that users of SAS version 9.2 can apply Firth's correction by specifying the option FIRTH in the model statement of PROC LOGISTIC. Profile penalized likelihood confidence intervals can be computed by specifying CLODDS=PL in combination with the FIRTH option. However, PROC LOGISTIC does not provide corresponding p-values from penalized likelihood ratio tests (as does our fl macro). The FIRTH option cannot be combined with a STRATA option. For conditional logistic regression, you must therefore use the above mentioned CFL macro.

Referenzen:

**Heinze,G., Ploner,M., Beyea,J. **(2013): "Confidence intervals after multiple imputation: combining profile likelihood information from logistic regressions", *Statistics in Medicine *32, 5062 - 5076 (doi:10.1002/sim.5899).

**Heinze, G., Puhr, R. **(2010): "Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets", *Statistics in Medicine* 29, 770 - 777 (doi:10.1002/sim.3794)**Bull, S., Lewinger, J. P., Lee, S. **(2007): "Confidence intervals for multinomial logistic regression in sparse data", *Statistics in Medicine* 26, 903 - 918 **Heinze, G.** (2006): "A comparative investigation of methods for logistic regression with separated or nearly separated data", *Statistics in Medicine* 25, 4216 - 4226 **Heinze, G., Schemper, M.** (2006): "Letter Re: A permutation test for inference in logistic regression with small- and moderate-sized data sets", *Statistics in Medicine* 25, 719 **Heinze, G., Ploner, M. (2004)**: "A SAS macro, S-PLUS library and R package to perform logistic regression without convergence problems", Technical Report 2/2004, Section for Clinical Biometrics, CeMSIIS, Medical University of Vienna**Heinze, G., Ploner, M.** (2003): "Fixing the nonconvergence bug in logistic regression with SPLUS and SAS", *Computer Methods and Programs in Biomedicine* 71, 181 - 187 **Heinze, G., Schemper, M.** (2002): "A Solution to the Problem of Separation in logistic regression", *Statistics in Medicine* 21, 2409 - 2419 **Bull, S., Mak, C., Greenwood, C. M. T.** (2002): "A modified score function estimator for multinomial logistic regression in small samples", *Computational Statistics and Data Analysis* 39, 57 - 74 **Heinze, G.** (1999): "The application of Firth's procedure to Cox and logistic regression", Technical Report 10/1999, Section for Clinical Biometrics, CeMSIIS, Medical University of Vienna **Firth, D.** (1993): "Bias reduction of maximum likelihood estimates", *Biometrika* 80, 27 - 38

Our programs are free of charge. However, before download, we would like you to supply your name and e-mail address here; we may then notify you if a new version is published: