Predicting Bankruptcy in Private Firms: Towards a Stepwise Regression Procedure

The aim of this paper is to investigate the relative importance of various bankruptcy predictors commonly used in the existing literature as well as ad hoc variables by applying a variable selection technique. Based on a sample of 4,796 Belgian private firms, a forward stepwise logistic regression procedure is employed. Our results confirm that lower levels of liquidity, solvency and profitability increase the probability of bankruptcy while younger and smaller firms are more likely to be bankrupt. Furthermore, the proportion of accruals in total assets is negatively related to the probability of failure and improves the accuracy of our model.

analysis on a single ratio (cash-flow/total debt) while Altman (1968) developed a multidimensional approach that combined five accounting ratios to calculate the well-known "Z-score" in order to predict corporate bankruptcy. Over years, numerous modelling techniques have been designed to provide a more complete understanding of the bankruptcy phenomenon and improve the accuracy of the models. In that sense, Du Jardin (2009) reveals that more than 50 methods have been used to build bankruptcy prediction models (discriminant analysis, logistic regression, probit regression, rules induction, spline regression, neural networks, gambler's ruin model, hazard model, and so on).
One of the most frequently employed statistical techniques in bankruptcy forecasts is the logit model (first used in Ohlson, 1980). This model offers several advantages that make it superior to alternative procedure. For instance, logit models do not require the potential bankruptcy predictors to be normally distributed and allow to combine several bankruptcy indicators into a multivariate probability score which indicates the likelihood of corporate bankruptcy (Balcaen & Ooghe, 2006;Karlson, 2015). Given the reliability of logit models to assess corporate bankruptcy (Acosta-González & Fernández-Rodríguez, 2014;Gupta, Gregoriou & Healy, 2015), this statistical technique will be used in this study.
Beyond the regression technique, another point must be stressed to build accurate bankruptcy prediction model: variable selection. Usually, a two-step procedure is chosen to select the most accurate variables. Reviewing 190 papers on bankruptcy prediction models, Du Jardin (2009) showed that, during the first step, 40% of studies only used variables that have been found to be good predictors in prior research. In other words, the variables are included in the model because they were significant in past empirical works. However, this procedure is quite reductionist as bankruptcy predictors can be reliable in one context but not in another owing to their contingent nature (Balcaen & Ooghe, 2006;Du Jardin, 2015). To mitigate this concern, the first step of this research proposes to adopt a mix-method that includes variables whose predictive power has been demonstrated in prior studies as well as ad hoc variables with the aim to improve the accuracy of the model.
In the second step, an automatic procedure is commonly employed to build the final set of bankruptcy predictors. Dash and Liu (1997) suggested to distinguish between complete methods and heuristic methods. While the former enables to find an optimal solution provided the evaluation criteria is monotonic, the latter relaxes the monotonic assumption on the selection criteria. As a result, heuristic procedures allow researchers to explore all possible combinations for the selection criteria to finally restrict attention to a smaller number of potential bankruptcy predictors (Acosta-González & Fernández-Rodríguez, 2014). Some of the most popular methods are the forward or backward stepwise procedures, which sequentially include or exclude variables based on various criteria such as t-ratio statistics or the probability of F (Miller, 2002;Lussier & Corman, 2015). Despite their relevancy, few studies engage in such procedures. According to Du Jardin (2009), only 26% of prior works on bankruptcy prediction adopt stepwise selection procedures, which hampers the robustness of their results. As a consequence, this research uses forward stepwise logistic regressions to build the most accurate model of bankruptcy prediction

Data Collection
In this study, we used a sample of 4,796 Belgian private firms whose 826 companies were declared bankrupt between the years 2010 and 2014. The Bureau Van Dijk (hereafter, BVD) database was employed to identify the firms that had experienced a bankruptcy between 2010 and 2014. This database was also used to gather information about the financial and accounting ratios as well as firm's characteristics.

Regression Procedure
Since the dependent variable is a dichotomous qualitative variable, a binary logit regression model was used, as is the case in many studies regarding the occurrence of bankruptcy filing (Ohlson, 1980;Premachandra, Bhabra & Sueyoshi, 2009). A logit model describes the relationship between a dichotomous dependent variable that can take value 1 (bankrupt business) or value 0 (healthy), and k other explanatory variables x 1 , x 2 ... x k . These variables can be quantitative or qualitative. Since the dependent variable is binary (dichotomous), the latter follows the Bernoulli distribution such that Pi = P (yi = 1) is the probability of bankruptcy and 1 -Pi is the probability of non-failure. The estimated model requires the endogenous variable to be a linear combination of exogenous variables: and where yi = 1 if yi* > 0 The probability of non-default (a posteriori) of business i is given by: Similarly, the probability of failure (a posteriori) of business i is represented by: The logit model assumes that the errors follow a logistic distribution where the distribution function is: Therefore, it is possible to calculate the probability of non-default of business I as follows: Similarly, the probability of the default of business i is: The β coefficients were estimated using the method of maximum likelihood and the model is analysed with Stata software.

Variable Selection
Dependent variable. Bankruptcy is a dichotomous variable coded with the value of 1 if the company is bankrupt, 0 otherwise.

Independent variables.
Numerous studies have included a large amount of accounting-based variables and market-based variables to improve the accuracy of bankruptcy forecast model (e.g. Altman, 1968;Beaver, McNichols & Rhie, 2005;Bharath & Shumway, 2008;Charitou, Dionysiou, Lambertides & Trigeorgis, 2013). However, most of them used sub-optimal selection criteria. According to Du Jardin (2009), only 35% of empirical research in the field employs an effective variable selection method. In this study, stepwise logistic regressions were used to identify and select the best combination of variables predicting bankruptcy (Shin & Lee, 2002;Shin, Lee & Kim, 2005).
Forward stepwise selection consists to find the single best predictor variable and add variables that meet specified criteria (Tsai, 2009). More specifically, this method sequentially includes variables based on F-statistic considerations until none improves the model (Miller, 2002). The final model only contains effective predictors with a significant coefficient. In building the model, 20 potential predictors were used based on prior studies such as Altman, (1968), Beaver (1966), Beaver et al. (2005), Ding, Song and Zen (2008), Ohlson (1980), Reznakova and Karas (2014), Tsai (2009), Tseng and Hu (2010), Wang and Lee (2008). 11 ad hoc variables were also included to take into account the contingent nature of bankruptcy predictors (Du Jardin, 2009, 2015. To select the most accurate and relevant predictors during the stepwise selection procedure, we used several criteria. Following Tsai (2009), we employed .05 probability of F as a cutoff line of entry to select one of the variables which is crucial to the model. To avoid co-linearity problems with the variables selected in applying this procedure, we set .10 probability of F as a limit of removal (Tsai, 2009). Thus, one variable was included in the model if the probability of F is less than .05 and removal from the models if the probability of F is more than 0.1. The probability of F corresponds to the contribution of the variables to the model whether they reach the significance or not. Applying this forward stepwise technique to the accounting-based variables reported in Table 1, the following independent variables were included in our model: Firm age corresponds to the number of years the firm has been in business. Firm size is the natural logarithm of total assets. Accruals is measured as accruals divided by total assets. Current represents a measure of the current ratio (current assets/current liabilities). EBIT/TA is a profitability ratio that is measured as earning before interests and taxes divided by total assets. Solvency corresponds to a commonly employed measure of the solvency ratio: (Net income + depreciation) / (Short-term liabilities + Long term liabilities). VA/TW captures labor productivity and is measured as the value added per worker.

Descriptive Statistics
Descriptive statistics and Pearson's correlation matrix are reported in Table 2. It appears that bankrupt firms are younger (p < .01) and smaller (p < .01). Additionally, it must be noted that the solvency ratio (p < .01) and the current ratio (p < .01) are negatively correlated with the probability of bankruptcy. Profitability seems to be negatively linked to bankruptcy since a negative correlation is found between both EBIT/TA (p < .01) and VA/TW (p < .01) and bankruptcy. It is also observed that accruals are positively correlated to bankruptcy (p < .01). Interestingly enough, larger firms display higher levels of solvency ratio (p < .01), current ratio (p < .01), EBIT/TA (p < .01) and VA/TW (p < .01). Older firms are only characterized by higher levels of solvency ratio (p < .01) and current ratio (p < .01). Furthermore, both firm age (p < .01) and firm size (p < .01) are negatively correlated to accruals.

Regression Analysis
The results of our regression are presented in Table 3. Both firm age (p < .01) and firm size (p < .01) are negatively related to bankruptcy. Regarding the relationship between profitability and bankruptcy, we observe that both EBIT/TA (p < .01) and VA/TW (p < .01) are negatively related to the probability of bankruptcy. A negative relation is reported between the degree of liquidity, assessed by the current ratio, and bankruptcy (p < .01) while a negative link exists between the solvency ratio and the probability of bankruptcy (p < .01). Furthermore, a positive relationship is observed between accruals and bankruptcy (p < .01). Log Likelihood -377.966 * p ≤ .10. ** p ≤ .05. *** p ≤ .01.
As robustness check, we led an out-of-sample analysis. As such, the sample was divided into two parts. The first part included 70% of the observed group (3,357 firms) and acts as the training group. The rest of the sample (1,439 firms) represented the control group that was employed to test the model obtained on the training group. The out-of-sample analysis revealed high levels of accurate prediction since the estimated parameters did not differ significantly from our initial regression. Furthermore, we run additional robustness tests by changing the division percentage of both groups. However, setting the observed group and the control group both at 50% , or even 25% of the initial sample for the observed group and 75% for the control group, did not change our percentage of accurate prediction significantly.

Discussion and Conclusion
Bankruptcy prediction is a critical issue that has been widely explored in the finance and accounting literature. Although numerous improvements have been noted in the construction of bankruptcy prediction models over the last decades (Balcaen & Ooghe, 2006;Du Jardin, 2015), a long way has still to be done regarding the accuracy of those models . Using a comprehensive Belgian bankruptcy database constructed from Belfirst, we adopt a forward stepwise logit regression procedures in order to extract the most accurate model based on an initial set of 31 variables containing the most commonly employed bankruptcy predictors as well as ad hoc variables whose inclusion was likely to improve the fit of the model.
Our results suggest that bankruptcy can be predicted by a subset of 7 variables: firm age, firm size, EBIT/TA, VA/TW, Solvency, Current and accruals. In addition to the variables related to solvency, liquidity and profitability whose predictive power has already been shown (e.g. Altman, 1968;Beaver, 1966;Beaver et al., 2005;Ohlson, 1980;Tsai, 2009), it is interesting to note that the proportion of accruals in total assets improves the predictive power of the model. This observation is in line with prior research suggesting that higher levels of accruals is an indication of the presence of errors in the accruals-estimation process (Richardson, 2003), which is likely to increase the probability of bankruptcy (Al-Attar, Hussain & Zuo, 2008).
The contribution of this article to the academic literature is twofold. First, by employing a stepwise forward logit regression procedure to build our bankruptcy prediction model, we answer to a recent call for more robustness in the variable selection process (Acosta-González & Fernández-Rodríguez, 2014). Moreover, we also included ad hoc variables to take into account the contingent nature of bankruptcy prediction, a procedure that has been overlooked in empirical research so far (Du Jardin, 2009, 2015. Taken together, these elements contribute to the literature as they are oriented towards more accuracy in the construction of bankruptcy prediction models. Second, our findings also add a unusual bankruptcy predictor with the proportion of accruals in total assets. By so doing, we emphasize the importance to take into account ad hoc variables in order to extend corporate bankruptcy theories (Tian et al., 2015).
This study also suffers from several limitations that must be acknowledged. First, we used a forward stepwise procedure. Even if such using such a procedure has been found reliable (Acosta-González & Fernández-Rodríguez, 2007), over-identification concern can appear in the model. It means that false significant variables could be included in the final model (Lovell, 1983). Therefore, future research should adopt a method that alleviates this problem such as computational search procedures (Acosta-González & Fernández-Rodríguez, 2014) or data mining and machine learning techniques (Tsai, 2009). Furthermore, this study addressed bankruptcy prediction using a sample of Belgian private firms. Since bankruptcy prediction models must be contextualized to offer relevant and accurate forecasts (McSweeney, 2001), our study could be replicated in other institutional, legal or organizational contexts. Additionally, we used static bankruptcy predictors to build our model. As such predictors only capture an imbalance at a determined moment, an alternative method would be to use variation variables that consider year-on-year changes in the predictors to address bankruptcy in a dynamic way (Du Jardin, 2009).
In conclusion, we hope that, by developing a further understanding of bankruptcy prediction in private firms, this study will stimulate future research on this complex and important issue of finance and accounting studies.