Self-Selection Models in Corporate Finance
Выбери формат для чтения
Загружаем конспект в формате pdf
Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇
Ch. 2: Self-Selection Models in Corporate Finance
39
Abstract
Corporate finance decisions are not made at random, but are usually deliberate decisions by firms or their managers to self-select into their preferred choices. This chapter
reviews econometric models of self-selection. The review is organized into two parts.
The first part reviews econometric models of self-selection, focusing on the key assumptions of different models and the types of applications they may be best suited
for. Part two reviews empirical applications of selection models in the areas of corporate investment, financing, and financial intermediation. We find that self-selection is a
rapidly growing area in corporate finance, partly reflecting its recognition as a pervasive
feature of corporate finance decisions, but more importantly, the increasing recognition
of selection models as unique tools for understanding, modeling, and testing the role of
private information in corporate finance.
Keywords
selection, private information, switching regression, treatment effect, matching,
propensity score, Bayesian selection methods, panel data, event study, underwriting,
investment banking, diversification
40
K. Li and N.R. Prabhala
Introduction
Corporate finance concerns the financing and investment choices made by firms and
a broad swathe of decisions within these broad choices. For instance, firms pick their
target capital structure, and to achieve the target, must make several choices including
issue timing of security issues, structural features of the securities issued, the investment
bank chosen to underwrite it, and so on. These choices are not usually random, but are
deliberate decisions by firms or their managers to self-select into their preferred choices.
This chapter reviews econometric models of self-selection. We review the approaches
used to model self-selection in corporate finance and the substantive findings obtained
by implementing selection methods.
Self-selection has a rather mixed history in corporate finance. The fact that there is
self-selection is probably not news; indeed, many papers at least implicitly acknowledge
its existence. However, the literature differs on whether to account for self-selection using formal econometric methods, and why one should do so. One view of self-selection
is that it is an errant nuisance, a “correction” that must be made to prevent other parameter estimates from being biased. Selection is itself of little economic interest under
this view. In other applications, self-selection is itself of central economic interest, because models of self-selection represent one way of incorporating and controlling for
unobservable private information that influences corporate finance decisions. Both perspectives find expression in the literature, although an increasing emphasis in recent
work reflects the positive view in which selection models are used to construct interesting tests for private information.
Our review is organized into two parts. Part I focuses on econometric models of
self-selection. We approach selection models from the viewpoint of a corporate finance researcher who is implementing selection models in an empirical application. We
formalize the notion of self-selection and overview several approaches towards modeling it, including reduced form models, structural approaches, matching methods, fixed
effect estimators, and Bayesian methods. As the discussion clarifies, the notion of selection is not monolithic. No single model universally models or accounts for all forms
of selection, so there is no one “fix” for selection. Instead, there are a variety of approaches, each of which makes its own economic and statistical assumptions. We focus
on the substantive economic assumptions underlying the different approaches to illustrate what each can and cannot do and the type of applications a given approach may be
best suited for. We do not say much on estimation, asymptotic inference, or computational issues, but refer the reader to excellent texts and articles on these matters.
Part II of our review examines corporate finance applications of self-selection models. We cover a range of topics such as mergers and acquisitions, stock splits, equity
offerings, underwriting, analyst behavior, share repurchases, and venture capital. Our
objective is to illustrate the wide range of corporate finance settings in which selection arises and the different econometric approaches employed in modeling it. Here,
Ch. 2: Self-Selection Models in Corporate Finance
41
we focus on applications published in the last decade or so, and on articles in which
self-selection is a major component of the overall results.1
I. MODELING SELF-SELECTION
This portion of our review discusses econometric models of self-selection. Our intention is not to summarize the entire range of available models and their estimation.
Rather, we narrow our focus to models that have been applied in the corporate finance
literature, and within these models, we focus on the substantive assumptions made by
each specification. From the viewpoint of the empirical researcher, this is the first order
issue in deciding what approach suits a given application in corporate finance. We do
not touch upon asymptotic theory, estimation, and computation. These important issues
are well covered in excellent textbooks.2
We proceed as follows. Section 1 describes the statistical issue raised by selfselection, the wedge between the population distribution and the distribution within
a selected sample. Sections 2–6 develop the econometric models that can address selection. Section 2 discusses a baseline model for self-selection, the “Heckman” selection
model analyzed in Heckman (1979), a popular modeling choice in corporate finance.3
We discuss identification issues related to the model, which are important but not frequently discussed or justified explicitly in corporate finance applications. Because the
Heckman setting is so familiar in corporate finance, we use it to develop a key point
of this survey, the analogy between econometric models of self-selection and private
information models in corporate finance. Section 3 considers switching regressions and
structural self-selection models. While these models generalize the Heckman selection
model in some ways, they also bring additional baggage in terms of economic and statistical assumptions that we discuss.
We then turn to other approaches towards modeling selection. Section 4 discusses
matching models, which are methods du jour in the most recent applications. The
popularity of matching models can be attributed to their relative simplicity, easy interpretation of coefficients, and minimal structure with regard to specification. However,
these gains come at a price. Matching models make the strong economic assumption
that unobservable private information is irrelevant. This assumption may not be realistic
in many corporate finance applications. In contrast, selection models explicitly model
and incorporate private information. A second point we develop is that while matching
1 Our attempt is to capture the overall flavor of self-selection models as they stand in corporate finance as of
the writing. We apologize to any authors whose work we have overlooked: no slight is intended.
2 The venerable reference, Maddala (1983), continues to be remarkably useful, though its notation is often
(and annoyingly, to the empirical researcher) different from that used in other articles and software packages.
Newer material is covered in Wooldridge (2002) and Greene (2003).
3 Labeling any one model as “the” Heckman model surely does disservice to the many other contributions
of James Heckman. We choose this label following common usage in the literature.
42
K. Li and N.R. Prabhala
methods are often motivated by the fact that they yield easily interpretable treatment
effects, selection methods also estimate treatment effects with equal ease. Our review
of methodology closes by briefly touching upon fixed effect models in Section 5 and
Bayesian approaches to selection in Section 6.
1. Self-selection: The statistical issue
To set up the self-selection issue, assume that we wish to estimate parameters β of the
regression
Yi = Xi β + ϵi
(1)
Yi |E = Xi β + ϵi |E.
(2)
for a population of firms. In equation (1), Yi is the dependent variable, which is typically
an outcome such as profitability or return. The variables explaining outcomes are Xi ,
and the error term is ϵi . If ϵi satisfies usual classical regression conditions, standard
OLS/GLS procedures consistently estimate β.
Now consider a sub-sample of firms who self-select choice E. For this sub-sample,
equation (1) can be written as
The difference between equations (2) and (1) is at the heart of the self-selection problem. Equation (1) is a specification written for the population but equation (2) is written
for a subset of firms, those that self-select choice E. If self-selecting firms are not random subsets of the population, the usual OLS/GLS estimators applied to equation (2),
are no longer consistent estimators of β.
Accounting for self-selection consists of two steps. Step 1 specifies a model for selfselection, using economic theory to model why some firms select E while others do
not. While this specification step is not often discussed extensively in applications, it
is critical because the assumptions involved ultimately dictate what econometric model
should be used in the empirical application. Step 2 ties the random variable(s) driving
self-selection to the outcome variable Y .
2. The baseline Heckman selection model
2.1. The econometric model
Early corporate finance applications of self-selection are based on the model analyzed
in Heckman (1979). We spend some time developing this model because most other
specifications used in the finance literature can be viewed as extensions of the Heckman
model in various directions.
In the conventional perspective of self-selection, the key issue is that we have a regression such as equation (1) that is well specified for a population but it must be estimated
Ch. 2: Self-Selection Models in Corporate Finance
43
using sub-samples of firms that self-select into choice E. To estimate population parameters from self-selected subsamples, we first specify a self-selection mechanism. This
usually takes the form of a probit model in which firm i chooses E if the net benefit
from doing so, a scalar Wi , is positive. Writing the selection variable Wi as a function
of explanatory variables Zi , which are assumed for now to be exogenous,4 we have the
system
C = E ≡ Wi = Zi γ + ηi > 0,
(3)
Yi = Xi β + ϵ i ,
(5)
C = NE ≡ Wi = Zi γ + ηi ! 0,
(4)
where Zi denotes publicly known information influencing a firm’s choice, γ is a vector of probit coefficients, and ηi is orthogonal to public variables Zi . In the standard
model, Yi is observed only when a firm picks one of E or NE (but not both), so
equation (5) would require the appropriate conditioning. Assuming that ηi and ϵi are
bivariate normal, the likelihood function and the maximum likelihood estimators for
equations (3)–(5) follow, although a simpler two-step procedure (Heckman, 1979, and
Greene, 1981) is commonly used for estimation. Virtually all applied work is based on
the bivariate normal structure discussed above.
2.2. Self-selection and private information
In the above setup, self-selection is a nuisance problem. We model it because not doing so leads to inconsistent estimates of parameters β in regression (1). Self-selection
is, by itself, of little interest. However, this situation is frequently reversed in corporate finance, because tests for self-selection can be viewed as tests of private information theories. We develop this point in the context of the Heckman (1979) model
outlined above, but we emphasize that this private information interpretation is more
general.
We proceed as follows. Following a well-established tradition in econometrics, Section 2.2.1 presents selection as an omitted variable problem. Section 2.2.2 interprets
the omitted variable as a proxy for unobserved private information. Thus, including
the omitted self-selection variable controls for and tests for the significance of private
information in explaining ex-post outcomes of corporate finance choices.
2.2.1. Selection: An omitted variable problem
Suppose that firm i self-selects choice E. For firm i, we can take expectations of equation (5) and write
4 Thus, we preclude for now the possibility that Z includes the outcome variable Y . This restriction can be
relaxed at a cost, as we show in later sections.
44
K. Li and N.R. Prabhala
Yi |E = Xi β + (ϵi |Zi γ + ηi > 0)
= Xi β + π(ηi |Zi γ + ηi > 0) + νi .
(6)
(7)
Equation (7) follows from the standard result that ϵi |ηi = πηi + νi where π is the
coefficient in the regression of ϵi on ηi , and νi is an orthogonal zero-mean error term.5
Given the orthogonality and zero-mean properties of νi , we can take expectations of
equation (7) and obtain the regression model
E(Yi |E) = Xi β + πE(ηi |Zi γ + ηi > 0)
(8)
and a similar model for firms choosing not to announce E,
E(Yi |NE) = Xi β + πE(ηi |Zi γ + ηi ! 0).
(9)
Equations (8) and (9) can be compactly rewritten as
E(Yi |C) = Xi β + πλC (Zi γ )
(10)
where C ∈ {E, NE} and λC (.) is the conditional expectation of ηi given C. In particular,
if η and ϵ are bivariate normal, as is standard in the bulk of the applied work, λE (.) =
φ(.)
φ(.)
Φ(.) and λNE (.) = − 1−Φ(.) (Greene, 2003, p. 759).
A comparison of equations (1) and (10) clarifies why self-selection is an omitted
variable problem. In the population regression in equation (1), regressing outcome Y
on X consistently estimates β. However, in self-selected samples, consistent estimation requires that we include an additional variable, the inverse Mills ratio λC (.). Thus,
the process of correction for self-selection can be viewed as including an omitted variable.
2.2.2. The omitted variable as private information
In the probit model (3) and (4), ηi is the part of Wi not explained by public variables Zi .
Thus, ηi can be viewed as the private information driving the corporate financing decision being modeled. The ex-ante expectation of ηi should be zero, and it is so, given
that it has been defined as an error term in the probit model.
Ex-post after firm i selects C ∈ {E, NE}, the expectations of ηi can be updated. The
revised expectation, E(ηi |C), is thus an updated estimate of the firm’s private information. If we wished to test whether the private information in a firm’s choice affected
post-choice outcomes, we would regress outcome Y on E(ηi |C). But E(ηi |C) = λC (.)
is the inverse Mills ratio term that we add anyway to adjust for self-selection. Thus,
correcting for self-selection is equivalent to testing for private information. The omitted
variable used to correct for self-selection, λC (.), is an estimate of the private information
5 Note that π = ρ σ where ρ is the correlation between ϵ and η, and σ 2 is the variance of ϵ.
ηϵ ϵ
ηϵ
ϵ
Ch. 2: Self-Selection Models in Corporate Finance
45
underlying a firm’s choice and testing its significance is a test of whether private information possessed by a firm explains ex-post outcomes. In fact, a two-step procedure
most commonly used to estimate selection models follows this logic.6
Our main purpose of incorporating the above discussion of the Heckman model is
to highlight the dual nature of self-selection “corrections”. One can think of them as
a way of accounting for a statistical problem. There is nothing wrong with this view.
Alternatively, one can interpret self-selection models as a way of testing private information hypotheses, which is perhaps an economically more useful perspective of
selection models in corporate finance. Selection models are clearly useful if private information is one’s primary focus, but even if not, the models are useful as means of
controlling for potential private information effects.
2.3. Specification issues
Implementing selection models in practice poses two key specification issues: the need
for exclusion restrictions and the assumption that error terms are bivariate normal. While
seemingly innocuous, these issues, particularly the exclusion question, are often important in empirical applications, and deserve some comment.
2.3.1. Exclusion restrictions
In estimating equations (3)–(5), researchers must specify two sets of variables: those determining selection (Z) and those determining the outcomes (X). An issue that comes up
frequently is whether the two sets of variables can be identical. This knotty issue often
crops up in practice. For instance, consider the self-selection event E in equations (3)
and (4) as the decision to acquire a target and suppose that the outcome variable in
equation (5) is post-diversification productivity. Variables such as firm size or the relatedness of the acquirer and the target could explain the acquisition decision. The same
variables could also plausibly explain the ex-post productivity gains from the acquisition. Thus, these variables could be part of both Z and X in equations (3)–(5). Similar
arguments can be made for several other explanatory variables: they drive firms’ decision to self-select into diversification and the productivity gains after diversification. Do
we need exclusion restrictions so that there is at least one variable driving selection, an
instrument in Z that is not part of X?
Strictly speaking, exclusion restrictions are not necessary in the Heckman selection
model because the model is identified by non-linearity. The selection-adjusted outcome
regression (10) regresses Y on X and λC (Z ′ γ ). If λC (.) were a linear function of Z,
we would clearly need some variables in Z that are not part of X or the regressors
6 Step 1 estimates the probit model (3) and (4) to yield estimates of γ , say γ̂ , and hence the private infor-
mation function λC (Zi γ̂ ). In step 2, we substitute the estimated private information in lieu of its true value in
equation (10) and estimate it by OLS. Standard errors must be corrected for the fact that γ is estimated in the
second step, along the lines of Heckman (1979), Greene (1981), and Murphy and Topel (1985).
46
K. Li and N.R. Prabhala
would be collinear.7 However, under the assumption of bivariate normal errors, λC (.)
is a non-linear function. As Heckman and Navarro-Lozano (2004) note, collinearity
between the outcome regression function (here and usually the linear function Xi β) and
the selection “control” function λC (.) is not a generic feature, so some degree of nonlinearity will probably allow the specification to be estimated even when there are no
exclusion restrictions.
In practice, the identification issue is less clear cut. The problem is that while λC (.)
is a non-linear function, it is roughly linear in parts of its domain. Hence, it is entirely
possible that λC (Z ′ γ ) has very little variation relative to the remaining variables in
equation (10), i.e., X. This issue can clearly arise when the selection variables Z and
outcome variables X are identical. However, it is important to realize that merely having
extra instruments in Z may not solve the problem. The quality of the instruments also
matters. Near-multicollinearity could still arise when the extra instruments in Z are
weak and have limited explanatory power.
What should one do if there appears to be a multicollinearity issue? It is tempting
to recommend that the researcher impose additional exclusion restrictions so that selfselection instruments Z contain unique variables not spanned by outcome variables X.
Matters are, of course, a little more delicate. Either the exclusions make sense, in which
case these should have been imposed in the first place. Alternatively, the restrictions are
not reasonable, in which case it hardly makes sense to force them on a model merely
to make it estimable. In any event, as a practical matter, it seems reasonable to always
run diagnostics for multicollinearity while estimating selection models whether one imposes exclusion restrictions or not.
The data often offer one degree of freedom that can be used to work around particularly thorny cases of collinearity. Recall that the identification issue arises mainly
because of the 1/0 nature of the selection variable Wi , which implies that we do not
observe the error term ηi and we must take its expectation, which is the inverse Mills
ratio term. However, if we could observe the magnitude of the selection variable Wi , we
would introduce an independent source of variation in the selection correction term and
in effect observe the private information ηi itself and use it in the regression in lieu of
the inverse Mills ratio. Exclusion restrictions are no longer needed. This is often more
than just a theoretical possibility. For instance, in analyzing a sample of firms that have
received a bank loan, we do observe the bank loan amount conditional on a loan being
made. Likewise, in analyzing equity offerings, we observe the fact that a firm made an
equity offering and also the size of the offer. In hedging, we do observe (an estimate
of) the extent of hedging given that a firm has hedged. This introduces an independent
source of variation into the private information variable, freeing one from the reliance
on non-linearity for identification.
7 In this case, having a variable in X that is not part of Z does not help matters. If λ (.) is indeed linear, it
C
is spanned by X whenever Z is spanned by X. Thus, we require extra variables that explain the decision to
self-select but are unrelated to the outcomes following self-selection.
Ch. 2: Self-Selection Models in Corporate Finance
47
2.3.2. Bivariate normality
A second specification issue is that the baseline Heckman model assumes that errors
are bivariate normal. In principle, deviations from normality could introduce biases in
selection models, and these could sometimes be serious (for an early illustration, see
Goldberger, 1983). If non-normality is an issue, one alternative is to assume some specific non-normal distribution (Lee, 1983, and Maddala, 1983, Chapter 9.3). The problem
is that theory rarely specifies a particular alternative distribution that is more appropriate. Thus, whether one uses a non-normal distribution and the type of the distribution
should be used are often driven by empirical features of the data. One approach that
works around the need to specify parametric structures is to use semi-parametric methods (e.g., Newey, Powell and Walker, 1990). Here, exclusion restrictions are necessary
for identification.
Finance applications of non-normal selection models remain scarce, so it is hard at
this point of time to say whether non-normality is a first order issue deserving particular
attention in finance. In one application to calls of convertible bonds (Scruggs, 2006),
the data were found to be non-normal, but non-normality made little difference to the
major conclusions.
3. Extensions
We review two extensions of the baseline Heckman self-selection model, switching regressions and structural selection models. The first allows some generality in specifying
regression coefficients across alternatives, while the second allows bidirectional simultaneity between self-selection and post-selection outcomes.8 Each of these extensions
generalizes the Heckman model by allowing some flexibility in specification. However,
it should be emphasized that the additional flexibility that is gained does not come for
free. The price is that the alternative approaches place additional demands on the data or
require more stringent economic assumptions. The plausibility and feasibility of these
extra requirements should be carefully considered before selecting any alternative to the
Heckman model for a given empirical application.
3.1. Switching regressions
As in Section 2, a probit model based on exogenous variables drives firms’ self-selection
decisions. The difference is that the outcome is now specified separately for firms selecting E and NE, so the single outcome regression (5) in system (3)–(5) is now replaced
8 For instance, in modeling corporate diversification as a decision involving self-selection, structural models
would allow self-selection to determine post-diversification productivity changes, as in the standard setup, but
also allow anticipated productivity changes to impact the self-selection decision.
48
K. Li and N.R. Prabhala
by two regressions. The complete model is as follows:
C = E ≡ Zi γ + ηi > 0,
(11)
YE,i = XE,i βE + ϵE,i ,
(13)
C = NE ≡ Zi γ + ηi ! 0,
(12)
YNE,i = XNE,i βNE + ϵNE,i ,
(14)
where C ∈ {E, NE}. Along with separate outcome regression parameter vectors βE and
βNE , there are also two covariance coefficients for the impact of private information
on outcomes, the covariance between private information η and ϵE and that between
η and ϵNE . Two-step estimation is again straightforward, and is usually implemented
assuming that the errors {ηi , ϵE,i , ϵNE,i } are trivariate normal.9
Given the apparent flexibility in specifying two outcome regressions (13) and (14)
compared to the one outcome regression in the standard selection model, it is natural to
ask why we do not always use the switching regression specification. There are three
issues involved. First, theory should say whether there is a single population regression
whose LHS and RHS variables are observed conditional on selection, as in the Heckman
model, or whether we have two regimes in the population and the selection mechanism
dictates which of the two we observe. In some applications, the switching regression is
inappropriate: for instance, it is not consistent with the equilibrium modeled in Acharya
(1988). A second issue is that the switching regression model requires us to observe
outcomes of firms’ choices in both regimes. This may not always be feasible because we
only observe outcomes of firms self-selecting E but have little data on firms that choose
not to self-select. For instance, if we were analyzing stock market responses to merger
announcements as in Eckbo, Maksimovic and Williams (1990), implementing switching
models literally requires us to obtain a sample of would-be acquirers that had never
announced to the market and the market reaction on the dates that the markets realize
that there is no merger forthcoming. These data may not always be available (Prabhala,
1997).10 A final consideration is statistical power: imposing restrictions such as equality
of coefficients {β, π} for E and NE firms (when valid), lead to greater statistical power.
A key advantage of the switching regression framework is that we obtain more useful
estimates of (unobserved) counterfactual outcomes. Specifically, if firm i chooses E,
we observe outcome YE,i . However, we can ask what the outcome might have been had
9 Write equations (13) and (14) in regression form as
YC,i = XC,i βC + πC λC (Zi γ ),
(15)
where C ∈ {E, NE}. The two-step estimator follows: the probit model (11) and (12) gives estimates of γ and
hence the inverse Mills ratio λC (.), which is fed into regression (15) to give parameters {βE , βNE , πE , πNE }.
As before, standard errors in the second step regression require adjustment because λC (Z γ̂ ) is a generated
regressor (Maddala, 1983, pp. 226–227).
10 Li and McNally (2004) and Scruggs (2006) describe how we can use Bayesian methods to update priors
on counterfactuals. More details on their approach are given in Section 6.
Ch. 2: Self-Selection Models in Corporate Finance
49
firm i chosen NE, the unobserved counterfactual, and what the gain is from firm i’s
having made choice E rather than NE. The switching regression framework provides
an estimate. The net benefit from choosing E is the outcome of choosing E less the
counterfactual had it chosen NE, i.e., YE,i − YNE,i = YE,i − Xi βNE − πNE λNE (Zi γ ).
The expected gain for firm i is Xi (βE − βNE ) + (πE λE (.) − πNE λNE (.)).11 We return
to the counterfactuals issue when we deal with treatment effects and propensity scores.
We make this point at this stage only to emphasize that selection models do estimate
treatment effects. This fact is often not apparent when reading empirical applications,
especially those employing matching methods.
3.2. Simultaneity in self-selection models
The models considered thus far presume that the variables Z explaining the selfselection decision (equations (3) and (4) or equations (11) and (12)) are exogenous.
In particular, the bite of this assumption is to preclude the possibility that the decision to self-select choice C does not directly depend on the anticipated outcome from
choosing C. This assumption is sometimes too strong in corporate finance applications.
For instance, suppose we are interested in studying the diversification decision and that
the outcome variable to be studied is firm productivity. The preceding models would
assume that post-merger productivity does not influence the decision to diversify. If
firms’ decisions to diversify depend on their anticipated productivity changes, as theory
might suggest (Maksimovic and Phillips, 2002), the assumption that Z is exogenous is
incorrect.
The dependence of the decision to self-select on outcomes and the dependence of
outcomes on the self-selection decision is essentially a problem of simultaneity. Structural selection models can account for simultaneity. We review two modeling choices.
The Roy (1951) model places few demands on the data but it places tighter restrictions
on the mechanism by which self-selection occurs. More elaborate models are less stringent on the self-selection mechanism, but they demand more of the data, specifically
instruments, exactly as in conventional simultaneous equations models.
3.2.1. The Roy model
The Roy model hard-wires the dependence of self-selection on post-selection outcomes.
Firms self-select E or NE depending on which of the two alternatives yields the higher
outcome. Thus, {E, YE } is observed for firm i if YE,i > YNE,i . If, on the other hand,
11 This expression stands in contrast to the basic Heckman setup. There, in equation (9), β = β
NE and
E
πE = πNE , so the expected difference is π(λE (.) − λNE (.)). There, the sign of the expected difference is
fixed: it must equal to the sign of π because (λE (.) − λNE (.)) > 0. Additionally, the expected difference in
the setup of Section 2 does not vary with β or variables X that are not part of Z: here, it does. In short, the
counterfactual choices that could be made but were not are less constrained in the switching regression setup.
50
K. Li and N.R. Prabhala
YNE,i " YE,i , we observe {NE, YNE,i }. The full model is
C = E ≡ YE,i > YNE,i ,
(16)
YE,i = Xi βE + ϵE,i ,
(18)
C = NE ≡ YE,i ! YNE,i ,
YNE,i = Xi βNE + ϵNE,i ,
(17)
(19)
where the ϵ’s are (as usual) assumed to be bivariate normal. The Roy model is no more
demanding of the data than standard selection models. Two-step estimation is again
fairly straightforward (Maddala, 1983, Chapter 9.1).
The Roy selection mechanism is rather tightly specified on two dimensions. One,
the model exogenously imposes the restriction that firms selecting E would experience
worse outcomes had they chosen NE and vice versa. This is often plausible. However,
it is unclear whether this should be a hypothesis that one wants to test or a restriction
that one imposes on the data. Two, the outcome differential is the only driver of the
self-selection decision in the Roy setup. Additional flexibility can be introduced by
loosening the model of self-selection. This extra flexibility is allowed in models to be
described next, but it comes at the price of requiring additional exclusion restrictions
for model identification.
3.2.2. Structural self-selection models
In the standard Heckman and switching regression models, the explanatory variables in
the selection equation are exogenous. At the other end of the spectrum is the Roy model
of Section 3.2.1, in which self-selection is driven solely by the endogenous variable. The
interim case is one where selection is driven by both exogenous and outcome variables.
This specification is
C = E ≡ Zi γ + δ(YE,i − YNE,i ) + ηi > 0,
(20)
YE,i = Xi βE + ϵE,i ,
(22)
C = NE ≡ Zi γ + δ(YE,i − YNE,i ) + ηi ! 0,
YNE,i = Xi βNE + ϵNE,i .
(21)
(23)
The structural model generalizes the switching regression model of Section 3.1, by incorporating the extra explanatory variable YE,i − YNE,i , the net outcome gain from
choosing E over NE, in the selection decision, and generalizes the Roy model by permitting exogenous variables Zi to enter the selection equation. Estimation of the system
(20)–(23) follows the route one typically treads in simultaneous equations systems
estimation—reduced form estimation followed by a step in which we replace the dependent variables appearing in the RHS by their fitted projections. A trivariate normal
assumption is standard (Maddala, 1983, pp. 223–239). While structural self-selection
models have been around for a while in the labor economics literature, particularly
Ch. 2: Self-Selection Models in Corporate Finance
51
those studying unionism and the returns to education (see Maddala, 1983, Chapter 8),
applications in finance are of very recent origin.
The structural self-selection model clearly generalizes every type of selection model
considered before. The question is why one should not always use it. Equivalently, what
additional restrictions or demands does it place on the data? Because it is a type of the
switching regression model, it comes with all the baggage and informational requirements of the switching regression. As in simultaneous equations systems, instruments
must be specified to identify the model. In the diversification example at the beginning of this section, the identification requirement demands that we have at least one
instrument that determines whether a firm diversifies but does not determine the expost productivity of the diversifying firm. The quality of one’s estimates depends on
the strength of the instrument, and all the caveats and discussion of Section 2.3.1 apply
here.
4. Matching models and self-selection
This section reviews matching models, primarily those based on propensity scores.
Matching models are becoming increasingly common in applied work. They represent
an attractive means of inference because they are simple to implement and yield readily interpretable estimates of “treatment effects.” However, matching models are based
on fundamentally different set of assumptions relative to selection models. Matching
models assume that unobserved private information is irrelevant to outcomes. In contrast, unobserved private information is the essence of self-selection models. We discuss
these differences between selection and matching models as well as specific techniques
used to implement matching models.
To clarify the issues, consider the switching regression selection model of Section 3.1,
but relabel the choices to be consistent with the matching literature. Accordingly, firms
are treated and belong to group E or untreated and belong to group NE. This assignment
occurs according to the probit model
pr(E|Z) = pr(Zγ + η) > 0,
(24)
YE = XE βE + ϵE ,
(25)
where Z denotes explanatory variables, γ is a vector of parameters and we drop firm
subscript i for notational convenience. The probability of being untreated is 1−pr(E|Z).
We write post-selection outcomes as YE for treated firms and YNE for untreated firms,
and for convenience, write
YNE = XNE βNE + ϵNE ,
(26)
where (again suppressing subscript i) ϵC denotes error terms, XC denotes explanatory
variables, βC denotes parameter vectors, and C ∈ {E, NE}. We emphasize that the basic
setup is identical to that of a switching regression of Section 3.1.
52
K. Li and N.R. Prabhala
4.1. Treatment effects
Matching models focus on estimating treatment effects. A treatment effect, loosely
speaking, is the value added or the difference in outcome when a firm undergoes treatment E relative to not undergoing treatment, i.e., choosing NE. Selection models such as
the switching regression specification (equations (11)–(14)) estimate treatment effects.
Their approach is indirect. In selection models, we estimate a vector of parameters and
covariances in the selection equations and use these parameters to estimate treatment
effects. In contrast, matching models go directly to treatment effect estimation, setting
aside the step of estimating parameters of regression structures specified in selection
models.
The key question in the matching literature is whether treatment effects are significant. In the system of equations (24)–(26), this question can be posed statistically in a
number of ways.
• At the level of an individual firm i, the effectiveness of a treatment can be judged by
asking whether E(YE,i − YNE,i ) = 0.
• For the group of treated firms, the effectiveness of the treatment for treated firms is
assessed by testing whether the treatment effect on treated (TT), equals zero, i.e.,
whether E[(YE − YNE )|C = E] = 0.
• For the population as a whole whether treated or not, we test the significance of the
average treatment effect (ATE) by examining whether E(YE − YNE ) = 0.
The main issue in calculating any of the treatment effects discussed above, whether by
selection or matching models, is the fact that unchosen counterfactuals are not observed.
If a firm i chooses E, we observe outcome of its choice YE,i . However, because firm
i chose E, we do not explicitly observe the outcome YNE,i that would occur had the
firm instead made the counterfactual choice NE. Thus, the difference YE,i − YNE,i is
never directly observed for any particular firm i, so its expectation—whether at the
firm level, or across treated firms, or across treated and untreated firms—cannot be
calculated directly. Treatment effects can, however, be obtained via selection models
or by matching models, using different identifying assumptions. We discuss selection
methods first and then turn to matching methods.
4.2. Treatment effects from selection models
Self-selection models obtain treatment effects by first estimating parameters of the system of equations (24)–(26). Given the parameter estimates, it is straightforward to
estimate treatment effects described in Section 4.1, as illustrated, e.g., in Section 3.1
for the switching regression model. The key identifying assumption in selection models is the specification of the variables entering selection and outcome equations, i.e.,
variables X and Z in equations (24)–(26).
Two points deserve emphasis. The first is that the entire range of selection models discussed in Section 2 through Section 3.2 can be used to estimate treatment effects. This
point deserves special mention because in received corporate finance applications, the
Ch. 2: Self-Selection Models in Corporate Finance
53
tendency has been to report estimates of matching models and as a robustness check, an
accompanying estimate of a selection model. With virtually no exception, the selection
model chosen for the robustness exercise is the Heckman model of Section 2. However,
there is no a priori reason to impose this restriction—any other model, including the
switching regression models or the structural models, can be used, and perhaps ought to
at least get a hearing. The second point worth mentioning is that unlike matching models, selection models always explicitly test for and incorporate the effect of unobservable
private information, through the inverse Mills ratio term, or more generally, through
control functions that model private information (Heckman and Navarro-Lozano, 2004).
4.3. Treatment effects from matching models
In contrast to selection models, matching models begin by assuming that private information is irrelevant to outcomes.12 Roughly speaking, this is equivalent to imposing
zero correlation between private information η and outcome YE in equations (24)–(26).
Is irrelevance of private information a reasonable assumption? It clearly depends on
the specific application. The assumption is quite plausible if the decision to obtain treatment E is done through an exogenous randomization process. It becomes less plausible
when the decision to choose E is an endogenous choice of the decision-maker, which
is probably close to many corporate finance applications except perhaps for exogenous
shocks such as regulatory changes.13 If private information can be ignored, matching
methods offer two routes to estimate treatment effects: dimension-by-dimension matching and propensity score matching.
4.3.1. Dimension-by-dimension matching
If private information can be ignored, the differences in firms undergoing treatment E
and untreated NE firms only depend on observable attributes X. Thus, the treatment effect for any firm i equals the difference between its outcome and the outcome for a firm
j (i) that matches it on all observable dimensions, Formally, the treatment effect equals
Yi,E − Yj (i),NE , where j (i) is such that Xi,k = Xj (i),k for all K relevant dimensions,
i.e., ∀k, k = 1, 2, . . . , K. Other measures such as TT and ATE defined in Section 4.1
follow immediately.14
Dimension-by-dimension matching methods have a long history of usage in empirical
corporate finance, as explained in Chapter 1 (Kothari and Warner, 2007) in this book.
12 See, e.g., Wooldridge (2002) for formal expressions of this condition.
13 Of course, even here, if unobservable information guides company responses to such shocks, irrelevance
of unobservables is still not a good assumption.
14 One could legitimately ask why we need to match dimension by dimension when we have a regression
structure such as (25) and (26). The reason is that dimension-by-dimension matching is still consistent when
the data come from the regressions, but dimension-by-dimension matching is also consistent with other data
generating mechanisms. If one is willing to specify equations (25) and (26), the treatment effect is immediately obtained as the difference between the fitted values in the two equations.
54
K. Li and N.R. Prabhala
Virtually all studies routinely match on size, industry, the book-to-market ratio, and so
on. The “treatment effect” is the matched-pair difference in outcomes. There is nothing
inherently wrong with these methods. They involve the same economic assumptions
as other matching methods based on propensity scores used in recent applications. In
fact, dimension-by-dimension matching imposes less structure and probably represents
a reasonable first line of attack in typical corporate finance applications.
Matching on all dimensions and estimating the matched-pair differences in outcomes
poses two difficulties. One is that characteristics are not always exactly matched in corporate finance applications. For instance, we often match firm size or book-to-market
ratios with 30% calipers. When matches are inexact, substantial biases could build up
as we traverse different characteristics being matched. A second issue that proponents
of matching methods frequently mention is dimensionality. When the number of dimensions to be matched goes up and the matching calipers become fine (e.g., size and
prior performance matched within 5% rather than 30%, and 4-digit rather than 2-digit
SIC matches), finding matches becomes difficult or even infeasible. When dimensionby-dimension matching is not feasible, a convenient alternative is methods based on
propensity scores. We turn to these next.
4.3.2. Propensity score (PS) matching
Propensity score (PS) matching methods handle the problems caused by dimensionby-dimension matching by reducing it to a problem of matching on a single one: the
probability of undergoing treatment E. The probability of treatment is called the propensity score. Given a probability model such as equation (24), the treatment effect equals
the outcome for the treated firm minus the outcome for an untreated firm with equal
treatment probability. The simplicity of the estimator and its straightforward interpretation makes the propensity score estimator attractive.
It is useful to review the key assumptions underlying the propensity score method.
Following Rosenbaum and Rubin (1983), suppose that the probability model in equation (24) satisfies
• PS1: 0 < pr(E|Z) < 1.
• PS2: Given Z, outcomes YE , YNE do not depend on whether the firm is in group E
(NE).
Assumption (PS1) requires that at each level of the explanatory variable Z, some
firms should pick E and others pick NE. This constraint is frequently imposed in empirical applications by requiring that treated and untreated firms have common support.
Assumption (PS2) is the strong ignorability or conditional independence condition.
It requires that unobserved private information should not explain outcome differentials
between firms choosing E and those choosing NE. This is a crucial assumption. As
Heckman and Navarro-Lozano (2004) show, even fairly mild departures can trigger
substantial biases in treatment effect estimates.
Given assumptions (PS1) and (PS2), Rosenbaum and Rubin (1983) show that the
treatment effect is the difference between outcomes of treated and untreated firms hav-
Ch. 2: Self-Selection Models in Corporate Finance
55
ing identical treatment probabilities (or propensity scores). Averaging across different
treatment probabilities gives the average treatment effect across the population.15
4.3.3. Implementation of PS methods
In light of Rosenbaum and Rubin (1983), the treatment effect is the difference between
outcomes of treated and untreated firms with identical propensity scores. One issue
in implementing matching is that we need to know propensity scores, i.e., the treatment probability pr(E|Z). This quantity is not ex-ante known but it must be estimated
from the data, using, for instance, probit, logit, or other less parametrically specified
approaches. The corresponding treatment effects are also estimated with error and the
literature develops standard error estimates (e.g., Heckman, Ichimura and Todd, 1998;
Dehejia and Wahba, 1999; Wooldridge, 2002, Chapter 18).
A second implementation issue immediately follows. What variables do we include
in estimating the probability of treatment? While self-selection models differentiate between variables determining outcomes and variables determining probability of being
treated (X and Z, respectively, in equations (24)–(26)), matching models make no such
distinction. Roughly speaking, either a variable determines the treatment probability, in
which case it should be used in estimating treatment probability, or it does not, in which
case it should be randomly distributed across treated and untreated firms and is averaged out in computing treatment effects. Thus, for matching models, the prescription is
to use all relevant variables in estimating propensity scores.16
A third issue is estimation error. In principle, matching demands that treated firms
be compared to untreated firms with the same treatment probability. However, treatment probabilities must be estimated, so exact matching based on the true treatment
probability is usually infeasible. A popular approach, following Dehejia and Wahba
(1999), divides the data into several probability bins. The treatment effect is estimated
as the average difference between the outcomes of E and NE firms within each bin.
Heckman, Ichimura and Todd (1998) suggest taking the weighted average of untreated
firms, with weights declining inversely in proportion to the distance between the treated
and untreated firms. For statistical reasons, Abadie and Imbens (2004) suggest that the
counterfactual outcomes should be estimated not as the actual outcomes for a matched
untreated firm, but as the fitted value in a regression of outcomes on explanatory variables.17
15 This discussion points to another distinction between PS and selection methods. The finest level to which
PS methods can go is the propensity score or the probability of treatment. Because many firms can have the
same propensity score, PS methods do not estimate treatment effects at the level of the individual firm, while
selection methods can do so.
16 This statement is not, of course, a recommendation to engage in data snooping. For instance, in fitting
models to estimate propensity scores, using quality of fit as a model selection criterion leads to difficulties, as
pointed out by Heckman and Navarro-Lozano (2004).
17 The statistical properties of different estimators has been extensively discussed in the econometrics literature, most recently in a review issue devoted to the topic (Symposium on the Econometrics of Matching,
Review of Economics and Statistics 86 (1), 2004).