This illustrates one of the important features of propensity score methods, namely that the creation of subsamples from the non-experimental comparison group is neither necessary nor desirable, because subsamples created based on single pre-intervention characteristics may dispose of comparison units which nonetheless are good overall comparisons with treatment units. The propensity score sorts out which comparison units are most relevant considering all of the pre-intervention characteristics, not just one characteristic at a time.
Column (3) in Table 3 gives an important insight into how the estimators in columns (4) to (8) succeed in estimating the treatment effect accurately. In column (3) we regress the outcome (earnings in 1978) on a quadratic function of the estimated propensity score and a treatment indicator. The estimates are comparable to those in column (2), where we regress the outcome on all pre-intervention characteristics. This again demonstrates the ability of the propensity score to summarize all pre-intervention variables. The estimators in columns (4) to (8) differ from column (3) in two respects. First, their functional form is more flexible than a low-order polynomial in the estimated propensity score. Second, rather than requiring a constant additive treatment effect, they allow the treatment effect to vary within each stratum (for stratification) or for each individual (for matching). Click Here
Finally, it must be noted that even though the estimates presented in Table 3 are closer to the experimental benchmark than those presented in Table 2, with the exception of the adjusted matching estimator, their standard errors are higher: in Table 3, column (5), the standard errors are 1,152 and 1,581 for the CPS and PSID, compared with 550 and 886 in Table 2, column (5). This is because the propensity score estimators use fewer observations. When stratifying on the propensity score, we discard irrelevant controls, and so the strata may contain as few as seven treated observations. However, the standard errors for the adjusted matching estimator (751 and 809) are similar to those in Table 2.
By summarizing all of the covariates in a single number, the propensity score method allows us to focus on the comparability of the comparison group to the treatment group. Hence, it allows us to address the issues of functional form and treatment effect heterogeneity much more easily.