Omitted variable bias

👀 271 просмотр
📌 246 загрузок

Выбери формат для чтения

Конспект лекции по дисциплине «Omitted variable bias», pdf

Загружаем конспект в формате pdf

Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇

Конспект лекции по дисциплине «Omitted variable bias», Word формат

LINEAR REGRESSION WITH MULTIPLE REGRESSORS Omitted Variable Bias Omitted Variable Bias   If the regressor is correlated with a variable that has been omitted from the analysis and that determines, in part, the dependent variable, then OLS estimator will have omitted variable bias. Conditions:  The omitted variable is correlated with the included regressor,  The omitted variable is a determinant of the dependent variable. V. Ozolina, Econometrics Examples (Import of Pharmaceutical Products Depending on Wages)  Price level  Both  conditions hold. Weather conditions, for example, rainfall  Firs condition does not hold,  Second condition holds.  Minimal wages  First condition holds,  Second condition does not hold. V. Ozolina, Econometrics The Mozart Effect      A study published in Nature in 1993 (Rauscher, Shaw and Ky) suggested that listening to Mozart for 10-15 minutes could temporarily raise you IQ by 8 or 9 points. What is the evidence for the “Mozart effect”? A review of dozens of studies found that students who take optional music or arts courses in high school do in fact have higher English and math test scores than those who don’t. By omitting factors such as the student’s innate ability or the overall quality of the school, studying music appears to have an effect on test scores when in fact it has none, which leads to omitted variable bias. Randomized controlled experiment can be used to prove “Mozart effect”. Taken together, the many controlled experiments on the Mozart effect fail to show that listening to Mozart improves IQ or general test performance. However, it seems that listening to classical music does help temporarily in one narrow area: folding paper and visualising shapes. V. Ozolina, Econometrics Multiple Regression Model  This model permits estimating the effect on dependent variable (Yi) of changing one variable (X1i) while holding the other regressors (X2i, X3i etc.) constant.      Initial equation: Y = β0 + β1X1 + β2X2 Effect on Y of a change in X1: Y + ΔY = β0 + β1(X1 + ΔX1) + β2X2 By subtracting the 2nd equation from the 1st one, yields: ΔY = β1ΔX1 That is: , holding X2 constant The coefficient β1 is the effect on Y (the expected change in Y) of a unit change in X1 holding all other X values fixed. V. Ozolina, Econometrics Multiple Regression Model Yi = β0 + β1X1i + β2X2i + ... + βkXki + ui where Y – dependent variable X – regressors or factors (some of them – control variables) u – deviation or error term β0 – constant βki – slope coefficients of X-es k – number of regressors i – observation index  V. Ozolina, Econometrics Estimation of Model Coefficients   Using the ordinary least squares method – deviations are minimized In Excel you can use the same functions and tools as in case of single regression, by selecting values of several X-es instead of 1. V. Ozolina, Econometrics Ordinary Least Squares OLS  IMP_FARMAC_SA = 64.758*W_NET_SA + + 3313.8 + u1  Increase of wages by 1 unit results in increase of import of pharmaceutical products by 64.758 units  IMP_FARMAC_SA = 42.99*W_NET_SA + + 53.4*PI_VES_SA – 523.9 + u2  Increase of wages by 1 unit results in increase of import of pharmaceutical products by 42,99 units, holding values of all other factors constant V. Ozolina, Econometrics The Least Squares Assumptions in Multiple Regression     The conditional distribution of ui given Xki has a mean of zero (X1i, X2i, ..., Xki,Yi), i = 1, ..., n are independently and identically distributed Large outliers are unlikely No perfect multicollinearity V. Ozolina, Econometrics Imperfect Multicollinearity  Solution for the system of normal equations exists, however, increase in correlation between factors, results in the loss of:     Accuracy of estimated values, Statistical significance of factors Allowable level of multicollinearity can be determined by multi-step analysis and hypothesis testing (t-statistics). Preliminary evaluation can be made using inequalities:    roi > rij, roj > rij, where o – index of dependent variable, i, j – index of factors. V. Ozolina, Econometrics Correlation Coefficients for All Pairs of Several Indicators Correlation Matrix  In EViews: Group  Correlation  Common Sample  In Excel: Data  Data Analysis  Correlation V. Ozolina, Econometrics Correlation Matrix in Excel  Select all the data  Orientation of data If labels (names) of indicators are selected  Location of results  V. Ozolina, Econometrics Example: Correlation Matrix in EViews ... IMP_FARMAC_SA W_NET_SA PI_SA PI_VES_SA PROC_KRED_SA NOKR_SA TEMP_SA W_MIN IMP_FAR PI_VES_S PROC_KRE MAC_SA W_NET_SA PI_SA A D_SA NOKR_SA TEMP_SA W_MIN 1 0.796991 0.807615 0.762608 0.20116 0.133749 0.08047 0.793392 0.79699 1 0.940375 0.834961 0.438014 0.057657 0.165431 0.890052 0.80762 0.940375 1 0.944571 0.263623 0.06616 0.120771 0.982155 0.76261 0.834961 0.944571 1 0.275302 0.06653 0.049224 0.95306 0.20116 0.438014 0.263623 0.275302 1 -0.07148 -0.05842 0.203297 0.13375 0.057657 0.06616 0.06653 -0.07148 1 0.011074 0.066816 0.08047 0.165431 0.120771 0.049224 -0.05842 0.011074 1 0.110778 0.79339 0.890052 0.982155 0.95306 0.203297 0.066816 0.110778 1 V. Ozolina, Econometrics OLS Assumptions      A1: E(ui) = 0 Expected/ average value of the error term is 0 A2: Var(ui) = σ2 Variation of the error is constant and finite (homoscedasticity) A3: Cov(ui,uj) = 0 Errors are statistically independent (no autocorrelation) A4: Cov(ui,Xi) = 0 Variations of the error term and X are not related A5: ut is normally distributed V. Ozolina, Econometrics Vienādojumu novērtēšana/ Specification of Equations  Excel: Data  Data Analysis  Eviews: Object  New object  Equation V. Ozolina, Econometrics Example in Excel Data  Data Analysis  Regression  Data in columns  Several x indicators should be next to each other  Advisable to select labels  Can change confidence level  V. Ozolina, Econometrics Vienādojumu novērtēšana/ Specification of Equations  Excel: Data  Data Analysis  Regression V. Ozolina, Econometrics Vienādojumu novērtēšana/ Specification of Equations  Excel: Data  Data Analysis  Regression SUMMARY OUTPUT Regression Statistics Multiple R 0.81631 R Square 0.666361 Adjusted R Square 0.658692 Standard Error 2999.754 Observations 90 ANOVA df 2 87 89 SS 1.56E+09 7.83E+08 2.35E+09 MS 7.82E+08 8998527 Coefficients -523.902 42.9936 53.39734 Standard Error 2015.986 9.143424 18.73079 t Stat -0.25987 4.702133 2.850779 Predicted IMP_FARMAC_SA 1 13610.3 2 13858.26 Residuals -2281.42 -3701.55 Regression Residual Total Intercept W_NET_SA PI_VES_SA F Significance F 86.88062 1.83E-21 P-value 0.795575 9.6E-06 0.005445 Lower 95% -4530.89 24.82006 16.16787 RESIDUAL OUTPUT Observation V. Ozolina, Econometrics Upper 95% 3483.089 61.16715 90.62682 Lower 95.0% -4530.89 24.82006 16.16787 Upper 95.0% 3483.089 61.16715 90.62682 Testing if OLS Assumptions Hold    A1: E(ui) = 0 Average value of the error term is zero – it is important, because we do not know the exact errors A1 will always hold, if we use constant term in the equation V. Ozolina, Econometrics Testing if OLS Assumptions Hold  A2: Var(ui) = σ2 < ∞ Homoscedasticity  The opposite situation - heteroscedasticity  V. Ozolina, Econometrics Testing if OLS Assumptions Hold  Var(ui) = σ2 < ∞ - homoscedasticity  How to test: analysis – correlation diagram of the error terms and forecasts  Using White’s test  Graphical  H0: assumption holds, Ha: does not hold > 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 ℎ𝑜𝑙𝑑𝑠  𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑛 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 ℎ𝑜𝑙𝑑 V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.81631 R Square 0.666361 Adjusted R Square 0.658692 Standard Error 2999.754 Observations 90 ANOVA df 2 87 89 SS 1.56E+09 7.83E+08 2.35E+09 MS 7.82E+08 8998527 Coefficients -523.902 42.9936 53.39734 Standard Error 2015.986 9.143424 18.73079 t Stat -0.25987 4.702133 2.850779 Predicted IMP_FARMAC_SA 1 13610.3 2 13858.26 Residuals -2281.42 -3701.55 Regression Residual Total Intercept W_NET_SA PI_VES_SA F Significance F 86.88062 1.83E-21 P-value 0.795575 9.6E-06 0.005445 Lower 95% -4530.89 24.82006 16.16787 RESIDUAL OUTPUT Observation V. Ozolina, Econometrics Upper 95% 3483.089 61.16715 90.62682 Lower 95.0% -4530.89 24.82006 16.16787 Upper 95.0% 3483.089 61.16715 90.62682 Testing if OLS Assumptions Hold: Example  Graphical analysis in Excel  Is the spread of the error term increasing or decreasing? V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  White’s test in EViews V. Ozolina, Econometrics OLS pieņēmumu pārbaude: piemērs p-value = 0.14  0.14 > 0.05  H0 is accepted  Error term is homoscedastic  White Heteroskedasticity Test: F-statistic Obs*R-squared 2.070058 7.989054 Prob. F(4,85) Prob. Chi-Square(4) 0.091834 0.091980 Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 10/19/15 Time: 10:55 Sample: 2005M01 2012M06 Included observations: 90 Variable Coefficient Std. Error t-Statistic Prob. C W_NET_SA W_NET_SA^2 PI_VES_SA PI_VES_SA^2 1.52E+08 -135271.7 299.9393 -1559383. 4481.282 2.67E+08 473635.1 750.6340 3301836. 8231.547 0.567902 -0.285603 0.399581 -0.472277 0.544403 0.5716 0.7759 0.6905 0.6379 0.5876 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.088767 0.045886 17752977 2.68E+16 -1627.418 2.312690 V. Ozolina, Econometrics Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 8698576. 18174857 36.27596 36.41484 2.070058 0.091834 Testing if OLS Assumptions Hold   A3: Cov(ui,uj) = 0 There is no autocorrelation (spatial correlation) in the error term V. Ozolina, Econometrics Testing if OLS Assumptions Hold  Cov(ui,uj) = 0  How to test: - no autocorrelation Graphical analysis  Durbin-Watson statistic,  Breusch-Godfrey or serial correlation LM test  H0: assumption holds, Ha: does not hold > 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 ℎ𝑜𝑙𝑑𝑠  𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑛 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 ℎ𝑜𝑙𝑑  V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Excel Intercept Coefficients -523.902 W_NET_SA 42.9936 PI_VES_SA 53.39734 Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 2015.986 -0.25987 0.795575 -4530.89 3483.089 -4530.89 3483.089 4.70213 9.143424 3 9.6E-06 24.82006 61.16715 24.82006 61.16715 2.85077 18.73079 9 0.005445 16.16787 90.62682 16.16787 90.62682 RESIDUAL OUTPUT Observation 1 2 3 4 5 Predicted IMP_FARMAC_SA Residuals Res (-1) 13610.3 -2281.42 13858.26 -3701.55 -2281.42 14106.29 -1160.25 -3701.55 14552.45 -1820.19 -1160.25 14735.11 -2499.34 -1820.19 V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Graphical tests in Excel   Do errors group in a circle? V. Ozolina, Econometrics Is there a trend? Testing if OLS Assumptions Hold  Durbin-Watson test      Formula of the Durbin-Watson statistics: 𝑛 2 𝑢 − 𝑢 𝑖 𝑖−1 𝑖=2 𝑑= 𝑛 2 𝑢 𝑖=1 𝑖 0 < d < 4, the closer to 2, the better. If d < 2:     H0 : Cov(ui,uj) = 0 H1 : Cov(ui,uj) > 0 If d < dL: H0 is rejected – autocorrelation exists If d > dU: H0 is accepted – autocorrelation does not exist If dL < d < dU: we cannot accept or reject H0 If d > 2, then instead of d we use (4 – d) V. Ozolina, Econometrics Table of Critical Values V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Durbin-Watson test in Excel (+lag of residuals) RESIDUAL OUTPUT 1 2 3 4 5 6 Predicted Observation IMP_FARMAC_SA Residuals Residual (t-1) ([3]-[4])^2 [3]^2 1 13610.3 -2281.42 5204862.21 2 13858.26 -3701.55 -2281.42 2016767.1 13701442.5 90 25704.63 -388.486 -6166.36 33383823.4 150921.557 Sum 1218287337 777666992 DW = (sum([3]-[4])^2) : (sum[3]^2) 1.56659258 V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Durbin-Watson test in Excel (+lag of residuals) RESIDUAL OUTPUT 1 2 3 4 5 6 Predicted Observation IMP_FARMAC_SA Residuals Residual (t-1) ([3]-[4])^2 [3]^2 1 13610.3 -2281.42 5204862.21 2 13858.26 -3701.55 -2281.42 2016767.1 13701442.5 90 25704.63 -388.486 -6166.36 33383823.4 150921.557 Sum 1218287337 777666992 DW = (sum([3]-[4])^2) : (sum[3]^2) 1.56659258    DW statistics = 1.6 From the table: 90 observations, 3 coefficients, significance level 0.05  dL = 1.612, dU = 1.703 1.57 < 1.612  d < dL  autocorrelation exists V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example Durbin-Watson test in EViews  From the table: 90 observations, 3 coefficients, significance level 0.05  dL = 1.612, dU = 1.703  1.56 < 1.612  d < dL  autocorrelation exists  V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Serial Correlation LM Test in EViews V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Serial Correlation LM Test in EViews p-value = 0.0076  0.008 < 0.05  H0 is rejected autocorrelation exists  V. Ozolina, Econometrics Testing if OLS Assumptions Hold  Cov(ui,uj) = 0 - no autocorrelation  What to do, if the assumption does not hold:  Find the missing factors,  Check the functional form,  Specify a dynamic model by introducing the lagged (past) values of y  We have to add lags until the problem is solved V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Adding lags in EViews V. Ozolina, Econometrics Testing if OLS Assumptions Hold: Example  Adding lags in EViews LM test:  p-value = 0.15  0.15 > 0.05  autocorrelation problem is solved  V. Ozolina, Econometrics Evaluation of the Quality of the Equation   Measures of Fit Hypothesis testing for coefficients V. Ozolina, Econometrics Measures of Fit  The standard error of regression (SER) estimates the standard deviation of the error term ui: where k – the number of coefficients in the equation  R2 – determination coefficient is the fraction of the sample variance of Y explained by (or predicted by) the regressors X. . V. Ozolina, Econometrics Measures of Fit  R2 characterises the forecasting ability or «explanation ability» of the equation, but: An increase in the value of R2 does not necessarily mean that an added variable is statistically significant.  A high value of R2 does not mean that the regressors are a true cause of the dependent variable.  A high value of R2 does not mean there is no omitted variable bias.  A high value of R2 does not necessarily mean you have the most appropriate set of regressors, nor does a low value necessarily mean you have an inappropriate set of regressors.  V. Ozolina, Econometrics Measures of Fit     Adjusted R2 ( ) – does not necessarily increase when a new regressor is added. The factor (n-1)/(n-k-1) will always be larger than 1  Ṝ2 will always be smaller than R2 Adding a regressor has 2 opposite effects on the Ṝ2. Ṝ2 can be negative V. Ozolina, Econometrics Measures of Fit: Example  In Excel: Data Analysis Regression Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.81631 0.666361 0.658692 2999.754 90 V. Ozolina, Econometrics Measures of Fit: Example  In EViews V. Ozolina, Econometrics Hypothesis Tests for a Single Coefficient     Compute the standard error for each coefficient 𝛽𝑗 − 𝛽𝑗,0 Compute the t-statistic 𝑡= 𝑆𝐸(𝛽𝑗 ) Compute/obtain p-value Hypothesis can be rejected at 5% significance level, if:  p-value is less than 0.05 or  |calculated value of t-stat| > critical value of t-stat V. Ozolina, Econometrics Confidence Intervals for a Single Coefficient    A 95% two-sided confidence interval for the coefficient βj is an interval than contains the true value of βj with a 95% probability Confidence interval is the set of values of βj which cannot be rejected by a two-sided hypothesis test. 𝛽𝑗 ∓ 𝑡𝑘𝑟𝑖𝑡 𝑆𝐸 𝛽𝑗 V. Ozolina, Econometrics Hypothesis Tests for a Single Coefficient: Example  In Excel: Data Analysis Regression Intercept W_NET_SA PI_VES_SA Coefficients -523.902 42.9936 53.39734 Intercept W_NET_SA PI_VES_SA Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 2015.986 -0.25987 0.795575 -4530.89 3483.089 -4530.89 3483.089 9.143424 4.702133 9.6E-06 24.82006 61.16715 24.82006 61.16715 18.73079 2.850779 0.005445 16.16787 90.62682 16.16787 90.62682 Coefficients Standard Error t Stat P-value -523.902 2015.986 -0.25987 0.795575 42.9936 9.143424 4.702133 9.6E-06 53.39734 18.73079 2.850779 0.005445 Lower 95% Upper 95% -4530.89 3483.089 24.82006 61.16715 16.16787 90.62682 V. Ozolina, Econometrics Example in Eviews  Dependent Variable: IMP_FARMAC_SA Method: Least Squares Date: 10/03/14 Time: 23:05 Sample (adjusted): 2006M01 2013M06 Included observations: 90 after adjustments Variable Coefficient Std. Error t-Statistic Prob. W_NET_SA PI_VES_SA C 42.99360 53.39734 -523.9017 9.143424 18.73079 2015.986 4.702133 2.850779 -0.259874 0.0000 0.0054 0.7956 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.666361 0.658692 2999.754 7.83E+08 -846.7446 1.556177 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) V. Ozolina, Econometrics 22193.36 5134.666 18.88321 18.96654 86.88062 0.000000 Tests of Joint Hypothesis       Null hypothesis: H0: β1 = 0 and β2 = 0 Vs the alternative hypothesis H1: β1 ≠ 0 and/or β2≠ 0 The same in general form: Null hypothesis: H0: βj = βj,0, βm = βm,0, ..., for a total of q restrictions Vs alternative hypothesis H1: one or more of the q restrictions under H0 does not hold, where βj, βm, ... – refer to different regression coefficients and βj,0, βm,0, ... – refer to the values of these coefficients under the null hypothesis V. Ozolina, Econometrics Several t-statistics?     Null hypothesis is not rejected only if both Pr(|t1|≤|tcrit| and |t1|≤|tcrit|) hold Because the t-statistics are independent, we have to multiply the probabilities Pr(|t1|≤|tcrit| x |t1|≤|tcrit|) = 0,952 = 0,9025 = 90.25% So the probability of rejecting the null hypothesis when it is true is 1 – 0.952 = 9.75% V. Ozolina, Econometrics F-statistics  If there are 2 factors in equation  where - is an estimator of the correlation between the two t-statistics  Degrees of freedom to obtain critical F-stat values (in Excel – function FINV) – k numerator and (n-k-1) denominator V. Ozolina, Econometrics F-statistics: Example  In Excel: Data Analysis Regression ANOVA df Regression Residual Total  SS MS F Significance F 2 1563595260 781797630 86.8806222 0.00000000 87 782871854 8998527.1 89 2346467114 =FINV(0.05;2;87) = 3.1 V. Ozolina, Econometrics n2\n1 87 2 3.1 Example in EViews Dependent Variable: IMP_FARMAC_SA Method: Least Squares Date: 10/03/14 Time: 23:05 Sample (adjusted): 2006M01 2013M06 Included observations: 90 after adjustments Variable Coefficient Std. Error t-Statistic Prob. W_NET_SA PI_VES_SA C 42.99360 53.39734 -523.9017 9.143424 18.73079 2015.986 4.702133 2.850779 -0.259874 0.0000 0.0054 0.7956 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.666361 0.658692 2999.754 7.83E+08 -846.7446 1.556177 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 22193.36 5134.666 18.88321 18.96654 86.88062 0.000000 V. Ozolina, Econometrics Multi-Step Analysis       At the beginning we choose comparatively large number of factors (6 - 14) Then estimate regression equation and calculate values of t-statistics for coefficients Find the smallest value of t-stat and compare it with the critical value of t-stat If |tmin| < tcrit, only that 1 factor is excluded from the equation Estimate the equation once again and compute the values of t-stat for all coefficients If |tmin| > tcrit, optimal number of factors is found V. Ozolina, Econometrics Report and presentation          Chosen data, data type, sample size (number of observations), problem, its topicality, motivation of choice etc. Graphical analysis of the data (dynamics, correlation diagram) Values of covariation and correlation coefficients, their interpretation (single regression) Results of OLS assumptions analysis Values of the coefficient of determination (R2) and standard deviation, their interpretation Hipothesys tests of coefficients bi and their 95% confidence interval Hipothesys test of equation (not in single regression) Analysis of alternative equations (if made) Conclusions – about the quality of equation, its possible application etc. V. Ozolina, Econometrics

Авторы лекции