Omitted variable bias
Выбери формат для чтения
Загружаем конспект в формате pdf
Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇
LINEAR REGRESSION
WITH MULTIPLE
REGRESSORS
Omitted Variable Bias
Omitted Variable Bias
If the regressor is correlated with a variable
that has been omitted from the analysis and
that determines, in part, the dependent
variable, then OLS estimator will have omitted
variable bias.
Conditions:
The
omitted variable is correlated with the
included regressor,
The omitted variable is a determinant of the
dependent variable.
V. Ozolina, Econometrics
Examples (Import of Pharmaceutical
Products Depending on Wages)
Price level
Both
conditions hold.
Weather conditions, for example, rainfall
Firs
condition does not hold,
Second condition holds.
Minimal wages
First
condition holds,
Second condition does not hold.
V. Ozolina, Econometrics
The Mozart Effect
A study published in Nature in 1993 (Rauscher, Shaw and Ky) suggested
that listening to Mozart for 10-15 minutes could temporarily raise you IQ by
8 or 9 points.
What is the evidence for the “Mozart effect”? A review of dozens of studies
found that students who take optional music or arts courses in high school
do in fact have higher English and math test scores than those who don’t.
By omitting factors such as the student’s innate ability or the overall quality
of the school, studying music appears to have an effect on test scores
when in fact it has none, which leads to omitted variable bias.
Randomized controlled experiment can be used to prove “Mozart effect”.
Taken together, the many controlled experiments on the Mozart effect fail to
show that listening to Mozart improves IQ or general test performance.
However, it seems that listening to classical music does help temporarily in
one narrow area: folding paper and visualising shapes.
V. Ozolina, Econometrics
Multiple Regression Model
This model permits estimating the effect on dependent
variable (Yi) of changing one variable (X1i) while
holding the other regressors (X2i, X3i etc.) constant.
Initial equation: Y = β0 + β1X1 + β2X2
Effect on Y of a change in X1:
Y + ΔY = β0 + β1(X1 + ΔX1) + β2X2
By subtracting the 2nd equation from the 1st one, yields:
ΔY = β1ΔX1
That is:
, holding X2 constant
The coefficient β1 is the effect on Y (the expected
change in Y) of a unit change in X1 holding all other X
values fixed.
V. Ozolina, Econometrics
Multiple Regression Model
Yi = β0 + β1X1i + β2X2i + ... + βkXki + ui
where
Y – dependent variable
X – regressors or factors (some of them –
control variables)
u – deviation or error term
β0 – constant
βki – slope coefficients of X-es
k – number of regressors
i – observation index
V. Ozolina, Econometrics
Estimation of Model Coefficients
Using the ordinary least squares method –
deviations are minimized
In Excel you can use the same functions and
tools as in case of single regression, by
selecting values of several X-es instead of 1.
V. Ozolina, Econometrics
Ordinary Least Squares OLS
IMP_FARMAC_SA = 64.758*W_NET_SA +
+ 3313.8 + u1
Increase
of wages by 1 unit results in increase of
import of pharmaceutical products by 64.758 units
IMP_FARMAC_SA = 42.99*W_NET_SA +
+ 53.4*PI_VES_SA – 523.9 + u2
Increase
of wages by 1 unit results in increase of
import of pharmaceutical products by 42,99 units,
holding values of all other factors constant
V. Ozolina, Econometrics
The Least Squares Assumptions in
Multiple Regression
The conditional distribution of ui given Xki has
a mean of zero
(X1i, X2i, ..., Xki,Yi), i = 1, ..., n are
independently and identically distributed
Large outliers are unlikely
No perfect multicollinearity
V. Ozolina, Econometrics
Imperfect Multicollinearity
Solution for the system of normal equations exists, however,
increase in correlation between factors, results in the loss of:
Accuracy of estimated values,
Statistical significance of factors
Allowable level of multicollinearity can be determined by
multi-step analysis and hypothesis testing (t-statistics).
Preliminary evaluation can be made using inequalities:
roi > rij,
roj > rij,
where o – index of dependent variable, i, j – index of factors.
V. Ozolina, Econometrics
Correlation Coefficients for All
Pairs of Several Indicators
Correlation Matrix
In EViews: Group Correlation Common
Sample
In Excel: Data Data Analysis Correlation
V. Ozolina, Econometrics
Correlation Matrix in Excel
Select all the data
Orientation of data
If labels (names)
of indicators
are selected
Location of
results
V. Ozolina, Econometrics
Example: Correlation Matrix in
EViews ...
IMP_FARMAC_SA
W_NET_SA
PI_SA
PI_VES_SA
PROC_KRED_SA
NOKR_SA
TEMP_SA
W_MIN
IMP_FAR
PI_VES_S
PROC_KRE
MAC_SA W_NET_SA PI_SA
A
D_SA
NOKR_SA TEMP_SA W_MIN
1
0.796991 0.807615
0.762608
0.20116
0.133749
0.08047
0.793392
0.79699
1 0.940375
0.834961
0.438014
0.057657
0.165431
0.890052
0.80762
0.940375
1
0.944571
0.263623
0.06616
0.120771
0.982155
0.76261
0.834961 0.944571
1
0.275302
0.06653
0.049224
0.95306
0.20116
0.438014 0.263623
0.275302
1
-0.07148
-0.05842
0.203297
0.13375
0.057657
0.06616
0.06653
-0.07148
1
0.011074
0.066816
0.08047
0.165431 0.120771
0.049224
-0.05842
0.011074
1
0.110778
0.79339
0.890052 0.982155
0.95306
0.203297
0.066816
0.110778
1
V. Ozolina, Econometrics
OLS Assumptions
A1: E(ui) = 0
Expected/ average value of the
error term is 0
A2: Var(ui) = σ2 Variation of the error is constant
and finite (homoscedasticity)
A3: Cov(ui,uj) = 0
Errors are statistically
independent
(no autocorrelation)
A4: Cov(ui,Xi) = 0
Variations of the error term
and X are not related
A5: ut is normally distributed
V. Ozolina, Econometrics
Vienādojumu novērtēšana/
Specification of Equations
Excel: Data Data Analysis
Eviews: Object New object Equation
V. Ozolina, Econometrics
Example in Excel
Data Data Analysis Regression
Data in columns
Several x indicators
should be next to
each other
Advisable to select
labels
Can change
confidence level
V. Ozolina, Econometrics
Vienādojumu novērtēšana/
Specification of Equations
Excel: Data Data Analysis Regression
V. Ozolina, Econometrics
Vienādojumu novērtēšana/
Specification of Equations
Excel: Data Data Analysis Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.81631
R Square
0.666361
Adjusted R Square
0.658692
Standard Error
2999.754
Observations
90
ANOVA
df
2
87
89
SS
1.56E+09
7.83E+08
2.35E+09
MS
7.82E+08
8998527
Coefficients
-523.902
42.9936
53.39734
Standard Error
2015.986
9.143424
18.73079
t Stat
-0.25987
4.702133
2.850779
Predicted
IMP_FARMAC_SA
1
13610.3
2
13858.26
Residuals
-2281.42
-3701.55
Regression
Residual
Total
Intercept
W_NET_SA
PI_VES_SA
F
Significance F
86.88062
1.83E-21
P-value
0.795575
9.6E-06
0.005445
Lower 95%
-4530.89
24.82006
16.16787
RESIDUAL OUTPUT
Observation
V. Ozolina, Econometrics
Upper 95%
3483.089
61.16715
90.62682
Lower 95.0%
-4530.89
24.82006
16.16787
Upper 95.0%
3483.089
61.16715
90.62682
Testing if OLS Assumptions Hold
A1: E(ui) = 0
Average value of the error term is zero – it is
important, because we do not know the exact
errors
A1 will always hold, if we use constant term in
the equation
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold
A2: Var(ui) = σ2 < ∞
Homoscedasticity
The opposite situation - heteroscedasticity
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold
Var(ui) = σ2 < ∞ - homoscedasticity
How to test:
analysis – correlation diagram of the
error terms and forecasts
Using White’s test
Graphical
H0:
assumption holds, Ha: does not hold
> 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 ℎ𝑜𝑙𝑑𝑠
𝑝 − 𝑣𝑎𝑙𝑢𝑒
< 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑛 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 ℎ𝑜𝑙𝑑
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Excel
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.81631
R Square
0.666361
Adjusted R Square
0.658692
Standard Error
2999.754
Observations
90
ANOVA
df
2
87
89
SS
1.56E+09
7.83E+08
2.35E+09
MS
7.82E+08
8998527
Coefficients
-523.902
42.9936
53.39734
Standard Error
2015.986
9.143424
18.73079
t Stat
-0.25987
4.702133
2.850779
Predicted
IMP_FARMAC_SA
1
13610.3
2
13858.26
Residuals
-2281.42
-3701.55
Regression
Residual
Total
Intercept
W_NET_SA
PI_VES_SA
F
Significance F
86.88062
1.83E-21
P-value
0.795575
9.6E-06
0.005445
Lower 95%
-4530.89
24.82006
16.16787
RESIDUAL OUTPUT
Observation
V. Ozolina, Econometrics
Upper 95%
3483.089
61.16715
90.62682
Lower 95.0%
-4530.89
24.82006
16.16787
Upper 95.0%
3483.089
61.16715
90.62682
Testing if OLS Assumptions Hold:
Example
Graphical analysis in Excel
Is the spread of the error term
increasing or
decreasing?
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
White’s test in EViews
V. Ozolina, Econometrics
OLS pieņēmumu pārbaude:
piemērs
p-value = 0.14
0.14 > 0.05
H0 is accepted
Error term is
homoscedastic
White Heteroskedasticity Test:
F-statistic
Obs*R-squared
2.070058
7.989054
Prob. F(4,85)
Prob. Chi-Square(4)
0.091834
0.091980
Test Equation:
Dependent Variable: RESID^2
Method: Least Squares
Date: 10/19/15 Time: 10:55
Sample: 2005M01 2012M06
Included observations: 90
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
W_NET_SA
W_NET_SA^2
PI_VES_SA
PI_VES_SA^2
1.52E+08
-135271.7
299.9393
-1559383.
4481.282
2.67E+08
473635.1
750.6340
3301836.
8231.547
0.567902
-0.285603
0.399581
-0.472277
0.544403
0.5716
0.7759
0.6905
0.6379
0.5876
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.088767
0.045886
17752977
2.68E+16
-1627.418
2.312690
V. Ozolina, Econometrics
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
8698576.
18174857
36.27596
36.41484
2.070058
0.091834
Testing if OLS Assumptions Hold
A3: Cov(ui,uj) = 0
There is no autocorrelation (spatial correlation)
in the error term
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold
Cov(ui,uj) = 0
How to test:
- no autocorrelation
Graphical analysis
Durbin-Watson statistic,
Breusch-Godfrey or serial correlation LM test
H0: assumption holds, Ha: does not hold
> 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 ℎ𝑜𝑙𝑑𝑠
𝑝 − 𝑣𝑎𝑙𝑢𝑒
< 0.05 → 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑛 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 ℎ𝑜𝑙𝑑
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Excel
Intercept
Coefficients
-523.902
W_NET_SA
42.9936
PI_VES_SA
53.39734
Standard
Error
t Stat
P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
2015.986 -0.25987 0.795575
-4530.89
3483.089
-4530.89
3483.089
4.70213
9.143424
3
9.6E-06
24.82006
61.16715
24.82006
61.16715
2.85077
18.73079
9 0.005445
16.16787
90.62682
16.16787
90.62682
RESIDUAL OUTPUT
Observation
1
2
3
4
5
Predicted
IMP_FARMAC_SA Residuals Res (-1)
13610.3
-2281.42
13858.26
-3701.55 -2281.42
14106.29
-1160.25 -3701.55
14552.45
-1820.19 -1160.25
14735.11
-2499.34 -1820.19
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Graphical tests in Excel
Do errors group in a circle?
V. Ozolina, Econometrics
Is there a trend?
Testing if OLS Assumptions Hold
Durbin-Watson test
Formula of the Durbin-Watson statistics:
𝑛
2
𝑢
−
𝑢
𝑖
𝑖−1
𝑖=2
𝑑=
𝑛
2
𝑢
𝑖=1 𝑖
0 < d < 4, the closer to 2, the better.
If d < 2:
H0 : Cov(ui,uj) = 0
H1 : Cov(ui,uj) > 0
If d < dL: H0 is rejected – autocorrelation exists
If d > dU: H0 is accepted – autocorrelation does not exist
If dL < d < dU: we cannot accept or reject H0
If d > 2, then instead of d we use (4 – d)
V. Ozolina, Econometrics
Table of Critical Values
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Durbin-Watson test in Excel (+lag of residuals)
RESIDUAL OUTPUT
1
2
3
4
5
6
Predicted
Observation IMP_FARMAC_SA Residuals
Residual (t-1)
([3]-[4])^2
[3]^2
1
13610.3 -2281.42
5204862.21
2
13858.26 -3701.55
-2281.42 2016767.1 13701442.5
90
25704.63 -388.486
-6166.36 33383823.4 150921.557
Sum
1218287337 777666992
DW = (sum([3]-[4])^2) : (sum[3]^2) 1.56659258
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Durbin-Watson test in Excel (+lag of residuals)
RESIDUAL OUTPUT
1
2
3
4
5
6
Predicted
Observation IMP_FARMAC_SA Residuals
Residual (t-1)
([3]-[4])^2
[3]^2
1
13610.3 -2281.42
5204862.21
2
13858.26 -3701.55
-2281.42 2016767.1 13701442.5
90
25704.63 -388.486
-6166.36 33383823.4 150921.557
Sum
1218287337 777666992
DW = (sum([3]-[4])^2) : (sum[3]^2) 1.56659258
DW statistics = 1.6
From the table: 90 observations, 3 coefficients, significance
level 0.05 dL = 1.612, dU = 1.703
1.57 < 1.612 d < dL autocorrelation exists
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Durbin-Watson test in EViews
From the table:
90 observations,
3 coefficients,
significance
level 0.05
dL = 1.612,
dU = 1.703
1.56 < 1.612
d < dL
autocorrelation exists
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Serial Correlation LM Test in EViews
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Serial Correlation LM Test in EViews
p-value = 0.0076
0.008 < 0.05
H0 is rejected
autocorrelation exists
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold
Cov(ui,uj) = 0 - no autocorrelation
What to do, if the assumption does not hold:
Find
the missing factors,
Check the functional form,
Specify a dynamic model by introducing the
lagged (past) values of y
We have to add lags until the problem is solved
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Adding lags in EViews
V. Ozolina, Econometrics
Testing if OLS Assumptions Hold:
Example
Adding lags in EViews
LM test:
p-value = 0.15
0.15 > 0.05
autocorrelation
problem is
solved
V. Ozolina, Econometrics
Evaluation of the Quality of the
Equation
Measures of Fit
Hypothesis testing for coefficients
V. Ozolina, Econometrics
Measures of Fit
The standard error of regression (SER) estimates the
standard deviation of the error term ui:
where k – the number of coefficients in the equation
R2 – determination coefficient is the fraction of the sample
variance of Y explained by (or predicted by) the regressors X.
.
V. Ozolina, Econometrics
Measures of Fit
R2 characterises the forecasting ability or
«explanation ability» of the equation, but:
An increase in the value of R2 does not necessarily
mean that an added variable is statistically significant.
A high value of R2 does not mean that the regressors
are a true cause of the dependent variable.
A high value of R2 does not mean there is no omitted
variable bias.
A high value of R2 does not necessarily mean you
have the most appropriate set of regressors, nor does
a low value necessarily mean you have an
inappropriate set of regressors.
V. Ozolina, Econometrics
Measures of Fit
Adjusted R2 ( ) – does not necessarily increase
when a new regressor is added.
The factor (n-1)/(n-k-1) will always be larger than
1 Ṝ2 will always be smaller than R2
Adding a regressor has 2 opposite effects on the
Ṝ2.
Ṝ2 can be negative
V. Ozolina, Econometrics
Measures of Fit: Example
In Excel: Data Analysis Regression
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.81631
0.666361
0.658692
2999.754
90
V. Ozolina, Econometrics
Measures of Fit: Example
In EViews
V. Ozolina, Econometrics
Hypothesis Tests for a Single
Coefficient
Compute the standard error for each
coefficient
𝛽𝑗 − 𝛽𝑗,0
Compute the t-statistic
𝑡=
𝑆𝐸(𝛽𝑗 )
Compute/obtain p-value
Hypothesis can be rejected at 5% significance
level, if:
p-value
is less than 0.05 or
|calculated value of t-stat| > critical value of t-stat
V. Ozolina, Econometrics
Confidence Intervals for a Single
Coefficient
A 95% two-sided confidence interval for the
coefficient βj is an interval than contains the
true value of βj with a 95% probability
Confidence interval is the set of values of βj
which cannot be rejected by a two-sided
hypothesis test.
𝛽𝑗 ∓ 𝑡𝑘𝑟𝑖𝑡 𝑆𝐸 𝛽𝑗
V. Ozolina, Econometrics
Hypothesis Tests for a Single
Coefficient: Example
In Excel: Data Analysis Regression
Intercept
W_NET_SA
PI_VES_SA
Coefficients
-523.902
42.9936
53.39734
Intercept
W_NET_SA
PI_VES_SA
Standard
Error
t Stat
P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
2015.986 -0.25987 0.795575
-4530.89
3483.089
-4530.89
3483.089
9.143424 4.702133 9.6E-06
24.82006
61.16715
24.82006
61.16715
18.73079 2.850779 0.005445
16.16787
90.62682
16.16787
90.62682
Coefficients
Standard Error
t Stat
P-value
-523.902
2015.986 -0.25987 0.795575
42.9936
9.143424 4.702133 9.6E-06
53.39734
18.73079 2.850779 0.005445
Lower 95%
Upper 95%
-4530.89
3483.089
24.82006
61.16715
16.16787
90.62682
V. Ozolina, Econometrics
Example in Eviews
Dependent Variable: IMP_FARMAC_SA
Method: Least Squares
Date: 10/03/14 Time: 23:05
Sample (adjusted): 2006M01 2013M06
Included observations: 90 after adjustments
Variable
Coefficient
Std. Error
t-Statistic
Prob.
W_NET_SA
PI_VES_SA
C
42.99360
53.39734
-523.9017
9.143424
18.73079
2015.986
4.702133
2.850779
-0.259874
0.0000
0.0054
0.7956
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.666361
0.658692
2999.754
7.83E+08
-846.7446
1.556177
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
V. Ozolina, Econometrics
22193.36
5134.666
18.88321
18.96654
86.88062
0.000000
Tests of Joint Hypothesis
Null hypothesis: H0: β1 = 0 and β2 = 0
Vs the alternative hypothesis H1: β1 ≠ 0 and/or β2≠ 0
The same in general form:
Null hypothesis: H0: βj = βj,0, βm = βm,0, ..., for a total of
q restrictions
Vs alternative hypothesis H1: one or more of the q
restrictions under H0 does not hold,
where βj, βm, ... – refer to different regression
coefficients and βj,0, βm,0, ... – refer to the values of
these coefficients under the null hypothesis
V. Ozolina, Econometrics
Several t-statistics?
Null hypothesis is not rejected only if both
Pr(|t1|≤|tcrit| and |t1|≤|tcrit|) hold
Because the t-statistics are independent, we
have to multiply the probabilities
Pr(|t1|≤|tcrit| x |t1|≤|tcrit|) = 0,952 = 0,9025 =
90.25%
So the probability of rejecting the null
hypothesis when it is true is 1 – 0.952 = 9.75%
V. Ozolina, Econometrics
F-statistics
If there are 2 factors in equation
where
- is an estimator of the correlation
between the two t-statistics
Degrees of freedom to obtain critical F-stat values (in
Excel – function FINV) – k numerator and (n-k-1)
denominator
V. Ozolina, Econometrics
F-statistics: Example
In Excel: Data Analysis Regression
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 1563595260 781797630 86.8806222
0.00000000
87 782871854 8998527.1
89 2346467114
=FINV(0.05;2;87) = 3.1
V. Ozolina, Econometrics
n2\n1
87
2
3.1
Example in EViews
Dependent Variable: IMP_FARMAC_SA
Method: Least Squares
Date: 10/03/14 Time: 23:05
Sample (adjusted): 2006M01 2013M06
Included observations: 90 after adjustments
Variable
Coefficient
Std. Error
t-Statistic
Prob.
W_NET_SA
PI_VES_SA
C
42.99360
53.39734
-523.9017
9.143424
18.73079
2015.986
4.702133
2.850779
-0.259874
0.0000
0.0054
0.7956
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.666361
0.658692
2999.754
7.83E+08
-846.7446
1.556177
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
22193.36
5134.666
18.88321
18.96654
86.88062
0.000000
V. Ozolina, Econometrics
Multi-Step Analysis
At the beginning we choose comparatively large
number of factors (6 - 14)
Then estimate regression equation and calculate
values of t-statistics for coefficients
Find the smallest value of t-stat and compare it
with the critical value of t-stat
If |tmin| < tcrit, only that 1 factor is excluded from the
equation
Estimate the equation once again and compute
the values of t-stat for all coefficients
If |tmin| > tcrit, optimal number of factors is found
V. Ozolina, Econometrics
Report and presentation
Chosen data, data type, sample size (number of observations),
problem, its topicality, motivation of choice etc.
Graphical analysis of the data (dynamics, correlation diagram)
Values of covariation and correlation coefficients, their interpretation
(single regression)
Results of OLS assumptions analysis
Values of the coefficient of determination (R2) and standard
deviation, their interpretation
Hipothesys tests of coefficients bi and their 95% confidence interval
Hipothesys test of equation (not in single regression)
Analysis of alternative equations (if made)
Conclusions – about the quality of equation, its possible application
etc.
V. Ozolina, Econometrics