Data Analysis
Выбери формат для чтения
Загружаем конспект в формате pdf
Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇
ECONOMETRICS
Data Analysis
Logarithms and First Differences (∆
or d)
log
d(log)
V. Ozolina, Econometrics
Logarithms and First Differences (∆
or d) – Latvia’s real GDP
A.Auziņa-Emsiņa, Econometrics
Logarithms and First Differences (∆
or d)
15.0
2400000
14.5
2000000
14.0
1600000
13.5
1200000
13.0
800000
12.5
400000
12.0
1975 1980 1985 1990 1995 2000 2005 2010 2015
1975 1980 1985 1990 1995 2000 2005 2010 2015
log
LUK
UK
.3
.2
.1
.0
-.1
-.2
1975 1980 1985 1990 1995 2000 2005 2010 2015
DLUK
V. Ozolina, Econometrics
d(log)
Seasonal Adjustment
Aim – to remove or reduce the seasonal or
cyclical fluctuations to analyse and forecast
only unpredictable fluctuations
Moving average methods:
Multiplicative
(cannot use, if the values are 0 or
negative), easier to interpret %
Additive
Seasonal dummies
V. Ozolina, Econometrics
Filtering
Used a lot in Central banks to forecast the
values of the exogenous indicators
Help to disclose a «signal» – fluctuations,
which are worth to forecast
Can «erase» not only the random fluctuations,
but also a part of a «signal»
The most common is Hodrick-Prescott filter
V. Ozolina, Econometrics
Let’s begin with the basics.
Descriptive statistics - are brief descriptive
coefficients that summarize a given data set.
Descriptive statistics - are simply a way to
describe our data, but they do not allow to
make final conclusions about the process or
activity.
A.Auzina-Emsina, Econometrics
Descriptive Statistics
Measures of Location
Mean – arithmetic average value –
sum/number of observations (influenced by
extreme values)
Median – middle value (or the average of 2
middle values) of the series, if observations
are ordered from the smallest to the largest
(less sensitive)
Max and Min values
V. Ozolina, Econometrics
Descriptive Statistics
Measures of scale or spread
Variation – average value of the typical fluctuations
1
= =
Standard Deviation (std.dev. ; also called Sigma) – a measure of dispersion or
spread in the series, a measure of stability
=
The most simple forecast = confidence interval (95%probability):
± 1,96
So called «68–95–99.7 rule» in statistics (normal distribution):
1-sigma rule ~ ± 1 ~68%
2-sigma rule~ ± 2 ~95%
3-sigma rule~ ± 3 ~99.7%
V. Ozolina&A.Auziņa-Emsiņa, Econometrics
Descriptive Statistics
Skewness – a measure of asymmetry of the
distribution of the series around its mean
1
' =
(
1
/
V. Ozolina, Econometrics
(/
Descriptive Statistics
Skewness – a measure of asymmetry of the
distribution of the series around its mean
Symmetric
distribution (such as the normal
distribution) = 0
Positive values indicate on a long right tail
Negative values indicate on a long left tail
V. Ozolina, Econometrics
Descriptive Statistics
Skewness
V. Ozolina, Econometrics
Descriptive Statistics
Kurtosis – measures the flatness of the
distribution – how frequently we can observe
large fluctuations
1
*=
+
1
/
V. Ozolina, Econometrics
Descriptive Statistics
Kurtosis – measures the flatness of the
distribution
If
K = 3* normal distribution
If K > 3* flat distribution (platykurtic), heavy
tails
If K < 3* peaked distribution (leptokurtic),
skinny or light tails
*If 3 is subtracted from the formula, then K=0 in
case of a normal distribution (this is the case of MS
Excel etc.)
V. Ozolina&A.Auzina-Emsina, Econometrics
Descriptive Statistics
Kurtosis
V. Ozolina, Econometrics
Testing
The main ingredients of testing:
H0: null hypothesis – a statement, which can be
true
H1: alternative hypothesis – general
p-value = P[H0 is true]
> 0,05 => 33/,4 56
, -./ 0
< 0,05 => /8/34 56
If p-value is not given, critical values are used
Decision to accept or reject the H0
V. Ozolina, Econometrics
Descriptive Statistics
Jarque-Bera statistics – for testing whether the
series is normally distributed («Jarque-Bera
statistics=Test for normality»)
The
test statistics measures the difference of the
skewness and kurtosis from the normal
distribution
H0: the data have a normal distribution
If the reported probability is small (usually < 0.05),
the data do not have a normal distribution
V. Ozolina&A.Auzina-Emsina, Econometrics
Descriptive Statistics
Excel: Data Data Analysis
V. Ozolina, Econometrics
Excel
Y
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample
Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
ln(Y)
3176.923 Mean
530.8427 Standard Error
2462 Median
#N/A
Mode
Standard
1913.981 Deviation
Sample
3663322 Variance
0.293998 Kurtosis
1.17146 Skewness
5871 Range
1273 Minimum
7144 Maximum
41300 Sum
13 Count
d(ln(Y))
7.91422 Mean
0.154158 Standard Error
7.808729 Median
#N/A
Mode
Standard
0.555823 Deviation
Sample
0.308939 Variance
-0.85639 Kurtosis
0.491652 Skewness
1.724897 Range
7.149132 Minimum
8.874028 Maximum
102.8849 Sum
13 Count
V. Ozolina, Econometrics
0.147352
0.016925
0.115289
#N/A
0.061023
0.003724
0.423699
1.019058
0.205669
0.079296
0.284965
1.91558
13
Excel – Latvia’s real GDP example
Y
log(Y)
dln(Y)
Mean
Standard
Error
Median
Mode
Standard
Deviation
Sample
Variance
Kurtosis
19.01958Mean
Standard
0.768002 Error
19.85241Median
#N/A
Mode
Standard
3.347643 Deviation
Sample
11.20672 Variance
-0.51697Kurtosis
2.929184Mean
Standard
0.04372 Error
2.988325Median
#N/A
Mode
Standard
0.190569 Deviation
Sample
0.036317 Variance
-0.07225Kurtosis
0.004039
4.048338
Skewness
Range
Minimum
-0.64666Skewness
11.46765Range
12.39656Minimum
-0.92775Skewness
0.654961Range
2.517419Minimum
-1.73972
0.267845
-0.1555
Maximum
Sum
Count
23.8642Maximum
361.3721Sum
19Count
3.17238Maximum
55.65449Sum
19Count
0.112341
0.654961
18
A.Auziņa-Emsiņa, Econometrics
0.036387
0.01498
0.045958
#N/A
0.063554
Descriptive Statistics
Eviews: Series View Descriptive
Statistics & Tests
V. Ozolina, Econometrics
EViews
9
12
Series: UK
Sample 1975 2015
Observations 40
8
7
6
5
4
3
2
1
10
500000
1000000
1500000
Series: LUK
Sample 1975 2015
Observations 40
10
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
1158919.
952576.0
2222912.
195123.1
649969.1
0.108323
1.578219
Jarque-Bera
Probability
3.447327
0.178411
2000000
8
6
4
2
12.0
Series: DLUK
12.5 Sample
13.0 1975
13.5
2015 14.0
Observations 39
8
6
4
2
-0.1
-0.0
0.1
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
0.062383
0.065379
0.236191
-0.136675
0.079692
-0.215108
3.610835
Jarque-Bera
Probability
0.907085
0.635374
0.2
V. Ozolina, Econometrics
14.5
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
13.75663
13.76689
14.61433
12.18139
0.713249
-0.685299
2.458080
Jarque-Bera
Probability
3.620362
0.163625
EViews – Latvia’s real GDP
example
4
7
Series: Y
Sample 2000 2020
Observations 19
3
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
2
1
Series: LOG_Y_
Sample 2000 2020
Observations 19
6
5
19.019584
19.85241
23.864203
12.39656
2
3.347643
-0.5944441
2.309401
Jarque-Bera
Probability
12
13
14
15
16
17
18
19
20
21
22
23
1.496551
0.473182
2.5
2.6
2.7
24
5
Series: DLN_Y_
Sample 2000 2020
Observations 18
4
3
2
1
-0.15
-0.10
-0.05
0.00
0.05
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
0.036387
0.045958
0.112341
-0.155505
0.063554
-1.591250
5.692264
Jarque-Bera
Probability
13.03244
0.001479
0.10
A.Auziņa-Emsiņa, Econometrics
2.8
2.9
3.0
3.1
3.2
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
2.929184
2.988325
3.172380
2.517419
0.190569
-0.852836
2.645409
Jarque-Bera
Probability
2.402749
0.300780
Descriptive Statistics
Eviews: Group View Descriptive
Statistics & Tests
V. Ozolina, Econometrics
EViews
V. Ozolina, Econometrics
Eviews - Latvia’s real GDP
example
A.Auziņa-Emsiņa, Econometrics
Denominations of the Variables
Outcomes/ Effect
Y
Resulting variable
Dependent variable
Endogenous variable
Explained variable
Predictand
Regressand
Target variable
Causes/ Causal variables
X1, X2, ... Xn
Factors
Independent variables
Exogenous variables
Explanatory variables
Predictors
Regressors
Control variables
V. Ozolina, Econometrics
Tasks of Econometrics in Research
of Causalities
Correlation analysis.
Estimation of quantitative effect of factor to
resulting indicator.
V. Ozolina, Econometrics
Covariance
Positive covariance – Xi is greater than its
mean, when Yi greater than its mean and vice
versa.
Negative covariance – Xi is greater than its
mean, when Yi is smaller and vice versa.
Zero covariance – when X un Y are
independent
V. Ozolina, Econometrics
Covariance: Example
Values
of 3 variables are given in the Table. Your
task is to calculate covariance for pairs RA and
RB as well as RA and RC.
RA
11
10
9
12
8
10
Δ
11-10 = 1
10-10 = 0
9-10 = -1
12-10 = 2
8-10 = -2
RB
Δ
RC
8
8-12 = -4 10
10 10-12 = -2 9
16 16-12 = 4 8
10 10-12 = -2 11
16 16-12 = 4 7
12
9
V. Ozolina, Econometrics
Covariance: Example
cov(RA,RB) = (1/5)*((11-10)*(8-12)+(10-10)*(1012)+ +(9-10)*(16-12)+(12-10)*(10-12)+(8-10)*(1612) =
= (1/5) * (1*(-4) + 0*(-2) + (-1)*4 + 2*(-2) + (-2)*4) =
= (1/5)*(-4 + 0 – 4 – 4 – 8) =
= -20/5 = -4
cov(RA,RC) = 2
V. Ozolina, Econometrics
Correlation
-1 ≤ corr(X,Y) or rX,Y ≤ 1
rX,Y < 0 – negative correlation
rX,Y > 0 – positive correlation
rX,Y = 0 – variables are uncorrelated (no linear correlation)
rX,Y = ± 1 – perfect correlation
Only linear relations are analysed
Correlation ≠ causality:
Spurious correlation
Opposite causality
V. Ozolina, Econometrics
Correlation: Example
RA
11
10
9
12
8
10
Δ
(11-10)2 = 1
(10-10)2 = 0
(9-10)2 = 1
(12-10)2 = 4
(8-10)2 = 4
2
RB
8
10
16
10
16
12
Δ
(8-12)2 = 16
(10-12)2 = 4
(16-12)2 = 16
(10-12)2 = 4
(16-12)2 = 16
11.2
V. Ozolina, Econometrics
RC
10
9
8
11
7
9
Correlation: Example ...
corr(RA,RB) = -0.845
corr(RA,RC) = 1
V. Ozolina, Econometrics
Correlation Diagram or Scatter Plot
y
300
250
200
150
100
b0
50
200
400
600
800
V. Ozolina, Econometrics
1000
1200
1400
x
Correlation -> Graphical Analysis
V. Ozolina, Econometrics
Check the data!
V. Ozolina, Econometrics
Types of Regression
Depending on the number of factors:
Single
Regression
Multiple Regression
Depending on form:
Linear
Non-linear
Depending on character:
Positive
(direct) regression
Negative (opposite) regression
V. Ozolina, Econometrics
Objectives of Regression Analysis
To determine the form of regression:
Linear,
non-linear
To determine regression function:
Estimate
particular values of coefficients
To estimate unknown values of the dependent
variable:
Calculate
the value of Y given particular values of
X
V. Ozolina, Econometrics
ECONOMETRICS
Single Regression
Single Linear Regression
Model: Yi = β0 + β1Xi + ui,
Where
the subscript i runs over observations, i = 1, 2, ... n;
Yi – dependent variable, regressand, left-hand
variable;
Xi – independent variable, regressor, right-hand
variable;
β0 + β1Xi – population regression line or population
regression function;
β0 – intercept of the population regression line;
β1 – slope of the population regression line;
ui (sometimes also εi) – error term.
V. Ozolina, Econometrics
Number of crimes per 10 000 residents
Single Linear Regression
300
β0 + β1Xi
(X10,Y10)
280
260
240
u10
220
200
180
u1
160
(X1,Y1)
140
120
1000
2000
3000 4000 5000
GDP per capita, Ls
V. Ozolina, Econometrics
6000
7000
8000
Estimating the Coefficients of the
Linear Regression Model
Ordinary Least Squares (OLS)
Coefficients are estimated for a particular
sample, but not the whole population, which is
unknown
n
n
2
∑ u i = ∑ ( Yi − Ŷi ) 2 → min
i =1
i =1
V. Ozolina, Econometrics
Ordinary Least Squares OLS
Using linear function Ŷi = b0 + b1X i , we obtain
n
n
i =1
i =1
2
2
(
Y
−
Ŷ
)
=
[
Y
−
(
b
+
b
X
)
]
∑ i i ∑ i 0 1 i → min
n
∑u
2
i
= F(b0 , b1 )
i =1
∂F
∂ b = 0
0
∂F = 0
∂ b1
V. Ozolina, Econometrics
Ordinary Least Squares OLS
Differentiation results in a system of normal
n
n
equations: n ⋅ βˆ0 + βˆ1 ∑ X i = ∑ Yt
i =1
n
n
i =1
i =1
i =1
βˆ0 ∑ X i + βˆ1 ∑ X i2 =
n
∑YX
i
i
i =1
Solution of the normal equations yields OLS
estimators of β0 and β1
n
βˆ1 =
n
n
n
n ∑ X i Yi − ∑ X i ∑ Yi
i =1
i =1
i =1
n ∑ X i2 − ∑ X i
i =1
i =1
n
n
2
∑Y
n
i
βˆ 0 =
i =1
n
− βˆ1
V. Ozolina, Econometrics
∑X
i =1
n
i
Ordinary Least Squares OLS
Regression line: Ŷi = βˆ0 + βˆ1X i
Estimated Yi, predicted value for Xi: Ŷi or
β0 estimator: βˆ0
β1 estimator: β̂1
Error for the ith observation: .9 = : :;
V. Ozolina, Econometrics
Ordinary Least Squares OLS
Where does the error come from?
Not
the correct model
Not the correct parameters
.9 = . + = => + ? ?@ A
y
yi
ûi
ŷi
xi
x
V. Ozolina, Econometrics
Ordinary Least Squares OLS
Yi = 157.16 + 0.0172 Xi
GDP per capita increase by 1 unit number of
crimes increases by 0.0172 units
Constantly?
V. Ozolina, Econometrics
Scale of Correlation Diagram ...
The weaker the relationship, the more
horizontal should the line be
V. Ozolina, Econometrics
Least Squares Assumptions
The conditional distribution of ui given Xi has a
mean of zero.
Distribution of Y when X = 8
Distribution of Y when X = 5
β0 + β1Xi
Distribution of Y when X = 2
E(Y|X=8)
E(Y|X=2)
E(Y|X=5)
V. Ozolina, Econometrics
Least Squares Assumptions
(Xi,Yi), i = 1, ..., n are independently and
identically distributed
Large outliers are unlikely
V. Ozolina, Econometrics
OLS Assumptions
A1: E(ui) = 0
Expected/ average value of the
error term is 0
A2: Var(ui) = σ2 Variation of the error is constant
and finite (homoscedasticity)
A3: Cov(ui,uj) = 0
Errors are statistically
independent
(no autocorrelation)
A4: Cov(ui,Xi) = 0
Variations of the error term
and X are not related
A5: ut is normally distributed
V. Ozolina, Econometrics
Properties of OLS
If A1 and A4 hold, OLS is unbiased, i.e.,
B ?@ ? = 0
If A1, A2, A3 and A4 hold, OLS ir BLUE (Best
Linear Unbiased Estimator):
?@ ? = ∑E D .
Var ?@ ? is the smallest obtainable value
V. Ozolina, Econometrics
Properties of OLS
If A4 and a part of A2 hold then OLS is
consistent (usable) lim K ?@ ? > 0 = 0
→J
If A1, A2, A3 and A4 hold, we have the
formulas of the variation of ?@ ?
V. Ozolina, Econometrics
Properties of OLS
According to the assumptions, errors are normally
distributed ui ~ N(0,σ2)
As OLS estimators are linearly related to the error
term, also they are normally distributed
?@~L(?, N
?@ ? )
It is possible to carry out hypothesis testing
It is possible to use confidence intervals
What to do, if the errors are not normally
distributed?
V. Ozolina, Econometrics
Sample Size
Estimated coefficients have a jointly normal
sampling distribution, if the sample size is very
large.
N > 30; N > 100 observations
The larger the
variance of Xi,
the smaller the variance
of coefficient errors
V. Ozolina, Econometrics