Data Analysis

👀 360 просмотров
📌 311 загрузок

Выбери формат для чтения

Конспект лекции по дисциплине «Data Analysis», pdf

Загружаем конспект в формате pdf

Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇

Конспект лекции по дисциплине «Data Analysis», Word формат

ECONOMETRICS Data Analysis Logarithms and First Differences (∆ or d) log d(log) V. Ozolina, Econometrics Logarithms and First Differences (∆ or d) – Latvia’s real GDP A.Auziņa-Emsiņa, Econometrics Logarithms and First Differences (∆ or d) 15.0 2400000 14.5 2000000 14.0 1600000 13.5 1200000 13.0 800000 12.5 400000 12.0 1975 1980 1985 1990 1995 2000 2005 2010 2015 1975 1980 1985 1990 1995 2000 2005 2010 2015 log LUK UK .3 .2 .1 .0 -.1 -.2 1975 1980 1985 1990 1995 2000 2005 2010 2015 DLUK V. Ozolina, Econometrics d(log) Seasonal Adjustment Aim – to remove or reduce the seasonal or cyclical fluctuations to analyse and forecast only unpredictable fluctuations Moving average methods: Multiplicative (cannot use, if the values are 0 or negative), easier to interpret % Additive Seasonal dummies V. Ozolina, Econometrics Filtering Used a lot in Central banks to forecast the values of the exogenous indicators Help to disclose a «signal» – fluctuations, which are worth to forecast Can «erase» not only the random fluctuations, but also a part of a «signal» The most common is Hodrick-Prescott filter V. Ozolina, Econometrics Let’s begin with the basics. Descriptive statistics - are brief descriptive coefficients that summarize a given data set. Descriptive statistics - are simply a way to describe our data, but they do not allow to make final conclusions about the process or activity. A.Auzina-Emsina, Econometrics Descriptive Statistics Measures of Location Mean – arithmetic average value – sum/number of observations (influenced by extreme values) Median – middle value (or the average of 2 middle values) of the series, if observations are ordered from the smallest to the largest (less sensitive) Max and Min values V. Ozolina, Econometrics Descriptive Statistics Measures of scale or spread Variation – average value of the typical fluctuations 1 = = Standard Deviation (std.dev. ; also called Sigma) – a measure of dispersion or spread in the series, a measure of stability = The most simple forecast = confidence interval (95%probability): ± 1,96 So called «68–95–99.7 rule» in statistics (normal distribution): 1-sigma rule ~ ± 1 ~68% 2-sigma rule~ ± 2 ~95% 3-sigma rule~ ± 3 ~99.7% V. Ozolina&A.Auziņa-Emsiņa, Econometrics Descriptive Statistics Skewness – a measure of asymmetry of the distribution of the series around its mean 1 ' = ( 1 / V. Ozolina, Econometrics (/ Descriptive Statistics Skewness – a measure of asymmetry of the distribution of the series around its mean Symmetric distribution (such as the normal distribution) = 0 Positive values indicate on a long right tail Negative values indicate on a long left tail V. Ozolina, Econometrics Descriptive Statistics Skewness V. Ozolina, Econometrics Descriptive Statistics Kurtosis – measures the flatness of the distribution – how frequently we can observe large fluctuations 1 *= + 1 / V. Ozolina, Econometrics Descriptive Statistics Kurtosis – measures the flatness of the distribution If K = 3* normal distribution If K > 3* flat distribution (platykurtic), heavy tails If K < 3* peaked distribution (leptokurtic), skinny or light tails *If 3 is subtracted from the formula, then K=0 in case of a normal distribution (this is the case of MS Excel etc.) V. Ozolina&A.Auzina-Emsina, Econometrics Descriptive Statistics Kurtosis V. Ozolina, Econometrics Testing The main ingredients of testing: H0: null hypothesis – a statement, which can be true H1: alternative hypothesis – general p-value = P[H0 is true] > 0,05 => 33/,4 56 , -./ 0 < 0,05 => /8/34 56 If p-value is not given, critical values are used Decision to accept or reject the H0 V. Ozolina, Econometrics Descriptive Statistics Jarque-Bera statistics – for testing whether the series is normally distributed («Jarque-Bera statistics=Test for normality») The test statistics measures the difference of the skewness and kurtosis from the normal distribution H0: the data have a normal distribution If the reported probability is small (usually < 0.05), the data do not have a normal distribution V. Ozolina&A.Auzina-Emsina, Econometrics Descriptive Statistics Excel: Data Data Analysis V. Ozolina, Econometrics Excel Y Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count ln(Y) 3176.923 Mean 530.8427 Standard Error 2462 Median #N/A Mode Standard 1913.981 Deviation Sample 3663322 Variance 0.293998 Kurtosis 1.17146 Skewness 5871 Range 1273 Minimum 7144 Maximum 41300 Sum 13 Count d(ln(Y)) 7.91422 Mean 0.154158 Standard Error 7.808729 Median #N/A Mode Standard 0.555823 Deviation Sample 0.308939 Variance -0.85639 Kurtosis 0.491652 Skewness 1.724897 Range 7.149132 Minimum 8.874028 Maximum 102.8849 Sum 13 Count V. Ozolina, Econometrics 0.147352 0.016925 0.115289 #N/A 0.061023 0.003724 0.423699 1.019058 0.205669 0.079296 0.284965 1.91558 13 Excel – Latvia’s real GDP example Y log(Y) dln(Y) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis 19.01958Mean Standard 0.768002 Error 19.85241Median #N/A Mode Standard 3.347643 Deviation Sample 11.20672 Variance -0.51697Kurtosis 2.929184Mean Standard 0.04372 Error 2.988325Median #N/A Mode Standard 0.190569 Deviation Sample 0.036317 Variance -0.07225Kurtosis 0.004039 4.048338 Skewness Range Minimum -0.64666Skewness 11.46765Range 12.39656Minimum -0.92775Skewness 0.654961Range 2.517419Minimum -1.73972 0.267845 -0.1555 Maximum Sum Count 23.8642Maximum 361.3721Sum 19Count 3.17238Maximum 55.65449Sum 19Count 0.112341 0.654961 18 A.Auziņa-Emsiņa, Econometrics 0.036387 0.01498 0.045958 #N/A 0.063554 Descriptive Statistics Eviews: Series View Descriptive Statistics & Tests V. Ozolina, Econometrics EViews 9 12 Series: UK Sample 1975 2015 Observations 40 8 7 6 5 4 3 2 1 10 500000 1000000 1500000 Series: LUK Sample 1975 2015 Observations 40 10 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 1158919. 952576.0 2222912. 195123.1 649969.1 0.108323 1.578219 Jarque-Bera Probability 3.447327 0.178411 2000000 8 6 4 2 12.0 Series: DLUK 12.5 Sample 13.0 1975 13.5 2015 14.0 Observations 39 8 6 4 2 -0.1 -0.0 0.1 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 0.062383 0.065379 0.236191 -0.136675 0.079692 -0.215108 3.610835 Jarque-Bera Probability 0.907085 0.635374 0.2 V. Ozolina, Econometrics 14.5 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 13.75663 13.76689 14.61433 12.18139 0.713249 -0.685299 2.458080 Jarque-Bera Probability 3.620362 0.163625 EViews – Latvia’s real GDP example 4 7 Series: Y Sample 2000 2020 Observations 19 3 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 2 1 Series: LOG_Y_ Sample 2000 2020 Observations 19 6 5 19.019584 19.85241 23.864203 12.39656 2 3.347643 -0.5944441 2.309401 Jarque-Bera Probability 12 13 14 15 16 17 18 19 20 21 22 23 1.496551 0.473182 2.5 2.6 2.7 24 5 Series: DLN_Y_ Sample 2000 2020 Observations 18 4 3 2 1 -0.15 -0.10 -0.05 0.00 0.05 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 0.036387 0.045958 0.112341 -0.155505 0.063554 -1.591250 5.692264 Jarque-Bera Probability 13.03244 0.001479 0.10 A.Auziņa-Emsiņa, Econometrics 2.8 2.9 3.0 3.1 3.2 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 2.929184 2.988325 3.172380 2.517419 0.190569 -0.852836 2.645409 Jarque-Bera Probability 2.402749 0.300780 Descriptive Statistics Eviews: Group View Descriptive Statistics & Tests V. Ozolina, Econometrics EViews V. Ozolina, Econometrics Eviews - Latvia’s real GDP example A.Auziņa-Emsiņa, Econometrics Denominations of the Variables Outcomes/ Effect Y Resulting variable Dependent variable Endogenous variable Explained variable Predictand Regressand Target variable Causes/ Causal variables X1, X2, ... Xn Factors Independent variables Exogenous variables Explanatory variables Predictors Regressors Control variables V. Ozolina, Econometrics Tasks of Econometrics in Research of Causalities Correlation analysis. Estimation of quantitative effect of factor to resulting indicator. V. Ozolina, Econometrics Covariance Positive covariance – Xi is greater than its mean, when Yi greater than its mean and vice versa. Negative covariance – Xi is greater than its mean, when Yi is smaller and vice versa. Zero covariance – when X un Y are independent V. Ozolina, Econometrics Covariance: Example Values of 3 variables are given in the Table. Your task is to calculate covariance for pairs RA and RB as well as RA and RC. RA 11 10 9 12 8 10 Δ 11-10 = 1 10-10 = 0 9-10 = -1 12-10 = 2 8-10 = -2 RB Δ RC 8 8-12 = -4 10 10 10-12 = -2 9 16 16-12 = 4 8 10 10-12 = -2 11 16 16-12 = 4 7 12 9 V. Ozolina, Econometrics Covariance: Example cov(RA,RB) = (1/5)*((11-10)*(8-12)+(10-10)*(1012)+ +(9-10)*(16-12)+(12-10)*(10-12)+(8-10)*(1612) = = (1/5) * (1*(-4) + 0*(-2) + (-1)*4 + 2*(-2) + (-2)*4) = = (1/5)*(-4 + 0 – 4 – 4 – 8) = = -20/5 = -4 cov(RA,RC) = 2 V. Ozolina, Econometrics Correlation -1 ≤ corr(X,Y) or rX,Y ≤ 1 rX,Y < 0 – negative correlation rX,Y > 0 – positive correlation rX,Y = 0 – variables are uncorrelated (no linear correlation) rX,Y = ± 1 – perfect correlation Only linear relations are analysed Correlation ≠ causality: Spurious correlation Opposite causality V. Ozolina, Econometrics Correlation: Example RA 11 10 9 12 8 10 Δ (11-10)2 = 1 (10-10)2 = 0 (9-10)2 = 1 (12-10)2 = 4 (8-10)2 = 4 2 RB 8 10 16 10 16 12 Δ (8-12)2 = 16 (10-12)2 = 4 (16-12)2 = 16 (10-12)2 = 4 (16-12)2 = 16 11.2 V. Ozolina, Econometrics RC 10 9 8 11 7 9 Correlation: Example ... corr(RA,RB) = -0.845 corr(RA,RC) = 1 V. Ozolina, Econometrics Correlation Diagram or Scatter Plot y 300 250 200 150 100 b0 50 200 400 600 800 V. Ozolina, Econometrics 1000 1200 1400 x Correlation -> Graphical Analysis V. Ozolina, Econometrics Check the data! V. Ozolina, Econometrics Types of Regression Depending on the number of factors: Single Regression Multiple Regression Depending on form: Linear Non-linear Depending on character: Positive (direct) regression Negative (opposite) regression V. Ozolina, Econometrics Objectives of Regression Analysis To determine the form of regression: Linear, non-linear To determine regression function: Estimate particular values of coefficients To estimate unknown values of the dependent variable: Calculate the value of Y given particular values of X V. Ozolina, Econometrics ECONOMETRICS Single Regression Single Linear Regression Model: Yi = β0 + β1Xi + ui, Where the subscript i runs over observations, i = 1, 2, ... n; Yi – dependent variable, regressand, left-hand variable; Xi – independent variable, regressor, right-hand variable; β0 + β1Xi – population regression line or population regression function; β0 – intercept of the population regression line; β1 – slope of the population regression line; ui (sometimes also εi) – error term. V. Ozolina, Econometrics Number of crimes per 10 000 residents Single Linear Regression 300 β0 + β1Xi (X10,Y10) 280 260 240 u10 220 200 180 u1 160 (X1,Y1) 140 120 1000 2000 3000 4000 5000 GDP per capita, Ls V. Ozolina, Econometrics 6000 7000 8000 Estimating the Coefficients of the Linear Regression Model Ordinary Least Squares (OLS) Coefficients are estimated for a particular sample, but not the whole population, which is unknown n n 2 ∑ u i = ∑ ( Yi − Ŷi ) 2 → min i =1 i =1 V. Ozolina, Econometrics Ordinary Least Squares OLS Using linear function Ŷi = b0 + b1X i , we obtain n n i =1 i =1 2 2 ( Y − Ŷ ) = [ Y − ( b + b X ) ] ∑ i i ∑ i 0 1 i → min n ∑u 2 i = F(b0 , b1 ) i =1 ∂F ∂ b = 0  0  ∂F = 0  ∂ b1 V. Ozolina, Econometrics Ordinary Least Squares OLS Differentiation results in a system of normal n n equations: n ⋅ βˆ0 + βˆ1 ∑ X i = ∑ Yt i =1 n n i =1 i =1 i =1 βˆ0 ∑ X i + βˆ1 ∑ X i2 = n ∑YX i i i =1 Solution of the normal equations yields OLS estimators of β0 and β1 n βˆ1 = n n n n ∑ X i Yi − ∑ X i ∑ Yi i =1 i =1 i =1   n ∑ X i2 −  ∑ X i  i =1  i =1  n n 2 ∑Y n i βˆ 0 = i =1 n − βˆ1 V. Ozolina, Econometrics ∑X i =1 n i Ordinary Least Squares OLS Regression line: Ŷi = βˆ0 + βˆ1X i Estimated Yi, predicted value for Xi: Ŷi or β0 estimator: βˆ0 β1 estimator: β̂1 Error for the ith observation: .9 = : :; V. Ozolina, Econometrics Ordinary Least Squares OLS Where does the error come from? Not the correct model Not the correct parameters .9 = . + = => + ? ?@ A y yi ûi ŷi xi x V. Ozolina, Econometrics Ordinary Least Squares OLS Yi = 157.16 + 0.0172 Xi GDP per capita increase by 1 unit number of crimes increases by 0.0172 units Constantly? V. Ozolina, Econometrics Scale of Correlation Diagram ... The weaker the relationship, the more horizontal should the line be V. Ozolina, Econometrics Least Squares Assumptions The conditional distribution of ui given Xi has a mean of zero. Distribution of Y when X = 8 Distribution of Y when X = 5 β0 + β1Xi Distribution of Y when X = 2 E(Y|X=8) E(Y|X=2) E(Y|X=5) V. Ozolina, Econometrics Least Squares Assumptions (Xi,Yi), i = 1, ..., n are independently and identically distributed Large outliers are unlikely V. Ozolina, Econometrics OLS Assumptions A1: E(ui) = 0 Expected/ average value of the error term is 0 A2: Var(ui) = σ2 Variation of the error is constant and finite (homoscedasticity) A3: Cov(ui,uj) = 0 Errors are statistically independent (no autocorrelation) A4: Cov(ui,Xi) = 0 Variations of the error term and X are not related A5: ut is normally distributed V. Ozolina, Econometrics Properties of OLS If A1 and A4 hold, OLS is unbiased, i.e., B ?@ ? = 0 If A1, A2, A3 and A4 hold, OLS ir BLUE (Best Linear Unbiased Estimator): ?@ ? = ∑E D . Var ?@ ? is the smallest obtainable value V. Ozolina, Econometrics Properties of OLS If A4 and a part of A2 hold then OLS is consistent (usable) lim K ?@ ? > 0 = 0 →J If A1, A2, A3 and A4 hold, we have the formulas of the variation of ?@ ? V. Ozolina, Econometrics Properties of OLS According to the assumptions, errors are normally distributed ui ~ N(0,σ2) As OLS estimators are linearly related to the error term, also they are normally distributed ?@~L(?, N ?@ ? ) It is possible to carry out hypothesis testing It is possible to use confidence intervals What to do, if the errors are not normally distributed? V. Ozolina, Econometrics Sample Size Estimated coefficients have a jointly normal sampling distribution, if the sample size is very large. N > 30; N > 100 observations The larger the variance of Xi, the smaller the variance of coefficient errors V. Ozolina, Econometrics

Авторы лекции