Справочник от Автор24
Поделись лекцией за скидку на Автор24

Data Analysis

  • 👀 484 просмотра
  • 📌 435 загрузок
Выбери формат для чтения
Загружаем конспект в формате pdf
Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇
Конспект лекции по дисциплине «Data Analysis» pdf
ECONOMETRICS Data Analysis Logarithms and First Differences (∆ or d) log d(log) V. Ozolina, Econometrics Logarithms and First Differences (∆ or d) – Latvia’s real GDP A.Auziņa-Emsiņa, Econometrics Logarithms and First Differences (∆ or d) 15.0 2400000 14.5 2000000 14.0 1600000 13.5 1200000 13.0 800000 12.5 400000 12.0 1975 1980 1985 1990 1995 2000 2005 2010 2015 1975 1980 1985 1990 1995 2000 2005 2010 2015 log LUK UK .3 .2 .1 .0 -.1 -.2 1975 1980 1985 1990 1995 2000 2005 2010 2015 DLUK V. Ozolina, Econometrics d(log) Seasonal Adjustment   Aim – to remove or reduce the seasonal or cyclical fluctuations  to analyse and forecast only unpredictable fluctuations Moving average methods:  Multiplicative   (cannot use, if the values are 0 or negative), easier to interpret %  Additive     Seasonal dummies V. Ozolina, Econometrics Filtering     Used a lot in Central banks to forecast the values of the exogenous indicators Help to disclose a «signal» – fluctuations, which are worth to forecast Can «erase» not only the random fluctuations, but also a part of a «signal» The most common is Hodrick-Prescott filter V. Ozolina, Econometrics    Let’s begin with the basics. Descriptive statistics - are brief descriptive coefficients that summarize a given data set. Descriptive statistics - are simply a way to describe our data, but they do not allow to make final conclusions about the process or activity. A.Auzina-Emsina, Econometrics Descriptive Statistics Measures of Location  Mean – arithmetic average value – sum/number of observations (influenced by extreme values)  Median – middle value (or the average of 2 middle values) of the series, if observations are ordered from the smallest to the largest (less sensitive)  Max and Min values V. Ozolina, Econometrics Descriptive Statistics Measures of scale or spread  Variation – average value of the typical fluctuations  1 =   =          Standard Deviation (std.dev. ; also called Sigma) – a measure of dispersion or spread in the series, a measure of stability = The most simple forecast = confidence interval (95%probability):  ± 1,96 So called «68–95–99.7 rule» in statistics (normal distribution): 1-sigma rule ~  ± 1  ~68% 2-sigma rule~  ± 2  ~95% 3-sigma rule~  ± 3  ~99.7% V. Ozolina&A.Auziņa-Emsiņa, Econometrics Descriptive Statistics  Skewness – a measure of asymmetry of the distribution of the series around its mean  1 ' =       (  1 /       V. Ozolina, Econometrics  (/ Descriptive Statistics  Skewness – a measure of asymmetry of the distribution of the series around its mean  Symmetric distribution (such as the normal distribution) = 0  Positive values indicate on a long right tail  Negative values indicate on a long left tail V. Ozolina, Econometrics Descriptive Statistics  Skewness V. Ozolina, Econometrics Descriptive Statistics  Kurtosis – measures the flatness of the distribution – how frequently we can observe large fluctuations  1 *=       +  1 /       V. Ozolina, Econometrics   Descriptive Statistics  Kurtosis – measures the flatness of the distribution  If K = 3*  normal distribution  If K > 3*  flat distribution (platykurtic), heavy tails  If K < 3*  peaked distribution (leptokurtic), skinny or light tails *If 3 is subtracted from the formula, then K=0 in case of a normal distribution (this is the case of MS Excel etc.) V. Ozolina&A.Auzina-Emsina, Econometrics Descriptive Statistics  Kurtosis V. Ozolina, Econometrics Testing The main ingredients of testing:  H0: null hypothesis – a statement, which can be true H1: alternative hypothesis – general  p-value = P[H0 is true] > 0,05 => 33/,4 56 ,  -./ 0 < 0,05 => /8/34 56 If p-value is not given, critical values are used  Decision to accept or reject the H0 V. Ozolina, Econometrics Descriptive Statistics  Jarque-Bera statistics – for testing whether the series is normally distributed («Jarque-Bera statistics=Test for normality»)  The test statistics measures the difference of the skewness and kurtosis from the normal distribution  H0: the data have a normal distribution  If the reported probability is small (usually < 0.05), the data do not have a normal distribution V. Ozolina&A.Auzina-Emsina, Econometrics Descriptive Statistics  Excel: Data  Data Analysis V. Ozolina, Econometrics Excel Y Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count ln(Y) 3176.923 Mean 530.8427 Standard Error 2462 Median #N/A Mode Standard 1913.981 Deviation Sample 3663322 Variance 0.293998 Kurtosis 1.17146 Skewness 5871 Range 1273 Minimum 7144 Maximum 41300 Sum 13 Count d(ln(Y)) 7.91422 Mean 0.154158 Standard Error 7.808729 Median #N/A Mode Standard 0.555823 Deviation Sample 0.308939 Variance -0.85639 Kurtosis 0.491652 Skewness 1.724897 Range 7.149132 Minimum 8.874028 Maximum 102.8849 Sum 13 Count V. Ozolina, Econometrics 0.147352 0.016925 0.115289 #N/A 0.061023 0.003724 0.423699 1.019058 0.205669 0.079296 0.284965 1.91558 13 Excel – Latvia’s real GDP example Y log(Y) dln(Y) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis 19.01958Mean Standard 0.768002 Error 19.85241Median #N/A Mode Standard 3.347643 Deviation Sample 11.20672 Variance -0.51697Kurtosis 2.929184Mean Standard 0.04372 Error 2.988325Median #N/A Mode Standard 0.190569 Deviation Sample 0.036317 Variance -0.07225Kurtosis 0.004039 4.048338 Skewness Range Minimum -0.64666Skewness 11.46765Range 12.39656Minimum -0.92775Skewness 0.654961Range 2.517419Minimum -1.73972 0.267845 -0.1555 Maximum Sum Count 23.8642Maximum 361.3721Sum 19Count 3.17238Maximum 55.65449Sum 19Count 0.112341 0.654961 18 A.Auziņa-Emsiņa, Econometrics 0.036387 0.01498 0.045958 #N/A 0.063554 Descriptive Statistics  Eviews: Series  View  Descriptive Statistics & Tests V. Ozolina, Econometrics EViews 9 12 Series: UK Sample 1975 2015 Observations 40 8 7 6 5 4 3 2 1 10 500000 1000000 1500000 Series: LUK Sample 1975 2015 Observations 40 10 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 1158919. 952576.0 2222912. 195123.1 649969.1 0.108323 1.578219 Jarque-Bera Probability 3.447327 0.178411 2000000 8 6 4 2 12.0 Series: DLUK 12.5 Sample 13.0 1975 13.5 2015 14.0 Observations 39 8 6 4 2 -0.1 -0.0 0.1 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 0.062383 0.065379 0.236191 -0.136675 0.079692 -0.215108 3.610835 Jarque-Bera Probability 0.907085 0.635374 0.2 V. Ozolina, Econometrics 14.5 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 13.75663 13.76689 14.61433 12.18139 0.713249 -0.685299 2.458080 Jarque-Bera Probability 3.620362 0.163625 EViews – Latvia’s real GDP example 4 7 Series: Y Sample 2000 2020 Observations 19 3 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 2 1 Series: LOG_Y_ Sample 2000 2020 Observations 19 6 5 19.019584 19.85241 23.864203 12.39656 2 3.347643 -0.5944441 2.309401 Jarque-Bera Probability 12 13 14 15 16 17 18 19 20 21 22 23 1.496551 0.473182 2.5 2.6 2.7 24 5 Series: DLN_Y_ Sample 2000 2020 Observations 18 4 3 2 1 -0.15 -0.10 -0.05 0.00 0.05 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 0.036387 0.045958 0.112341 -0.155505 0.063554 -1.591250 5.692264 Jarque-Bera Probability 13.03244 0.001479 0.10 A.Auziņa-Emsiņa, Econometrics 2.8 2.9 3.0 3.1 3.2 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 2.929184 2.988325 3.172380 2.517419 0.190569 -0.852836 2.645409 Jarque-Bera Probability 2.402749 0.300780 Descriptive Statistics  Eviews: Group  View  Descriptive Statistics & Tests V. Ozolina, Econometrics EViews V. Ozolina, Econometrics Eviews - Latvia’s real GDP example A.Auziņa-Emsiņa, Econometrics Denominations of the Variables  Outcomes/ Effect Y  Resulting variable Dependent variable  Endogenous variable      Explained variable Predictand Regressand Target variable  Causes/ Causal variables X1, X2, ... Xn  Factors Independent variables  Exogenous variables      Explanatory variables Predictors Regressors Control variables V. Ozolina, Econometrics Tasks of Econometrics in Research of Causalities   Correlation analysis. Estimation of quantitative effect of factor to resulting indicator. V. Ozolina, Econometrics Covariance    Positive covariance – Xi is greater than its mean, when Yi greater than its mean and vice versa. Negative covariance – Xi is greater than its mean, when Yi is smaller and vice versa. Zero covariance – when X un Y are independent V. Ozolina, Econometrics Covariance: Example  Values of 3 variables are given in the Table. Your task is to calculate covariance for pairs RA and RB as well as RA and RC. RA 11 10 9 12 8 10 Δ 11-10 = 1 10-10 = 0 9-10 = -1 12-10 = 2 8-10 = -2 RB Δ RC 8 8-12 = -4 10 10 10-12 = -2 9 16 16-12 = 4 8 10 10-12 = -2 11 16 16-12 = 4 7 12 9 V. Ozolina, Econometrics Covariance: Example cov(RA,RB) = (1/5)*((11-10)*(8-12)+(10-10)*(1012)+ +(9-10)*(16-12)+(12-10)*(10-12)+(8-10)*(1612) = = (1/5) * (1*(-4) + 0*(-2) + (-1)*4 + 2*(-2) + (-2)*4) = = (1/5)*(-4 + 0 – 4 – 4 – 8) = = -20/5 = -4 cov(RA,RC) = 2 V. Ozolina, Econometrics Correlation        -1 ≤ corr(X,Y) or rX,Y ≤ 1 rX,Y < 0 – negative correlation rX,Y > 0 – positive correlation rX,Y = 0 – variables are uncorrelated (no linear correlation) rX,Y = ± 1 – perfect correlation Only linear relations are analysed Correlation ≠ causality:   Spurious correlation Opposite causality V. Ozolina, Econometrics Correlation: Example RA 11 10 9 12 8 10 Δ (11-10)2 = 1 (10-10)2 = 0 (9-10)2 = 1 (12-10)2 = 4 (8-10)2 = 4 2 RB 8 10 16 10 16 12 Δ (8-12)2 = 16 (10-12)2 = 4 (16-12)2 = 16 (10-12)2 = 4 (16-12)2 = 16 11.2 V. Ozolina, Econometrics RC 10 9 8 11 7 9 Correlation: Example ...   corr(RA,RB) = -0.845 corr(RA,RC) = 1 V. Ozolina, Econometrics Correlation Diagram or Scatter Plot y 300 250 200 150 100 b0 50 200 400 600 800 V. Ozolina, Econometrics 1000 1200 1400 x Correlation -> Graphical Analysis V. Ozolina, Econometrics Check the data! V. Ozolina, Econometrics Types of Regression  Depending on the number of factors:  Single Regression  Multiple Regression  Depending on form:  Linear  Non-linear  Depending on character:  Positive (direct) regression  Negative (opposite) regression V. Ozolina, Econometrics Objectives of Regression Analysis  To determine the form of regression:  Linear,  non-linear To determine regression function:  Estimate  particular values of coefficients To estimate unknown values of the dependent variable:  Calculate the value of Y given particular values of X V. Ozolina, Econometrics ECONOMETRICS Single Regression Single Linear Regression Model: Yi = β0 + β1Xi + ui, Where  the subscript i runs over observations, i = 1, 2, ... n;  Yi – dependent variable, regressand, left-hand variable;  Xi – independent variable, regressor, right-hand variable;  β0 + β1Xi – population regression line or population regression function;  β0 – intercept of the population regression line;  β1 – slope of the population regression line;  ui (sometimes also εi) – error term.  V. Ozolina, Econometrics Number of crimes per 10 000 residents Single Linear Regression 300 β0 + β1Xi (X10,Y10) 280 260 240 u10 220 200 180 u1 160 (X1,Y1) 140 120 1000 2000 3000 4000 5000 GDP per capita, Ls V. Ozolina, Econometrics 6000 7000 8000 Estimating the Coefficients of the Linear Regression Model Ordinary Least Squares (OLS)  Coefficients are estimated for a particular sample, but not the whole population, which is unknown n n 2 ∑ u i = ∑ ( Yi − Ŷi ) 2 → min  i =1 i =1 V. Ozolina, Econometrics Ordinary Least Squares OLS  Using linear function Ŷi = b0 + b1X i , we obtain n n i =1 i =1 2 2 ( Y − Ŷ ) = [ Y − ( b + b X ) ] ∑ i i ∑ i 0 1 i → min n ∑u 2 i = F(b0 , b1 ) i =1 ∂F ∂ b = 0  0  ∂F = 0  ∂ b1 V. Ozolina, Econometrics Ordinary Least Squares OLS  Differentiation results in a system of normal n n equations: n ⋅ βˆ0 + βˆ1 ∑ X i = ∑ Yt i =1 n n i =1 i =1 i =1 βˆ0 ∑ X i + βˆ1 ∑ X i2 =  n ∑YX i i i =1 Solution of the normal equations yields OLS estimators of β0 and β1 n βˆ1 = n n n n ∑ X i Yi − ∑ X i ∑ Yi i =1 i =1 i =1   n ∑ X i2 −  ∑ X i  i =1  i =1  n n 2 ∑Y n i βˆ 0 = i =1 n − βˆ1 V. Ozolina, Econometrics ∑X i =1 n i Ordinary Least Squares OLS      Regression line: Ŷi = βˆ0 + βˆ1X i Estimated Yi, predicted value for Xi: Ŷi or β0 estimator: βˆ0 β1 estimator: β̂1 Error for the ith observation: .9 = :  :; V. Ozolina, Econometrics Ordinary Least Squares OLS  Where does the error come from?  Not the correct model  Not the correct parameters  .9 = . + =  => + ?  ?@ A y yi ûi ŷi xi x V. Ozolina, Econometrics Ordinary Least Squares OLS    Yi = 157.16 + 0.0172 Xi GDP per capita increase by 1 unit  number of crimes increases by 0.0172 units Constantly? V. Ozolina, Econometrics Scale of Correlation Diagram ...  The weaker the relationship, the more horizontal should the line be V. Ozolina, Econometrics Least Squares Assumptions  The conditional distribution of ui given Xi has a mean of zero. Distribution of Y when X = 8 Distribution of Y when X = 5 β0 + β1Xi Distribution of Y when X = 2 E(Y|X=8) E(Y|X=2) E(Y|X=5) V. Ozolina, Econometrics Least Squares Assumptions   (Xi,Yi), i = 1, ..., n are independently and identically distributed Large outliers are unlikely V. Ozolina, Econometrics OLS Assumptions      A1: E(ui) = 0 Expected/ average value of the error term is 0 A2: Var(ui) = σ2 Variation of the error is constant and finite (homoscedasticity) A3: Cov(ui,uj) = 0 Errors are statistically independent (no autocorrelation) A4: Cov(ui,Xi) = 0 Variations of the error term and X are not related A5: ut is normally distributed V. Ozolina, Econometrics Properties of OLS   If A1 and A4 hold, OLS is unbiased, i.e., B ?@  ? = 0 If A1, A2, A3 and A4 hold, OLS ir BLUE (Best Linear Unbiased Estimator): ?@  ? = ∑E D .  Var ?@  ? is the smallest obtainable value  V. Ozolina, Econometrics Properties of OLS  If A4 and a part of A2 hold then OLS is consistent (usable) lim K ?@  ? > 0 = 0 →J  If A1, A2, A3 and A4 hold, we have the formulas of the variation of ?@  ? V. Ozolina, Econometrics Properties of OLS   According to the assumptions, errors are normally distributed ui ~ N(0,σ2) As OLS estimators are linearly related to the error term, also they are normally distributed ?@~L(?, N ?@  ? ) It is possible to carry out hypothesis testing  It is possible to use confidence intervals   What to do, if the errors are not normally distributed? V. Ozolina, Econometrics Sample Size   Estimated coefficients have a jointly normal sampling distribution, if the sample size is very large. N > 30; N > 100 observations The larger the variance of Xi, the smaller the variance of coefficient errors  V. Ozolina, Econometrics
«Data Analysis» 👇
Готовые курсовые работы и рефераты
Купить от 250 ₽
Решение задач от ИИ за 2 минуты
Решить задачу
Помощь с рефератом от нейросети
Написать ИИ

Тебе могут подойти лекции

Смотреть все 207 лекций
Все самое важное и интересное в Telegram

Все сервисы Справочника в твоем телефоне! Просто напиши Боту, что ты ищешь и он быстро найдет нужную статью, лекцию или пособие для тебя!

Перейти в Telegram Bot