Statistics: Introduction

⌛ 2020 год
👀 340 просмотров
📌 291 загрузка
🏢️ National Research University Higher School of Economics

Выбери формат для чтения

Конспект лекции по дисциплине «Statistics: Introduction», pdf

Загружаем конспект в формате pdf

Это займет всего пару минут! А пока ты можешь прочитать работу в формате Word 👇

Конспект лекции по дисциплине «Statistics: Introduction», Word формат

What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Statistics: Introduction Ekaterina A. Aleksandrova Associate Professor Department of Economics Centre for Health Economics, Management, and Policy National Research University Higher School of Economics in Saint Petersburg ea.aleksandrova@hse.ru January, 2020 What Is Statistics? Types of Data Vaiables and Scales of Measurement 1 What Is Statistics? 2 Types of Data 3 Vaiables and Scales of Measurement 4 Applying Statistics in Business Applying Statistics in Business What Is Statistics? Types of Data Vaiables and Scales of Measurement 1 What Is Statistics? 2 Types of Data 3 Vaiables and Scales of Measurement 4 Applying Statistics in Business Applying Statistics in Business What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business We generally divide the study of statistics into two branches: descriptive statistics and inferential statistics Descriptive statistics refers to the summary of important aspects of a data set This includes collecting data, organizing the data, and then presenting the data in the form of charts and tables In addition, we often calculate numerical measures that summarize, for instance, the data’s typical value and the data’s variability The unemployment rate, the president’s approval rating, the Dow Jones Industrial Average, batting averages, the crime rate, and the divorce rate are but a few of the many ‘statistics’ that can be found in a reputable newspaper on a frequent, if not daily, basis What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Despite the familiarity of descriptive statistics, these methods represent only a minor portion of the body of statistical applications The phenomenal growth in statistics is mainly in the field called inferential statistics Generally, inferential statistics refers to drawing conclusions about a large set of data — called a population — based on a smaller set of sample data A population is defined as all members of a specified group (not necessarily people), whereas a sample is a subset of that particular population The individual values contained in a population or a sample are often referred to as observations What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business In most statistical applications, we must rely on sample data in order to make inferences about various characteristics of the population For example, a 2016 Gallup survey found that only 50% of Millennials plan to be with their current job for more than a year Researchers use this sample result, called a sample statistic, in an attempt to estimate the corresponding unknown population parameter In this case, the parameter of interest is the percentage of all Millennials who plan to be with their current job for more than a year It is generally not feasible to obtain population data and calculate the relevant parameter directly, due to prohibitive costs What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Population vs Sample A population consists of all items of interest in a statistical problem A sample is a subset of the population We analyze sample data and calculate a sample statistic to make inferences about the unknown population parameter What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Need for Sampling A major portion of inferential statistics is concerned with the problem of estimating population parameters or testing hypotheses about such parameters If we have access to data that encompass the entire population, then we would know the values of the parameters Generally, however, we are unable to use population data for two main reasons: What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Need for Sampling Obtaining information on the entire population is expensive Consider how the monthly unemployment rate in the United States is calculated by the Bureau of Labor Statistics (BLS). Is it reasonable to assume that the BLS counts every unemployed person each month? The answer is a resounding NO! In order to do this, every home in the country would have to be contacted. Given that there are approximately 160 million individuals in the labor force, not only would this process cost too much, it would take an inordinate amount of time. Instead, the BLS conducts a monthly sample survey of about 60,000 households to measure the extent of unemployment in the United States) It is impossible to examine every member of the population Suppose we are interested in the average length of life of a Duracell AAA battery. If we tested the duration of each Duracell AAA battery, then in the end, all batteries would be dead and the answer to the original question would be useless What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Populations and Samples population: the 31 flavors of ice cream at a 31-flavor ice cream store sample: the five flavors that you have tested in order to determine whether this store sells good ice cream population: all voters in the US sample: the 3,000 people who are interviewd as part of an opinion poll population: all people in Russia sample: the people from 5,000 households interviewed in RLMS What Is Statistics? Types of Data Vaiables and Scales of Measurement 1 What Is Statistics? 2 Types of Data 3 Vaiables and Scales of Measurement 4 Applying Statistics in Business Applying Statistics in Business What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Cross Sectional Data Sample data are generally collected in one of two ways. Crosssectional data refer to data collected by recording a characteristic of many subjects at the same point in time, or without regard to differences in time Subjects might include individuals, households, firms, industries, regions, and countries What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Time Series Time series data refer to data collected over several time periods focusing on certain groups of people, specific events, or objects Time series can include hourly, daily, weekly, monthly, quarterly, or annual observations Examples of time series data include the hourly body temperature of a patient in a hospital’s intensive care unit, the daily price of General Electric stock in the first quarter of 2019, the weekly exchange rate between the U.S. dollar and the euro over the past six months, the monthly sales of cars at a dealership in 2016, and the annual growth rate of India in the last decade What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Cross-Sectional Data and Time Series Data Cross-sectional data contain values of a characteristic of many subjects at the same point or approximately the same point in time Time series data contain values of a characteristic of a subject over time What Is Statistics? Types of Data Vaiables and Scales of Measurement 1 What Is Statistics? 2 Types of Data 3 Vaiables and Scales of Measurement 4 Applying Statistics in Business Applying Statistics in Business What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business When we conduct a statistical investigation, we invariably focus on people, objects, or events with particular characteristics When a characteristic of interest differs in kind or degree among various observations, then the characteristic can be termed a variable We further categorize a variable as either qualitative or quantitative For a qualitative variable, we use labels or names to identify the distinguishing characteristic of each observation For instance, the 2010 Census asked each respondent to indicate gender on the form Each respondent chose either male or female Gender is a qualitative variable Other examples of qualitative variables include race, profession, type of business, the manufacturer of a car, and so on What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business A variable that assumes meaningful numerical values is called a quantitative variable Quantitative variables, in turn, are either discrete or continuous A discrete variable assumes a countable number of values Consider the number of children in a family or the number of points scored in a basketball game We may observe values such as 3 children in a family or 90 points being scored in a basketball game, but we will not observe 1.3 children or 92.5 scored points What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The values that a discrete variable assumes need not be whole numbers For example, the price of a stock for a particular firm is a discrete variable The stock price may take on a value of 20.37or20.38, but it cannot take on a value between these two points Finally, a discrete variable may assume an infinite number of values, but these values are countable; that is, they can be presented as a sequence x1 , x2 , x3 , and so on The number of cars that cross the Golden Gate Bridge on a Saturday is a discrete variable Theoretically, this variable assumes the values 0, 1, 2, . . . What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business A continuous variable is characterized by uncountable values within an interval Weight, height, time, and investment return are all examples of continuous variables For example, an unlimited number of values occur between the weights of 100 and 101 pounds, such as 100.3, 100.625, 100.8342, and so on In practice, however, continuous variables may be measured in discrete values We may report a newborn’s weight (a continuous variable) in discrete terms as 6 pounds 10 ounces and another newborn’s weight in similar discrete terms as 6 pounds 11 ounces What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Qualitative Variables vs. Quantitative Variables A variable is a general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree Labels or names are used to categorise the distinguishing characteristi of a qualitative variable; eventually, these attributes may be coded into numbers for purposes of data processing A quantitative variable assumes meaningful numerical values, and can be further categorized as either discrete or continuous A discrete variable assumes a countable number of values, whereas a continuous variable is characterized by uncountable values within an interval What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Nominal Scale The nominal scale represents the least sophisticated level of measurement If we are presented with nominal data, all we can do is categorize or group the data The values in the data set differ merely by name or label Consider the following example What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Nominal Scale Each company listed in Table is a member of the Dow Jones Industrial Average (DJIA) The DJIA is a stock market index that shows how 30 large, publicly owned companies based in the United States have traded during a standard trading session in the stock market Table also shows where stocks of these companies are traded: on either the National Association of Securities Dealers Automated Quotations (Nasdaq) or the New York Stock Exchange (NYSE) These data are classified as nominal scale since we are simply able to group or categorize them Specifically, only four stocks are traded on Nasdaq, whereas the remaining 26 are traded on the NYSE What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Nominal Scale Often we substitute numbers for the particular qualitative characteris or trait that we are grouping One reason why we do this is for ease of exposition; always referring to the National Association of Securities Dealers Automated Quotations, or even Nasdaq, becomes awkward and unwieldy In addition, as we will see later in the text, statistical analysis is greatly facilitated by using numbers instead of names For example, we might use the number 0 to show that a company’s stock is traded on Nasdaq and the number 1 to show that a company’s stock is traded on the NYSE, or in tabular form: What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Ordinal Scale Compared to the nominal scale, the ordinal scale reflects a stronger level of measurement With ordinal data we are able to both categorize and rank the data with respect to some characteristic or trait The weakness with ordinal data is that we cannot interpret the difference between the ranked values because the actual numbers used are arbitrary For example, suppose you are asked to classify the service at a particular hotel as excellent, good, fair, or poor A standard way to record the ratings is Category Excellent Good Fair Poor Rating 4 3 2 1 What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Ordinal Scale Here the value attached to excellent (4) is higher than the value attached to good (3), indicating that the response of excellent is preferred to good Category Excellent Good Fair Poor Rating 4 3 2 1 What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Ordinal Scale However, another representation of the ratings might be Category Excellent Good Fair Poor Rating 100 80 70 40 Excellent still receives a higher value than good, but now the difference between the two categories is 20 (100-80), as compared to a difference of 1 (4-3) when we use the first classification. In other words, differences between categories are meaningless with ordinal data. (We also should note that we could reverse the ordering so that, for instance, excellent equals 40 and poor equals 100; this renumbering would not change the nature of the data) What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Ordinal Scale Nominal and ordinal scales are used for qualitative variables Values corresponding to a qualitative variable are typically expressed in words but are coded into numbers for purposes of data processing When summarizing the results of a qualitative variable, we typically count the number or calculate the percentage of persons or objects that fall into each possible category With a qualitative variable, we are unable to perform meaningful arithmetic operations such as adding and subtracting What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Interval Scale With data that are measured on an interval scale, not only can we categorize and rank the data, we are also assured that the differences between scale values are meaningful Thus, the arithmetic operations of addition and subtraction are meaningful The Fahrenheit scale for temperatures is an example of an interval scale Not only is 60 degrees Fahrenheit hotter than 50 degrees Fahrenheit, the same difference of 10 degrees also exists between 90 and 80 degrees Fahrenheit What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Interval Scale The main drawback of data on an interval scale is that the value of zero is arbitrarily chosen; the zero point of an interval scale does not reflect a complete absence of what is being measured No specific meaning is attached to zero degrees Fahrenheit other than to say it is 10 degrees colder than 10 degrees Fahrenheit With an arbitrary zero point, meaningful ratios cannot be constructed. For instance, it is senseless to say that 80 degrees is twice as hot as 40 degrees; in other words, the ratio 80/40 has no meaning What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business The Ratio Scale The ratio scale represents the strongest level of measurement. Ratio data have all the characteristics of interval data as well as a true zero point, which allows us to interpret the ratios of values A ratio scale is used to measure many types of data in business analysis Variables such as sales, profits, and inventory levels are expressed as ratio data A meaningful zero allows us to state, for example, that profits for firm A are double those of firm B Measurements such as weight, time, and distance are also measured on a ratio scale since zero is meaningful Unlike qualitative data, arithmetic operations are valid on intervaland ratio-scaledvalues What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Definitions Continuous Data that can take on any value in an interval. Synonyms: interval, float, numeric Discrete Data that can take on only integer values, such as counts. Synonyms: integer, count What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Definitions Categorical Data that can take on only a specific set of values representing a set of possible categories. Synonyms: enums, enumerated, factors, nominal, polychotomous Binary A special case of categorical data with just two categories of values (0/1, true/false). Synonyms: dichotomous, logical, indicator, boolean Ordinal Categorical data that has an explicit ordering. Synonyms: ordered factor What Is Statistics? Types of Data Vaiables and Scales of Measurement 1 What Is Statistics? 2 Types of Data 3 Vaiables and Scales of Measurement 4 Applying Statistics in Business Applying Statistics in Business What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business A firm preparing to introduce a new product needs to estimate the preferences of the consumers in the relevant market. It can often do this by conducting a marketing survey based on interviews with some randomly selected HHs. The results of the survey can then be used to estimate the preferences of the entire population Statistical techniques are needed to disentangle the separate effects of several different factors. For example, the demand for ice cream in a community can be expected to depend on the price of ice cream, the level of average income, the number of children in the community, and the average temperature. If you have observations of all the different factors involved, you can use correlstion or regression analysis to determine which factors have the most important effects What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business An auditor has a job of checking the books of a company to make sure that they accurately reflect the financial condition of the company. The auditor will need to check through piles of original documents such as sales slips, purchase orders, and requisitions. It would require massive amounts of work to check every single original document; instead, the auditor can check a randomly selected sample of documents and make inferences about the entire population of documents based on that sample What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Before a new drug is marketed, it is necessary to perform extensive experiments to make sure the drug is safe and effective. Some people need to be given the drug to test it, but you won’t know if the drug makes a difference unless you have another group so you can compare/ Yhe best way to test a drug is ti take two groups that are as much alike as possible, give the drug to one of the groups but not to the other, and then see whether the results for the two groups are different. The group that is given the drug is called the experimental group (or treatment group), and the other group is called the control group. Statistical analysis is necessary to determine whether any observed differences really were caused by the drug or could have been caused by other factors What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business If you are receiving a large shipment of goods from a supplier, you will want to make sure that the goods meet the quality standards agreed upon. It would be very expensive to perform a quality control check on every single item, but once again statistical techniques come to the rescue by allowing you to make inferences about the quality of the entire lot by checking a randomly selected sample of items chosen from the lot Financial investments involve risk. Statistical analysis allows you to estimate the risk and expected return from an investment portfolio, and how changing the composition of a portfolio can affect the risk and return What Is Statistics? Types of Data Vaiables and Scales of Measurement Applying Statistics in Business Insurance actuaries need to study statistics to estimate the probabilities of various events and to analyze the risk for the insurance company Advertising rates for television and radio stations are based on ratings, which are determined by statistical samples When analyzing natural phenomena, such as weather or wildlife populations, you usually have to work with a sample because it is impractical to observe the entire population Empirical Research Data Scales Questionnaire Conclusion Statistics: Research Ekaterina A. Aleksandrova Associate Professor Department of Economics Centre for Health Economics, Management, and Policy National Research University Higher School of Economics in Saint Petersburg ea.aleksandrova@hse.ru May, 2020 Empirical Research Data Scales 1 Empirical Research 2 Data 3 Scales 4 Questionnaire 5 Conclusion Questionnaire Conclusion Empirical Research Data Scales 1 Empirical Research 2 Data 3 Scales 4 Questionnaire 5 Conclusion Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Useful cuotes ‘If something exists it can be measured’ E.L. Thornike ‘The purpose of computing is insight, not numbers’ R. Hamming ‘Nothing has such power to broaden the mind as the ability to investigate systematically and truly all that comes under thy observation in life’ Marcus Aurelius ‘In many spheres of human endeavor, from science to business to education to economic policy, good decisions depend on good measurement’ Ben Bernanke Empirical Research Data Scales Questionnaire Conclusion What does empirical research mean? We test various models, theories, hypotheses We use real data Our resuls are useful for policymakers, firms, general public, etc. Our methods are based on the synthesis of theory, measurements, and statistical instruments Empirical Research Data Scales Questionnaire Conclusion Empirical Research Make sure that the the question you pose is actually answered in the body of the work Typically, your research question should answer the question “WHY?” Always start from the MECHANISMS Empirical Research Data Scales Questionnaire Conclusion Example I RQ: Are better educated people healthier? We start from the question ‘Why?’ Mechanism 1: more education => more money => more investment to health => better health Mechanism 2: more education => more stress => more smoking / drinking => bad health Empirical Research Data Scales Questionnaire Conclusion Example II RQ: Is it true that high concentration of firms in one sector in one sity leads to innovations in this sector? WE start from the question ‘Why?’ Mechanism 1: more firms => higher competition => more investment to R&D => more innovations Mechanism 2: more firms => higher competition => lower prices => no money for investments => no innovations Empirical Research Data Scales Questionnaire Conclusion Empirical Research How to find a mechanism? Search the related theory Read some articles (type keywords in scholar.google.com and you will be surprised how many texts have been published yet) Just google Empirical Research Data Scales Questionnaire Conclusion Database Download your database Read the information about database carefully!!! Read not only the codebook but also the questionnaire!!! Be sure that you know who collected this database and for what the reasons How was this database collected? (expert opinion, self-reported measures, ...) What is the item of observations in your database? (country, firm, people, ...) Empirical Research Data Scales 1 Empirical Research 2 Data 3 Scales 4 Questionnaire 5 Conclusion Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Measurement Problems In most cases, we work with abstract (ideal) items which are not strictly measurable Examples: human capital, health What is to be done? We use proxies (variables that approximate the initial item in an appropriate way), indexes, sets of indicators or variables Empirical Research Data Scales Proxy Variables of Initial Characteristics We work with initial characteristics: Human capital Firm performance Poverty Innovations Health Entrepreneurial activity How to measure them? Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Proxy Variables of Initial Characteristics Plenty of definitions and each depends on the item: Human capital Labour Economics Individual level data — years of education; Country level data — the share of citizens with higher education (tertiary education) Firm performance Corporate Finance or Performance Management Sales, Revenue, ROS, ... It might be also the growth of such Salest indicators if we are interested in firm performance change Sales t−1 Poverty Demography or Sociology or Economics Definitions vary across countries. If you work with country level data, use the definition of WB. Empirical Research Data Scales Questionnaire Conclusion Proxy Variables of Initial Characteristics Plenty of definitions and each depends on the item: Innovations Economics of Innovation Firm-level data — R&D investment, number of patents, etc. Health Health Economics, Public Health, Demography, Epidemiology Individual level — self-assessed health, EQ–5D, SF–36, number of chronic deseases, doctor visits, etc.; country level — mortality, morbidity, life expectancy, etc. Entrepreneurial activity Entrepreneurship Individual level — self-employment status, etc.; country level — the share/number of SMEs, new firms, self-employed people, etc. Empirical Research Data Scales Questionnaire Conclusion Discussion of Proxy Variables You are recommended to choose (at least) two proxies if it is possible Try to explain the difference of them: Not the same interpretation; different scales; which one is better or not for your initial variable; advantages and disadvantages Why two or more proxies? It gives you: Robustness check; Comparable results; More details about analysed mechanisms; Sensitivity analyses of your results; Perhaps, more information for discussion? Flexible conclusion Empirical Research Data Scales Questionnaire Six Stages of Data Processing Data collection Data preparation Data input Processing Data output or Interpretation Data storage See more details here: talend.com/resources/what-is-data-processing/ Conclusion Empirical Research Data Scales Questionnaire Conclusion Data ‘Cleaning’ Encode into missings the following observations: “Refuse to answer”, “I do not know” ’, etc. These answers are usually encoded as 999, 9999999, -9, etc. Drop missings! BUT ensure the reduction in you sample (if the reduction is too big, you might make a biase conclusion = selection problem) Work in Stata and write all the command in do-file — it helps to come back and redo/correct your data processing Attach the final do-file to your final report (in Appendix) Make comments in your do-file, it helps to read and understand your logic Empirical Research Data Scales Questionnaire Conclusion Data Limitations You can limit data if it is necessary (according to the goal of your research or to the theoretical background) Examples: Return to human capital. We restrict a sample excluding respondents who are not in labour force Special cases: excluding the outliers (sberbank) Risks Sample selection can lead to a nonrepresentative subsample Aggregation errors (“manufacture-centrism”) Selection bias Empirical Research Data Scales Questionnaire Conclusion Data Description Database title Source of data Type of data Time period used Unit (item) of observations Number of observations Advantages and disadvantages of the database used Data limitations: omitted variables (no information about income), response rate (number of nonmissings), underrepresentative subgroups, attrition problem Describe all the manipulations performed (aggregation, merging databases, deflation, creating new variables) Descriptive statistics including descriptive statistics for the subsamples (male vs female, employed vs unemployed, etc.) Never use abbreviations for variables in tables!!! Round to the appropriate numbers (0.0000) Empirical Research Data Scales Questionnaire Conclusion Measurement Process Data reconnaissance includes: Descriptive statistics (min, max, mean, sd, median, etc.) Densities (kdensity, histogram, box-plot, etс.) Scatterplots for two variables Cross-tables Pair correlations Tests + Everything which allows you to make a conclusion! Empirical Research Data Scales 1 Empirical Research 2 Data 3 Scales 4 Questionnaire 5 Conclusion Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Scales of measuremen Scales of measurement in research and statistics are the different ways in which variables are defined and grouped into different categories Sometimes called the level of measurement, it describes the nature of the values assigned to the variables in a data set Measurement is the process of recording observations collected as part of a research Scaling is the assignment of objects to numbers or semantics These two words merged together refers to the relationship among the assigned objects and the recorded observations Empirical Research Data Scales Questionnaire Conclusion What is a Measurement Scale? A measurement scale is used to qualify or quantify data variables in statistics It determines the kind of techniques to be used for statistical analysis There are different kinds of measurement scales, and the type of data being collected determines the kind of measurement scale to be used for statistical measurement These measurement scales are four in number, namely: nominal scale, ordinal scale, interval scale, ratio scale Empirical Research Data Scales Questionnaire Conclusion NOTE! The measurement scales are used to measure qualitative and quantitative data With nominal and ordinal scale being used to measure qualitative data While interval and ratio scales are used to measure quantitative data Empirical Research Data Scales Questionnaire Conclusion Characteristics of a Measurement Scale: IDENTITY Identity refers to the assignment of numbers to the values of each variable in a data set Consider a questionnaire that asks for a respondent’s gender with the options Male and Female for instance The values 1 and 2 can be assigned to Male and Female respectively (in sociology) In statistics we use 0 and 1 (Why?) Arithmetic operations can not be performed on these values because they are just for identification purposes This is a characteristic of a nominal scale Empirical Research Data Scales Questionnaire Conclusion Characteristics of a Measurement Scale: MAGNITUDE The magnitude is the size of a measurement scale, where numbers (the identity) have an inherent order from least to highest They are usually represented on the scale in ascending or descending order The position in a race, for example, is arranged from the 1st, 2nd, 3rd to the least. Empirical Research Data Scales Questionnaire Conclusion Characteristics of a Measurement Scale: EQUAL INTERVALS Equal Intervals means that the scale has a standardized order I.e., the difference between each level on the scale is the same This is not the case for the ordinal scale Each position does not have an equal interval difference In a race, the 1st position may complete the race in 20 secs, 2nd position in 20.8 seconds while the 3rd in 1 min. A variable that has an identity, magnitude, and the equal interval is measured on an INTERVAL SCALE Empirical Research Data Scales Questionnaire Conclusion Characteristics of a Measurement Scale: ABSOLUTE ZERO Absolue zero is a feature that is unique to a ratio scale It means that there is an existence of zero on the scale, and is defined by the absence of the variable being measured (e.g. no qualification, no money, does not identify as any gender, etc.) Empirical Research Data Scales Questionnaire Conclusion Levels of Data Measurement By knowing the different levels of data measurement, researchers are able to choose the best method for statistical analysis!!! Empirical Research Data Scales Questionnaire Conclusion Nominal Scale The nominal scale is a scale of measurement that is used for identification purposes It is the coldest and weakest level of data measurement among the four Sometimes known as categorical scale, it assigns numbers to attributes for easy identity These numbers are however not qualitative in nature and only act as labels The only statistical analysis that can be performed on a nominal scale is the percentage or frequency count. It can be analyzed graphically using a bar chart and pie chart We might be interested in dynamics Empirical Research Data Scales Questionnaire Conclusion Nominal Scale: Example Which political party are you affiliated with? Independent, Republican, Democrat Labeling Independent as “1”, Republican as “2” and Democrat as “3” does not in any way mean any of the attributes are better than the other They are just used as an identity for easy data analysis Empirical Research Data Scales Questionnaire Conclusion Nominal Scale: Example Gender: Male, Female, Other. Hair Color: Brown, Black, Blonde, Red, Other. Type of living accommodation: House, Apartment, Trailer, Other. Genotype: Bb, bb, BB, bB. Religious preference: Buddhist, Mormon, Muslim, Jewish, Christian, Other. Empirical Research Data Scales Questionnaire Nominal Scale: Statistics The most appropriate analyses is pie-chart share comparison distribution analysis (specifically in time) NOT ALLOWED: mean values, sd, summation, etc. Absolute Zero exists Conclusion Empirical Research Data Scales Questionnaire Conclusion Ordinal Scale Ordinal Scale involves the ranking or ordering of the attributes depending on the variable being scaled The items in this scale are classified according to the degree of occurrence of the variable in question Ordinal scale can be used in market research, advertising, and customer satisfaction surveys It uses qualifiers like very, highly, more, less, etc. to depict a degree We can perform statistical analysis like median and mode using the ordinal scale, but not mean However, there are other statistical alternatives to mean that can be measured using the ordinal scale. Empirical Research Data Scales Questionnaire Conclusion Ordinal Scale: Example A software company may need to ask its users: How would you rate our app? Excellent, Very Good, Good, Bad, Poor The attributes in this example are listed in descending order Empirical Research Data Scales Questionnaire Conclusion Ordinal Scale: Example High school class ranking: 1st, 9th, 87th. . . Socioeconomic status: poor, middle class, rich. The Likert Scale: strongly disagree, disagree, neutral, agree, strongly agree. Level of Agreement: yes, maybe, no. Empirical Research Data Scales Questionnaire Ordinal Scale: Statistics The most appropriate analyses is histogramms shares distribution NOT ALLOWED: mean values, sd, summation, etc. Absolute Zero does not exist Conclusion Empirical Research Data Scales Questionnaire Conclusion Interval Scale The interval scale of data measurement is a scale in which the levels are ordered and each numerically equal distances on the scale have equal interval difference If it is an extension of the ordinal scale, with the main difference being the existence of equal intervals With an interval scale, you not only know that a given attribute A is bigger than another attribute B, but also the extent at which A is larger than B Also, unlike ordinal and nominal scale, arithmetic operations can be performed on an interval scale Empirical Research Data Scales Questionnaire Conclusion Interval Scale: Example It is used in various sectors like in education, medicine, engineering, etc. Some of these uses include calculating a student’s CGPA, measuring a patient’s temperature, etc. A common example is measuring temperature on the Celsius scale. It can be used in calculating mean, median, mode, range, and standard deviation Empirical Research Data Scales Questionnaire Conclusion Ratio Scale Ratio Scale is the peak level of data measurement It is an extension of the interval scale, therefore satisfying the four characteristics of measurement scale: identity, magnitude, equal interval, and the absolute zero property This level of data measurement allows the researcher to compare both the differences and the relative magnitude of numbers The ratio scale of data measurement is compatible with all statistical analysis methods like the measures of central tendency (mean, median, mode, etc.) and measures of dispersion (range, standard deviation, etc.). Empirical Research Data Scales Questionnaire Conclusion Ratio Scale: Example Some examples of ratio scales include length, weight, time, etc. With respect to market research, the common ratio scale examples are price, number of customers, competitors, etc. It is extensively used in marketing, advertising, and business sales For example: A survey that collects the weights of the respondents Which of the following category do you fall in? Weigh more than 100 kgs 81 – 100 kgs 61 – 80 kgs 40 – 60 kgs Less than 40 kgs Empirical Research Data Scales Ratio Scale: Example Age Weight Height Sales Figures Income earned in a week Years of education Number of children Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Ratio Scale: Cardinal Numbers A cardinal number, sometimes called a “counting number,” is used for counting, like when you count 1,2,3, ... You use these numbers to answer the question “how many?” Many times, sets of cardinal numbers create statistics When this happens, the cardinal numbers disappear For example, according to the 2010 U.S. Census, the average number of people per household in the U.S. is 2.58 This number was arrived at by taking the cardinal number of people in each household and then finding the mean Once you have taken that set of cardinals and found its mean (2.58), the statistic is no longer cardinal Empirical Research Data Scales 1 Empirical Research 2 Data 3 Scales 4 Questionnaire 5 Conclusion Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Qualities of a Good Questionnaire The length of questionnaire should be proper one The language used should be easy and simple The term used are explained properly The questions should be arranged in a proper way The questions should be in logical manner Complex questions should be broken into filter questions The questions should be described precisely and correctly The questionnaire should be constructed for a specific period of time Empirical Research Data Scales Questionnaire Conclusion Qualities of a Good Questionnaire The questions should be moving around the theme of the investigator The answers should be short and simple These answers should be accurate The answers should be direct one The answers should be relevant to the problem The answers should be understandable to everyone of respondents Empirical Research Data Scales Questionnaire Conclusion Qualities of a Good Questionnaire It should seek only that data which can not be obtained from other sources It should be as short as possible but should be comprehensive It should be attractive It should be represented in good Psychological order proceeding from general to more specific responses Double negatives in questions should be avoided Putting two questions in one question also should be avoided. Every question should seek to obtain only one specific information It should avoid annoying or embarrassing questions It should be designed to collect information which can be used subsequently as data for analysis Empirical Research Data Scales Questionnaire Conclusion Measurement Questionnaire surveys are measurement instruments While scientific measurement instruments measure physical properties like weight, questionnaire surveys often measure respondents’ self-reported attitudes, opinions or behaviours As constructs are intangible and complex human behaviours or characteristics, they are not well measured by any single question They are better measured by asking a series of related questions covering different aspects of the construct of interest The responses to these individual but related questions can then be combined to form a score or scale measure along a continuum. Empirical Research Data Scales Questionnaire Conclusion How to measure quality? As with scientific measurement instruments, two important qualities of surveys are consistency and accuracy These are assessed by considering the survey’s reliability and validity Empirical Research Data Scales Questionnaire Conclusion Validity Validity is the extent to which an instrument, a survey, measures what it is supposed to measure: validity is an assessment of its accuracy. How do we assess validity? Face validity and content validity are two forms of validity that are usually assessed qualitatively A survey has face validity if, in the view of the respondents, the questions measure what they are intended to measure A survey has content validity if, in the view of experts (for example, health professionals for patient surveys), the survey contains questions which cover all aspects of the construct being measured Face and content validity are subjective opinions of non-experts and experts Empirical Research Data Scales Questionnaire Conclusion Validity Face validity is often seen as the weakest form of validity, and it is usually desirable to establish that your survey has other forms of validity in addition to face and content validity Criterion validity is the extent to which the measures derived from the survey relate to other external criteria These external criteria can either be concurrent or predictive Concurrent validity criteria are measured at the same time as the survey, either with questions embedded within the survey, or measures obtained from other sources It could be how well the measures derived from the survey correlate with another established, validated survey which measures the same construct, or how well a survey measuring affluence correlates with salary or household income Empirical Research Data Scales Questionnaire Conclusion Validity Often the purpose of a survey is to make an assessment about a situation in the future, say the suitability of a candidate for a job or the likelihood of a student progressing to a higher level of education Predictive validity criteria are gathered at some point in time after the survey and, for example, workplace performance measures or end of year exam scores are correlated with or regressed on the measures derived from the survey If the external criteria is categorical (for example, how well a survey measuring political opinion distinguishes between Conservative and Labour voters), while still criterion validity, how well a survey distinguishes between different groups of respondents is referred to as known-group validity This could be assessed by comparing the average scores of the different groups of respondents using t-tests or analysis of variance (ANOVA = our next lecture!!!) Empirical Research Data Scales Questionnaire Conclusion Validity Construct validity is the extent to which the survey measures the theoretical construct it is intended to measure, and as such encompasses many, if not all, validity concepts rather than being viewed as a separate definition Confirmatory factor analysis (CFA) is a technique used to assess construct validity With CFA we state how we believe the questionnaire items are correlated by specifying a theoretical model Our theoretical model may be based on an earlier exploratory factor analysis (EFA), on previous research or from our own a priori theory We calculate the statistical likelihood that the data from the questionnaire items fit with this model, thus confirming our theory Empirical Research Data Scales Questionnaire Conclusion Reliability Reliability is the extent to which an instrument would give the same results if the measurement were to be taken again under the same conditions: its consistency How do we assess reliability? One estimate of reliability is test-retest reliability This involves administering the survey with a group of respondents and repeating the survey with the same group at a later point in time We then compare the responses at the two timepoints Empirical Research Data Scales Questionnaire Conclusion Reliability For categorical variables we can cross-tabulate and determine the percentage of agreement between the test and retest results, or calculate Cohen’s kappa For continuous variables, or where individual questions are combined to construct a score on a scale, we can compare the values at the two timepoints with a correlation One immediately obvious drawback of test-retest reliability is memory effects The test and the retest are not happening under the same conditions If people respond to the survey questions the second time in the same way they remember responding the first time, this will give an artificially good impression of reliability Increasing the time between test and retest (to reduce the memory effects) introduces the prospect of genuine changes over time Empirical Research Data Scales Questionnaire Conclusion Reliability If the survey is to be used to make judgements or observations of another subject, for example clinicians assessing patients with pain or mental health issues, or teachers rating different aspects of children’s writing, we can compare different raters’ responses for the same subject; inter-rater reliability Here we would use the same statistics as for test-retest reliability As with test-retest reliability the two measurements are again not taken under the same conditions, the raters are “different; one may be systematically “harsher” than the other Empirical Research Data Scales Questionnaire Conclusion Reliability Parallel-form reliability involves developing two equivalent, parallel forms of the survey; form A and form B say, both measuring the same underlying construct, but with different questions in each Respondents are asked to complete both surveys; some taking form A followed by form B, others taking form B first then form A As the questions differ in each survey, the questions within each are combined to form separate scales Based on the assumption that the parallel forms are indeed interchangeable, the correlation of the scale scores across the two forms is an estimate of their reliability The disadvantage of this is that it is expensive; potentially double the cost of developing one survey Empirical Research Data Scales Questionnaire Conclusion Reliability An alternative is split-half reliability Here we divide the survey arbitrarily into two halves (odd and even question numbers, for example), and calculate the correlation of the scores on the scales from the two halves Reliability is also a function of the number of questions in the scale, and we have effectively halved the number of questions So we adjust the calculated correlation to estimate the reliability of a scale that is twice the length, using the Spearman Brown formula Split-half reliability is an estimate of reliability known as internal consistency; it measures the extent to which the questions in the survey all measure the same underlying construct Empirical Research Data Scales Questionnaire Conclusion Reliability Cronbach’s alpha is another measure of internal consistency reliability For surveys or assessments with an even number of questions Cronbach’s alpha is the equivalent of the average reliability across all possible combinations of split-halves Most analysis software will also routinely calculate, for each question or questionnaire item in the scale, the value of Cronbach’s alpha if that questionnaire item was deleted These values can be examined to judge whether the reliability of the scale can be improved by removing any of the questionnaire items Empirical Research Data Scales 1 Empirical Research 2 Data 3 Scales 4 Questionnaire 5 Conclusion Questionnaire Conclusion Empirical Research Data Scales Final Practical Task I believe the Confucian adage: You tell me, I forget You show me, I remember You involve me, I understand Questionnaire Conclusion Empirical Research Data Scales Questionnaire Conclusion Research Questions Are big companies more productive than small companies? Are state-owned companies more productive than private companies? Do developed and developing countries differ in the level of entrepreneurial activity? Are there any gender differences in the level of entrepreneurial activity in Russia? Empirical Research Data Scales Questionnaire Conclusion Research Questions Are employed people healthier than unemployed or self-employed ones? Is it true that export-oriented countries have a better environment for starting a business than import-oriented countries do? Is it true that countries with better financial systems are better at international trade? Is it true that countries with better labour markets are better at entrepreneurial activity? Is it true that the better educated the population of the country is, the more innovative the country is? Empirical Research Data Scales Questionnaire Conclusion Research Questions Is it true that the better the country’s financial institutions are, the more innovative this country is? Is it true that among transition economies elderly people have lower trust to financial institutions than youth population? Is it true that better educated people believe that science makes our life better? Is it true that among transition economies there are generation differences in belief that someone can be rich only at the expenses of others? Empirical Research Data Scales Questionnaire Conclusion Questionnaire Client: The management of Saint Petersburg School of Economics and Management HSE-University (SEM) The goal of your survey is to give the client a suggestion (or suggestions) as to how to improve the process of managing the university during this period of lockdown (transition from mainly working face-to-face to working remotely); your main considerations should be the interests of the two biggest groups of stakeholders — the teachers and the students. Problem: The SEM management would like to choose the best way to adapt the current managerial process in a way that would be most agreeable to both of these two main stakeholder groups (the teachers and the students). Client’s Note: The goal is not to satisfy the students only! The teachers are a scarce resource; as such, they should have their interests taken into account. Empirical Research Data Scales Questionnaire Conclusion Questionnaire Client: Teachers of Saint Petersburg School of Economics and Management HSE-University (SEM) The goal of your survey is to give the client a suggestion (or suggestions) as to how to improve the process of teaching and communicating with the students during the current period of lockdown (transition from mainly working face-to-face to working remotely). Problem: The teachers would like to be advised on the ways in which they can make their courses more convenient to take, the classes more productive and enjoyable, and the communication more efficient. Also consider the technical aspect of distance learning (platforms, homework, connectivity, and so on). Note: Advising on a better way to teach is not equal to “satisfy students above all else”! Empirical Research Data Scales Questionnaire Conclusion Questionnaire Client: International Centre for Health Economics, Management, and Policy HSE-University (CHEMP) The goal of your survey is to make a survey about mental health problems in the current period of lockdown. Problem: We are going through a period of lockdown, which is a stressful time for everybody. Being stuck at home, the economy’s being in depression, working remotely, and worrying about COVID-19 are just a few of the problems that are troubling the population in these trying times. These problems may lead to some mental health disorders. It would be interesting to understand the scale of this problem. Target group: Students of Saint Petersburg School of Economic and Management Empirical Research Data Scales Questionnaire Conclusion Questionnaire Client: International Centre for Health Economics, Management, and Policy HSE-University (CHEMP) The goal of your survey is to make a survey about mental health problems in the current period of lockdown. Problem: We are going through a period of lockdown, which is a stressful time for everybody. Being stuck at home, the economy’s being in depression, working remotely, and worrying about COVID-19 are just a few of the problems that are troubling the population in these trying times. These problems may lead to some mental health disorders. It would be interesting to understand the scale of this problem. Target group: Teachers of Saint Petersburg School of Economic and Management Empirical Research Data Scales Questionnaire Conclusion Questionnaire Client: International Centre for Health Economics, Management, and Policy HSE-University (CHEMP) The goal of your project is to make a survey about people’s perceptions of COVID-19 pandemic. Target group: general population Note: You may narrow the target group down (to just one age group, gender, or any other social stratum) Empirical Research Data Scales Questionnaire Conclusion Questionnaire Problem: As billions of people over the world are currently in lockdown, each of them must have his own view of the COVID-19-related situation in the world. You need to assess how well the people are informed of the different aspects of the coronavirus: whether they know what causes it, how it spreads, and how to avoid contracting it; whether they believe it to be man-made or naturally-occurring; whether they are of the opinion that the virus is real or whether it is a Judeo-Masonic-Reptiloid conspiracy; whether the people are taking any proactive steps to make sure that they not become infected with COVID-19; whether the people believe the WHO statistics on the virus; whether the people follow the guidelines aimed at reducing the spread of the virus; whether the people brave the virus-ridden streets (risk-taking). Empirical Research Data Scales Questionnaire Conclusion Main elements of the survey (1) Reformulate the goal of your survey in the form of a list of specific aims that you should reach as you do your project; (2) Determine the target group (or target groups) that should be surveyed; identify who your target group, population, and sample are. (3) Develop a questionnaire (perhaps you should start from an in-depth interview of two-three representative members of the population you seek to analyse); (4) Choose the platform for your online questionnaire (the easiest way is Google Forms); (5) Make a pilot study (be sure that everybody understands your questions); (6) Analyse the results of the pilot study to be sure that you have a variation in the answers as well as that your questionnaire really helps to reach the aims formulated in (1); (7) Alter the questionnaire if you need to; Empirical Research Data Scales Questionnaire Conclusion Main elements of the survey (8) Organise the survey; (9) Process the collected data, encoding it and compiling it into one database; (10) Make calculations (descriptive statistics, tests, etc.) to help you to (a) check the quality of your data/questionnaire/survey; (b) confirm whether you have attained the aims formulated in (1); (11) Develop a report on your survey (graphs, descriptive statistics, tests, etc.); (12) Make a suggestion (or suggestions) to the client; (13) Discuss the main limitations of your survey. Empirical Research Data Scales Questionnaire Conclusion Main elements of the survey (14) If you use a standard questionnaire developed by others, you might need to translate it into Russian. If you do so, please translate the resulting Russian-language version back into English to see whether the meaning of the questions has remained the same; this operation must be done by different people to check whether the questions have the same meaning in both languages. (15) If the client is the International Centre for Health Economics, Management, and Policy (CHEMP), you should remember that the main goal of CHEMP is research that makes it possible for the centre to make suggestions to policy-makers. Empirical Research Data Scales Questionnaire Conclusion Main elements of the survey (16) If you use open-ended questions in your survey, be aware that doing so may make it difficult for you to conveniently aggregate the answers into a limited number of opinions; this may prevent you from getting the information that you set out to discover. (17) Before you make a questionnaire, have a think about how you are going to encode the information that you will obtain — ordered variables/answers, non-ordered answers, binary answers, Likert scale. . . ). Empirical Research Data Scales Questionnaire Conclusion EVALUATION FORMULA FOR THE SURVEY 1. (0.1) The quality of the report. 2. (0.1) The quality of the presentation. 3. (0.2) Using a sufficient amount of descriptive statistics, graphics, tests, and other methods relevant to attaining the aims of the survey. 4. (0.1) How well the questionnaire fits the goal of the survey. 5. (0.1) Reliability, validity, sensitivity of the questionnaire. 6. (0.1) How well the target group, the population, and the sample are chosen. 7. (0.1) There being survey limitations. 8. (0.2) How well-argumented the suggestions are; how well the suggestions are based on and stem from the results of the survey. Empirical Research Data Scales Questionnaire Conclusion Main elements of the research (1) Get to know the database that you have been recommended (read about who collected the data, how the data were collected, and why the data were collected). (2) Download the database and study the variables that you have at your disposal (carefully read not only the variable labels but also the descriptions of these labels on the website or in the reports). (3) Discuss the mechanisms behind the relationship between the characteristics mentioned in the question. Reformulate the research question in the form of a list of research hypotheses that you should reach as you do your project. Empirical Research Data Scales Questionnaire Conclusion Main elements of the research (4) You are recommended to describe the "ideal, "perfect"variables for your research. Choose variables that could replace your non-existing "ideal"benchmark variable. (5) Choose proxy variables that are best suited for reflecting the characteristics that you are studying (for example, how to measure productivity or innovativeness; how to determine whether the country is developed or developing). Empirical Research Data Scales Questionnaire Conclusion Main elements of the research (6) Reformulate your research question and hypothesis testing in terms of your proxy variables. (7) Make calculations (descriptive statistics, tests, etc.) to test your research hypotheses and to answer your research question. (8) Develop a report on your survey (graphs, descriptive statistics, tests, etc.). (9) Make a conclusion about what the bottom-line answer to your research question is. (10) Discuss the main limitations of your research. Empirical Research Data Scales Questionnaire Conclusion Main elements of the research NOTE. Be aware of the type of data that you are using. The choice of criteria that you will use to answer your research question depends on the type of data. NOTE. Sometimes, there can be several proxy variables for the characteristic that you need. It may be a good idea to use not just one of them but some of them or all of them. You should include a discussion of how each of the proxy variables is superior or inferior to the others. Empirical Research Data Scales Questionnaire Conclusion EVALUATION FORMULA FOR THE RESEARCH 1. (0.1) The quality of the report. 2. (0.1) The quality of the presentation. 3. (0.1) How good your discussion of the mechanisms behind the relationship between the characteristics mentioned in the question is. 4. (0.1) How well you discuss your choice of proxy variables. 5. (0.1) How well you discuss the database and its applicability to the goals of your research. 6. (0.2) Using a sufficient amount of descriptive statistics, graphics, tests, and other methods relevant to attaining the aims of the survey. 7. (0.1) How well you discuss the limitations of your research. 8. (0.2) How well your conclusions and the final discussion of your research are based on your statistical results.