Panel data methods in Stata Session 2 By Ziyodullo Parpiev, PhD Overview Panel data How to get to know the data Change over time Tabulating Calculating transition probabilities Using panel data in Stata Data on n cases, over t time periods, giving a total of n × t observations One record per observation i.e. long format Stata tools for analyzing panel data begin with the prefix xt First need to tell Stata that you have panel data using xtset Complete and incomplete person-wave data +------------------------------------------------------------------+ | pid wave sex age mastat jbstat fihhmn | |------------------------------------------------------------------| | 10019057 1 female 59 never ma retired 780 | | 10019057 2 female 60 never ma retired 759.14 | | 10019057 3 female 61 never ma retired 923.5 | | 10019057 4 female 62 never ma retired 62.5 | | 10019057 5 female 63 never ma retired 663 | | 10019057 6 female 64 never ma retired missing o | | 10019057 7 female 65 never ma retired 1254.963 | | 10019057 8 female 66 never ma retired 1270.432 | | 10019057 9 female 67 never ma retired 1364.555 | | 10019057 10 female 67 never ma retired 1479.74 | | 10019057 11 female 68 never ma retired 1328.25 | | 10019057 12 female 69 never ma retired 1371.49 | | 10019057 13 female 71 never ma retired missing o | | 10019057 14 female 71 never ma retired 1372.333 | | 10019057 15 female 73 never ma retired 1475.812 | |------------------------------------------------------------------| | 10028005 1 male 30 never ma employed 1501.155 | | 10028005 2 male 31 never ma employed 1636.259 | | 10028005 3 male 32 never ma employed 1943.283 | | 10028005 6 male 35 never ma employed 2001.54 | | 10028005 7 male 36 never ma employed 1634.33 | | 10028005 9 male 38 never ma employed 1587.945 | +------------------------------------------------------------------+ Telling Stata you have time series data Unique cross-wave identifier Time variable . xtset pid wave panel variable: time variable: delta: pid (unbalanced) wave, 1 to 15, but with gaps 1 unit Cases not observed for every time period . xtset pid wave panel variable: time variable: delta: pid (unbalanced) wave, 1 to 15, but with gaps 1 unit Period between observations in units of the time variable Describing the patterns in panel data . xtdes,patterns(20) Freq. Percent Cum. | Pattern ---------------------------+----------------1294 28.12 28.12 | 111111111111111 248 5.39 33.51 | 1.............. 157 3.41 36.93 | 11............. 115 2.50 39.43 | ..............1 105 2.28 41.71 | 111............ 104 2.26 43.97 | 1111........... 73 1.59 45.56 | 11111.......... 69 1.50 47.05 | ............111 66 1.43 48.49 | ..........11111 62 1.35 49.84 | .............11 60 1.30 51.14 | .1............. 60 1.30 52.45 | 11111111111.... 58 1.26 53.71 | 11111111....... 58 1.26 54.97 | 111111111...... 57 1.24 56.21 | 11111111111111. 55 1.20 57.40 | .....1......... 54 1.17 58.57 | ........1111111 54 1.17 59.75 | .11111111111111 54 1.17 60.92 | 1111111111..... 53 1.15 62.07 | .........111111 1745 37.93 100.00 | (other patterns) ---------------------------+----------------4601 100.00 | XXXXXXXXXXXXXXX Examining change over two waves 1991 | 1992 Employment status Employment| status | 1 2 3 | Total -----------+---------------------------------+---------1 | 961 35 76 | 1,072 2 | 36 38 24 | 98 3 | 40 23 524 | 587 -----------+---------------------------------+---------Total | 1,037 96 624 | 1,757 2001 | 2002 Employment status Employment | status | 1 2 3 | Total -----------+---------------------------------+---------1 | 991 15 46 | 1,052 2 | 20 12 9 | 41 3 | 56 20 495 | 571 -----------+---------------------------------+---------Total | 1,067 47 550 | 1,664 Calculating transition probabilities The transition probability is the probability of transitioning from one state to another p ij Pr{ X t j | X t 1 i ) So to calculate by hand, n p ij N ij / N ij j 1 Cell count Row total Transition probability matrix 1991 | 1992 Employment status Employment| status | 1 2 3 | Total -----------+---------------------------------+---------1 | 0.90 0.03 0.07| 1.00 2 | 0.37 0.39 0.24| 1.00 3 | 0.07 0.04 0.89| 1.00 -----------+---------------------------------+---------- 2001 | 2002 Employment status Employment| status | 1 2 3 | Total -----------+---------------------------------+--------1 | 0.94 0.01 0.04| 1.00 2 | 0.49 0.29 0.22| 1.00 3 | 0.10 0.04 0.87| 1.00 -----------+---------------------------------+--------- Transition probability matrices in Stata Mean transition probabilities for all waves t to t+1 when you leave out the “if” statement . xttrans jbstat if wave<3,freq current | economic | current economic activity activity | 1 2 3 | Total -----------+---------------------------------+---------1 | 961 35 76 | 1,072 | 89.65 3.26 7.09 | 100.00 -----------+---------------------------------+---------2 | 36 38 24 | 98 | 36.73 38.78 24.49 | 100.00 -----------+---------------------------------+---------3 | 40 23 524 | 587 | 6.81 3.92 89.27 | 100.00 -----------+---------------------------------+---------Total | 1,037 96 624 | 1,757 | 59.02 5.46 35.52 | 100.00 Change in a categorical variable over time A decision tree 0.91 empl empl 0.03 unemp 0.06 0.90 olf unemp 0.03 0.26 0.49 empl empl unemp 0.25 olf 0.07 olf 0.10 0.03 empl unemp 0.87 olf Change in a continuous variable over time Size transition matrix Quantile transition matrix Mean transition matrix Median transition matrix Size transition matrix Absolute mobility Boundaries set exogenously i.e. predetermined e.g. movement in and out of poverty e.g. poverty defined a priori as an income below £5,000 Do not depend on distribution under investigation e.g. comparing mobility in 1990s and 2000s incorporates both movements of positions of individuals and economic growth Quantile transition matrix Mobility as a relative concept Same number of individuals in each class Only records movements involving re-ranking Cannot take account of economic growth, for example when comparing matrices Cannot draw a complete picture if comparing mobility in different cohorts/countries/welfare regimes Mean/median transition matrices Both absolute and relative approaches incorporated into matrices Class boundaries defined as percentages of mean or median income of the origin and destination distributions Example: 25%, 50%, 75% of median income Note that this is not the same as quartiles Example: income 1991-1992 wave = 1 household income: month before interview ------------------------------------------------------------Percentiles Smallest 1% 181.86 0 5% 349.82 0 10% 458.98 0 Obs 2795 25% 826.6895 0 Sum of Wgt. 2795 50% 1511.067 75% 90% 95% 99% 2365.493 3329.769 4062.217 6748.689 Largest 9230.818 9230.818 9230.818 9230.818 Mean Std. Dev. 1773.253 1299.089 Variance Skewness Kurtosis 1687633 1.836874 8.622895 wave = 2 household income: month before interview ------------------------------------------------------------Percentiles Smallest 1% 207.9433 0 5% 338.7431 0 10% 460.68 0 Obs 2639 25% 861.67 5 Sum of Wgt. 2639 50% 75% 90% 95% 99% 1508 2449.813 3414.511 4103.649 5824.449 Largest 8405.636 8405.636 10491.08 10491.08 Mean Std. Dev. 1795.179 1229.827 Variance Skewness Kurtosis 1512476 1.352148 6.370836 Category boundaries for each method Matrix Year Boundary 1 (n) Boundary 2 (n) Boundary 3 (n) Boundary 4 (n) Size 1991 0 - 800 (580) 800 - 1500 (650) 1500 - 2200 (504) 2200 - 9231 (715) 1992 0 - 800 (580) 800 - 1500 (645) 1500 - 2200 (473) 2200 - 10491 (751) 1991 0 – 827 (609) 827 -1511 (615) 1511 – 2365 (611) 2365 – 9231 (614) 1992 0 – 862 (610) 862 – 1508 (612) 1508 – 2450 (612) 2450 – 10491 (615) 1991 0 – 887 (654) 887 -1773 (814) 1773 – 2660 (506) 2660 – 9231 (475) 1992 0 – 898 (652) 898 -1795 (766) 1795 – 2693 (501) 2693 – 10491 (530) 1991 0 – 750 (539) 750 -1500 (685) 1500 – 2250 (540) 2250 – 9231 (685) 1992 0 – 746 (536) 746 -1491 (686) 1491 -2237 (505) 2262 – 10491 (722) Quartile Mean Median Warning! Measurement error Causes an over-estimation of mobility If mother’s and baby’s weight are reported to nearest half pound can affect which band the observations falls in A respondent may describe their marital status as separated in year 1 and single in year 2 Overview Types of questions, types of variables: time-invariant, time-varying and trend Between- and within-individual variation Concept of individual heterogeneity From OLS to models that allow causal interpretations: fixed effects and random effects models The basics of these models’ implementation in Stata Types of variable Those which vary between individuals but hardly ever over time Those which vary over time, but not between individuals The retail price index National unemployment rates Age, in a cohort study Those which vary both over time and between individuals Sex Ethnicity Parents’ social class when you were 14 The type of primary school you attended (once you’ve become an adult) Income Health Psychological wellbeing Number of children you have Marital status Trend variables Vary between individuals and over time, but in highly predictable ways: Age Year Between- and within-individual variation If you have a sample with repeated observations on the same individuals, there are two sources of variance within the sample: The fact that individuals are systematically different from one another (between-individual variation) The fact that individuals’ behaviour varies between observations over time (within-individual variation) k T W m i 1 j 1 k m i 1 k B ( x ij x ) 2 _ ( x ij x i ) 2 Within variation is the sum of the squares of each individual’s observation from his or her mean j 1 m i 1 Total variation is the sum over all individuals and years, of the square of the difference between each observation of x and the mean _ _ _ ( xi x) 2 Between variation is the sum of squares of differences between individual means and the whole-sample mean j 1 x11 x12 ... x1 m x 21 x 22 ... x 2 m .......... ........ .......... ........ x x ... x km k1 k 2 Remember: From the variation, you get to the variance, you get to the Standard Deviation: SD T/(N - 1) xtsum in STATA . . Similar to ordinary “sum” command xtset pid wave panel variable: time variable: delta: pid (unbalanced) wave, 1 to 15, but with gaps 1 unit Have chosen a balanced sample xtsum female partner age ue_sick LIKERT wave if nwaves == 15 Variable female Mean Std. Dev. Min Max Observations .4984321 .4989059 0 0 0 .5397574 1 1 .5397574 N = 16324 n = 1237 T-bar = 13.1964 N = 16292 n = 1234 T-bar = 13.2026 overall between within .5397574 partner overall between within .6892954 .4627963 .4217842 .243531 0 0 -.244038 1 1 1.622629 age overall between within 40.03349 19.74332 19.27238 4.31763 0 6.4 31.30015 98 90.93333 54.30015 ue_sick overall between within .0672924 .2505353 .1738938 .1852756 0 0 -.866041 1 1 1.000626 N = 16302 n = 1237 T-bar = 13.1787 LIKERT overall between within 11.26167 5.344825 3.609665 4.030974 0 0 -6.738331 36 29.69231 35.12834 N = 15661 n = 1225 T-bar = 12.7845 wave overall between within 8 4.320605 0 4.320605 1 8 1 15 8 15 N = n = T = N = n = T = 19410 1294 15 19410 1294 15 All variation is “between” Most variation is “between”, because it’s fairly rare to switch between having and not having a partner All variation is within, because this is a balanced sample More on xtsum…. . . xtset pid wave panel variable: time variable: delta: pid (unbalanced) wave, 1 to 15, but with gaps 1 unit xtsum female partner age ue_sick LIKERT wave if nwaves == 15 Variable female Mean Std. Dev. Min Max Observations .4984321 .4989059 0 0 0 .5397574 1 1 .5397574 N = 16324 n = 1237 T-bar = 13.1964 N = 16292 n = 1234 T-bar = 13.2026 overall between within .5397574 partner overall between within .6892954 .4627963 .4217842 .243531 0 0 -.244038 1 1 1.622629 age overall between within 40.03349 19.74332 19.27238 4.31763 0 6.4 31.30015 98 90.93333 54.30015 ue_sick overall between within .0672924 .2505353 .1738938 .1852756 0 0 -.866041 1 1 1.000626 N = 16302 n = 1237 T-bar = 13.1787 LIKERT overall between within 11.26167 5.344825 3.609665 4.030974 0 0 -6.738331 36 29.69231 35.12834 N = 15661 n = 1225 T-bar = 12.7845 overall between within 8 4.320605 0 4.320605 1 8 1 15 8 15 wave N = n = T = N = n = T = 19410 1294 15 Observations with non-missing variable Number of individuals Average number of time-points Min & max refer to xi-bar 19410 1294 15 Min & max refer to individual deviation from own averages, with global averages added back in. The xttab command For simplicity, omitted jbstats of missing, maternity leave, gov training and other. . xttab jbstat if nwaves == 15 & jbstat >= 1 & jbstat != 5 & jbstat <= 8 jbstat Overall Freq. Percent self-emp employed unemploy retired family c ft studt lt sick, 1388 8982 539 2687 1159 718 558 8.66 56.03 3.36 16.76 7.23 4.48 3.48 Total 16031 100.00 Pooled sample, broken down by person/years Between Freq. Percent 228 974 274 314 292 271 105 2458 (n = 1236) Within Percent 18.45 78.80 22.17 25.40 23.62 21.93 8.50 42.72 68.27 17.51 58.49 28.97 42.93 39.08 198.87 50.28 Number of people who spent any time in this state Of those who spent any time in this state, the proportion of their time (on average) they spent in it. Which statistical model for panel data? Your research question will guide which models are most suitable but the nature of your data is also important: Is your research question cross-sectional or longitudinal, or both? Cross-sectional: exploit variation between individuals Longitudinal: exploit variation “within” individuals over time and permit causal interpretation of effects and can consider “between” variation if needed What is the effect on income of having more children? • • • What is the difference in income between individuals who have a different number of children? What is the difference in income before and after the birth of a child? • What is the difference in income between men and women and before and after the birth of a child? How does income change in the time leading up to the birth of a child ? survival analysis later in this course! Longitudinal analysis is concerned with modelling individual heterogeneity A very simple concept: people are different! In social science, when we talk about heterogeneity, we are really talking about unobservable (or unobserved) heterogeneity: Observed heterogeneity: differences in education levels, or parental background, or anything else that we can measure and control for in regressions Unobserved heterogeneity: anything which is fundamentally unmeasurable, or which is rather poorly measured, or which does not happen to be measured in the particular data set we are using. With panel data we can do something about unobserved heterogeneity as we can differentiate between person-level unobserved x that are identical over time and those that vary over time! OLS with panel data OLS: pooled 3000 4000 OLS: cross-section 1000 2000 Income x1 0 5 10 15 20 25 5 10 15 20 25 30 10 15 20 25 30 35 4000 y 2340 2405 2730 3250 3705 4030 1885 2145 2275 2470 2762 3120 780 1170 1365 2405 2405 2470 3000 wave 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 2000 pid 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 1000 Cross-sectional effect captures may be quite misleading (omitted variable bias)! By adding more data points from the same units at different points in time we can get better estimates. But assumptions of OLS may be violated! Income 0 10 20 30 40 Number of years since leaving school pid=1 pid=2 pid=3 OLSt=1: y=2448 -156*x1 0 10 20 30 40 Number of years since leaving school pid=1 pid=2 pid=3 OLSpooled: y=1925 + 29*x1 An illustration of how unobserved heterogeneity matters Considering this is from panel data, two problems become apparent: • • Error terms for persons 1, 2 and 3 differ systematically The association between x and y appears to be biased OLS: unobs het 4000 4000 OLS: pooled w1 3000 2000 Income 2000 Income 3000 u1 ? 1000 1000 w3 0 10 20 30 40 Number of years since leaving school pid=1 pid=3 pid=2 0 10 20 30 40 Number of years since leaving school pid=1 pid=3 pid=2 Panel data allows you to: Break down the error term (wi) in two components: the unobservable characteristics of the person (ui), and genuine “error” (ei). then model ui and ei Expanding the OLS model to consider unobserved heterogeneity Analytically, think of splitting the error term into it’s two components ui and i y i x i1 1 x i 2 2 x i 3 3 ......... x iK K u i i … and consider that you have repeated observations over time Individual-specific, fixed over time y it x it u i it Varies over time, usual assumptions apply (mean zero, homoscedastic, uncorrelated with x or u or itself) .. and then reduce the complexity of the information available in some way, or add further assumptions. Your options: • • • Focus on “between” variation: loose info on “within” variation Focus on “within” variation: loose info on “between” variation Model both types of variation making further assumptions Within and between estimators Individual-specific, fixed over time y it x it u i it Varies over time, usual assumptions apply (mean zero, homoscedastic, uncorrelated with x or u or itself) Not interested in within variation? Use the means of all observations for all persons i y i xi ui i This is the “between” estimator Not interested in “between” variation? Why not “remove” it in that case! ( y it y i ) ( x it x i ) ( it i ) And this is the “within” estimator – “fixed effects” Interested in both? Well, let’s treat xi_bar as imperfect to measure person fixed effect and use between variation where within variation is poorly captured ( y it y i ) (1 ) ( x it x i ) {( 1 ) u i ( it i )} θ measures the weight given to between-group variation, and is derived from the variances of ui and εi Between estimator y it x it u i it y i xi ui i Interpret as how much does y change between different people Not much used It’s inefficient compared to random effects It doesn’t use as much information as is available in the data (only uses means) Assumption required: that ui is uncorrelated with xi Except to calculate the θ parameter for random effects, but Stata does this, not you! Easy to see why: if they were correlated, how could one decide how much of the variation in y to attribute to the x’s (via the betas) as opposed to the correlation? Can’t estimate effects of variables where mean is invariant over individuals Age in a cohort study Macro-level variables Focusing on “within” variation – the fixed effects family “Fixed effects” estimator Basic idea: For each individual, calculate the mean of x and the mean of y. Then run OLS on a transformed dataset where each yit is replaced by ( x it x i ) and each xit is replaced by ( y it y i ) xtreg y x, fe Identical to: Least Squares Dummy Variables regression areg, y x, absorb(pid) Include a dummy indicator for each individual; all individual level differences, including the idiosyncratic error term, will then be captured in the person-specific intercept. Members of the same family, which you may come across in the literature: First Differences regress D.(y x) For each individual, and each time period’s y and x, calculate the difference between the value in this period and that in the last period. Then run OLS on a transformed dataset where each yit is replaced by (yit – yit-1) and each xit is replaced by (xit – xit-1) “Hybrid models” regress y x mean_x z run standard OLS but add x i of each time-varying variable as additional regressors Fixed effects estimator 1000 y it x it u i it -1000 -500 0 Income 500 ( y it y i ) ( x it x i ) ( it i ) pid wave y x1 x i ( y yi) (x xi ) yi 1 1 2340 0 3076.7 12.5 -736.7 -12.5 1 2 2405 5 3076.7 12.5 -671.7 -7.5 1 3 2730 10 3076.7 12.5 -346.7 -2.5 1 4 3250 15 3076.7 12.5 173.3 2.5 1 5 3705 20 3076.7 12.5 628.3 7.5 1 6 4030 25 3076.7 12.5 953.3 12.5 2 1 1885 5 2442.8 17.5 -557.8 -12.5 2 2 2145 10 2442.8 17.5 -297.8 -7.5 2 3 2275 15 2442.8 17.5 -167.8 -2.5 2 4 2470 20 2442.8 17.5 27.2 2.5 2 5 2762 25 2442.8 17.5 319.2 7.5 2 6 3120 30 2442.8 17.5 677.2 12.5 3 1 780 10 1765.8 22.5 -985.8 -12.5 3 2 1170 15 1765.8 22.5 -595.8 -7.5 3 3 1365 20 1765.8 22.5 -400.8 -2.5 3 4 2405 25 1765.8 22.5 639.2 2.5 3 5 2405 30 1765.8 22.5 639.2 7.5 3 6 2470 35 1765.8 22.5 704.2 12.5 Fixed Effects -10 0 Number of years since leaving school pid=1 pid=3 Fixed effects: 10 pid=2 y=65*x1 Ignores between-group variation – so it’s an inefficient estimator However, few assumptions are required for FE to be consistent: ui is allowed to correlate with xi Disadvantage: can’t estimate the effects of any time-invariant variables Need to consider change in interpretation of effects Want to look at the effect of non-time varying x? Use x and x in OLS i y it x it u i it y it 1 x it 2 x i 3 z i u i residual Hint: create pid 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 wave 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 xi y 2340 2405 2730 3250 3705 4030 1885 2145 2275 2470 2762 3120 780 1170 1365 2405 2405 2470 yourself x 1 2 2 2 1 1 0 1 1 1 1 0 1 1 0 0 0 0 z 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 x_bar 1.5 1.5 1.5 1.5 1.5 1.5 0.66 0.66 0.66 0.66 0.66 0.66 0.33 0.33 0.33 0.33 0.33 0.33 it it zi: non-time varying individual characteristics for which you do not need to include group means •the effect of any unobserved characteristic otherwise transported in the effect x it is shifted to the effect of x i : 1 approximates the coefficient in the FE model, 3 gives you, approximately, the OLS estimate for non-timevarying variables z i • Typically no interest in the effect of x i so no need to worry about its interpretation. Note that 1 3 is approximately equal to the effect in the pooled OLS • Disadvantage: can only control for unobserved heterogeneity associated with observed time-varying variables xi; u iresidual Random effects estimator y it x it u i it “Random Effects Model” here RE Generalised Least Squares ( y it y i ) (1 ) ( x it x i ) {( 1 ) u i ( it i )} Uses both within- and between-group variation, so makes best use of the data and is efficient. Starts off with the idea that using xi_bar is not the best we can do to capture within variation. the more imprecise the estimate of the person-level variation (as measured by the person xi_bar) the more we should draw on the information from other units (x_bar) Assumption required: that ui is uncorrelated with xi Rather heroic assumption – think of examples Will see a test for this later Note that the within and between effect is constrained to be identical (much more like OLS in this respect so no causal interpretation!). E.g., when you include a location indicator in your model, you are saying that the effect on y of moving to a new town is the same as the effect on y of living in different towns. When you include a female dummy, you are saying that the effect of being female on y is the same as the effect on y of changing gender. Estimating fixed effects in STATA . xtreg LIKERT female ue_sick partner age age2 badh, fe Fixed-effects (within) regression Group variable: pid “R-square-like” R-sq: statistic within = 0.0501 between = 0.1906 overall = 0.1285 corr(u_i, Xb) Peaks at age 48 Number of obs Number of groups Coef. female ue_sick partner age age2 badhealth _cons (dropped) 1.951485 -.298668 .1141748 -.0011833 1.230831 6.252975 sigma_u sigma_e rho 3.9934565 4.0525618 .49265449 F test that all u_i=0: 24204 3317 Obs per group: min = avg = max = 1 7.3 14 F(5,20882) Prob > F = 0.1561 LIKERT = = Std. Err. .1394164 .118635 .0214403 .0002209 .0428556 .4932977 t 14.00 -2.52 5.33 -5.36 28.72 12.68 P>|t| 0.000 0.012 0.000 0.000 0.000 0.000 = = [95% Conf. Interval] 1.678218 -.5312018 .0721501 -.0016163 1.14683 5.286073 (fraction of variance due to u_i) F(3316, 20882) = 4.56 Talk about xtmixed 220.44 0.0000 2.224752 -.0661342 .1561994 -.0007503 1.314831 7.219877 “u” and “e” are the two parts of the error term Prob > F = 0.0000 Between regression: . Not much used, but useful to compare coefficients with fixed effects xtreg LIKERT female ue_sick partner age age2 badh, be Between regression (regression on group means) Group variable: pid Number of obs Number of groups = = 24204 3317 R-sq: Obs per group: min = avg = max = 1 7.3 14 within = 0.0480 between = 0.2322 overall = 0.1482 sd(u_i + avg(e_i.))= F(6,3310) Prob > F 3.833357 LIKERT Coef. female ue_sick partner age age2 badhealth _cons 1.476659 2.038192 -.0101941 .0827335 -.0009489 2.275832 3.953941 Std. Err. .1350226 .312191 .1777423 .0219026 .0002263 .0926521 .4430909 t 10.94 6.53 -0.06 3.78 -4.19 24.56 8.92 P>|t| 0.000 0.000 0.954 0.000 0.000 0.000 0.000 = = 166.80 0.0000 [95% Conf. Interval] 1.211923 1.426085 -.35869 .0397895 -.0013927 2.094171 3.085181 1.741395 2.650299 .3383019 .1256775 -.0005052 2.457493 4.822701 Coefficient on “partner” was negative and significant in FE model. In FE, the “partner” coeff really measures the events of gaining or losing a partner Random effects regression . xtreg LIKERT female ue_sick partner age age2 badh, re theta Random-effects GLS regression Group variable: pid Number of obs Number of groups = = 24204 3317 R-sq: Obs per group: min = avg = max = 1 7.3 14 within = 0.0500 between = 0.2239 overall = 0.1471 Random effects u_i ~ Gaussian corr(u_i, X) = 0 (assumed) min 0.1986 5% 0.1986 theta median 0.5482 95% 0.6629 Std. Err. Wald chi2(6) Prob > chi2 LIKERT Coef. female ue_sick partner age age2 badhealth _cons 1.493431 2.045302 -.1947691 .1058038 -.0011062 1.433115 5.181864 .1259931 .1271039 .0973734 .014544 .0001498 .0385506 .3137662 sigma_u sigma_e rho 3.0248563 4.0525618 .3577895 (fraction of variance due to u_i) 11.85 16.09 -2.00 7.27 -7.39 37.17 16.52 2013.32 0.0000 Option “theta” gives a summary of weights max 0.6629 z = = P>|z| 0.000 0.000 0.045 0.000 0.000 0.000 0.000 [95% Conf. Interval] 1.246489 1.796183 -.3856175 .0772981 -.0013998 1.357558 4.566894 1.740373 2.294422 -.0039207 .1343094 -.0008126 1.508673 5.796835 Tells you how good an approximation xi_bar is of the person-level effect; or how much of the within variation we used to determine the effect size zero= OLS 1=FE estimators And what about OLS? OLS simply treats within- and between-group variation as the same Pools data across waves . reg LIKERT female ue_sick partner age age2 badh Source SS df MS Model Residual 103583.505 6 591239.694 24197 17263.9175 24.4344214 Total 694823.199 24203 28.7081436 LIKERT Coef. female ue_sick partner age age2 badhealth _cons 1.409466 2.031815 -.0751296 .0983746 -.0010613 1.841796 4.450393 Std. Err. .0640651 .1240757 .0769271 .0103316 .0001049 .0357165 .2212733 t 22.00 16.38 -0.98 9.52 -10.12 51.57 20.11 Number of obs F( 6, 24197) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.329 0.000 0.000 0.000 0.000 = = = = = = 24204 706.54 0.0000 0.1491 0.1489 4.9431 [95% Conf. Interval] 1.283895 1.788619 -.2259116 .078124 -.001267 1.771789 4.016684 1.535038 2.275011 .0756524 .1186252 -.0008557 1.911802 4.884102 Test whether pooling data is valid y it x it u i it If the ui do not vary between individuals, they can be treated as part of α and OLS is fine. Breusch-Pagan Lagrange multiplier test H0 Variance of ui = 0 H1 Variance of ui not equal to zero If H0 is not rejected, you can pool the data and use OLS Post-estimation test after random effects . quietly xtreg LIKERT female ue_sick partner age age2 badh, re . xttest0 Breusch and Pagan Lagrangian multiplier test for random effects LIKERT[pid,t] = Xb + u[pid] + e[pid,t] Estimated results: Var LIKERT e u Test: 28.70814 16.42326 9.149756 sd = sqrt(Var) 5.357998 4.052562 3.024856 Var(u) = 0 chi2(1) = 10816.48 Prob > chi2 = 0.0000 Comparing models Compare coefficients between models Reasonably similar – differences in “partner” and “badhealth” coeffs R-squareds are similar Within and between estimators maximise within and between r-2 respectively. FE RE fe m ale u e _sick p artn e r 1.95 *** -0.30 ** BE O LS 1.49 *** 1.48 *** 1.41 *** 2.05 *** 2.04 *** 2.03 *** -0.19 ** -0.01 -0.08 age 0.11 *** 0.11 *** 0.08 *** 0.10 *** age 2 0.00 *** 0.00 *** 0.00 *** 0.00 *** b ad h e alth 1.23 *** 1.43 *** 2.28 *** 1.84 *** _co n s 6.25 *** 5.18 *** 3.96 *** 4.45 *** R-2 w ith in 0.050 0.050 0.048 R-2 b e tw e e n 0.191 0.224 0.232 R-2 o v e rall 0.129 0.147 0.148 0.149

© Copyright 2017 ExploreDoc