Samantha Q7: I am currently trying to develop my research design. It will include a computer simulation where the dependent variable is “quality of the decision” and the independent variables are several forms of group characteristics, the mediating variables are symptoms of groupthink. By means of the simulation, I want to see how group size, cohesiveness etc. influence the likelihood of symptoms of groupthink, which will influence the quality of the decision. Different from most experimental designs, in this research a pre-test is not useful, because this is decided by me. Because it is a simulation, I will enter different values for cohesiveness and measure the outcome. For the analysis, what exactly should be measured? How do I do a “between situation” analysis in either SPSS or Stata?


Answer: Simulation is of course not a form of – empirical – research, but rather a mode of theory formulation. If you do not confront it with empirically defined data (these could be ‘stylized facts’), there is no way in which reality will correct you… But otherwise, it can be very useful. Indeed, there seems to little reason to fool around with pre-test of matching or other power-enhancing things, since you can make N as large as you want (unless one of your questions would be how much power you gain by these features of the research design). I know little about simulation, in the ones I have done (see the first assignment from the course), I have used a simple system of linear equations. First, you set the X-variables, then you calculate M’s with inclusion of a random error term, and finally the Y’s, either only from the M’s (perfect mediation) or from the M’s and the X’s (imperfect mediation). Your analysis will be a usual OLS regression / Anova and simply be a restatement of how you make the data.


Samantha Q6: When wanting to obtain information about a causal relationship in terms of a different variable (for example gender), stata requires a large number of observations. Like in the final assignment, for example, I was unable to obtain all information for just one wave due to this limitation. How can this be solved if the ‘by female’ command is not an option?


Answer: I am not sure what you intend to say (gender is almost completely observed in the OSA data, no missings). I will have to look at your assignment first.


Daphe Q7: Is it feasible to conduct multilevel data analysis on the state level? It was stated that at least 18 groups should be available for the analysis, but if we consider for example welfare states and the attitudes of the individual people, it is possible that there are not enough welfare states to reach the required number of groups. Would it then be detrimental for the results of the analysis if other states were included, even if they cannot be considered welfare states? Moreover, there are different types of welfare states, would it be possible to include these types as a variable at a lower level? Also, would it be possible to use for example NGOs or companies as the lowest level of analysis/dependent variable?


Answer: Yes. Multi-level techniques should in particular be used when the number of contexts is rather small. The limit of N=18 only means that you results become very unreliable below this number, but is does not mean that below this number there is no multi-level problem, as a matter of fact it becomes more pressing instead of less. Suppose you would be working on 5 contexts (countries organization), this would allow you to estimate the effects of 4 different scalings. (Single level) OLS regression might suggest that all of these 4 effects are ‘statistically significant’, a muti-level model would make clear that you do not have this amount of information in your data and everything would become very insignificant.


In cross-national research, I am a fan of a meta-analytical approach, I which you first derive micro-level effects for each context (with their SE) and then move up to the aggregate analysis for the cross-national questions. This design is a little bit cumbersome when you have a complicated micro-model (inclusing interaction), but works well for most questions. It is conceptually much clearer than all the usual multi-level designs, statistically easier to operate, and allows you to use XT-programs in the case you have repeated observations on your countries. If you N is 18 or 18*waves, this is the N to use in your analysis.


Daphne Q6: We applied the XT regression analysis to panel data that was collected over time among people, but would it also be a valid method for comparing states over time? In a way, states also have their individual traits and we cannot control for all of these when conducting an analysis. Moreover, some variables change over time for states as well, let’s take the effect of democratization on income disparity as an example. If we measure this at the state level and then compare the data for all states over time waves of 5 or 10 years, would we be able to assert whether or not a causal relationship exists? Or would there not be enough cases to control for all individual traits (there are +/- 193 states, and we could probably not collect the data for all of these)?


Answer: Yes, this very much what XT is useful for. See also my answer to Samantha Q6. There is more about XT than what we have discussed in the course, in particular you can use weights (proportional to the micro-level SE) and can adjust for serial correlation in the T-dimension.


Whether you are indeed discovering causal relations, is not so much dependent upon using XT or not, but what you are willing to assume about causal order.


Bringing in more countries to control possible confounders, can be risky. Suppose you would be interested in the effect of certain legal structures on outcomes among the 18 countries of interest, but you would want to control of GNP. In order to control for GNP it would seem to be useful to include other countries, for which you would know the legal structure (missings), but have the outcome variables (without outcome it will not work). This design would allow you the estimate the GNP effect much better, but run the risk that this additional information is primarily driven some irrelevant (low GNP) countries. This is sometimes called ‘lack of support’ on which case it is argues that you should choose you controls using matching (so similar countries).


Daphne Q5: When we use SEM to establish the correlations of the latent factors, which are indicated by the correlations of the indicators of these, is it possible that we accidentally identify a spurious relationship? If so, is it possible to include certain checks to ensure that we have actually found a causal relationship, besides the use of panel data?


Answer: Relationships are not ‘spurious’, only causal effects can be (i.e. the relationship exists, but is produced by confounding or reversed causation). This is the same whether you use SEM (with latent variables) or not. Everything depends on what you are willing and able to assume about causal order of measured variables, whether you can use fixed-effects controls, or whether you have well-argued instrumental variables in your design.



Nicolette Q7: Structural equation modelling and causal inference. During the lectures we learned how to use structural equation modelling to make claims about causality. As far as I know, we have used cross sectional data to determine whether there existed a causal relationship between variables. How is this possible? Because in order to claim causality you need at least longitudinal data or an experimental design. If you don’t have this, you can only analyse correlations and make suggestions about causality. Thus, how is it possible to make causal claims is you have (1) cross sectional data and use (2) structural equation modelling?


Answer: I disagree. Cross-sectional correlation must arise because of some causal process, which can be threefold: X causes Y (causation), Y causes X (reversed causation), of some Z causes Y and X (spurious causation) – these are all the possibilities – although a combination can also hold. So cross-sectional data can be used to establish causality of X on Y. The basic procedure is to rule our reversed causation (usually by research design or logical argument), and spurious causation (usually by statistical controls). The logical / theoretical part in the argument is about causal order, which is most often established by arguing that X precedes Y, and/ot that Z precedes both X and Y. So the is some sort of longitudinal argument here, although not necessarily longitudinal measurement. Sometimes other arguments may be valid to established causal order, e.g. that structural characteristics (occupations) are likely to be causes of subjective characteristics (work motivation), not the other way around. However, in measurement this can all be cross-sectional. SEMs by themselves do not add anything to establish causality – in the end they are just a bunch of connected linear equations. However, SEM models do help you think about the world in causal terms – which is what sciences is about.



Nicolette Q6x (Additional question about my own research): How can you use interviews for answering a quantitative research question? My research is about inter-organizational collaboration, and I want to explore the moderating (or mediating) effect of trust and commitment between the relationship of collaborative structure and successful collaboration. Successful inter-organisational collaboration involves specific factors such as perceived goal achievement, explored in previous research.  The reason for using interviews is because I have access to 16 respondents only, and this is not enough to claim causality. Is it scientifically possible find out about this mediating or moderating relationship when using interviews? 


Answer: The nature of causal processes is not limited to large N research, it happens in reality, the problem is that you observe so little of it, not that it does not happen. But any evidence is better than no evidence, so first try to model your data with the regular statistical models and study graphs. As I argued in class, inferential statistics (significance testing) is important in small N studies, not in large N studies. And things can really be significant at N=16, I think the minimum is N=5. If you do your study also statistically, you will learn about type II errors and tell your readers about it in an intelligent way. Much cross-national comparative research is like that. Then also consider some alternatives. First, I think that small N research with interviews is in fact often about reconstructing the theories of your respondents. I tend to think of this as a weak version of empirical research, but it is in itself respectable. Second, you can put yourself in the position of an idiosyncratic researcher, in much the same way as an historian or police detective would do research. In this case you are not after testing or developing lawlike causal statements, but explaining the situation you find in terms of a causal model you assume to be true. Third and finally, it might be interesting to use your respondents as informants about multiple instances: this is done in vignette research.


By the way, I think that the word <interview>  should be <open interview>  or <interview with open ended questions>. Furthermore, I never understand why <moderation> and <mediation> are used in one sentence – as behavioral scientist often do. These are not alternatives.


Nicolette Q6: Fixed effects methods. The last page of chapter 2 on fixed effect methods for linear regression states that the fixed effect method effectively controls for all individual-level covariates, both measured and unmeasured. Additionally, the author states that an assumption of a fixed effect method is that the individual-level covariates must have the same effect at all occasions. How could you measure the similarity of the specific effects? Is this based on previous empirical research, on theory, or on something else? Furthermore, how do you know whether you have included all these effects in a model?


Answer: <Assumption> would normally imply that we can not test it empirically. I see that this would be true for the statement that fixed-effect controls all individual-level covariaties, in particular the unmeasured ones. However, I think that the assumption can be tested for measured covariates, as an interaction between person-characteristic and panel wave (time) would be identified. However this is true only for measured characteristics. And there is an easy way to see whether you have included them all: it is the saturated model. However, your worries would be about the unmeasured ones.


Daphne Q4: What are the advantages of using SEM for fixed-effects models? Is it because it allows for testing the underlying latent factors by using multiple indicators? Or could it be that OLS does not offer options for controlling for systematic error like SEM does?


Answer: I think you should have asked: what are the advantages of using SEM for panel data? Fixed-effects models are an alternative to the SEM, or more in particular using a lagged dependent variable in your models, or first differences (the difference between the depedent variable and its lagged version), both of which can be calculated in OLS. At this point I see several advantages for SEM: (A) use of latent variables and correction for measurement error, (B) estimation of measurement error via the simplex model, (C) in SEM you can model longitudinal developments in multiple (latent variables), and estimate cross-lagged causal effect or even reciprocal instantaneous effects. But admittedly, specifying a moderately complicated SEM model is in fact very complicated – let alone obtaining estimates.


(I reserve my final opinion untill I have fully understood Allison’s last chapter.)


Carlos Q6a: I have two questions for this week. First, allowing the intercept to vary between time periods, is it similar to including the lagged dependent effect?


Answer: I would not think so, but I am not sure that I understand the question.


Carlos Q6b: Second, I want to conduct a panel experiment. As a result of randomization, there is no need to control for between-subjects variability. This would suggest that it is not necessary to use fixed effects, which would be an advantage in the sense that more efficiency in the model can be pursued over concern for bias. Let´s assume also that because of randomization, time invariant characteristics, such as gender, do not need to be controlled for. Therefore, which kind of methods can be used to estimate a model in this example? Assuming we have to worry less about getting biased estimates, which methods can provide the most efficiency? 


Answer: In a ‘panel experiment’, you would have both randomized groups and pretest-posttest. Fixed-effects is not needed here to control confouders, but it is useful to increase efficiency (decrease standard errors). The effect of this (decreased standard errors) can be very substantial and much better than when controlling observed characteristics.



Maartje Q5: Last lecture we discussed panel regressions in SEM. I am especially interested not in the cross-lagged panel regression, but the regression in which you want to know if occupation influences work motivation and you do not assume that previous occupation influences work motivation. I am very interested in this analysis method, since I might want to use it for my own research. My research is about social support on an online forum influencing well-being of people. I have two (or three) waves in which I distribute the survey which asks for well-being. Social support (measured by amount of reactions and content of reactions) can be extracted anytime from the forum since the information will be there forever. However, Rene Bekkers commented that the waves do not really make sense, since the change in social support would influence the well-being immediately and will not influence well-being a couple of months later. So, I wondered, can I use the SEM panel regression method for this problem? Can I claim causality by doing this analysis? And, additionally is then the only reason that we use panel data because of the fact that more information is added to the model? Since we do not measure the relationship between the independent variable at t1 and the dependent variable at t2, but we measure both at t2 (for example), it is the same as cross-sectional data, isn’t it? How can we still be able to claim causality in this kind of analysis?


Answer:  I am not sure I understand your questions. Let me try. If causality operates instantaneously, this would not imply that you would not notive it anymore in the next wave… (unfinished…)



Samantha 5: This question is about the MTMM model which we have discussed last week. Although I understood the idea behind the model, I was still interested in the interpretation behind it. I found this website: ; and now have the following question about the interpretation. The website notes that: “Coefficients in the validity diagonals should be significantly different from zero and high enough to warrant further investigation. This is essentially evidence of convergent validity - Convergent validity is the degree to which concepts that should be related theoretically are interrelated in reality. All of the correlations in our example meet this criterion.” However, if the correlation would not be one between different measurements of a concept, I would understand that there is a measurement error. But how can the correlation of the same measurement not be one? Basically all I understand from the matrix are the so-called validity diagonals, could you maybe explain some of the other interpretation methods?



Nicolette Q5: Systematic and measurement error During the lecture we talked about systematic and random measurement errors. The effect of random measurement errors is that it will decrease all correlations – a bad thing. The effect of a systematic measurement error is hard to identify. The meaning of a systematic measurement error is clear and considering this error it is self-evident that a problem like this needs to be solved. However, considering the fact that the effect of a measurement error is hard to identify and often results in rather small changes in the estimation, why should we worry so much about them?


Tamira Q5: During the last lecture (Friday, September 28) you stated that:

-          to trace random error we should repeat the question and

-          to trace systematic error we should repeat the error.

 Can you also trace random or systematic error by using multiple indicators, or can you only use it to compare which of the indicators used is a better one?


Answer: Within one questionnaire, this is all about using multiple indicators (alternate forms to ask the same thing). If there is only random error, the correlation between two indicators will give you a measure of unreliability. We can trace the amountof random error, either by asking three indicators, or by embedding the measurement model in a larger sem model. However, all of this assume absence of systematic error (== absence other latent factors influencing multiple indicators). We can trace the systematic error by asking the alternate forms in a repeated format, so that the error arise again. For occupations, I ask a crude and a detailed question and repeat that for another occupation (eg spouse or father). For education, I ask qualification and duration, and repeat that for another education (e.g. partner.)


Carlos Q5: I am interested in discussing more about the likelihood ratio test for estimating goodness of fit. As I understand, the aim with this test is to achieve no statistically significance difference between the estimated model and a model that fits perfectly the observed correlation matrix. However, as I observed in the last assignment, if the model is estimated with a higher number of observations, the p value tends to decrease towards being significant. Therefore, I don’t´ understand how chi square is a good statistic for measuring goodness of fit. You talked a bit about the RMSEA method and Stata offers other methods as well.

Can you comment on the advantages of using RMSEA?

What is the purpose of estimating chi square and why does Stata use it as a default goodness of fit statistic?

Efe Q4: If we constrain a model by stabilising to a standardised coefficient, is it equal to standardising the rest of the model as well (since we read coefficients in terms of the constrained variable)?

Efe Q3: Is there a way to combine a more complex SEM model that also has a sampling correction for hierarchical models; such as one SEM model at the macro level which explains certain latent variables of another SEM model at the individual (or even time) level.

Efe Q2: If an observed variable is a good indicator for two different latent variables, can we use them in the same model?

I tried this: fisei in both father's and mother's occupation (in addition to misei and mcrude) while latent FOCC and MOCC are predicting ROCC. The structural coefficients decreased while s.e. increased, and model fit (both chi and 2log got weaker), all were contrary to my expectation.

sem (FOCC -> fisei fcrude) (MOCC -> fisei misei mcrude) (ROCC -> rcrude isei) (FOCC -> ROCC) (MOCC -> ROCC)

reference model is:

sem (FOCC -> fisei fcrude) (MOCC -> misei mcrude) (ROCC -> rcrude isei) (FOCC -> ROCC) (MOCC -> ROCC)

Efe Q1: When we work on the simulated data, or the ISSP-NL that you have provided, we exactly know which observed variables are reporting about which common latent variable. In the simulated case, we knew because we created them, and in the ISSP case, we knew because of the data documentation and we build the SEM model on the previous theory, such as Blau & Duncan (1967).  How about somewhat ambiguous attitude or value questions, especially in an exploratory study with few or almost none previous literature on its subject? Should we first focus on factor analysis and reliability analysis between the observed variables that we expect them to be reporting on one latent variable or can we claim (as the math behind them must be very similar) that an SEM model with good predictors (grouping high coefficient ones who result in lower standard errors) are in fact how the world is.

>> I have one essential question which lies beneath of this since the day you explained factor analysis: Why don't we just upload a new data set to a super computer to build every combination of SEM models and run all the iterations and report back the ones that fit best? Of course, we may introduce certain restrictions that “father's education is chronologically before respondents' occupation”. But what if the most important predictor of one's Left-Right oriantation is the amount of tobacco their mother consumed during pregnancy?


Maartje Q4. Panel data & missing values. I was wondering how you should deal with missing values in panel data. Participants fill out the questionnaire in the first wave, not in the second and they fill it out again in the third. How can you account for the missing data on the participant in wave 2? Should you remove the total case, or can you impute the missing values based on the other two questionnaires? It seems to me a very big waste if you have to throw this person out of the analyses, especially because it also seems to me that it is likely in panel surveys that this happens more often.


Samantha Q4: I would like to base my question on a problem with real life multilevel data which I have recently been introduced to. In this case, data was acquired over five moments in time in the setting of a developing neighborhood. The idea is to measure the formation of a social network. The problem with the data is mostly the newcomers and the quitters. In the first measurement point, over 100 people replied and filled out the questionnaire. In the fifth measurement point, only approximately 30 people from the initial 100 people filled out the questionnaire. However, the problem lies in the fact that because it is a developing neighborhood, some people only moved in after the first, second or third measurement point. In the fifth measurement point, the response rate was higher than just the 30 people from the first questionnaire but the researchers gave the newcomers all the previous questionnaires at the same time. Thus, a person that entered after the third measurement point would receive the first, second, third and fourth questionnaire at the same time. Furthermore, the researchers measured the same concepts in all questionnaires, but they also introduced new concepts throughout the study.

My question now is, exactly what data is usable for multi level analysis? Is it just the 30 people who filled out all five questionnaires at the appropriate time? What about the concepts that are introduced in later questionnaires?


Carlos Q4: In Friday´ lecture, Professor Nagel said multilevel analysis cannot be conducted with sample sizes below 18 units of observations. Let’s say we have a small sample of countries in our study and we want to conduct multilevel analysis. We first identify outlier cases that may be potentially influential. Let´s assume we decide not to exclude these cases in order to maintain sufficient sample size for conducting multilevel analysis. How can influential cases be expected to affect the solution estimated with multilevel analysis? How can we counter the potential influence of these cases?


Nicolette Q4: Multi-level modelling is a solution for observations that are nested in a certain structure. Thus observations that are dependent of each other can be modelled using multi-level analysis. I do understand this advantage of the analysis. However, when I apply this to my type of research – network research - I am not sure whether this type of analysis could be the solution for the problem of dependent observations in network research. During class we used a large data set with nearly 4000 observations. First, is multi-level data also appropriate for smaller data sets (e.g. network data in organisations with 40 cases)? Second, what type of data collection strategy is appropriate when using multi-level analysis? Third, with fixed effect model you can analyse personality effects; this seems possible with multilevel modelling as well because you model effects at different levels (which in fact is also the case with personality effects in fixed effect models). So, to what extent (and how) are multi-level models related to fixed effect models?


Maartje Q3: I have a question regarding causality in my research. I want to test the causal relation between network position and well-being. Is it possible to do a panel regression with three waves of data on well-being and network positions using fixed effects methods to rule out confounding variables (since we keep the individual differences constant, the only explanation for the change in well-being is the network position)? And in order to establish the causality of the relation I can run the data in RSiena which is an explanation for selection; so if this model is correct this means that the network forms according to well-being; people connect to people with a high well-being and not the other way around: your connections influence your well-being. Is this a proper way of claiming the causal relation: running a RSiena model to prove that the selection argument is not valid in this case and thus the conclusion is that it is about influence? My question is thus (putted simply): how can I claim causality in a statistical way in my research? How should I collect my data and how should I analyse it?


Answer:  I am not sure I understand what you are saying. To establish a causal relationship, you generally need three things: (A) Causal order (to rule out reversed causation), (B) Control or randomization (to rule out confounding), (C) Association. The panel design would rule out reversed causation, so if R-Siena rules out selectivity (does it?) , you would only need to test whether earlier network position is associated with well-being. It may be that R-Siena does the whole job.


The way I myself (not a network specialist) would approach this is by a 3-wave panel design on individuals, in which network position (e.g. your connectedness, popularity, brokerage or what have you) is a dynamic property of your individuals, as is well-being. You would then test whether earlier network position is associated with later well-being, controlling for earlier well-being. Fixed-effects would rule out confounding by individual constants, but controlling earlier well-being does the same thing. If this is your design, talk to Irma.


If your problem is more complicated, and dyads should be the units of analysis, talk to Ineke.


Nicolette Q3: In assignment 2 we conducted an analysis with constrained estimation. What I understood of it, constrained estimation means that if you do not want a certain relationship to exist in your model (e.g. XàY) you can put a constraint in your model that equals zero. My question is, for what reasons would you put a constraint in your model? Because I would say that, when a certain relationship exists, you cannot just omit it in your model by putting in a constraint. Putting a constrained means in fact that you manipulate your results which could imply that in the end you will get the model you were searching for. The model could in that case not fit with reality and you thus make a type I error. How do you prevent making this type I error when you have found very good reasons (?!) to put in a constraint?


Answer: Constraint estimation can be done for several different reasons: (A) It reduces the number of estimated parameter; makes models easier to interpret and more powerful (less prone to type II errors). (B) You may have substantive reasons to constrain effects, e.g. because you believe or want to test whether two effects are the same (I work a lot with father-mother data – here this is a natural hypothesis. Or because you are repeating a battery ovet time or between countries and want to test whether the measurement model is the same. (C) Occusionally, constraints makes model identified (you reduce the number of estimated parameters.


Constraints are not always about omitting an effect. It can also be that two effects are the same (their difference is 0) or that certain parameters follow a linear / smooth trend (see manual about growth models,


Samantha Q3: In last year’s course we worked with nonlinearity in SPSS. However, we never added more variables to the nonlinear regression besides the independent and dependent variable. Is it possible to measure nonlinearity in different forms of models, such as spurious relationships, or with moderating and mediating variables? How can it be measured if a moderating variable has influence at a certain point in time or certain point of the process?


Answer: Are you referring to regression with polynomial (quadratic etc) terms? Adding covariates to ‘control’ is hardly a complication here, except that you have to think a bit harder to derive the expected values for the – non-linear – variables of interest, since they are no longer the predicted values of regression equation. (Generate them by taking the relevant part of the equation.) Things can become a bit trickier when you have multiplicative terms (interaction – moderation) as you now have to choose whether the interaction is on the non-linear components as well. Sometimes it is quite useful to restruct the interaction to the first order term and show that the non-linear trends have the same shapes, but go in different direction. Think about e.g. about income-age profiles of men and women.


Daphne Q3: When one is conducting statistical analyses with several variables, it is a prerequisite that they can be compared to each other. I have been wondering though if standardization of variables is sufficient to make them comparable, or if there are other conditions that should be met as well? I mean, when you try to compare variables that have been measured at the individual level with macro-data? Or would that never be possible? Also, are there other methods besides standardization that can increase the comparability of variables?


Answer: There are several ways of standardization. Usually we talk about Z-standardizion, but you can also use percentiles or other rank scores (such as quintiles) or dichotomies (P and D standardization). You can also standardize on the range of variables P and D accomplish this). With ratio data, there is usually no need to standardize, since you can use percentage change (the economists call this ‘elasticity’). By standardization you can compare effects to another, but that does not always makes sense (to everybody). Saying that the effect of gender on income is stronger than that of age, since a standard deviation gender brings more income differences than a standard deviation of age (psychologist sauy the same thing often by putting the word “effect size” in between) is somewhat of an odd idea, and many would argue that it is never proper to make such kind of comparisons – nor is there much theoretical need for it.


I tend to think that standardization makes more sense if the underlying metric is more arbitrary and also that you could better do standardization yourself and the variables you want to compare, than interpret fully standardized equation. E.g. in interaction (moderator) model standardization often makes sense but only before the multiplicative terms are constructed. However, in SEM’s you cannot do standardization of latent endogenous variables beforehand, it is a manipulation of the estimated parameters.


On comparing macro-units and micro-units effects, I would like to see an example. To bring one up: can we compare the effect of the GNP of the country you live in with the effect of your education on some outcome variable (say, an attitude). I think that can make sense and different ways of doing this should lead to similar answer. If not you should look closer at what you are standardizing, what units you are comparing.


Carlos Q3: I have observed that many studies do not report in their method section how they determined the sample size. Particularly I am interested in studies that use experimental designs. I have read experimental studies that use the same treatment variables and factorial designs. However, while one of them may use a sample size of 50 participants, the other may have 150 participants. We have talked about how having a large sample size is important to increase power and avoid type 2 errors, especially when we are looking for weak effects. A large sample size is important to reduce standard errors. In some of the studies I have read researchers do not report about these things, how they selected their sample size and how this affects the statistical analysis of data. My questions are the following: Why do you think some studies disregard these matters, or at least they do not talk about it in their articles? And second, which are the appropriate techniques to estimate sample size?


Answer: I do not generally share your experience that experimentalists do not report their sample size – in my experience they generally do (and they should). However, as long as they report appropriately estimated standard errors, it does not really matter. In meta-analysis, these SE’s can be used to compare or average results of different experiments. (However, some meta-analyses use N as we weight.) A more subtle concern is whether researchers actually know what their effective sample size is. In practice, samples are hardly ever simple random. If they are clustered, the effective sample size can be much smaller than the nominal N. Systematic and stratified sample can imply an effective N that is larger than the nominal N. Missing values also obscure the calculations of the effective sample size: if you substitute missing values you are making up data, which may lead to easier estimation of the correct parameters, but should not increase your N. However, I think that these things are generally a bigger problem in observational than experimental research. In experiments it is easier to collect complete data and also to obtain perfect random of matched random assignment of treatments.


Power calculations can made only if you are willing to state or assume an effect size of H1 (alternative hypothesis). A sociology PhD student called my attention to Gpower, that will do the calculations. However, in SEM you can increase power by additional measurements (e.g. pretests, multiple dependent variables) – I do not think that Gpower can cope with these ideas. Similations with realistic but fictional data can be helpful here.


Samantha Q2: As I understand it, normally a theoretical hypothesis is aligned with the alternative hypothesis rather than the null hypothesis. In SEM, it is actually aligned with the null hypothesis. This already brings up questions, but to make it more complicated, there are exceptions to this rule in SEM, namely between-group comparisons. How does this all influence the dissertation (??) itself, and the hypotheses that are stated? What about the conclusions; do we now conclude on results based on the null hypothesis?


Answer: It is actually less complicated than you would think. The test on “goodness of fit” in SEM answers a question that does not arise in OLS with a single equation, because that is a ‘saturated’ model, in which the model always perfectly reproduces the covariances. Questions about the significance of parameters are the same in SEM and OLS single equations. Questions about fit do arise in discrete data analysis (does the model reproduce the observed counts) or in OLS single equation, when testing a series of polynomial of dummy / spline specification to approach a non-linear pattern.


[My original questions was the same as Johannes’ question on nonlinear models within SEM. I found some readings on this, the most informative one is included in the attachment]


Comment: I think the article is not about the non-linear models Johannes meant, but about non-additive (moderation) and polynomial models. Non-linear models (proper) are models that are non-linear in the parameters (the parameters are e.g. in exponents) and in particular when such model cannot be ‘linearized’, e.g. by taking logs. Such models arise in econometrics, meteorology, etc., but I rarely have seen them in the social sciences. Meanwhile, the article you found takes up some advanced issues in SEM modeling, namely how to do interactions and polynomials in SEM. I still have to study it fully, but my first impression is that they use constraints that are easily specified in LISREL, for which I have not (yet) found a parallel in Stata12 SEM.



Vera Q2: The Stata SEM textbook states that “The absence of a curved path—say, between e.x1 and e.x4 —means the variables are constrained to be uncorrelated.” Later on they speak of 'covariance' of variables. But is correlation and covariance the same thing, in this context? 


Answer: Covariance is unstandardized correlation, correlation is standardized covariance. If standardization is not an issue, the terms are used interchangeable, and this is correct: two variables that covary, also correlate, vice-versa.


Daphne Q2: Imagine one has a sample of maybe 80-100 cases, but very few of these cases have no missing values for all of the included variables. If it is necessary to conduct an analysis that uses listwise exclusion, almost all cases would be excluded so the actual sample would become very small. But I suppose even for pairwise exclusion it would be useful to find a way of dealing with missing values. Which SPSS/Stata option would be most suitable? Or does that depend on the situation? Would it be useful to look at the standard error in this case to figure out which method for dealing with missing values is most reliable?


Answer: Using listwise deletion you would be left with no evidence at all, which is clearly less than what you have. If you need to work with techniques that cannot work with the summary correlation matrix (such as logistic regression), you have no other option than to substitute your missing data. The proper way of doing this is multiple imputation, i.e. generate several rounds of nearest neighbour imputations and use the differences between rounds to inform your estimate of uncertainty. Alternatively, you could use models that can work from a correlation matrix only (regression and factor analysis) and work from a correlation matrix with pairwise deletion of missing values. This choice would have the advantage of using only the information you have, not less, but also not more, and you do not have to make up anything. However, pairwise deletion such as practiced by spss clearly does not work the way you expect (I will show this in the future) with respect to the estimated SE. Stata SEM (method=mlmv) offers an alternative that works appropriately (also LISREL FIML option), and it can also be applied to models without latent variables.


Both multiple imputation and FIML assume that the data are MCAR or MAR (the missingness is at random). They tend to work better if you have fewer missings (which may be the case in the data you describe – if the missingness occurs at random, listwise deletion tends to loose a large amount of cases).


Johannes q2: SEM makes use of the fact that for systems of linear equations, either zero, one or an infinite number of solutions exist. Because we want to avoid having infinite solutions, we choose to use either identified or overidentified linear equations (for the latter, ML yields an optimal solution, if there is one).

But can SEM also be used to estimate non-linear systems of equations? If I recall highschool-math correctly, non-linear equations can have different numbers of solutions, so how do we handle it if SEM gives us two or even more solutions to a model? Or is it unusual to use non-linear equations in a SEM model?


Answer: I do not really know what the answer could be. Although I have not encountered systems of non-linear equations in social science, I am fairly certain that they are used in the natural sciences, both in biology and physics/chemistry. The issue here may be more whether such systems can also be estimated when one assumes latent variables and measurement error, which seems rather uniquely an idea of the social sciences. I have not found any further reference in the stata sem manual, but the lisrel 8.8 manual talks about how to use non-linear constraints. I have never tries this, not even for multiplicative interaction terms.


Maartje Q2a. My research is about the relation between an online social network and the well-being of people. However, if I measure the network characteristics and the well-being at the same time I cannot claim causality since it could be possible that the independent and the dependent variable co-evolve. So, instead I should do a pre-test on well-being, then collect the social network data and then do a post-test on well-being. However, I cannot control for confounding variables here. My question is thus, how should I collect my data in order to claim causality without the risk of confounding variables? An answer to this question could be (I think) to use RSiena; a longitudinal simulation program. Another question following from the use of RSiena is: A longitudinal design with at least two time points is necessary to be able to use RSiena. However, is it likely that the social network and the well-being of the participants will change in the short time period of three months?  


Answer: Any causal analysis need an argument about causal order: what is the cause, and what is the outcome? Causal analysis does nothing to justify causal order claims, it calculates causally interpretable coefficients, given that you specify the causal order first. The usual justification of causal order is by invoking time order, although sometimes other believable arguments are given. So if you want to draw causal conclusions, you better have a longitudinal design, either by panel observation of by retrospection. Causal order is related by not the same thing as confounding variables analysis. The relationship is this: you must also know the causal location of the confounders, and they must be causing both X and Y. Controlling an intervening variables (between X and Y) or even a variable that comes after Y, is not a good idea. The best ways to do  controlling confounders is by experimental design or instrumental variables analysis. If you are with the poor souls (welcome to the club) that cannot do experiments or IV, you will have to theorize you confounders and measure them. But you will always have to live your live knowing that there could be other confounders explaining to wonderful “effect”  and live unhappily ever after.


Maartje Q2b. I will probably use a panel regression analysis for my thesis research about social networks characteristics and their influence on well-being. However, I am interested in the differences between people (do people with different positions in the network – i.e. being more central – differ in their level of well-being?) as well as in the differences within people (at t1 person A has 2 friends and at t2 person A has 5 friends does this matter for the level of well-being of this person?). I am not sure anymore if I should use a fixed effects or random effect panel regression. For fixed effects, only changes within the individual count (differences within people). However, I also think that there are differences (in structural network positions) between persons that have an influence on their well-being. So what kind of analysis should I use, fixed effect or random effects?


Answer: Fixed effects will control any individually constant confounder (think: gender, cohort, education), but not confounders that change over time. However, it is somewhat hard to think of a time-dependent confounder that is indeed causally prior to X and Y and not in fact a mediating variable M. Historical changes come to my mind, but these are in fact the same for all subjects, and should not confound your results on individual differences.


I do not understand Random Effects models well enough to give you good advice. It seems to me that you cannot have it both ways.


Nicolette Q2: During the last lecture about SEM models we talked about the use of this type of statistical analysis. The main argument to use SEM models seems to be that it calculates the true score, which differs from the observed score. I do understand this advantage and I also understand that it is important to know the true correlation of latent variables. However, what I do not understand is how to interpret the true correlation between latent variables. How do you relate them to your model? Because in fact, the estimated model is wrong. So how do you explain that the true correlation for example is .7, while the model estimates .345? To me it seems that estimated model does not make sense anymore?


Answer: Indeed an interesting and important question! It is inaccurate to say that SEM’s calculate true scores, rather they estimate relationships (correlations and regressions) between such true scores, even without (!) the possibility to know someone’s true score. This may make your concerns even greater… To become as happy with SEM results as I (sometimes) am, it is important to develop a platonic attitude towards observation: what we see in our data-matrix is not reality, it is only caused by reality, and it comes with a load of measurement error. Only by theorizing about such error and modeling the error process, we can find out how reality really looks like. The observed correlation of 0.345 is indeed the correlation you observe in your data matrix, but then your data matrix is only your data matrix, not reality itself. With a better observation process, you would see another correlation, while reality would stay the same.


Carlos Q2: Yesterday in class we analyzed a measurement model using SEM in Stata. We observed that having items with high measurement error does not affect the structural coefficient when these are removed from the model. However, removing them increases the standard error of this coefficient, especially when we have few items in our measurement scale. I interpreted this observation as implying that having more unreliable measurements is better than having fewer measurements. In the previous course, we discussed how using Cronbach´s Alpha was not an adequate procedure because, among other reasons, this technique does not take into account all available data in the model, as it uses a listwise deletion method for handling missing values. I am interested that many researchers use Alpha Cronbach´s not only to estimate the reliability of a scale, but also to determine the reliability of individual items within a scale. Within this estimate, removing unreliable items may actually increase the reliability of a scale, as expressed in the increase of the Cronbach´s Alpha. I have five questions regarding this matter:

1.      How can a measurement model estimated by SEM provide us with estimates of measurement reliability as we saw yesterday?

2.      Since Cronbach´s Alpha uses listwise deletion and SEM uses FIML, how does this difference generate different results in estimations of reliability? 

3.      Does Cronbach´s Alpha take into consideration the number of items in a scale and, as SEM does, consider that having unreliable items is more positive for our inferential statistics than having fewer items?

4.      If Cronbach´s Alpha uses listwise deletion for handling missing data and we can estimate regression analysis with pairwise deletion, how useful is the information provided by Cronbach´s Alpha then to estimate the possible effects of the reliability of our items on the standard errors in a regression?

5.      We know that increasing sample size helps to reduce standard errors in a regression. However if Cronbach´s Alpha estimates reliability with fewer information than available, then there is a mismatch between the sample size used in reliability analysis and the one used in regression analysis. Then, how can we relate the estimates of reliability using Cronbach´s Alpha with the standard errors obtained in a regression?


Answers:  These are many good questions. I hope I have all the answers:

1.      Cronbach’s alpha states two conditions to improve reliable measurement: (1) more indicators, (2) better (more strongly correlated) items. You can improve measurement reliability by taking into account many bad items (which is what many psychological scales do).

2.      SEM (=factor analysis) does not give us a direct estimate of reliability of the index variable that we would form from the indicators, and there is no reason to form such an index to begin with. SEM’s do give us an indication of the reliability (systematic or stable variance) in each of the indicators, however, there is no suggestion whether of when to leave one out.

3.      The way unreliability works out in SEM solutions is indeed in the SE’s of the structural effects: when we have lower mean correlation, or fewer indicators (the two ingredients of alpha), the SE’s become larger. In principle it seems possible to me to have a criterium about when it is better to leave an indicator out, but I am not aware of its existence.

4.      Indeed, missing values are a headache in reliability analysis – you can only do listwise analysis, despite the fact that alpha uses the mean correlation between items, a quantity that can clearly be calculated from a pairwise correlation matrix. The underlying reason is that with missing data there is in fact variation in reliability for each individual, depending upon how much items have been validly scored. Imputing the missing data might even aggrevate the problem.

5.      The FIML procedure in SEM supposedly avoids the problem by taking into account as much information as there is – pairwise deletion as you really wanted it to be – and presents you the bill for missing information in terms of uncertainty about coefficients – their SE. This is important, but actually not a solution for the problem you really wanted to solve: what is the reliability of the index that you would form from partially observed indicators?


Vera (1) I read in some textbooks that the Central Limit Theorem states that the distributions of large samples are normal. As such, we do not have to worry about assumptions about normal distributions for large samples. However, aren't variables such as income typically skewed, and not normally distributed, even in large samples? 


Answer: You are mixing things up – the distributions in large samples are not at all necessarily normal, there is in fact no relationship between sample size and sample distribution. The Central Limit Theorem states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. So it is about a distribution of means, as produced by a large number of independent, random variables. One way to think about it is that if something is produced by a large number of causal factors, none of which dominates the process, that something will be normally distributed. Natural phenomena that come to mind are height of weight. In your research, if you add up (average) a large number of attitude items, the resulting scale will be approximately normally distributed: symmetric distribution, with lots of cases around the mean and ever fewer cases further away from the mean.


Your mix-up has to do with sampling theory and refers in fact to sampling distribution, this is the distribution of a certain sample statistic (such as a mean, but also a regression coefficient or a standard deviation) in a large number of random samples tends to be normal, which is a tremendously important thing in statistics. The CLT here says that the normal shape of the sampling distribution arises when you have many samples – or as they say, it is asymptotically true. But it is also true that the normal shape arises more accurately, when the size of all the individual samples is relatively large (typical numbers here range between 30 and 100). The approximation is also better when the sample distributions themselves are approximately normal.


Finally, income distributions are in fact approximately normal, or more appropriately log-normal: if you take logs in a (skewed) income distribution, the result can be approximately normal. The same is true for other right skewed distribution (distribution with a long right tail), such as waiting times. This is so because the underlying growth process is typically multiplicative in nature: taking the logarithm makes it additive and conform to CLT.


Maartje (0): what is the difference between type 1 and type 2 errors?


Type I / II errors are defined relative to the Null Hypothesis H0.

·         Type I error is when the H0 holds in the population and you reject it

·         Type II error is when the H0 does not hold in the population and you do not reject (=accept) it.


Both are probabilities that we would like to know. This is simple for Type I errors, since this probability is chosen by the researcher: it is called the significance level and most commonly chosen at 5%. Whether you run into a type I error is just a matter of bad luck – it happens 5% of the time, if your want to put your risk at 5%. We know how often it happens, but not when it happens – there is really nothing that you can do here.


The probability of a type II error is usually expressed by its complement, the probability of rejecting the H0 when it is indeed false – which is called statistical power (onderscheidend vermogen). Typically, you care very much about type II errors, because the researcher’s sympathy is mostly not with the H0, but with the alternative H1. You would hate yourself, if you could have discovered a great effect, but erroneously concluded that there is no effect!


Power is hard to calculate – it order to do so, you have to have an expectation of the size of the effect in H1, or at least assume one. This type of knowledge is typically absent. However, there are some important rules when power becomes larger or smaller – and they contain very important lessons in research methodology. Here are some:

·         Power increases with sample size, typically with sqrt(N). This implies that increasing the N is much more effective in small sample sizes than in large sample sizes.

·         Power increases with explained variance, even if this is produced by variables that are substantively uninteresting. This makes that pre/post experimental designs are so much more powerful than randomized group designs. It is also a reason why panel data and time series data with correlated error terms can be so powerful. We will also see examples with constrained estimation.

·         Power decreases when you measure your variables with much (random) error, when you uses lots of predictors, that do not contribute to explained variance and generally when you ill-design your research.

·         (Power increases when you choose your significance level a (probability of type I error) higher. If you choose a very low (.01 or lower), you make it more likely that you accept the H0, even when it should haven been rejected. This is again mostly a concern in low N situation, where it actually can make sense to increase the a -level to 0.20, in order to find more balance between type I and type II errors – but of course low N research is likely to run into erroneous conclusions either way.)


So, type II errors are the errors you care about most and they are the errors that you can do something about!


Maartje Q1: A structural model is a model in which the parameters are not merely a description but believed to be of a causal nature (Stata manual, p. 285). I have a question regarding the word ‘believed’ in the definition of a structural model; does this mean that the causal nature of the model is based on assumptions and theory of the researcher? We discussed in class that SEM does not simply put several X variables in one model, but assumes a causal order of those X variables. However, I do not understand how this causal order is determined. As we also discussed, in your research about gender, education and occupation the order is generally clear, however it is still not ‘proven’ by statistics, it is still an assumption. Is the causality of the model determined by SEM, or by assumptions of the researcher? And what is the difference with multiple regression models regarding the causal ordering of the variables?


Answer: Indeed, any causal analysis requires the assumption of causal order (== no reverse causation) between X and Y, as well as on the causal position of potential confounders Z. Given these assumptions, causality can be concluded from the association between X and Y, if the Z are statistically controlled. The assumed causal order must come from the researcher (and the research design), it is not in the data. This is the same in regression, SEM or whatever.


Carlos Q1: Traditionally quantitative social research in Costa Rica has centered on cross-sectional survey designs. In terms of theory building, typically the state of knowledge involving certain topics is based on importing knowledge from other regions in the world, mainly Europe and USA. However, there are certain research topics that correspond to the local reality of Costa Rican society. Therefore, the current state of knowledge is highly limited in this sense. Research findings are not replicated and no infrastructure and research culture exist to conduct longitudinal research or even experimental research. It is difficult to find national organizations or institutions that also periodically gather datasets on social issues.

Therefore I argue that the current state of social research in Costa Rica, from the reality check of what can be effectively researched in the context hinders the possibilities of pursuing causal explanations of phenomenon. Quantitative methods at least in the field of Social Psychology that I know of mostly perform multivariate regressions, Chi Squares and Manova.


Although I think that experimental research could be more encouraged there is still the question of how to approach causality in observational research within this context. My question is the following: Given the reality of the research context and how this influences the types of research designs available, which methods for addressing causality can offer opportunities to overcome this gap in spite of design limitations? I would consider panel and longitudinal methods difficult to implement, but how can SEM contribute to address causality in cross-sectional survey designs with little local theory building and background research on a subject?


Answer: I am not so sure about the connection of you argument with the Costa Rican situation. There may be such things as Costa Rican research problems, budgets and expertise, but social theories apply in Costa Rica just as well (or bad) as elsewhere. So if we need causal approaches elsewhere, Costa Ricans need them too.


Experimental research is not necessarily harder or more costly than observational research, and in fact it would be less expensive to do it in Costa Rica than in Western Europe. The same is true for longitudinal (panel) designs: in Costa Rica it may be less expensive and attrition may be lower. However, it may be equally hard to think of good and feasible experimental and longitudinal designs as elsewhere.


SEM cannot remedy low budgets or lack of technical skills.


Samantha Q1: In the Stata manual on SEM models in section Intro 5, an introduction is given about the use of SEM for the comparison of groups. In the case of quasi-experimental research (as an alternative to an experimental design), a comparison of groups would be vital. The manual states that: “When we specify group(groupvar), the measurement parts of the model—parts 1 and 3—are constrained by default to be the same across the groups, whereas the middle part—part 2—will have separate parameters for each group. More specifically, parts 1 and 3 are constrained to be equal across groups except that the variances of the errors will be estimated separately for each group”. Does the use of parts 1 and 3 then imply that a control group is not necessary because the control variables within the groups are used in the analysis? What other benefits would SEM offer when doing quasi-experimental research (with exception of the option to put constraints on variables)?


Answer: No, the groups in SEM are generally not the experimental and control groups in (quasi-) experiments. Using the group option in SEM allows you to have different measurement models in different parts of the data, e.g. men and women, or different countries. At best, you would want to check this in experiments and truly hope the measurement models are not different. But usually, the experimental treatment in SEM would just be an X-variable.


SEM does not remove the need of a control group.


(However, your question made me think once more – maybe there is something useful in using the group option.)


Daphne Q1: What would be the best approach to analyzing macro-data that has been measured at various points in time for a sample of countries?


Answer: This is not so different from analyzing individuals in a panel observation design and you can use the same models (XT in stata) or a cross-lagged panel design in SEM. Variations in XT designs typically have to do with at what intervals you do your observations and how many intervals you have. Many macro cross-national country studies have many intervals and few countries and they are often called pooled time-series designs, whereas situations with many cross-sections and fewer time point are usually called panel designs. The differences between th etwo are at most subtle.


Nicolette Q1:  An assumption of regression analysis is that observations are independent of each other. However, in my case (dynamic networks), the observations are not independent since the relationship at time point 1 is dependent on the relationship at time point 2. How is it possible to statistically determine (or not) the causality in this situation? Thus, what and how is the best way to analyze this data? Or, do I need another design or method of data collection?


Answer: Dependency of observations is a very natural phenomenon in any kind of longitudinal observation: the units you see in the next wave are nor new units, the are the same. You should not think about this a disadvantage, in fact the strengt of panel observation comes from exploiting this dependency, e.g. fixed effects models, or by controlling pre-test variables. In a sense, in these situation you can do better than if you would have independent information – it makes the design much more powerful (and also valid).


The problem with your longitudinal network observations is that dependency happens twice: the units are not only dependent over time, but also cross-sectionally, because they are related to one another according to the network structure. This situation is somewhat similar to a hierarchical multi-level design, in which students are observed in school classes within schools, etc. However, the dependency in networks is not hierarchically structured, but according to network dependency. I am not sure that it is of help, but read my article with Ineke Nagel & Matthijs Kalmijn about a similar situation. Here we model friendships between students based on the cultural characteristics in a two-wave design.


Johannes Q1: My question is: why should including covariates in experimental designs increase statistical power? Why does including Z decrease our uncertainty about βk, even though X and Z are completely uncorrelated (do not provide any information about each other)? Also if I think about the consequences of this: suppose we have a medical trial in which a drug is randomly assigned to a sample and we want to test how the drug performs compared to taking no drug (or a placebo; say Y is whether a patient dies or not). Because assignment is random, taking the drug is by design uncorrelated with everything. Now, assuming the discussed above is true, wouldn't that mean that we could increase our certainty about the effect of the drug by including predictors of Y such as age, gender, some genetic stuff and so on? Because they would not be correlated with the assignment to the drug but increase the overall model fit? If this is true, is this actually done in medical research?