SELECTED
QUANTITATIVE METHODS
QUESTION & ANSWERS 
Samantha Q7: I am currently trying to develop my research
design. It will include a computer simulation where the dependent variable is
“quality of the decision” and the independent variables are several forms of
group characteristics, the mediating variables
are symptoms of groupthink. By means of the simulation,
I want to see how group size, cohesiveness
etc. influence the likelihood of symptoms of groupthink, which will influence
the quality of the decision. Different from most experimental designs, in
this research a pretest is not useful, because this is decided by me.
Because it is a simulation, I will enter different values for cohesiveness
and measure the outcome. For the analysis, what exactly should be measured?
How do I do a “between situation” analysis in either SPSS or Stata? Answer: Simulation is of course not a form of – empirical –
research, but rather a mode of theory formulation. If you do not confront it
with empirically defined data (these could be ‘stylized facts’), there is no
way in which reality will correct you… But otherwise, it can be very useful.
Indeed, there seems to little reason to fool around with pretest of matching
or other powerenhancing things, since you can make N as large as you want
(unless one of your questions would be how much power you gain by these features
of the research design). I know little about simulation, in the ones I have done
(see the first assignment from the course), I have used a simple system of
linear equations. First, you set the Xvariables, then you calculate M’s with
inclusion of a random error term, and finally the Y’s, either only from the M’s
(perfect mediation) or from the M’s and the X’s (imperfect mediation). Your
analysis will be a usual OLS regression / Anova and simply be a restatement
of how you make the data. 
Samantha Q6: When wanting to obtain information about a causal
relationship in terms of a different variable (for example gender), stata
requires a large number of observations. Like in the final assignment, for
example, I was unable to obtain all information for just one wave due to this
limitation. How can this be solved if the ‘by female’ command is not an
option? Answer: I am not sure what you intend to say
(gender is almost completely observed in the OSA data, no missings). I will
have to look at your assignment first. 
Daphe Q7: Is it feasible to conduct multilevel data analysis
on the state level? It was stated that at least 18 groups should be available
for the analysis, but if we consider for example welfare states and the
attitudes of the individual people, it is possible that there are not enough
welfare states to reach the required number of groups. Would it then be
detrimental for the results of the analysis if other states were included,
even if they cannot be considered welfare states? Moreover, there are
different types of welfare states, would it be possible to include these
types as a variable at a lower level? Also, would it be possible to use for
example NGOs or companies as the lowest level of analysis/dependent variable? Answer: Yes. Multilevel techniques should in particular
be used when the number of contexts is rather small. The limit of N=18 only
means that you results become very unreliable below this number, but is does
not mean that below this number there is no multilevel problem, as a matter
of fact it becomes more pressing instead of less. Suppose you would be
working on 5 contexts (countries organization), this would allow you to
estimate the effects of 4 different scalings. (Single level) OLS regression
might suggest that all of these 4 effects are ‘statistically significant’, a
mutilevel model would make clear that you do not have this amount of
information in your data and everything would become very insignificant. In
crossnational research, I am a fan of a metaanalytical approach, I which
you first derive microlevel effects for each context (with their SE) and
then move up to the aggregate analysis for the crossnational questions. This
design is a little bit cumbersome when you have a complicated micromodel (inclusing
interaction), but works well for most questions. It is conceptually much
clearer than all the usual multilevel designs, statistically easier to
operate, and allows you to use XTprograms in the case you have repeated
observations on your countries. If you N is 18 or 18*waves, this is the N to use
in your analysis. 
Daphne Q6: We applied the XT regression analysis to panel data
that was collected over time among people, but would it also be a valid
method for comparing states over time? In a way, states also have their
individual traits and we cannot control for all of these when conducting an
analysis. Moreover, some variables change over time for states as well, let’s
take the effect of democratization on income disparity as an example. If we
measure this at the state level and then compare the data for all states over
time waves of 5 or 10 years, would we be able to assert whether or not a
causal relationship exists? Or would there not be enough cases to control for
all individual traits (there are +/ 193 states, and we could probably not
collect the data for all of these)? Answer: Yes, this very much what XT is useful for. See also
my answer to Samantha Q6. There is more about XT than what we have discussed
in the course, in particular you can use weights (proportional to the
microlevel SE) and can adjust for serial correlation in the Tdimension. Whether
you are indeed discovering causal relations, is not so much dependent upon
using XT or not, but what you are willing to assume about causal order. Bringing
in more countries to control possible confounders, can be risky. Suppose you
would be interested in the effect of certain legal structures on outcomes
among the 18 countries of interest, but you would want to control of GNP. In
order to control for GNP it would seem to be useful to include other
countries, for which you would know the legal structure (missings), but have
the outcome variables (without outcome it will not work). This design would allow
you the estimate the GNP effect much better, but run the risk that this additional
information is primarily driven some irrelevant (low GNP) countries. This is
sometimes called ‘lack of support’ on which case it is argues that you should
choose you controls using matching (so similar countries). 
Daphne Q5: When we use SEM to establish the correlations of the
latent factors, which are indicated by the correlations of the indicators of these,
is it possible that we accidentally identify a spurious relationship? If so,
is it possible to include certain checks to ensure that we have actually
found a causal relationship, besides the use of panel data? Answer: Relationships are not ‘spurious’, only causal effects
can be (i.e. the relationship exists, but is produced by confounding or
reversed causation). This is the same whether you use SEM (with latent
variables) or not. Everything depends on what you are willing and able to
assume about causal order of measured variables, whether you can use
fixedeffects controls, or whether you have wellargued instrumental
variables in your design. 
Nicolette Q7: Structural
equation modelling and causal inference. During the lectures we learned how
to use structural equation modelling to make claims about causality. As far
as I know, we have used cross sectional data to determine whether there
existed a causal relationship between variables. How is this possible?
Because in order to claim causality you need at least longitudinal data or an
experimental design. If you don’t have this, you can only analyse
correlations and make suggestions about causality. Thus, how is it possible
to make causal claims is you have (1) cross sectional data and use (2)
structural equation modelling? Answer:
I
disagree. Crosssectional correlation must arise because of some causal
process, which can be threefold: X causes Y (causation), Y causes X (reversed
causation), of some Z causes Y and X (spurious causation) – these are all the
possibilities – although a combination can also hold. So crosssectional data
can be used to establish causality of X on Y. The basic procedure is to rule
our reversed causation (usually by research design or logical argument), and
spurious causation (usually by statistical controls). The logical /
theoretical part in the argument is about causal order, which is most often
established by arguing that X precedes Y, and/ot that Z precedes both X and
Y. So the is some sort of longitudinal argument here, although not
necessarily longitudinal measurement. Sometimes other arguments may be valid
to established causal order, e.g. that structural characteristics
(occupations) are likely to be causes of subjective characteristics (work
motivation), not the other way around. However, in measurement this can all
be crosssectional. SEMs by themselves do not add anything to establish
causality – in the end they are just a bunch of connected linear equations.
However, SEM models do help you think about the world in causal terms – which
is what sciences is about. 
Nicolette Q6x (Additional question about my own
research): How can you use interviews for answering a quantitative
research question? My research is about interorganizational collaboration,
and I want to explore the moderating (or mediating) effect of trust and
commitment between the relationship of collaborative structure and successful
collaboration. Successful interorganisational collaboration involves
specific factors such as perceived goal achievement, explored in previous
research. The reason for using
interviews is because I have access to 16 respondents only, and this is not
enough to claim causality. Is it scientifically possible find out about this
mediating or moderating relationship when using interviews? Answer: The nature of causal
processes is not limited to large N research, it happens in reality, the
problem is that you observe so little of it, not that it does not happen. But
any evidence is better than no evidence, so first try to model your data with
the regular statistical models and study graphs. As I argued in class,
inferential statistics (significance testing) is important in small N
studies, not in large N studies. And things can really be significant at
N=16, I think the minimum is N=5. If you do your study also statistically,
you will learn about type II errors and tell your readers about it in an
intelligent way. Much crossnational comparative research is like that. Then
also consider some alternatives. First, I think that small N research with
interviews is in fact often about reconstructing the theories of your
respondents. I tend to think of this as a weak version of empirical research,
but it is in itself respectable. Second, you can put yourself in the position
of an idiosyncratic researcher, in much the same way as an historian or
police detective would do research. In this case you are not after testing or
developing lawlike causal statements, but explaining the situation you find
in terms of a causal model you assume to be true. Third and finally, it might
be interesting to use your respondents as informants about multiple
instances: this is done in vignette research. By
the way, I think that the word <interview> should be <open interview> or <interview with open ended
questions>. Furthermore, I never understand why <moderation> and
<mediation> are used in one sentence – as behavioral scientist often
do. These are not alternatives. 
Nicolette
Q6: Fixed
effects methods. The last page of chapter 2 on fixed effect methods for
linear regression states that the fixed effect method effectively controls
for all individuallevel covariates, both measured and unmeasured.
Additionally, the author states that an assumption of a fixed effect method
is that the individuallevel covariates must have the same effect at all
occasions. How could you measure the similarity of the specific effects? Is
this based on previous empirical research, on theory, or on something else?
Furthermore, how do you know whether you have included all these effects in a
model? Answer: <Assumption> would
normally imply that we can not test it empirically. I see that this would be
true for the statement that fixedeffect controls all individuallevel
covariaties, in particular the unmeasured ones. However, I think that the
assumption can be tested for measured covariates, as an interaction between
personcharacteristic and panel wave (time) would be identified. However this
is true only for measured characteristics. And there is an easy way to
see whether you have included them all: it is the saturated model. However,
your worries would be about the unmeasured ones.

Daphne
Q4: What
are the advantages of using SEM for fixedeffects models? Is it because it
allows for testing the underlying latent factors by using multiple
indicators? Or could it be that OLS does not offer options for controlling
for systematic error like SEM does? Answer: I think you should have
asked: what are the advantages of using SEM for panel
data? Fixedeffects models are an alternative to the SEM, or more in
particular using a lagged dependent variable in your models, or first
differences (the difference between the depedent variable and its lagged
version), both of which can be calculated in OLS. At this point I see several
advantages for SEM: (A) use of latent variables and correction for
measurement error, (B) estimation of measurement error via the simplex model,
(C) in SEM you can model longitudinal developments in multiple (latent
variables), and estimate crosslagged causal effect or even reciprocal
instantaneous effects. But admittedly, specifying a moderately complicated
SEM model is in fact very complicated – let alone obtaining estimates. (I
reserve my final opinion untill I have fully understood Allison’s last
chapter.) 
Carlos Q6a: I have two questions for this week. First,
allowing the intercept to vary between time periods, is it similar to
including the lagged dependent effect? Answer: I would not think so, but I am not sure that I
understand the question. Carlos Q6b: Second, I want to conduct a panel experiment. As a result of
randomization, there is no need to control for betweensubjects variability.
This would suggest that it is not necessary to use fixed effects, which would
be an advantage in the sense that more efficiency in the model can be pursued
over concern for bias. Let´s assume also that because of randomization, time
invariant characteristics, such as gender, do not need to be controlled for.
Therefore, which kind of methods can be used to estimate a model in this
example? Assuming we have to worry less about getting biased estimates, which
methods can provide the most efficiency? Answer: In a ‘panel experiment’, you would have
both randomized groups and pretestposttest. Fixedeffects is not needed here
to control confouders, but it is useful to increase efficiency (decrease
standard errors). The effect of this (decreased standard errors) can be very
substantial and much better than when controlling observed characteristics. 
Maartje Q5: Last lecture we discussed panel
regressions in SEM. I am especially interested not in the crosslagged panel
regression, but the regression in which you want to know if occupation
influences work motivation and you do not assume that previous occupation
influences work motivation. I am very interested in this analysis method,
since I might want to use it for my own research. My research is about social
support on an online forum influencing wellbeing of people. I have two (or
three) waves in which I distribute the survey which asks for wellbeing.
Social support (measured by amount of reactions and content of reactions) can
be extracted anytime from the forum since the information will be there
forever. However, Rene Bekkers commented that the waves do not really make
sense, since the change in social support would influence the wellbeing
immediately and will not influence wellbeing a couple of months later. So, I
wondered, can I use the SEM panel regression method for this problem? Can I
claim causality by doing this analysis? And, additionally is then the only
reason that we use panel data because of the fact that more information is
added to the model? Since we do not measure the relationship between the
independent variable at t1 and the dependent variable at t2, but we measure
both at t2 (for example), it is the same as crosssectional data, isn’t it?
How can we still be able to claim causality in this kind of analysis? Answer: I am not sure I understand your questions. Let me try. If
causality operates instantaneously, this would not imply that you would not
notive it anymore in the next wave… (unfinished…) 
Samantha
5: This
question is about the MTMM model which we have discussed last week. Although
I understood the idea behind the model, I was still interested in the
interpretation behind it. I found this website: http://www.socialresearchmethods.net/kb/mtmmmat.php
; and now have the following question about the interpretation. The website
notes that: “Coefficients in the validity diagonals should be significantly
different from zero and high enough to warrant further investigation. This is
essentially evidence of convergent validity  Convergent validity is the
degree to which concepts that should be related theoretically are
interrelated in reality. All of the correlations in our example meet this
criterion.” However, if the correlation would not be one between different
measurements of a concept, I would understand that there is a measurement
error. But how can the correlation of the same measurement not be one?
Basically all I understand from the matrix are the socalled validity
diagonals, could you maybe explain some of the other interpretation methods? 
Nicolette
Q5: Systematic
and measurement error During the lecture we talked about systematic and random measurement
errors. The effect of random measurement errors is that it will decrease all
correlations – a bad thing. The effect of a systematic measurement error is
hard to identify. The meaning of a systematic measurement error is clear and
considering this error it is selfevident that a problem like this needs to
be solved. However, considering the fact that the effect of a measurement
error is hard to identify and often results in rather small changes in the
estimation, why should we worry so much about them? 
Tamira
Q5: During
the last lecture (Friday, September 28) you stated that: 
to trace
random error we should repeat the question and 
to trace
systematic error we should repeat the error. Can you also trace random or
systematic error by using multiple indicators, or can you only use it to compare
which of the indicators used is a better one? Answer: Within one questionnaire, this is all
about using multiple indicators (alternate forms to ask the same thing). If
there is only random error, the correlation between two indicators will give
you a measure of unreliability. We can trace the amountof random error,
either by asking three indicators, or by embedding the measurement model in a
larger sem model. However, all of this assume absence of systematic error (==
absence other latent factors influencing multiple indicators). We can trace
the systematic error by asking the alternate forms in a repeated format, so
that the error arise again. For occupations, I ask a crude and a detailed
question and repeat that for another occupation (eg spouse or father). For
education, I ask qualification and duration, and repeat that for another
education (e.g. partner.) 
Carlos
Q5: I am
interested in discussing more about the likelihood ratio test for estimating
goodness of fit. As I understand, the aim with this test is to achieve no
statistically significance difference between the estimated model and a model
that fits perfectly the observed correlation matrix. However, as I observed
in the last assignment, if the model is estimated with a higher number of observations,
the p value tends to decrease towards being significant. Therefore, I don’t´
understand how chi square is a good statistic for measuring goodness of fit.
You talked a bit about the RMSEA method and Stata offers other methods as
well. Can
you comment on the advantages of using RMSEA? What
is the purpose of estimating chi square and why does Stata use it as a
default goodness of fit statistic? 
Efe Q4: If we constrain a model
by stabilising to a standardised coefficient, is it equal to standardising
the rest of the model as well (since we read coefficients in terms of the
constrained variable)? 
Efe Q3: Is there a way to
combine a more complex SEM model that also has a sampling correction for hierarchical
models; such as one SEM model at the macro level which explains certain
latent variables of another SEM model at the individual (or even time) level. 
Efe Q2: If an observed variable
is a good indicator for two different latent variables, can we use them in
the same model? I tried this: fisei in both father's and mother's occupation (in addition to misei and mcrude) while latent FOCC and MOCC are predicting ROCC. The structural coefficients decreased while s.e. increased, and model fit (both chi and 2log got weaker), all were contrary to my expectation. sem
(FOCC > fisei fcrude) (MOCC > fisei misei mcrude) (ROCC > rcrude
isei) (FOCC > ROCC) (MOCC > ROCC) reference model is: sem
(FOCC > fisei fcrude) (MOCC > misei mcrude) (ROCC > rcrude isei)
(FOCC > ROCC) (MOCC > ROCC) 
Efe Q1: When we work on the
simulated data, or the ISSPNL that you have provided, we exactly know which
observed variables are reporting about which common latent variable. In the
simulated case, we knew because we created them, and in the ISSP case, we
knew because of the data documentation and we build the SEM model on the
previous theory, such as Blau & Duncan (1967). How about somewhat
ambiguous attitude or value questions, especially in an exploratory study with
few or almost none previous literature on its subject? Should we first focus
on factor analysis and reliability analysis between the observed variables
that we expect them to be reporting on one latent variable or can we claim
(as the math behind them must be very similar) that an SEM model with good
predictors (grouping high coefficient ones who result in lower standard
errors) are in fact how the world is. >> I have one
essential question which lies beneath of this since the day you explained
factor analysis: Why don't we just upload a new data set to a super computer
to build every combination of SEM models and run all the iterations and
report back the ones that fit best? Of course, we may introduce certain
restrictions that “father's education is chronologically before respondents'
occupation”. But what if the most important predictor of one's LeftRight
oriantation is the amount of tobacco their mother consumed during pregnancy? 
Maartje Q4.
Panel data & missing values. I was wondering how you should deal with
missing values in panel data. Participants fill out the questionnaire in the
first wave, not in the second and they fill it out again in the third. How
can you account for the missing data on the participant in wave 2? Should you
remove the total case, or can you impute the missing values based on the
other two questionnaires? It seems to me a very big waste if you have to
throw this person out of the analyses, especially because it also seems to me
that it is likely in panel surveys that this happens more often. 
Samantha Q4:
I would like to base my question on a problem with
real life multilevel data which I have recently been introduced to. In this
case, data was acquired over five moments in time in the setting of a developing
neighborhood. The idea is to measure the formation of a social network. The
problem with the data is mostly the newcomers and the quitters. In the first
measurement point, over 100 people replied and filled out the questionnaire.
In the fifth measurement point, only approximately 30 people from the initial
100 people filled out the questionnaire. However, the problem lies in the
fact that because it is a developing neighborhood, some people only moved in
after the first, second or third measurement point. In the fifth measurement
point, the response rate was higher than just the 30 people from the first
questionnaire but the researchers gave the newcomers all the previous
questionnaires at the same time. Thus, a person that entered after the third
measurement point would receive the first, second, third and fourth
questionnaire at the same time. Furthermore, the researchers measured the
same concepts in all questionnaires, but they also introduced new concepts
throughout the study. My question now is, exactly what data is usable for multi level analysis? Is it just the 30 people who filled out all five questionnaires at the appropriate time? What about the concepts that are introduced in later questionnaires? 
Carlos Q4: In Friday´
lecture, Professor Nagel said multilevel analysis cannot be conducted with sample sizes below 18 units of observations. Let’s
say we have a small sample of countries in our study and we want to conduct
multilevel analysis. We first identify outlier cases that may be potentially
influential. Let´s assume we decide not to exclude these cases in order to
maintain sufficient sample size for conducting multilevel analysis. How can influential cases be expected to affect the
solution estimated with multilevel analysis? How can we counter the potential
influence of these cases? 
Nicolette
Q4: Multilevel
modelling is a solution for observations that are nested in a certain
structure. Thus observations that are dependent of each other can be modelled
using multilevel analysis. I do understand this advantage of the analysis.
However, when I apply this to my type of research – network research  I am
not sure whether this type of analysis could be the solution for the problem
of dependent observations in network research. During class we used a large
data set with nearly 4000 observations. First, is multilevel data also
appropriate for smaller data sets (e.g. network data in organisations with 40
cases)? Second, what type of data collection strategy is appropriate when
using multilevel analysis? Third, with fixed effect model you can analyse
personality effects; this seems possible with multilevel modelling as well
because you model effects at different levels (which in fact is also the case
with personality effects in fixed effect models). So, to what extent (and
how) are multilevel models related to fixed effect models? 
Maartje
Q3: I
have a question regarding causality in my research. I want to test the causal
relation between network position and wellbeing. Is it possible to do a
panel regression with three waves of data on wellbeing and network positions
using fixed effects methods to rule out confounding variables (since we keep
the individual differences constant, the only explanation for the change in
wellbeing is the network position)? And in order to establish the causality
of the relation I can run the data in RSiena which is an explanation for
selection; so if this model is correct this means that the network forms
according to wellbeing; people connect to people with a high wellbeing and
not the other way around: your connections influence your wellbeing. Is this
a proper way of claiming the causal relation: running a RSiena model to prove
that the selection argument is not valid in this case and thus the conclusion
is that it is about influence? My question is thus (putted simply): how can I
claim causality in a statistical way in my research? How should I collect my
data and how should I analyse it? Answer:
I am not sure I understand what you are
saying. To establish a causal relationship, you generally need three things:
(A) Causal order (to rule out reversed causation), (B) Control or
randomization (to rule out confounding), (C) Association. The panel design
would rule out reversed causation, so if RSiena rules out selectivity (does
it?) , you would only need to test whether earlier network position is
associated with wellbeing. It may be that RSiena does the whole job. The
way I myself (not a network specialist) would approach this is by a 3wave
panel design on individuals, in which network position (e.g. your
connectedness, popularity, brokerage or what have you) is a dynamic property
of your individuals, as is wellbeing. You would then test whether earlier
network position is associated with later wellbeing, controlling for earlier
wellbeing. Fixedeffects would rule out confounding by individual constants,
but controlling earlier wellbeing does the same thing. If this is your
design, talk to Irma. If
your problem is more complicated, and dyads should be the units of analysis,
talk to Ineke. 
Nicolette
Q3: In
assignment 2 we conducted an analysis with constrained estimation. What I
understood of it, constrained estimation means that if you do not want a
certain relationship to exist in your model (e.g. XŕY) you can put a
constraint in your model that equals zero. My question is, for what reasons
would you put a constraint in your model? Because I would say that, when a
certain relationship exists, you cannot just omit it in your model by putting
in a constraint. Putting a constrained means in fact that you manipulate your
results which could imply that in the end you will get the model you were
searching for. The model could in that case not fit with reality and you thus
make a type I error. How do you prevent making this type I error when you
have found very good reasons (?!) to put in a constraint? Answer: Constraint estimation
can be done for several different reasons: (A) It reduces the number of estimated
parameter; makes models easier to interpret and more powerful (less prone to
type II errors). (B) You may have substantive reasons to constrain effects,
e.g. because you believe or want to test whether two effects are the same (I
work a lot with fathermother data – here this is a natural hypothesis. Or
because you are repeating a battery ovet time or between countries and want
to test whether the measurement model is the same. (C) Occusionally,
constraints makes model identified (you reduce the number of estimated
parameters. Constraints
are not always about omitting an effect. It can also be that two effects are
the same (their difference is 0) or that certain parameters follow a linear /
smooth trend (see manual about growth models, 
Samantha Q3:
In last year’s course we worked with nonlinearity in SPSS. However, we never
added more variables to the nonlinear regression besides the independent and
dependent variable. Is it possible to measure nonlinearity in different forms
of models, such as spurious relationships, or with moderating and mediating
variables? How can it be measured if a moderating variable has influence at a
certain point in time or certain point of the process? Answer:
Are you referring to regression with polynomial (quadratic etc) terms? Adding
covariates to ‘control’ is hardly a complication here, except that you have
to think a bit harder to derive the expected values for the – nonlinear –
variables of interest, since they are no longer the predicted values of
regression equation. (Generate them by taking the relevant part of the
equation.) Things can become a bit trickier when you have multiplicative
terms (interaction – moderation) as you now have to choose whether the
interaction is on the nonlinear components as well. Sometimes it is quite
useful to restruct the interaction to the first order term and show that the
nonlinear trends have the same shapes, but go in different direction. Think
about e.g. about incomeage profiles of men and women. 
Daphne
Q3: When
one is conducting statistical analyses with several variables, it is a
prerequisite that they can be compared to each other. I have been wondering
though if standardization of variables is sufficient to make them comparable,
or if there are other conditions that should be met as well? I mean, when you
try to compare variables that have been measured at the individual level with
macrodata? Or would that never be possible? Also, are there other methods
besides standardization that can increase the comparability of variables? Answer:
There are
several ways of standardization. Usually we talk about Zstandardizion, but
you can also use percentiles or other rank scores (such as quintiles) or dichotomies
(P and D standardization). You can also standardize on the range of variables
P and D accomplish this). With ratio data, there is usually no need to
standardize, since you can use percentage change (the economists call this
‘elasticity’). By standardization you can compare effects to another, but
that does not always makes sense (to everybody). Saying that the effect of
gender on income is stronger than that of age, since a standard deviation
gender brings more income differences than a standard deviation of age
(psychologist sauy the same thing often by putting the word “effect size” in
between) is somewhat of an odd idea, and many would argue that it is never
proper to make such kind of comparisons – nor is there much theoretical need
for it. I
tend to think that standardization makes more sense if the underlying metric
is more arbitrary and also that you could better do standardization yourself
and the variables you want to compare, than interpret fully standardized
equation. E.g. in interaction (moderator) model standardization often makes
sense but only before the multiplicative terms are constructed.
However, in SEM’s you cannot do standardization of latent endogenous
variables beforehand, it is a manipulation of the estimated parameters. On
comparing macrounits and microunits effects, I would like to see an
example. To bring one up: can we compare the effect of the GNP of the country
you live in with the effect of your education on some outcome variable (say,
an attitude). I think that can make sense and different ways of doing this
should lead to similar answer. If not you should look closer at what you are
standardizing, what units you are comparing. 
Carlos Q3: I have observed that many studies do not report in their method section
how they determined the sample size. Particularly I am interested in studies
that use experimental designs. I have read experimental studies that use the
same treatment variables and factorial designs. However, while one of them
may use a sample size of 50 participants, the other may have 150
participants. We have talked about how having a large sample size is
important to increase power and avoid type 2 errors, especially when we are
looking for weak effects. A large sample size is important to reduce standard
errors. In some of the studies I have read researchers do not report about
these things, how they selected their sample size and how this affects the
statistical analysis of data. My questions are the following: Why do you
think some studies disregard these matters, or at least they do not talk
about it in their articles? And second, which are the appropriate techniques
to estimate sample size? Answer:
I do not
generally share your experience that experimentalists do not report their
sample size – in my experience they generally do (and they should). However,
as long as they report appropriately estimated standard errors, it does not
really matter. In metaanalysis, these SE’s can be used to compare or average
results of different experiments. (However, some metaanalyses use N as we
weight.) A more subtle concern is whether researchers actually know what
their effective sample size is. In practice, samples are hardly ever simple
random. If they are clustered, the effective sample size can be much smaller
than the nominal N. Systematic and stratified sample can imply an effective N
that is larger than the nominal N. Missing values also obscure the
calculations of the effective sample size: if you substitute missing values
you are making up data, which may lead to easier estimation of the correct
parameters, but should not increase your N. However, I think that these
things are generally a bigger problem in observational than experimental
research. In experiments it is easier to collect complete data and also to
obtain perfect random of matched random assignment of treatments. Power
calculations can made only if you are willing to state or assume an effect
size of H1 (alternative hypothesis). A sociology PhD student called my
attention to Gpower,
that will do the calculations. However, in SEM you can increase power by
additional measurements (e.g. pretests, multiple dependent variables) – I do
not think that Gpower can cope with these ideas. Similations with realistic
but fictional data can be helpful here. 
Samantha Q2:
As
I understand it, normally a theoretical hypothesis is aligned with the
alternative hypothesis rather than the null hypothesis. In SEM, it is
actually aligned with the null hypothesis. This already brings up questions,
but to make it more complicated, there are exceptions to this rule in SEM,
namely betweengroup comparisons. How does this all influence the
dissertation (??) itself, and the hypotheses that are
stated? What about the conclusions; do we now conclude on results based on
the null hypothesis? Answer:
It
is actually less complicated than you would think. The test on “goodness of
fit” in SEM answers a question that does not arise in OLS with a single
equation, because that is a ‘saturated’ model, in which the model always
perfectly reproduces the covariances. Questions about the significance of
parameters are the same in SEM and OLS single equations. Questions about fit
do arise in discrete data analysis (does the model reproduce the observed
counts) or in OLS single equation, when testing a series of polynomial of
dummy / spline specification to approach a nonlinear pattern. [My original questions was
the same as Johannes’ question on nonlinear models within SEM. I found some
readings on this, the most informative one is included in the attachment] Comment: I think the article
is not about the nonlinear models Johannes meant, but about nonadditive (moderation)
and polynomial models. Nonlinear models (proper) are models that are
nonlinear in the parameters (the parameters are e.g. in exponents) and in
particular when such model cannot be ‘linearized’, e.g. by taking logs. Such
models arise in econometrics, meteorology, etc., but I rarely have seen them
in the social sciences. Meanwhile, the article you found takes up some
advanced issues in SEM modeling, namely how to do interactions and
polynomials in SEM. I still have to study it fully, but my first impression
is that they use constraints that are easily specified in LISREL, for which I
have not (yet) found a parallel in Stata12 SEM. 
Vera
Q2: The
Stata SEM textbook states that “The absence of a curved path—say, between
e.x1 and e.x4 —means the variables are constrained to be uncorrelated.” Later
on they speak of 'covariance' of variables. But is correlation and covariance
the same thing, in this context? Answer:
Covariance
is unstandardized correlation, correlation is standardized covariance. If
standardization is not an issue, the terms are used interchangeable, and this
is correct: two variables that covary, also correlate, viceversa. 
Daphne
Q2: Imagine
one has a sample of maybe 80100 cases, but very few of these cases have no
missing values for all of the included variables. If it is necessary to
conduct an analysis that uses listwise exclusion, almost all cases would be
excluded so the actual sample would become very small. But I suppose even for
pairwise exclusion it would be useful to find a way of dealing with missing
values. Which SPSS/Stata option would be most suitable? Or does that depend
on the situation? Would it be useful to look at the standard error in this
case to figure out which method for dealing with missing values is most
reliable? Answer: Using listwise deletion
you would be left with no evidence at all, which is clearly less than what
you have. If you need to work with techniques that cannot work with the summary
correlation matrix (such as logistic regression), you have no other option
than to substitute your missing data. The proper way of doing this is
multiple imputation, i.e. generate several rounds of nearest neighbour
imputations and use the differences between rounds to inform your estimate of
uncertainty. Alternatively, you could use models that can work from a
correlation matrix only (regression and factor analysis) and work from a
correlation matrix with pairwise deletion of missing values. This choice
would have the advantage of using only the information you have, not less,
but also not more, and you do not have to make up anything. However, pairwise
deletion such as practiced by spss clearly does not work the way you expect
(I will show this in the future) with respect to the estimated SE. Stata SEM
(method=mlmv) offers an alternative that works appropriately (also LISREL
FIML option), and it can also be applied to models without latent variables. Both
multiple imputation and FIML assume that the data are MCAR or MAR (the
missingness is at random). They tend to work better if you have fewer
missings (which may be the case in the data you describe – if the missingness
occurs at random, listwise deletion tends to loose a large amount of cases). 
Johannes q2:
SEM makes
use of the fact that for systems of linear equations, either zero, one or an
infinite number of solutions exist. Because we want to avoid having infinite
solutions, we choose to use either identified or overidentified linear equations
(for the latter, ML yields an optimal solution, if there is one). But
can SEM also be used to estimate nonlinear systems of equations? If I recall
highschoolmath correctly, nonlinear equations can have different numbers of
solutions, so how do we handle it if SEM gives us two or even more solutions
to a model? Or is it unusual to use nonlinear equations in a SEM model? Answer: I
do not really know what the answer could be. Although I have not encountered systems
of nonlinear equations in social science, I am fairly certain that they are
used in the natural sciences, both in biology and physics/chemistry. The
issue here may be more whether such systems can also be estimated when one
assumes latent variables and measurement error, which seems rather uniquely
an idea of the social sciences. I have not found any further reference in the
stata sem manual, but the lisrel 8.8 manual talks about how to use nonlinear
constraints. I have never tries this, not even for multiplicative interaction
terms. 
Maartje Q2a.
My research is about the relation between an online social network and the
wellbeing of people. However, if I measure the network characteristics and
the wellbeing at the same time I cannot claim causality since it could be
possible that the independent and the dependent variable coevolve. So,
instead I should do a pretest on wellbeing, then collect the social network
data and then do a posttest on wellbeing. However,
I cannot control for confounding variables here. My question is thus, how
should I collect my data in order to claim causality without the risk of
confounding variables? An answer to this question could be (I think) to use
RSiena; a longitudinal simulation program. Another question following from
the use of RSiena is: A longitudinal design with at least two time points is
necessary to be able to use RSiena. However, is it likely that the social
network and the wellbeing of the participants will change in the short time
period of three months? Answer: Any causal
analysis need an argument about causal order: what is the cause, and what is
the outcome? Causal analysis does nothing to justify causal order claims, it
calculates causally interpretable coefficients, given that you specify the
causal order first. The usual justification of causal order is by invoking
time order, although sometimes other believable arguments are given. So if
you want to draw causal conclusions, you better have a longitudinal design,
either by panel observation of by retrospection. Causal order is related by
not the same thing as confounding variables analysis. The relationship is
this: you must also know the causal location of the confounders, and they
must be causing both X and Y. Controlling an intervening variables (between X
and Y) or even a variable that comes after Y, is not a good idea. The best
ways to do controlling confounders is
by experimental design or instrumental variables analysis. If you are with
the poor souls (welcome to the club) that cannot do experiments or IV, you
will have to theorize you confounders and measure them. But you will always
have to live your live knowing that there could be other confounders
explaining to wonderful “effect” and
live unhappily ever after. Maartje Q2b. I will probably use a panel regression analysis for my thesis
research about social networks characteristics and their influence on
wellbeing. However, I am interested in the differences between people (do
people with different positions in the network – i.e. being more central –
differ in their level of wellbeing?) as well as in the differences within
people (at t1 person A has 2 friends and at t2 person A has 5 friends does
this matter for the level of wellbeing of this person?). I am not sure
anymore if I should use a fixed effects or random effect panel regression.
For fixed effects, only changes within the individual count (differences
within people). However, I also think that there are differences (in
structural network positions) between persons that have an influence on their
wellbeing. So what kind of analysis should I use, fixed effect or random
effects? Answer: Fixed effects will control any individually constant
confounder (think: gender, cohort, education), but not confounders that
change over time. However, it is somewhat hard to think of a timedependent
confounder that is indeed causally prior to X and Y and not in fact a
mediating variable M. Historical changes come to my mind, but these are in
fact the same for all subjects, and should not confound your results on
individual differences. I do not understand Random Effects
models well enough to give you good advice. It seems to me that you cannot
have it both ways. 
Nicolette Q2: During the last lecture about SEM models we talked
about the use of this type of statistical analysis. The main argument to use
SEM models seems to be that it calculates the true score, which differs from
the observed score. I do understand this advantage and I also understand that
it is important to know the true correlation of latent variables. However,
what I do not understand is how to interpret the true correlation between
latent variables. How do you relate them to your model? Because in fact, the
estimated model is wrong. So how do you explain that the true correlation for
example is .7, while the model estimates .345? To me it seems that estimated
model does not make sense anymore? Answer: Indeed an interesting and important question! It is
inaccurate to say that SEM’s calculate true scores, rather they estimate
relationships (correlations and regressions) between such true scores, even
without (!) the possibility to know someone’s true score. This may make your
concerns even greater… To become as happy with SEM results as I (sometimes)
am, it is important to develop a platonic attitude towards observation: what
we see in our datamatrix is not reality, it is only caused by reality, and
it comes with a load of measurement error. Only by theorizing about such
error and modeling the error process, we can find out how reality really
looks like. The observed correlation of 0.345 is indeed the correlation you
observe in your data matrix, but then your data matrix is only your data
matrix, not reality itself. With a better observation process, you would see
another correlation, while reality would stay the same. 
Carlos Q2: Yesterday in class we analyzed a measurement
model using SEM in Stata. We observed that having items with high measurement
error does not affect the structural coefficient when these are removed from
the model. However, removing them increases the standard error of this
coefficient, especially when we have few items in our measurement
scale. I interpreted this observation as implying
that having more unreliable measurements is
better than having fewer measurements. In the previous course, we discussed
how using Cronbach´s Alpha was not an adequate procedure because, among other
reasons, this technique does not take into account all available data in the
model, as it uses a listwise deletion method for handling missing
values. I am interested that many researchers use Alpha Cronbach´s not
only to estimate the reliability of a scale, but also to determine the
reliability of individual items within a scale. Within this estimate,
removing unreliable items may actually increase the reliability of a scale,
as expressed in the increase of the Cronbach´s Alpha. I have five
questions regarding this matter: 1. How can a measurement
model estimated by SEM provide us with estimates of measurement reliability
as we saw yesterday? 2. Since Cronbach´s Alpha
uses listwise deletion and SEM uses FIML, how does this difference generate
different results in estimations of reliability? 3. Does Cronbach´s Alpha take
into consideration the number of items in a scale and, as SEM does, consider
that having unreliable items is more positive for our inferential statistics
than having fewer items? 4. If Cronbach´s Alpha uses
listwise deletion for handling missing data and we can estimate regression
analysis with pairwise deletion, how useful is the information provided by
Cronbach´s Alpha then to estimate the possible effects of the reliability of
our items on the standard errors in a regression? 5. We know that increasing
sample size helps to reduce standard errors in a regression. However if
Cronbach´s Alpha estimates reliability with fewer information than available,
then there is a mismatch between the sample size used in reliability analysis
and the one used in regression analysis. Then, how can we relate the
estimates of reliability using Cronbach´s Alpha with the standard errors
obtained in a regression? Answers: These are many good questions. I hope I have
all the answers:
1. Cronbach’s alpha states
two conditions to improve reliable measurement: (1) more indicators, (2)
better (more strongly correlated) items. You can improve measurement
reliability by taking into account many bad items (which is what many
psychological scales do). 2. SEM (=factor analysis)
does not give us a direct estimate of reliability of the index variable that
we would form from the indicators, and there is no reason to form such an
index to begin with. SEM’s do give us an indication of the reliability
(systematic or stable variance) in each of the indicators, however, there is
no suggestion whether of when to leave one out. 3. The way unreliability
works out in SEM solutions is indeed in the SE’s of the structural effects:
when we have lower mean correlation, or fewer indicators (the two ingredients
of alpha), the SE’s become larger. In principle it seems possible to me to
have a criterium about when it is better to leave an indicator out, but I am
not aware of its existence. 4. Indeed, missing values are
a headache in reliability analysis – you can only do listwise analysis,
despite the fact that alpha uses the mean correlation between items, a
quantity that can clearly be calculated from a pairwise correlation matrix.
The underlying reason is that with missing data there is in fact variation in
reliability for each individual, depending upon how much items have been
validly scored. Imputing the missing data might even aggrevate the problem. 5. The FIML procedure in SEM
supposedly avoids the problem by taking into account as much information as
there is – pairwise deletion as you really wanted it to be – and presents you
the bill for missing information in terms of uncertainty about coefficients –
their SE. This is important, but actually not a solution for the problem you
really wanted to solve: what is the reliability of the index that you would
form from partially observed indicators? 
Vera
(1) I
read in some textbooks that the Central Limit Theorem states that the
distributions of large samples are normal. As such, we do not have to worry
about assumptions about normal distributions for large samples. However,
aren't variables such as income typically skewed, and not normally
distributed, even in large samples? Answer:
You are mixing things up – the distributions in large samples are not at all necessarily
normal, there is in fact no relationship between sample size and sample
distribution. The Central
Limit Theorem states that, given certain conditions, the mean of a
sufficiently large number of independent random variables, each with finite
mean and variance, will be approximately normally distributed. So it is about
a distribution of means, as produced by a large number of independent,
random variables. One way to think about it is that if something is
produced by a large number of causal factors, none of which dominates the
process, that something will be normally distributed. Natural phenomena that
come to mind are height of weight. In your research, if you add up (average)
a large number of attitude items, the resulting scale will be approximately
normally distributed: symmetric distribution, with lots of cases around the
mean and ever fewer cases further away from the mean. Your mixup
has to do with sampling theory and refers in fact to sampling distribution,
this is the distribution of a certain sample statistic (such as a mean, but
also a regression coefficient or a standard deviation) in a large number of
random samples tends to be normal, which is a tremendously important thing in
statistics. The CLT here says that the normal shape of the sampling
distribution arises when you have many samples – or as they say, it is
asymptotically true. But it is also true that the normal shape arises more
accurately, when the size of all the individual samples is relatively large
(typical numbers here range between 30 and 100). The approximation is also
better when the sample distributions themselves are approximately normal. Finally,
income distributions are in fact approximately normal, or more appropriately
lognormal: if you take logs in a (skewed) income distribution, the result
can be approximately normal. The same is true for other right skewed
distribution (distribution with a long right tail), such as waiting times.
This is so because the underlying growth process is typically multiplicative
in nature: taking the logarithm makes it additive and conform to CLT. 
Maartje
(0): what
is the difference between type 1 and type 2 errors? Type I / II errors are defined relative to the Null
Hypothesis H0. ·
Type
I error is when the H0 holds in the population and you reject it ·
Type
II error is when the H0 does not hold in the population and you do not reject
(=accept) it. Both are probabilities that we would like to know. This is
simple for Type I errors, since this probability is chosen by the researcher:
it is called the significance level and most commonly chosen at 5%. Whether
you run into a type I error is just a matter of bad luck – it happens 5% of
the time, if your want to put your risk at 5%. We know how often it happens,
but not when it happens – there is really nothing that you can do here. The probability of a type II error is usually expressed by
its complement, the probability of rejecting the H0 when it is indeed false –
which is called statistical power (onderscheidend vermogen). Typically, you
care very much about type II errors, because the researcher’s sympathy is
mostly not with the H0, but with the alternative H1. You would hate yourself,
if you could have discovered a great effect, but erroneously concluded that
there is no effect! Power is hard to calculate – it order to do so, you have
to have an expectation of the size of the effect in H1, or at least assume
one. This type of knowledge is typically absent. However, there are some
important rules when power becomes larger or smaller – and they contain very
important lessons in research methodology. Here are some: ·
Power
increases with sample size, typically with sqrt(N). This implies that
increasing the N is much more effective in small sample sizes than in large
sample sizes. ·
Power
increases with explained variance, even if this is produced by variables that
are substantively uninteresting. This makes that pre/post experimental designs
are so much more powerful than randomized group designs. It is also a reason
why panel data and time series data with correlated error terms can be so
powerful. We will also see examples with constrained estimation. ·
Power
decreases when you measure your variables with much (random) error, when you
uses lots of predictors, that do not contribute to explained variance and
generally when you illdesign your research. ·
(Power
increases when you choose your significance level a (probability of type I error) higher.
If you choose a very low (.01 or lower), you make
it more likely that you accept the H0, even when it should haven been
rejected. This is again mostly a concern in low N situation, where it
actually can make sense to increase the a level to 0.20, in order to find
more balance between type I and type II errors – but of course low N research
is likely to run into erroneous conclusions either way.) So, type II errors are the errors
you care about most and they are the errors that you can do something about! 
Maartje
Q1: A
structural model is a model in which the parameters are not merely a
description but believed to be of a causal nature (Stata manual, p. 285). I
have a question regarding the word ‘believed’ in the definition of a
structural model; does this mean that the causal nature of the model is based
on assumptions and theory of the researcher? We discussed in class that SEM
does not simply put several X variables in one model, but assumes a causal
order of those X variables. However, I do not understand how this causal
order is determined. As we also discussed, in your research about gender,
education and occupation the order is generally clear, however it is still
not ‘proven’ by statistics, it is still an assumption. Is the causality of
the model determined by SEM, or by assumptions of the researcher? And what is
the difference with multiple regression models regarding the causal ordering
of the variables? Answer: Indeed, any causal
analysis requires the assumption of causal order (== no reverse causation)
between X and Y, as well as on the causal position of potential confounders
Z. Given these assumptions, causality can be concluded from the association
between X and Y, if the Z are statistically controlled. The assumed causal
order must come from the researcher (and the research design), it is not in
the data. This is the same in regression, SEM or whatever. 
Carlos
Q1:
Traditionally quantitative social research in Costa Rica has centered on crosssectional survey designs. In terms of theory
building, typically the state of knowledge involving certain topics is based
on importing knowledge from other regions in the world, mainly Europe and
USA. However, there are certain research topics that correspond to the local
reality of Costa Rican society. Therefore, the current state of knowledge is
highly limited in this sense. Research findings are not replicated and no
infrastructure and research culture exist to conduct longitudinal research or
even experimental research. It is difficult to find national organizations or
institutions that also periodically gather datasets on social issues. Therefore
I argue that the current state of social research in Costa Rica, from the
reality check of what can be effectively researched in the context hinders
the possibilities of pursuing causal explanations of phenomenon. Quantitative
methods at least in the field of Social Psychology that I know of mostly
perform multivariate regressions, Chi Squares and Manova. Although
I think that experimental research could be more encouraged there is still
the question of how to approach causality in observational research within
this context. My question is the following: Given the reality of the research
context and how this influences the types of research designs available,
which methods for addressing causality can offer opportunities to overcome
this gap in spite of design limitations? I would consider panel and
longitudinal methods difficult to implement, but how can SEM contribute to
address causality in crosssectional survey
designs with little local theory building and background research on a
subject? Answer:
I am not
so sure about the connection of you argument with the Costa Rican situation.
There may be such things as Costa Rican research problems, budgets and
expertise, but social theories apply in Costa Rica just as well (or bad) as
elsewhere. So if we need causal approaches elsewhere, Costa Ricans need them
too. Experimental
research is not necessarily harder or more costly than observational research,
and in fact it would be less expensive to do it in Costa Rica than in Western
Europe. The same is true for longitudinal (panel) designs: in Costa Rica it
may be less expensive and attrition may be lower. However, it may be equally
hard to think of good and feasible experimental and longitudinal designs as
elsewhere. SEM
cannot remedy low budgets or lack of technical skills. 
Samantha Q1: In the Stata manual on SEM models in section Intro 5, an introduction is given about the use of SEM for the comparison of groups. In the case of quasiexperimental research (as an alternative to an experimental design), a comparison of groups would be vital. The manual states that: “When we specify group(groupvar), the measurement parts of the model—parts 1 and 3—are constrained by default to be the same across the groups, whereas the middle part—part 2—will have separate parameters for each group. More specifically, parts 1 and 3 are constrained to be equal across groups except that the variances of the errors will be estimated separately for each group”. Does the use of parts 1 and 3 then imply that a control group is not necessary because the control variables within the groups are used in the analysis? What other benefits would SEM offer when doing quasiexperimental research (with exception of the option to put constraints on variables)? Answer: No, the groups in SEM are generally not the experimental and control groups in (quasi) experiments. Using the group option in SEM allows you to have different measurement models in different parts of the data, e.g. men and women, or different countries. At best, you would want to check this in experiments and truly hope the measurement models are not different. But usually, the experimental treatment in SEM would just be an Xvariable. SEM does not remove the need of a control group. (However, your question made me think once more – maybe there is something useful in using the group option.) 
Daphne Q1: What would be the best approach to analyzing macrodata that has been measured at various points in time for a sample of countries? Answer: This is not so different from analyzing individuals in a panel observation design and you can use the same models (XT in stata) or a crosslagged panel design in SEM. Variations in XT designs typically have to do with at what intervals you do your observations and how many intervals you have. Many macro crossnational country studies have many intervals and few countries and they are often called pooled timeseries designs, whereas situations with many crosssections and fewer time point are usually called panel designs. The differences between th etwo are at most subtle. 
Nicolette
Q1: An assumption of regression analysis
is that observations are independent of each other. However, in my case (dynamic
networks), the observations are not independent since the
relationship at time point 1 is dependent on the relationship at time point
2. How is it possible to statistically determine (or not) the causality in
this situation? Thus, what and how is the best way to analyze this data? Or,
do I need another design or method of data collection? Answer: Dependency of observations is a very natural phenomenon in any kind of longitudinal observation: the units you see in the next wave are nor new units, the are the same. You should not think about this a disadvantage, in fact the strengt of panel observation comes from exploiting this dependency, e.g. fixed effects models, or by controlling pretest variables. In a sense, in these situation you can do better than if you would have independent information – it makes the design much more powerful (and also valid). The problem with your longitudinal network observations is that dependency happens twice: the units are not only dependent over time, but also crosssectionally, because they are related to one another according to the network structure. This situation is somewhat similar to a hierarchical multilevel design, in which students are observed in school classes within schools, etc. However, the dependency in networks is not hierarchically structured, but according to network dependency. I am not sure that it is of help, but read my article with Ineke Nagel & Matthijs Kalmijn about a similar situation. Here we model friendships between students based on the cultural characteristics in a twowave design. 
Johannes Q1: My question is: why should including covariates in experimental designs increase statistical power? Why does including Z decrease our uncertainty about βk, even though X and Z are completely uncorrelated (do not provide any information about each other)? Also if I think about the consequences of this: suppose we have a medical trial in which a drug is randomly assigned to a sample and we want to test how the drug performs compared to taking no drug (or a placebo; say Y is whether a patient dies or not). Because assignment is random, taking the drug is by design uncorrelated with everything. Now, assuming the discussed above is true, wouldn't that mean that we could increase our certainty about the effect of the drug by including predictors of Y such as age, gender, some genetic stuff and so on? Because they would not be correlated with the assignment to the drug but increase the overall model fit? If this is true, is this actually done in medical research? 