Overview of harmonized variables


Harmonization in the ISMF is achieved with respect to following variables (when available in the source files):

·         Demographics: age, sex, marital status, type of place of residence (urban vs. rural).

·         Education: father, mother, respondent, and spouse

·         Employment status: respondent and spouse

·         Occupation: father, mother, respondent (current and first), and spouse

·         Income: personal and household.

Together with administrative variables, a harmonized ISMF dataset includes 44 variables, each of which has identical labels in all datasets. We describe the harmonization procedure separately for each of the six groups of variables listed above.

Administrative variables

STUDY is a 10-character string used to refer to studies, datafiles, and conversion tools.

COUNTRY is a 3-character string that denotes the nation (or sub-national region) to which the dataset pertains.

YEAR is a four digit representation of the year of data collection.

ISMFNR is a (sequential) number used to keep track of files.

LABNR is a number that refers to a set of occupation codes and labels shared by multiple studies.

ARCHIVE is an 8-character string with the catalogue number of the source data in the data archive from which the data were obtained.

LASTFIX is a 8-digit representation [yyyymmdd] of the date of last processing.

RESPNR is the case identification number found in the source data.


AGE is coded in years. Multiple-year categories are assigned the midpoint of the interval.

FEMALE is coded as a 0/1 indicator (1=female; 0=male).

MARRIED is a 3-category variable: 1=never married; 2=widowed, divorced, or separated; 3=currently married.

URBAN stores whatever information the source files includes on urbanization. This information is only labeled, not harmonized.


Educational variables are processed for father, mother, respondent and spouse, when available. The information is both stored and harmonized, using two sets of variables:

FEDUCTP, MEDUCTP, EDUCTP, and SEDUCTP store the original information with full labels. The codes are generated using ISMFNR.

FEDUCYR, MEDUCYR, EDUCYR, and SEDUCYR scale the educational information with respect to level of education. [The current variable names suggest that duration of education is the dominant concept, but this is not the case.] Education categories in the source data are recoded into an ordered set of categories on the basis of judgements that we draw from documentation in the original source, the advice of experts, and the relationship of the categories to criterion variables, such as occupational status and income.  The resulting hierarchy is then mapped into a “virtual years of schooling” metric, which is closely related (in some files identical) to the number of years it takes competent students to reach a given level.  In many data files, this results in the following metric: 6 for complete primary, 12 for complete higher secondary (university entrance level), 16 for complete university (BA), but these “anchor points” are adjusted to local educational systems. Other levels are expressed in this metric using interpolation.  The educational recode maps for each study can be found here.

Employment status

Employment status is recorded for respondent and, if available, for the spouse and is harmonized into three variables:

WORK, SWORK: 1=not employed in paid labor; 2=employed in paid labor.

HOURS, SHOURS: the number of contractual / actual hours worked per week.  When available, contractual hours are preferred over actual hours.

EMPST, SEMPST: employment status, differentiating non-employment activities: The codes are:

 (1) Work [full-time]

 (2) Work part-time

 (3) Unemployed

 (4) Schoolleaver

 (5) Student

 (6) Military

 (7) Homeworker

 (8) Retired

 (9) Disabled

 (10) Other

 (-1) NA.

Historical note: EMPST and SEMPST are later additions to the ISMF and have not yet been implemented in most of the files.


Occupation is included, as available, for father and mother (when the respondent was growing up), for the respondent (first and current/last), and for the spouse (current/last).  All files include respondent’s current/last occupation and father’s occupation, since the presence of these variables is a criterion for inclusion in the ISMF. Occupational information is stored as five sets of harmonized variables:

FSEMPL, MSEMP, SEMPL1, SEMPL, SSEMPL: 1=not self-employed, 2=self-employed.

FSUPVIS, MSUPVIS, SUPVIS1, SUPVIS, SSUPVIS: the number of subordinates.  When a range is given in the source, the midpoint of the range is coded.

FOCC, MOCC, OCC1, OCC, SOCC: the original occupation codes, using LABNR as initial digits. Labels for these occupations are contained in separate modules.  See here.

FISCO, MISCO, ISCO1, ISCO, SISCO: recodes to the 1968 categories of the International Labour Office 1968 International Standard Classification of Occupations. See here.

FISKO, MISKO, ISKO1, ISKO, SISKO: recodes to the 1988 categories of the International Labour Office 1988 International Standard Classification of Occupations. See here.

The conversions of the original occupation codes into ISCO and ISKO codes were carried out independently.  The conversion maps are available as stand-alone SPSS modules. 

Historical note: The inclusion of measures of mother’s occupation is a recent addition to the ISMF and has not yet been implemented in all of the files.


PINC, HINC: income measures for personal or household income. The metric used is that of the original currency. Intervals are expressed by their category midpoints (plausible values for open top and bottom categories). For international comparisons this metric should be transformed by within-file logarithms.

