Overview of harmonized variables

Last revised: 2019/11/25


Harmonization in the ISMF is achieved with respect to following variables (when available in the source files):

·         Administrative variables

·         Demographics: age, sex, marital status, type of place of residence (urban vs. rural).

·         Education: father, mother, respondent, and spouse

·         Employment status: respondent and spouse

·         Occupation: father, mother, respondent (current and first), and spouse

·         Income: personal and household.

Together with administrative variables, a harmonized ISMF dataset includes 61 variables, each of which has identical labels in all datasets. We describe the harmonization procedure separately for each of the six groups of variables listed above.

Administrative variables

STUDY is a 10-character string used to refer to studies, datafiles, and conversion tools.

DATAREF is a 25-character strings used to refer to the originating data source according to the ISMF Data Catalogue.

Historical note: DATAREF replaces ARCHIVE.

COUNTRY is a 3-character string that denotes the nation (or sub-national region) to which the dataset pertains.

CNTRY is a 2-character string that denotes the nation (or sub-national region) to which the dataset pertains according to the ISO standard. Not yet implemented.

YEAR is a four-digit representation of the year of data collection.

ISMFNR is a (sequential) number used to keep track of files.

LABNR is a number that refers to a set of occupation codes and labels shared by multiple studies. It is often identical to ISMFNR, but may also refer to other files.

LASTFIX is a 8-digit representation [yyyymmdd] of the date of last processing.

RESPNR is the case identification number found in the source data.

Technical variables

WWW is the weight. WWW combines the post-stratification weight, as provided in the source datafile, duplication weight for cases that are repeated (in panel files) and an efficiency weight, which may be estimated as a reverse of the design effect.


AGE is coded in years. Multiple-year categories are assigned the midpoint of the interval.

FEMALE is coded as a 0/1 indicator (1=female; 0=male).

MARRIED is a 3-category variable: 1=never married; 2=widowed, divorced, or separated; 3=currently married.

LOCATION stores whatever information the source files includes on urbanization. This information is only labeled, not harmonized.


Educational variables are processed for father, mother, respondent and spouse, when available. The information is both stored and harmonized, using two sets of variables:

FEDUCTP, MEDUCTP, EDUCTP, and SEDUCTP store the original information with full labels. The codes are generated using ISMFNR.

FEDUCYR, MEDUCYR, EDUCYR, and SEDUCYR scale the educational information with respect to level of education. [The current variable names suggest that duration of education is the dominant concept, but this is not the case.] Education categories in the source data are recoded into an ordered set of categories on the basis of judgements that we draw from documentation in the original source, the advice of experts, and the relationship of the categories to criterion variables, such as occupational status and income.  The resulting hierarchy is then mapped into a “virtual years of schooling” metric, which is closely related (in some files identical) to the number of years it takes competent students to reach a given level.  In many data files, this results in the following metric: 6 for complete primary, 12 for complete higher secondary (university entrance level), 16 for complete university (BA), but these “anchor points” are adjusted to local educational systems. Other levels are expressed in this metric using interpolation.  The educational recode maps for each study can be found here.

EDDUR: records the true duration of respondent’s and spouse’s education. EDDUR is thus in the same metric as EDUCYR, but records the true duration of the educational career, if independently availebls

Employment status

Employment status is recorded for respondent and, if available, for the spouse and is harmonized into three variables:

WORK, SWORK: 1=not employed in paid labour; 2=employed in paid labour.

HOURS, SHOURS: the number of contractual / actual hours worked per week.  When both are available, contractual hours are preferred over actual hours.

EMPST, SEMPST: employment status, differentiating non-employment activities: The codes are:

 (1) Work [full-time]

 (2) Work part-time

 (3) Unemployed

 (4) Schoolleaver

 (5) Student

 (6) Military

 (7) Homeworker

 (8) Retired

 (9) Disabled

 (10) Other

 (-1) NA.

Historical note: EMPST and SEMPST are later additions to the ISMF and have not yet been implemented in all files.


Occupation is included, as available, for father and mother (when the respondent was growing up), for the respondent (first and current/last), and for the spouse (current/last).  All files include respondent’s current/last occupation and father’s occupation, since the presence of these variables is a criterion for inclusion in the ISMF. Occupational information is stored as five sets of harmonized variables:

FSEMPL, MSEMP, SEMPL1, SEMPL, SSEMPL: 1=not self-employed, 2=self-employed.

FSUPVIS, MSUPVIS, SUPVIS1, SUPVIS, SSUPVIS: the number of subordinates.  When a range is given in the source, the midpoint of the range is coded.

FOCC, MOCC, OCC1, OCC, SOCC: the original occupation codes, using LABNR as initial digits. Labels for these occupations are contained in separate modules.  See here.

FISCO, MISCO, ISCO1, ISCO, SISCO: recodes to the 1968 categories of the International Labour Office International Standard Classification of Occupations 1968. See here.

FISKO, MISKO, ISKO1, ISKO, SISKO: recodes to the 1988 categories of the International Labour Office International Standard Classification of Occupations 1988. See here.

FISQO, MISQO, ISQO1, ISQO, SISKO: recode to the 2008 categories of the International Labour Office International Standard Classification of Occupations 2008. These are either as found in the originating files, or obtained by conversion from ISKO / ISCO. Conversions not yet implemented.

The conversions of the original occupation codes into ISCO and ISKO codes were carried out independently.  The conversion maps are available as stand-alone SPSS modules. 

Historical note: The inclusion of measures of mother’s occupation is a recent addition to the ISMF and has not yet been implemented in all of the files.


PINC, HINC: income measures for personal or household income. The metric used is that of the original currency. Intervals are expressed by their category midpoints (plausible values for open top and bottom categories). For international comparisons this metric should be transformed by within-file logarithms.

Back to main ISMF page