The interdisciplinary potential of Data Matching
When doing interdisciplinary research, one is often combining different ideas from different disciplines. In qualitative research this is usually not a very big problem. IT becomes somewhat more complicated if on e wants to base the constructed interdisciplinary theoretical framework on a quantitative foundation. The empirical data for the newly constructed theory will not likely have been gathered one existing survey. The most ideal choice for a researcher would then be to run a new survey including measurements of variables that are specified to suit his or her research. However surveys are extremely expensive and are not likely to be catered to a beginning researcher. Interdisciplinary science, especially that that focuses on societal problems, is thus often likely to limit itself towards creations of original ideas without quantification.
This is a problem that is inherent to the idea of interdisciplinary science as a dynamic process in which concepts, methods and perspectives from different disciplines are combined. This however means that interdisciplinary approaches are short-lived or become a discipline themselves. Large data-sets will not be likely be available for interdisciplinary approaches until they are more or less settled and a discipline on their own.
A way to get round the problem of needing research specific data is the use of multiple data sets. By linking two or more data sets one can construct a dataset that is suitable for the interdisciplinary research. The researcher is then able to link different concepts to each other without them being asked in the same survey. Of course the problem of using proxies as a measuring tool will remain a problem as always with using datasets that aren’t tailor made to the research.
Linking datasets is not new to scientific research. For instance cohort analysis has been widely used. This approach follows the development of generations through time by linking surveys from different moments in history. All these surveys need not and usually do not use the same respondents every time. It is thus assumed that every group is representative of the population, and that the population hasn’t radically changed of composition, for instance by plagues or immigration.
Next to cohort analysis, the linking of variables is done to deal with the problem of missing data. Often a dataset has a few missing entries. However, if this number is large enough it can be a significant obstruction to the analysis. It can be necessary to fill in these gaps of data in order to be able to perform the analysis.
This dental work of the social scientist is often done via data imputation. There are many different ways to impute data. Statistics Netherlands, part of the Central Bureau for Statistics, uses the following data imputation techniques: deductive imputation, regression imputation (including mean imputation, ratio imputation and historical imputation) and hot deck donor imputation. (De Waal, 2000).
All these techniques have their pro’s and con’s. They do make it possible to link different surveys together. A missing variable can be considered as missing data for all respondents in the survey. In the forthcoming Master thesis of the author for instance regression imputation is used to link two different data sets together. This enables the possibility to test a causal model based on six different theories. It is thus not necessary to directly oppose different theories, but they can be related in order to get the best explanation (Giezen, forthcoming). In the case of this research for instance, the six theories are used to explain antipathy towards membership of the EU. One theory proposes education and income as explaining variables and, another theory intimates that people’s xenophobic feelings are of strong influence. These could very well be related to each other. When these concepts are measured in different surveys, one would necessarily have to compare these. While when these concepts are in one (constructed) survey the can be related both to each other and the dependent variable.
Linking different surveys together thus makes it easier for researcher with little resources to do quantitative interdisciplinary research. Interdisciplinary research will then have both feet, qualitative and quantitative, on scientific ground and can repel the criticism that interdisciplinary research is necessarily only qualitative.
Eijk, C. v.d. (2002) Design issues in electoral research: taking care of (core) business. Electoral Studies, 21, p. 189-206
Giezen, M. (Forthcoming) Master Thesis Political Science.
Olinsky, A. et al. (2002) The comparative efficacy of imputation methods for missing data in structural equation modelling. European Journal of operational research, 151, p. 53-79
Paul, C. et al. (2003) What should we do about missing data. California Center for Population Studies, on-line working paper series.
Waal, T. (2000) A brief overview of imputation methods applied at Statistics Netherlands. Netherlands Official Statistics, 15(3) p. 23-27
Mendel Giezen heeft de Bèta Gamma propedeuse gedaan. Nu studeert hij politicologie en doet daarnaast de onderzoeksmaster Metropolitan Studies