Forskningsprojekt Projektet syfte är att digitalisera och tillgängliggöra befolkningsdata på individnivå från Umeåregionen, 1900-1950, för forskning inom ramen för den nya flergenerationsdatabasen POPLINK, en infrastruktur av internationellt hög klass.
Med POPLINK öppnas möjligheter till nydanande forskning inom livsvetenskaper, social, ekonomisk och samhällsvetenskaplig forskning, där flergenerationsperspektivet är väsentligt för förståelsen av samhällsförändring, förändrade livsmönster och livsvillkor i ett långt tidsperspektiv. Digitala individdata från tiden för den svenska välfärdsstatens framväxt kommer för första gången att bli tillgängliga för registerdataforskning i stor skala. Databasens storlek, generationsdjup, detalj- och variabelrikedom gör den väl lämpad för populationsbaserade studier. Digitalisering och länkning av kyrkböckers individdata utförs av Demografiska databasen.
Finansieringsår: 2011, 2012, 2013
Huvudman: Sören Edvinsson
Finansiär: Riksbankens jubileumsfond
y2011: 1420, y2012: 1376, y2013: 1418
A new infrastructure, making the most of Swedish registry data
The long term objective of the proposed project is to build a new multigenerational research infrastructure, POPLINK, for longitudinal studies within social sciences, humanities and life sciences, covering the coastal region in Västerbotten until 1950. When linked with today’s administrative research registers, it will produce a unique resource of international relevance, which for the first time makes it possible to take advantage of the full scientific potential of the rich Swedish registry data for cutting edge research. This application explicitly concerns funding for including data from Umeå in POPLINK.
The detail and quality of Swedish registry resources is widely acknowledged. Longitudinal registers covering the entire population are available from late 17th century until present day, which is unique in an international perspective. The long standing Swedish tradition of data collection is further expressed in a wide range of excellent research registers on living conditions, health and social welfare on the individual level. Over the years, these high class resources have contributed to considerable achievements within a large number of disciplines and have provided Swedish scientists with a competitive edge; in social sciences and humanities as well as within the life sciences. Nevertheless, the full scientific potential of Swedish registry data has not yet been thoroughly exploited. The main obstacle is the lack of digitized population data on the individual level for the period 1900-1950, as digitized historical population registers usually end around 1900 and modern digital registers were not kept until the 1950-60s and forward. Another complication is that data for the period before 1950 lacks civil registration number, the key variable for straightforward linkage between registers. These difficulties do not only hamper research on a period of vital interest in Swedish history, they also impede the prospects of combining longitudinal and multigenerational population data with modern research registers, which would bring about new and groundbreaking scientific possibilities.
In order to master these obstacles and to meet the present needs within several fields of research, the new research infrastructure, POPLINK, is developed by the Demographic Data Base (DDB) in close collaboration with Statistics Sweden (SCB). This comprehensive database will significantly increase access to high quality Swedish population data, as it combines a digitization of population registers for the period 1900-1950 with a high quality linkage to modern administrative research registers, such as the Multi-Generation Register (MGR) at SCB. Methods for secure linkage to any civil registration number-based register have been developed and tested in a joint project with SCB (Engberg 2007). In this respect, this infrastructural project differs from other digitization initiatives where the data, without this fundamental advantage, only can be used within specific research projects.
Advantages with focusing on Västerbotten POPLINK will be built to cover the coastal area in Västerbotten, i.e. Skellefteå and Umeå with adjacent parishes. The selected region has a considerable size in terms of area as well as population, and represents the main population cluster in the province. Also in this respect, the proposed project surpasses previous digitization efforts of 20th century data, which primarily have involved small regions with a limited population. The coastal region in Västerbotten covers more than 2/3 of the population in the province, and the same pattern applies to the 20th century. The general development presents the same distinctive traits as in many other Swedish regions; mainly rural until the early 20th century and thereafter characterized by industrialisation and a growing public sector. One major scientific advantage with focusing on this region is the extremely favourable preconditions for linkage to other registry resources, such as the population based Västerbotten Intervention Programme (VIP), the MONICA and BETULA registers and the UMEÅ MEDICAL BIOBANK. (In all these registers, like in almost every other administrative Swedish register, the civil registration number is used as a key variable for linkage with other data.) With this target population, POPLINK will enhance the prospects of acquiring multigenerational information for a majority of the individuals mapped in these valuable resources.
Moreover, digitized parish records for Skellefteå c.1680-1900 are available in DDB’s historical population database POPUM, which, linked to the new infrastructure, extends its multigenerational scope up to 15 generations. The inclusion of population data from Skellefteå in POPLINK, the northern part of the coastal region, has started and is covered by a grant from Umeå University, co-financed by the Demographic Data Base. Yet, to reach the full scientific potential and value of this unique resource and optimizing the synergies with the Västerbotten based registers, it is necessary include data also from Umeå. With Skellefteå and Umeå included in POPLINK it will have a dimension well suited to large scale studies, providing detailed life course data about c. 300,000 individuals for the period 1900-1950, c. 2,000,000 records. The rich data reaches far beyond basic demographic variables, including information about kinship, occupation, health, mobility, as well as other socio-economic and demographic conditions on the individual level, over the life course. This scope and detail is unequalled in an international perspective, where information like this, if it exists, is usually restricted to one or two generations.
With the increased availability of rich, longitudinal population data and the favourable linkage possibilities to any Swedish official register, POPLINK can constitute a platform for research within a broad range of fields and issues. Being able to link multigenerational information with registers such as the VIP, the MONICA and the BETULA registers and to the UMEÅ MEDICAL BIOBANK will add further dimensions to unique registry data and bring forth novel perspectives upon relevant issues. These registers constitute core facilities for the interdisciplinary Linnaeus programme Ageing and Living Conditions (ALC), hosted by the Centre for Population Studies (CPS) at Umeå University. Collaboration between social scientists, statisticians, humanists and epidemiologists within the programme have opened up several new avenues of research and broadened the issues being addressed in these resources.
Yet, these widely used registers include variables which still are waiting to be analyzed in depth. The VIP covers data from more than 113,000 health examinations of more than 87,000 participants, of which 70 percent are resident in the coastal region. The available information spans over medical records, socio-economic variables, health and lifestyle factors as well as self-reported information on well-being and quality of life. Along with the MONICA study, addressing trends and risk factors for cardiovascular diseases, VIP has elicited a large number of publications on the interaction between health, lifestyle and risk factors. However, these rich registers have also a distinct, and largely unexploited, potential to serve as the basis for longitudinal and cross sectional studies of social conditions and processes, further strengthened by the prospects of adding a multigenerational
perspective to issues such as social mobility, class, education and family economics. The same applies to the BETULA register on cognition, health, and lifestyle on the individual level, collected to study the development of the memory function in adulthood and late life, e.g. early signs and potential risk factors of dementia. This register, today covering 4,200 participants in Umeå, also includes useful socio-economic variables, which until now have been poorly utilized by social scientists. Sweden’s largest biobank, UMEÅ MEDICAL BIOBANK, with more than 185,000 samples from 101,132 unique individuals, will also significantly benefit from the availability of multigenerational data for the coastal region, given that the large majority of the samples in its main cohort are collected from persons resident in Umeå or Skellefteå.
With the construction of POPLINK, the period 1900-1950 will for the first time be opened up for large scale longitudinal research as a consequence of the significantly increased availability of highquality
microdata. The new infrastructure will be an invaluable resource for studies of this period of profound socio-economic change, when Sweden was remodelled from a mainly agrarian community into a modern welfare state, with dramatic improvements in health, social welfare and standard ofliving (Sundin & Willner 2007). Mortality rates fell strikingly (Hofsten 1986), household size and family structures changed, fertility rates declined and women’s position in society was markedly altered (Stanfors 2003; Myrdal 1934; Edin & Hutchinson 1935). Since digitized data on the individual level for this period is almost inexistent today, these processes of fundamental importance for social development and economic progress have until now only been possible to observe and study in aggregate statistics, on a provincial, or a national level (Historisk Statistik för Sverige 1969). With POPLINK, it will be possible to conduct in-depth studies of these large-scale transformations in a longitudinal perspective, and thus achieve a better understanding of the finer mechanisms behind social and demographic change affecting households, labour markets, welfare systems and public health. Longitudinal data on the individual level will also be very useful for studying and simulating long-term population dynamics on different levels in society, such as mobility, regional development and migration. The multigenerational features of POPLINK will further be of key importance to increase our understanding of the mechanisms behind the transmission of demographic and socioeconomic patterns between generations, such as fertility, family-size, age of marriage and infant mortality (van Poppel et. al. 2008; Parrado 2008) which all show traits of social inheritance. Access to high-quality data on the individual level and large datasets with excellent information of family and kinship, also for previous generations, will enable new and innovative studies in order to increase our understanding of these processes. How behavioural factors are passed on from one generation to the other is of high social relevance, for instance in today’s policy making (d’Addio 2007).
The construction of POPLINK will further give precedence to researchers within the life sciences, where information about individual relatedness in a population over several generations is of crucial importance. Access to multigenerational life course data, health survey information including life-style factors on the individual level, modern techniques and registry resources, will allow researchers to develop new perspectives and methods to address issues related to heritability, intergenerational patterns in disease incidence, and the interaction between different risk factors over long time periods (Franks et. al. 2007;Ling & Groop 2009). With this new infrastructure observations of large prospective cohorts can be supplemented by retrospective studies, since the historical registers also enables inclusion of lifestyle factors in earlier generations in the analysis. This prospective of developing new standards within the field of genetic epidemiology has already elicited large interest among scientists in Sweden and in the US. Collaborations have been established and explicit project plans, using the POPLINK data, are currently under preparation. Swedish population data from the 18th and 19th centuries have already successfully been used to generate internationally competitive research within this area, such as the groundbreaking study suggesting that early life conditions can affect the longevity of future generations (Kaati, Bygren et. al 2007). Population data from the DDB have also proved useful as a test-bed to study the influence of endogamy and consanguinity on genetic disorders (Bittles and Egerbladh 2005). With POPLINK and its extraordinary abilities to integrate high quality population data with modern research registers, competitive studies like this can be achieved on a much larger scale.
Methods and qualifications
Parish records in extenso from Umeå (longitudinal parish registers, birth, marriage, death and migration registers) will be digitized for the period 1900-c.1950, including the essential linkage and refinement processes necessary for converting individual records into a comprehensive database. Individual records in different sources are brought together into individual life biographies, and family relations (parents-children, spouses) are defined according to carefully applied rules (Mandemakers et.al 2004). The different stages of linkage are executed with a combination of automatic and manual methods of record linkage. Since the unique civil registration numbers were not introduced until 1947, the major part of the digitized records for the actual period have to be linked together on other matching variables, such as name, date of birth, gender, and parish of birth, which is far more complicated than conventional record linkage, and thus considerably more time consuming. Access to civil registration number is in most cases restricted to the youngest generations, those present 1947-1950, but considering the multigenerational features of the resource, this has proved sufficient for a successful linkage to modern registry resources. Detailed analysis and validation of established links between digitized data for 1900-1950 and the MGR shows that than almost 100 percent of the individuals in the DDB database belonging to the target population of the MGR, could be linked to the modern population register. Data entry and linkage are executed with tested and reliable methods and systems by trained data entry assistants, following consistent principles and international standards. Several control measures have been developed and implemented to assure consistency and high quality. Established in 1973, the DDB has a solid experience of building and long-term maintenance of large longitudinal research databases and is today an internationally renowned and experienced agent within this particular field.
Because POPLINK involves information about living persons, and parish records younger than 70 years are protected by the law of confidentiality, special measures have to be taken to protect the
individual’s right to privacy and to safeguard sensitive data. The entire data production process, including all stages of linkage described above, is executed in closed computer networks and all personnel involved in the project are committed to professional secrecy. These measures, described in detail in DDB’s confidentiality policy, vouches for high security in data processing, database maintenance and database access. In order to avoid storage of civil registration numbers for living persons at the DDB, which actually would compromise Swedish laws on confidentiality and protection
of privacy, all subsequent linkage to modern registers will be executed at the SCB. This will be done by way of two separate key tables, created during the digitization process. Each individual’s unique identity in POPLINK (a serial number) is matched against the civil registration number by way of another anonymous serial number. The table including the civil registration number will be permanently maintained at SCB, which is confirmed in a long term bi-lateral agreement between DDB and SCB.
Long term maintenance and accessibility POPLINK will be permanently hosted by the DDB, who will have the full responsibility for completed database, including continuous care and long-term maintenance; an arrangement in full understanding with the Swedish National Data Service, SND. With a profound experience in database building and management, a capable organisation and highly competent and quality-conscious personnel, DDB is well suited to handle a general infrastructure of this character. This implies optimal pre-conditions for the building and continuous management of the new infrastructure in terms of quality, security and cost effectiveness. POPLINK will be administrated within the quality assured DDB framework of routines for database production, administration and maintenance characterised by continuous quality monitoring and documentation. Database maintenance and long term support will be managed at the DDB by skilled system developers and technicians. Data from POPLINK will be available for researchers in all disciplines through DDB’S well established, strictly documented procedures for data retrievals, in this case with certain restrictions. Since the data contain information about living persons all requests for datasets must always be accompanied by an authorization from a regional ethics review board. Datasets will be anonymized and the conditions for use of the material, including aspects of confidentiality, will be stated in a legal contract between the researcher and the DDB. When a linkage to the MGR, or other registry resources, is required, the process will always be executed at SCB. As an incentive to strengthen registry research, encourage long-term preservation of data, and increase the accessibility of data, Umeå University has established a coordinating framework, DORUM, in which DDB has a key role and administrative responsibilities. In addition to DDB’s solid experience within this field, the presence of this framework with explicit strategies for long term preservation and increased accessibility of data vouches for a save long-term availability of research data in POPLINK. The applicants, docent Sören Edvinsson (main applicant) and FD Elisabeth
Engberg (co-applicant), have solid experience in database building and possess in-depth knowledge about processing historical data. Both are also qualified researchers within the fields of social history
and population studies, with full-time positions at the DDB.