Research project Probabilistic expert system for prediction of unemployment scenarios of individuals using Bayesian graphical models
Unemployment is a considerable socio–economic problem; at the macro level it has adverse impacts on the country economy and at the micro level, unemployment can ruin an individual’s human capital, health, etc. Therefore, for people who are at risk of unemployment or who are already unemployed prediction of their future employment scenarios is of interest. The predictions must be individualized since people have different personal characteristics, live in different local economies and so on. In this project we propose to develop models for prediction of probabilities of future unemployment behaviours. We propose to use a Bayesian graphical modelling framework, which have proven be a powerful tool to combine different sources of information in an elegant way.
Two people who become unemployed at the same time will typically not have the same opportunities to find new jobs quickly. In particular, they will have different probabilities of becoming long–term unemployed depending on where they live, their education, their professional experience, etc. A long list of individual demographic and socio-economic characteristics, as well as local labour market conditions, are expected to affect the probability that an unemployed person will find a new job. That different people have different propensities to become long–term unemployed is not a new insight, and the case workers at unemployment agencies are expected to make subjective judgments on such propensities when allocating resources for helping unemployed people in their job search. There is, nowadays, an increasing interest in statistical systems (called profiling systems) that can help the case workers identify people that have higher chances to be long–term unemployed by combining information collected at the unemployment agency with digitized individual characteristics available in administrative registries. The Nordic countries have extensive register information on their citizens, including labour market participation history, making such systems practical to implement.
Behncke et al. (2007b) point out that profiling and targeting (assessment of active labour market program effects) are hotly debated topics in countries like the UK, Germany, Denmark, Finland and Sweden. A profiling system has been implemented in Denmark, as described in Rosholm et al. (2006). The system predicts whether an unemployed person will remain unemployed for another six months after his/her entrance to the employment office. These predictions are used by case workers in their daily work. The authors emphasize the need of statistical systems both for profiling and targeting to enhance case workers’ practices. The Swedish National Labour Market Board (SNLMB) has recently implemented a profiling system in a pilot region, Gävleborg, based on the models developed in Bennmarker et al. (2007). A very preliminary evaluation of this system could not show that individuals with high probability of being long–term unemployed came back into the labour market more rapidly if they were profiled by the system. However, when the case workers were asked if they wanted to continue having a statistical profiling tool, about 80% answered positively. Thus, the SNLMB is now planning to implement the profiling system throughout the country by 2011. Profiling systems are also valuable to the SNLMB when they outsource their services to other actors, because it can help them in deciding about resource allocation.
Existing profiling systems focus on predicting unemployment duration (or the probability to be unemployed at a given time, e.g., six months, after exit from employment). Statistical models of unemployment duration prediction are traditionally based on the survival analysis approach, as used in many fields, such as medicine, epidemiology, insurance, etc. A fair amount of research has been conducted to find the determinants of unemployment duration using this traditional approach. Hazard functions in polynomial form were used with some determinants of unemployment duration in the context of competing risks in Addison and Portugal (2003), where the objective was to find defective risks for the two destination states of employed and inactive. The effects of various individual characteristics and local demand conditions on unemployment duration were analyzed in Qupets (2006) using general hazard models. Wichert and Wilke (2008) proposed a new and simple non–parametric estimation method for conventional survival models applied in unemployment duration analysis. They have shown that their estimates are computationally efficient through simulations and real–world applications. The survival models approach is also used in the Danish and Swedish profiling systems mentioned above; see Rosholm et al. (2006) and Bennmarker et al. (2007). In fact, a fair amount of research has been done with this traditional approach, mostly for single–spell duration data. However, Hamerle (1989) discussed a generalization of the approach to multiple–spell duration data. Furthermore the author showed the implication of different time scales for predictions; they lead to different underlying Markov processes that are assumed to generate those multiple–spells.
However, other methods have also been proposed, and, for instance, Guell and Hu (2006) estimate the probability of exit from unemployment with an econometric method using cross-sectional data from the Spanish Labour Force Survey that was carried out quarterly on a sample of some 60,000 households in 1980s and 1990s. Unlike the proportional hazard model, the effect of covariates on the unemployment continuation probability is not proportional. Therefore, they could investigate the different possible changes in duration distributions for different reference workers. Shimer (2008) examined the duration dependence, the cyclicality of duration dependence and aggregate economic conditions in the job finding probability of individuals both empirically and theoretically. The examination was based on a simple and novel model that takes into account the life–time utilities of unemployed people in different scenarios. In Neocleous and Portnoy (2008), censored regression quantile methods extended to a partial linear setting were applied for unemployment duration prediction. There seems to be many alternative modelling frameworks that have been suggested for this problem. Here we use a framework that has attracted much attention within the statistical community for uncertainty modelling, reasoning and prediction in many application domains, namely, the probabilistic Markov graphical modelling methodology.
The aim of this project is to develop a profiling system based on probabilistic graphical models. In contrast to the literature reviewed above, we will not focus on unemployment duration because entering employment does not guarantee that a new unemployment spell does not occur soon after. We instead define different states by looking at the amount of days employed during a certain period (say 1 year) after exit from employment. For instance, one could consider four categories: less than 5% of time in employment during 1 year, between 5% and 30% of time in employment, between 30% and 70% in employment, and more than 70% of time in employment during 1 year. Then we plan to develop Bayesian graphical models (for example, Cowel et al. (1999)) to predict in which of these categories a newly unemployed individual will end up after one year from exit of employment, given the information available at the time of exit. A similar method is to apply a dynamic version of the graphical modelling framework (see below for description). It is possible to consider time–wise employment states of the individual, for instance, if weekly time steps are considered, then whether the individual is unemployed, temporary employed, employed or similar at such time periods into the future can be modelled using past information on employment status and other covariates.
Bayesian graphical models have proven useful as expert systems, for instance, to provide medical diagnostics. They have the advantage of being able to provide co-lateral insights by describing the causal and association structure of the phenomenon predicted. Such a profiling system could, therefore, not only be useful to the case workers in the daily work but also to policy makers and socio–political scientists for the suggestion and formulation of remedies for social and economical problems of the society.
The overall objectives of the project are the following:
1. Develop a profiling system based on probabilistic Markov Bayesian graphical models, and in particular, develop methods for the selection of causal and association features for building Bayesian graphical models using longitudinal micro and macro data.
2. Compare profiling by Bayesian graphical modelling and systems based on traditional survival methods for the prediction of unemployment duration.
3. Develop user friendly software and make it available on the internet.