Research project This project falls under the broad statistical research area called causal inference.
This is a challenging and active area of research which is of strategic importance for Sweden since opportunities to conduct complex and comprehensive observational studies are plentiful due to the possibility to link socio-economics, demographics, geographical and health data on an individual level. The main purpose of this project is to develop new and improved statistical methods for analyzing time-to- event data in studies where treatment assignment is not randomized. Within the causal inference literature, the time-to-event setting has not gained much attention compared to other settings. Providing novel theoretical results and practical recommendations useful for empirical scientists working with this type of data is therefore welcomed by researchers both in causal inference as well as in the empirical sciences.
Swedish Research Council, 2019-2022: 5 000 000 kr
Randomized controlled trials (RCTs) are generally considered the golden standard for estimating causal effects, of e.g., medical treatments on survival time or time to disease recurrence. In many scientific areas it is unfeasible or unethical to perform randomized controlled trials and this is one reason that researchers increasingly conduct observational studies. Despite that treatment assignment is non-randomized, obser- vational studies can be designed to approximate RCTs, and measures similar to those reported from RCTs can be estimated from such studies, e.g., population level treatment effects such as marginal hazard ratios. This project addresses some of the issues researchers face when designing and conducting observational studies.
In the last decade propensity score matching (PSM) and inverse probability of treatment weighting using the propensity score (IPTW-PS) have become enormously popular and widely used within the medical, and other empirical, sciences. However, these methods are not without faults, in short: Both PSM and IPTW-PS requires specifying a model for the propensity score; PSM approximates a completely randomized experiment, which is less efficient than approximating a fully blocked randomized experiment; and IPTW-PS can lead to large standard errors.
A different issue which can lead to uncertain estimates and conclusions is the problem of missing data. A common way to handle missing data is to use multiple imputation, and matched or weighted samples can subsequently be created for each imputed dataset. For the parameter of interest, point estimates from each imputed dataset can be pooled to a single estimate but it is not always straightforward to correctly estimate the standard error (needed for confidence interval construction). In cases where bootstrap standard errors is to be computed it is somewhat unclear how to draw bootstrap samples when both matching/weighting and multiple imputation have been performed.
Last but not least, regardless of if matching, weighting or some other method is used, the researcher has to select a covariate set such that it is probable that, given the covariate set, the effect of treatment on out- come is no longer confounded by any other variables. To aid in this selection reliable data-driven covariate selection procedures, suited for time-to-event outcomes, need to be developed.
Marginal hazard ratios are of interest to policy makers and are the type of parameters estimated from randomized controlled trials. The proposed methods are less model-dependent than existing methods. Adjusting for confounding variables and correct bootstrap sampling is of importance to obtain unbiased results and valid inference.