Title: Towards Valid (reproducible) causal inference
Abstract: With very large datasets statistical inference cannot treat modelling assumptions as if they were given. While sampling variation may be large when small samples are analysed, with very large samples this is not anymore the case and it becomes important to acknowledge uncertainty due to modelling assumptions, since the latter may actually dominate sampling uncertainty. We advocate the use of confidence intervals taking into account both sampling variation and uncertainty due to modelling assumptions. We distinguish two types of modelling assumptions: those that can be investigated empirically with the data at hand and those that cannot. If model assumptions are tested, this should be taken into account in the final inference. Model assumptions which are not testable may be encompassed in a collection of a priori reasonable scenarios, implying a recognition that the parameter of interest may not be point identified but that only subsets of the parameter space, e.g. an identification interval, can be retrieved asymptotically. We review existing results and propose a general strategy for obtaining confidence intervals with desired coverage in the presence of testable and untestable model assumptions. We give illustrations by considering several examples where assumptions on missing data mechanisms are made in the context of classical parametric regression, as well as flexible semi-parametric estimation of causal parameters.