Dynamic treatment regimes (DTRs) arise in situations where each patient receives a series of treatments at different time points (stages). This situation often arises in treatment of chronic diseases such as sepsis or HIV. The goal is to find the "best" (optimal) treatment regime.
Semi-supervised DTRs with estimation based approach:
In reality, precise information on the treatment outcome can be scarce, but a large amount of outcome related information can still be cheaply available. This gives rise to the so-called semi-supervised setting. We construct semi-supervised DTRs using linear Q-learning, and rigorously establish that good quality additional information on the outcome translates to increased stability of the estimators in large samples. For more information, see our paper "Semi-Supervised Off Policy Reinforcement Learning".