Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits ...

A common approach to learning mobile health (mHealth) intervention policies is linear Thompson sampling. Two desirable mHealth policy features are (1) pooling information across individuals and time and (2) incorporating a time-varying baseline reward. Previous approaches pooled information across i...

Full description

Bibliographic Details
Main Authors:	Huch, Easton K., Shi, Jieru, Abbott, Madeline R., Golbus, Jessica R., Moreno, Alexander, Dempsey, Walter H.
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2023
Subjects:	Machine Learning stat.ML Machine Learning cs.LG FOS Computer and information sciences DML
Online Access:	https://dx.doi.org/10.48550/arxiv.2312.06403 https://arxiv.org/abs/2312.06403

Description
Summary:	A common approach to learning mobile health (mHealth) intervention policies is linear Thompson sampling. Two desirable mHealth policy features are (1) pooling information across individuals and time and (2) incorporating a time-varying baseline reward. Previous approaches pooled information across individuals but not time, failing to capture trends in treatment effects over time. In addition, these approaches did not explicitly model the baseline reward, which limited the ability to precisely estimate the parameters in the differential reward model. In this paper, we propose a novel Thompson sampling algorithm, termed ''DML-TS-NNR'' that leverages (1) nearest-neighbors to efficiently pool information on the differential reward function across users and time and (2) the Double Machine Learning (DML) framework to explicitly model baseline rewards and stay agnostic to the supervised learning algorithms used. By explicitly modeling baseline rewards, we obtain smaller confidence sets for the differential reward ...

Debiased Machine Learning and Network Cohesion for Doubly-Robust Differential Reward Models in Contextual Bandits ...

Similar Items