Military stochastic scheduling treated as a "multi-armed bandit" problem
A Blue airborne force attacks a region defended by a single Red surface-to-air missile system (SAM). Red is uncertain about the Blues he faces, but is able to learn about them during the engagement. Red's objective is to develop a policy for shooting at the Blues to maximize the value of Blues...
Main Author: | |
---|---|
Other Authors: | |
Format: | Report |
Language: | unknown |
Published: |
Monterey, California. Naval Postgraduate School
2001
|
Subjects: | |
Online Access: | https://hdl.handle.net/10945/15377 |
Summary: | A Blue airborne force attacks a region defended by a single Red surface-to-air missile system (SAM). Red is uncertain about the Blues he faces, but is able to learn about them during the engagement. Red's objective is to develop a policy for shooting at the Blues to maximize the value of Blues shot down before he himself is destroyed. We show that index policies are optimal for Red in a range of scenarios and yield effective heuristics more generally. The quality of such index heuristics is confirmed in a computational study. Contract number: N00244-10-10031. |
---|