Presentation Title

Improving on Iterative Approaches for Solving Fully-Specified POMDPs

Faculty Mentor

Andrew Forney

Start Date

23-11-2019 8:00 AM

End Date

23-11-2019 8:45 AM

Location

269

Session

poster 1

Type of Presentation

Poster

Subject Area

engineering_computer_science

Abstract

Markov Decision Processes (MDPs) are discrete, mathematical formu- lations used to simulate generalized sequential decision making. An agent acting within an MDP must optimize expected future reward by making a decision on which action to take given the current state. Partially Ob- servable MDPs (POMDPs) are MDPs in which the state is not known by the agent, and instead the agent must act upon observations given by the environment at each time step. This work presents a novel solution that is more sample-efficient than traditional methods for fully-specified POMDPs, viz. when transition probabilities between states as well as observations are given to the agent. Currently, traditional approaches for solving POMDPs require iterative learning processes that converge slowly when not exploiting opportunities for linear parallelism. Consequently, we present a closed-form solution that is derived algebraically from tradi- tional iterative update rules. Solving this closed-form solution yields an accelerated learning rate that enables a jump start unemployed by tradi- tional iterative methods. Simulation results support the efficacy of this method on traditional POMDPs. Additionally, applied and theoretical impliciations of this method are discussed.

This document is currently not available here.

Share

COinS
 
Nov 23rd, 8:00 AM Nov 23rd, 8:45 AM

Improving on Iterative Approaches for Solving Fully-Specified POMDPs

269

Markov Decision Processes (MDPs) are discrete, mathematical formu- lations used to simulate generalized sequential decision making. An agent acting within an MDP must optimize expected future reward by making a decision on which action to take given the current state. Partially Ob- servable MDPs (POMDPs) are MDPs in which the state is not known by the agent, and instead the agent must act upon observations given by the environment at each time step. This work presents a novel solution that is more sample-efficient than traditional methods for fully-specified POMDPs, viz. when transition probabilities between states as well as observations are given to the agent. Currently, traditional approaches for solving POMDPs require iterative learning processes that converge slowly when not exploiting opportunities for linear parallelism. Consequently, we present a closed-form solution that is derived algebraically from tradi- tional iterative update rules. Solving this closed-form solution yields an accelerated learning rate that enables a jump start unemployed by tradi- tional iterative methods. Simulation results support the efficacy of this method on traditional POMDPs. Additionally, applied and theoretical impliciations of this method are discussed.