tfplan.planners.stochastic package


tfplan.planners.stochastic.hindsight module

class tfplan.planners.stochastic.hindsight.HindsightPlanner(rddl, config)

Bases: tfplan.planners.stochastic.StochasticPlanner

HindsightPlanner class implements an online gradient-based planner that chooses the next action based on the upper bound of the Value function of the current state.

  • rddl (str) – A RDDL domain/instance filepath or rddlgym id.
  • config (Dict[str, Any]) – The planner config dict.
tfplan.planners.stochastic.simulation module

class tfplan.planners.stochastic.simulation.Trajectory(states, actions, interms, rewards)

Bases: tuple


Return a new OrderedDict which maps field names to their values.

_fields = ('states', 'actions', 'interms', 'rewards')
classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)

Make a new Trajectory object from a sequence or iterable


Return a new Trajectory object replacing specified fields with new values

class tfplan.planners.stochastic.simulation.SimulationCell(compiler, policy, config=None)

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

SimulationCell class implements an RNN cell that simulates the next state and reward for the MDP transition given by the RDDL model.

  • compiler (rddl2tf.compilers.ReparameterizationCompiler) – The RDDL2TF compiler.
  • policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).
  • config (Dict[str, Any]) – A config dict.

Returns the MDP state size.


Returns the MDP action size.


Returns the MDP intermediate state size.


Returns the simulation cell output size.

class tfplan.planners.stochastic.simulation.Simulator(compiler, policy, config)

Bases: object

Simulator class implements an RNN-based trajctory simulator for the RDDL model.

  • compiler (rddl2tf.compilers.DefaulCompiler) – The RDDL2TF compiler.
  • policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).
  • config (Dict[str, Any]) – A config dict.

Returns the compiler’s graph.


Returns the policy’s batch size.


Returns the policy’s batch size.


Builds the reparametrized recurrent cell.

trajectory(initial_state, sequence_length=None)

Returns the state-action-reward trajectory induced by the given initial_state and policy.

  • initial_state (Sequence[tf.Tensor]) – The trajectory’s initial state.
  • sequence_length (tf.Tensor(shape=(batch_size,))) – An integer vector
  • the trajectories' number of timesteps. (defining) –

The collection of states-actions-interms-rewards trajectory. final_state (Sequence[tf.Tensor]): The trajectory’s final state. total_reward (tf.Tensor(shape=(batch_size,))): The trajectory’s total reward.

Evaluates the given trajectory.

classmethod timesteps(batch_size, horizon)

Returns the batch-sized increasing-horizon timesteps tensor.

tfplan.planners.stochastic.straightline module

class tfplan.planners.stochastic.straightline.StraightLinePlanner(rddl, config)

Bases: tfplan.planners.stochastic.StochasticPlanner

StraightLinePlanner class implements the online gradient-based planner that chooses the next action based on the lower bound of the Value function of the start state.

  • rddl (str) – A RDDL domain/instance filepath or rddlgym id.
  • config (Dict[str, Any]) – The planner config dict.
tfplan.planners.stochastic.utils module

Collection of reparameterization utility functions.

tfplan.planners.stochastic.utils.get_noise_samples(reparameterization_map, batch_size, horizon)
tfplan.planners.stochastic.utils.decode_inputs_as_noise_samples(inputs, encoding)
tfplan.planners.stochastic.utils.evaluate_noise_samples_as_inputs(sess, samples)

Module contents

class tfplan.planners.stochastic.StochasticPlanner(rddl, compiler_cls, config)

Bases: tfplan.planners.planner.Planner

StochasticPlanner abstract class implements basic methods for online stochastic gradient-based planners.

  • rddl (str) – A RDDL domain/instance filepath or rddlgym id.
  • compiler_cls (rddl2tf.Compiler) – The RDDL-to-TensorFlow compiler class.
  • config (Dict[str, Any]) – The planner config dict.

Builds the planner.

_get_action(actions, feed_dict)
run(timestep, feed_dict)
