tfplan.planners.deterministic package


tfplan.planners.deterministic.simulation module

class tfplan.planners.deterministic.simulation.OutputTuple(state, action, interm, reward)

Bases: tuple


Return a new OrderedDict which maps field names to their values.

_fields = ('state', 'action', 'interm', 'reward')
classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)

Make a new OutputTuple object from a sequence or iterable


Return a new OutputTuple object replacing specified fields with new values

Alias for field number 1


Alias for field number 2


Alias for field number 3


Alias for field number 0

class tfplan.planners.deterministic.simulation.Trajectory(states, actions, interms, rewards)

Bases: tuple


Return a new OrderedDict which maps field names to their values.

_fields = ('states', 'actions', 'interms', 'rewards')
classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)

Make a new Trajectory object from a sequence or iterable


Return a new Trajectory object replacing specified fields with new values

Alias for field number 1


Alias for field number 2


Alias for field number 3


Alias for field number 0

class tfplan.planners.deterministic.simulation.SimulationCell(compiler, policy)

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

SimulationCell class implements an RNN cell that simulates the next state and reward for the MDP transition given by the RDDL model.

  • compiler (rddl2tf.compilers.DefaulCompiler) – The RDDL2TF compiler.
  • policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).

Returns the MDP state size.


Returns the MDP action size.


Returns the MDP intermediate state size.


Returns the simulation cell output size.

class tfplan.planners.deterministic.simulation.Simulator(compiler, policy)

Bases: object

Simulator class implements an RNN-based trajctory simulator for the RDDL model.

  • compiler (rddl2tf.compilers.DefaulCompiler) – The RDDL2TF compiler.
  • policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).

Returns the compiler’s graph.


Returns the policy’s batch size.


Returns the policy’s batch size.


Builds the recurrent cell ops by embedding the policy in the transition sampling cell.


Returns the state-action-reward trajectory induced by the given initial_state and policy.

Parameters:initial_state (Sequence[tf.Tensor]) – The trajectory’s initial state.
Returns:The collection of states-actions-interms-rewards trajectory. final_state (Sequence[tf.Tensor]): The trajectory’s final state. total_reward (tf.Tensor(shape=(batch_size,))): The trajectory’s total reward.
Return type:trajectory (Trajectory)

Evaluates the given trajectory.

classmethod timesteps(batch_size, horizon)

Returns the batch-sized decreasing-horizon timesteps tensor.

tfplan.planners.deterministic.tensorplan module

class tfplan.planners.deterministic.tensorplan.Tensorplan(rddl, config)

Bases: tfplan.planners.planner.Planner

Tensorplan class implements the Planner interface for the offline gradient-based planner (i.e., tensorplan).

  • model (pyrddl.rddl.RDDL) – A RDDL model.
  • config (Dict[str, Any]) – The planner config dict.

Builds planner ops.


Run the planner for the given number of epochs.

Returns:The best solution plan.
Return type:plan (Sequence(np.ndarray)
tfplan.planners.deterministic.utils module

Collection of RNN-based simulation utility functions.


Module contents