tfplan.planners.deterministic package¶

Submodules¶

tfplan.planners.deterministic.simulation module¶

class tfplan.planners.deterministic.simulation.OutputTuple(state, action, interm, reward)¶

Bases: tuple

_asdict()¶: Return a new OrderedDict which maps field names to their values.

_fields = ('state', 'action', 'interm', 'reward')¶

classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)¶: Make a new OutputTuple object from a sequence or iterable

_replace(**kwds)¶: Return a new OutputTuple object replacing specified fields with new values

_source = "from builtins import property as _property, tuple as _tuple\nfrom operator import itemgetter as _itemgetter\nfrom collections import OrderedDict\n\nclass OutputTuple(tuple):\n 'OutputTuple(state, action, interm, reward)'\n\n __slots__ = ()\n\n _fields = ('state', 'action', 'interm', 'reward')\n\n def __new__(_cls, state, action, interm, reward):\n 'Create new instance of OutputTuple(state, action, interm, reward)'\n return _tuple.__new__(_cls, (state, action, interm, reward))\n\n @classmethod\n def _make(cls, iterable, new=tuple.__new__, len=len):\n 'Make a new OutputTuple object from a sequence or iterable'\n result = new(cls, iterable)\n if len(result) != 4:\n raise TypeError('Expected 4 arguments, got %d' % len(result))\n return result\n\n def _replace(_self, **kwds):\n 'Return a new OutputTuple object replacing specified fields with new values'\n result = _self._make(map(kwds.pop, ('state', 'action', 'interm', 'reward'), _self))\n if kwds:\n raise ValueError('Got unexpected field names: %r' % list(kwds))\n return result\n\n def __repr__(self):\n 'Return a nicely formatted representation string'\n return self.__class__.__name__ + '(state=%r, action=%r, interm=%r, reward=%r)' % self\n\n def _asdict(self):\n 'Return a new OrderedDict which maps field names to their values.'\n return OrderedDict(zip(self._fields, self))\n\n def __getnewargs__(self):\n 'Return self as a plain tuple. Used by copy and pickle.'\n return tuple(self)\n\n state = _property(_itemgetter(0), doc='Alias for field number 0')\n\n action = _property(_itemgetter(1), doc='Alias for field number 1')\n\n interm = _property(_itemgetter(2), doc='Alias for field number 2')\n\n reward = _property(_itemgetter(3), doc='Alias for field number 3')\n\n"¶

action¶: Alias for field number 1

interm¶: Alias for field number 2

reward¶: Alias for field number 3

state¶: Alias for field number 0

class tfplan.planners.deterministic.simulation.Trajectory(states, actions, interms, rewards)¶

Bases: tuple

_asdict()¶: Return a new OrderedDict which maps field names to their values.

_fields = ('states', 'actions', 'interms', 'rewards')¶

classmethod _make(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)¶: Make a new Trajectory object from a sequence or iterable

_replace(**kwds)¶: Return a new Trajectory object replacing specified fields with new values

_source = "from builtins import property as _property, tuple as _tuple\nfrom operator import itemgetter as _itemgetter\nfrom collections import OrderedDict\n\nclass Trajectory(tuple):\n 'Trajectory(states, actions, interms, rewards)'\n\n __slots__ = ()\n\n _fields = ('states', 'actions', 'interms', 'rewards')\n\n def __new__(_cls, states, actions, interms, rewards):\n 'Create new instance of Trajectory(states, actions, interms, rewards)'\n return _tuple.__new__(_cls, (states, actions, interms, rewards))\n\n @classmethod\n def _make(cls, iterable, new=tuple.__new__, len=len):\n 'Make a new Trajectory object from a sequence or iterable'\n result = new(cls, iterable)\n if len(result) != 4:\n raise TypeError('Expected 4 arguments, got %d' % len(result))\n return result\n\n def _replace(_self, **kwds):\n 'Return a new Trajectory object replacing specified fields with new values'\n result = _self._make(map(kwds.pop, ('states', 'actions', 'interms', 'rewards'), _self))\n if kwds:\n raise ValueError('Got unexpected field names: %r' % list(kwds))\n return result\n\n def __repr__(self):\n 'Return a nicely formatted representation string'\n return self.__class__.__name__ + '(states=%r, actions=%r, interms=%r, rewards=%r)' % self\n\n def _asdict(self):\n 'Return a new OrderedDict which maps field names to their values.'\n return OrderedDict(zip(self._fields, self))\n\n def __getnewargs__(self):\n 'Return self as a plain tuple. Used by copy and pickle.'\n return tuple(self)\n\n states = _property(_itemgetter(0), doc='Alias for field number 0')\n\n actions = _property(_itemgetter(1), doc='Alias for field number 1')\n\n interms = _property(_itemgetter(2), doc='Alias for field number 2')\n\n rewards = _property(_itemgetter(3), doc='Alias for field number 3')\n\n"¶

actions¶: Alias for field number 1

interms¶: Alias for field number 2

rewards¶: Alias for field number 3

states¶: Alias for field number 0

class tfplan.planners.deterministic.simulation.SimulationCell(compiler, policy)¶

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

SimulationCell class implements an RNN cell that simulates the next state and reward for the MDP transition given by the RDDL model.

Parameters:	compiler (rddl2tf.compilers.DefaulCompiler) – The RDDL2TF compiler. policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).

state_size¶: Returns the MDP state size.

action_size¶: Returns the MDP action size.

interm_size¶: Returns the MDP intermediate state size.

output_size¶: Returns the simulation cell output size.

class tfplan.planners.deterministic.simulation.Simulator(compiler, policy)¶

Bases: object

Simulator class implements an RNN-based trajctory simulator for the RDDL model.

Parameters:	compiler (rddl2tf.compilers.DefaulCompiler) – The RDDL2TF compiler. policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).

graph¶: Returns the compiler’s graph.

batch_size¶: Returns the policy’s batch size.

horizon¶: Returns the policy’s batch size.

build()¶: Builds the recurrent cell ops by embedding the policy in the transition sampling cell.

trajectory(initial_state)¶

Returns the state-action-reward trajectory induced by the given initial_state and policy.

Parameters:	initial_state (Sequence[tf.Tensor]) – The trajectory’s initial state.
Returns:	The collection of states-actions-interms-rewards trajectory. final_state (Sequence[tf.Tensor]): The trajectory’s final state. total_reward (tf.Tensor(shape=(batch_size,))): The trajectory’s total reward.
Return type:	trajectory (Trajectory)

run(trajectory)¶: Evaluates the given trajectory.

classmethod timesteps(batch_size, horizon)¶: Returns the batch-sized decreasing-horizon timesteps tensor.

tfplan.planners.deterministic.tensorplan module¶

class tfplan.planners.deterministic.tensorplan.Tensorplan(rddl, config)¶

Bases: tfplan.planners.planner.Planner

Tensorplan class implements the Planner interface for the offline gradient-based planner (i.e., tensorplan).

Parameters:	model (pyrddl.rddl.RDDL) – A RDDL model. config (Dict[str, Any]) – The planner config dict.

logdir¶

build()¶: Builds planner ops.

_build_init_ops()¶

_build_policy_ops()¶

_build_initial_state_ops()¶

_build_trajectory_ops()¶

_build_loss_ops()¶

_build_optimization_ops()¶

_build_solution_ops()¶

_build_summary_ops()¶

run()¶

Run the planner for the given number of epochs.

Returns:	The best solution plan.
Return type:	plan (Sequence(np.ndarray)

_abc_cache = <_weakrefset.WeakSet object>¶

_abc_negative_cache = <_weakrefset.WeakSet object>¶

_abc_negative_cache_version = 53¶

_abc_registry = <_weakrefset.WeakSet object>¶

tfplan.planners.deterministic.utils module¶

Collection of RNN-based simulation utility functions.

tfplan.planners.deterministic.utils.cell_size(sizes)¶

tfplan.planners.deterministic.utils.to_tensor(fluents)¶

tfplan.planners.deterministic package¶

Submodules¶

tfplan.planners.deterministic.simulation module¶

tfplan.planners.deterministic.tensorplan module¶

tfplan.planners.deterministic.utils module¶

Module contents¶