tfplan.planners.stochastic package¶
Submodules¶
tfplan.planners.stochastic.hindsight module¶
-
class
tfplan.planners.stochastic.hindsight.
HindsightPlanner
(rddl, config)¶ Bases:
tfplan.planners.stochastic.StochasticPlanner
HindsightPlanner class implements an online gradient-based planner that chooses the next action based on the upper bound of the Value function of the current state.
Parameters: - rddl (str) – A RDDL domain/instance filepath or rddlgym id.
- config (Dict[str, Any]) – The planner config dict.
-
_build_policy_ops
()¶
-
_build_base_policy_ops
()¶
-
_build_scenario_policy_ops
()¶
-
_build_trajectory_ops
()¶
-
_build_scenario_start_states_ops
()¶
-
_build_scenario_trajectory_ops
()¶
-
_build_loss_ops
()¶
-
_build_summary_ops
()¶
-
_abc_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache_version
= 53¶
-
_abc_registry
= <_weakrefset.WeakSet object>¶
tfplan.planners.stochastic.simulation module¶
-
class
tfplan.planners.stochastic.simulation.
Trajectory
(states, actions, interms, rewards)¶ Bases:
tuple
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
_fields
= ('states', 'actions', 'interms', 'rewards')¶
-
classmethod
_make
(iterable, new=<built-in method __new__ of type object>, len=<built-in function len>)¶ Make a new Trajectory object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new Trajectory object replacing specified fields with new values
-
_source
= "from builtins import property as _property, tuple as _tuple\nfrom operator import itemgetter as _itemgetter\nfrom collections import OrderedDict\n\nclass Trajectory(tuple):\n 'Trajectory(states, actions, interms, rewards)'\n\n __slots__ = ()\n\n _fields = ('states', 'actions', 'interms', 'rewards')\n\n def __new__(_cls, states, actions, interms, rewards):\n 'Create new instance of Trajectory(states, actions, interms, rewards)'\n return _tuple.__new__(_cls, (states, actions, interms, rewards))\n\n @classmethod\n def _make(cls, iterable, new=tuple.__new__, len=len):\n 'Make a new Trajectory object from a sequence or iterable'\n result = new(cls, iterable)\n if len(result) != 4:\n raise TypeError('Expected 4 arguments, got %d' % len(result))\n return result\n\n def _replace(_self, **kwds):\n 'Return a new Trajectory object replacing specified fields with new values'\n result = _self._make(map(kwds.pop, ('states', 'actions', 'interms', 'rewards'), _self))\n if kwds:\n raise ValueError('Got unexpected field names: %r' % list(kwds))\n return result\n\n def __repr__(self):\n 'Return a nicely formatted representation string'\n return self.__class__.__name__ + '(states=%r, actions=%r, interms=%r, rewards=%r)' % self\n\n def _asdict(self):\n 'Return a new OrderedDict which maps field names to their values.'\n return OrderedDict(zip(self._fields, self))\n\n def __getnewargs__(self):\n 'Return self as a plain tuple. Used by copy and pickle.'\n return tuple(self)\n\n states = _property(_itemgetter(0), doc='Alias for field number 0')\n\n actions = _property(_itemgetter(1), doc='Alias for field number 1')\n\n interms = _property(_itemgetter(2), doc='Alias for field number 2')\n\n rewards = _property(_itemgetter(3), doc='Alias for field number 3')\n\n"¶
-
actions
¶ Alias for field number 1
-
interms
¶ Alias for field number 2
-
rewards
¶ Alias for field number 3
-
states
¶ Alias for field number 0
-
-
class
tfplan.planners.stochastic.simulation.
SimulationCell
(compiler, policy, config=None)¶ Bases:
tensorflow.python.ops.rnn_cell_impl.RNNCell
SimulationCell class implements an RNN cell that simulates the next state and reward for the MDP transition given by the RDDL model.
Parameters: - compiler (rddl2tf.compilers.ReparameterizationCompiler) – The RDDL2TF compiler.
- policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).
- config (Dict[str, Any]) – A config dict.
-
state_size
¶ Returns the MDP state size.
-
action_size
¶ Returns the MDP action size.
-
interm_size
¶ Returns the MDP intermediate state size.
-
output_size
¶ Returns the simulation cell output size.
-
class
tfplan.planners.stochastic.simulation.
Simulator
(compiler, policy, config)¶ Bases:
object
Simulator class implements an RNN-based trajctory simulator for the RDDL model.
Parameters: - compiler (rddl2tf.compilers.DefaulCompiler) – The RDDL2TF compiler.
- policy (tfplan.train.OpenLoopPolicy) – The state-independent policy (e.g., a plan).
- config (Dict[str, Any]) – A config dict.
-
graph
¶ Returns the compiler’s graph.
-
batch_size
¶ Returns the policy’s batch size.
-
horizon
¶ Returns the policy’s batch size.
-
build
()¶ Builds the reparametrized recurrent cell.
-
trajectory
(initial_state, sequence_length=None)¶ Returns the state-action-reward trajectory induced by the given initial_state and policy.
Parameters: - initial_state (Sequence[tf.Tensor]) – The trajectory’s initial state.
- sequence_length (tf.Tensor(shape=(batch_size,))) – An integer vector
- the trajectories' number of timesteps. (defining) –
Returns: The collection of states-actions-interms-rewards trajectory. final_state (Sequence[tf.Tensor]): The trajectory’s final state. total_reward (tf.Tensor(shape=(batch_size,))): The trajectory’s total reward.
Return type: trajectory
-
run
(trajectory)¶ Evaluates the given trajectory.
-
classmethod
timesteps
(batch_size, horizon)¶ Returns the batch-sized increasing-horizon timesteps tensor.
tfplan.planners.stochastic.straightline module¶
-
class
tfplan.planners.stochastic.straightline.
StraightLinePlanner
(rddl, config)¶ Bases:
tfplan.planners.stochastic.StochasticPlanner
StraightLinePlanner class implements the online gradient-based planner that chooses the next action based on the lower bound of the Value function of the start state.
Parameters: - rddl (str) – A RDDL domain/instance filepath or rddlgym id.
- config (Dict[str, Any]) – The planner config dict.
-
_build_policy_ops
()¶
-
_build_trajectory_ops
()¶
-
_build_loss_ops
()¶
-
_build_summary_ops
()¶
-
_abc_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache_version
= 53¶
-
_abc_registry
= <_weakrefset.WeakSet object>¶
tfplan.planners.stochastic.utils module¶
Collection of reparameterization utility functions.
-
tfplan.planners.stochastic.utils.
cell_size
(sizes)¶
-
tfplan.planners.stochastic.utils.
to_tensor
(fluents)¶
-
tfplan.planners.stochastic.utils.
get_noise_samples
(reparameterization_map, batch_size, horizon)¶
-
tfplan.planners.stochastic.utils.
encode_noise_samples_as_inputs
(samples)¶
-
tfplan.planners.stochastic.utils.
decode_inputs_as_noise_samples
(inputs, encoding)¶
-
tfplan.planners.stochastic.utils.
evaluate_noise_samples_as_inputs
(sess, samples)¶
Module contents¶
-
class
tfplan.planners.stochastic.
StochasticPlanner
(rddl, compiler_cls, config)¶ Bases:
tfplan.planners.planner.Planner
StochasticPlanner abstract class implements basic methods for online stochastic gradient-based planners.
Parameters: - rddl (str) – A RDDL domain/instance filepath or rddlgym id.
- compiler_cls (rddl2tf.Compiler) – The RDDL-to-TensorFlow compiler class.
- config (Dict[str, Any]) – The planner config dict.
-
build
()¶ Builds the planner.
-
_build_policy_ops
()¶
-
_build_trajectory_ops
()¶
-
_build_loss_ops
()¶
-
_build_summary_ops
()¶
-
_build_init_ops
()¶
-
_build_initial_state_ops
()¶
-
_build_sequence_length_ops
()¶
-
_build_optimization_ops
()¶
-
_get_batch_initial_state
(state)¶
-
_get_action
(actions, feed_dict)¶
-
horizon
¶
-
epochs
(timestep)¶
-
run
(timestep, feed_dict)¶
-
save_stats
()¶
-
_abc_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache_version
= 53¶
-
_abc_registry
= <_weakrefset.WeakSet object>¶