tfplan.train package¶
Submodules¶
tfplan.train.optimizer module¶
-
class
tfplan.train.optimizer.
ActionOptimizer
(config)¶ Bases:
object
ActionOptimizer wraps a TensorFlow optimizer.
Parameters: config (Dict[str, Any]) – The optimizer config dict. -
build
()¶ Builds the underlying optimizer.
-
compute_gradients
(loss)¶
-
apply_gradients
(grads_and_vars)¶
-
minimize
(loss)¶ Returns the train op corresponding to the loss minimization.
-
tfplan.train.policy module¶
-
class
tfplan.train.policy.
OpenLoopPolicy
(compiler, horizon, parallel_plans=True)¶ Bases:
object
OpenLoopPolicy returns an action independently of the current state.
Note
It uses the current state only for constraining the bounds of each action fluent.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – A RDDL2TensorFlow compiler. - batch_size (int) – The simulation batch size.
- horizon (int) – The number of timesteps.
-
graph
¶ Returns the compiler’s graph.
-
batch_size
¶ Returns the compiler’s batch size.
-
build
(scope, initializers=None)¶ Builds the policy.
-
_build_policy_variables
(initializers=None)¶ Builds the policy variables for each action fluent.
-
_build_warm_start_op
()¶
-
_get_policy_variable
(fluent, fluent_shape, initializer=None)¶ Returns the correspondig policy variable for fluent with fluent_shape.
Parameters: - fluent (str) – The fluent name.
- fluent_shape (Sequence[int]) – The fluent shape.
Returns: The policy variable for the action fluent.
Return type: tf.Tensor
-
static
_get_action_tensor
(policy_variable, bounds)¶ Returns the action tensor for policy_variable with domain constrainted by the action fluent precondition bounds
Parameters: - policy_variable (tf.Tensor) – The policy variable.
- (Tuple[Optional[rddl2t.core.fluent.TensorFluent], (bounds) –
- Optional[rddl2tf.core.fluent.TensorFluent]]) – The (lower, upper) bounds.
Returns: The action fluent tensor.
Return type: tf.Tensor
- compiler (