tfplan.train package¶

Submodules¶

class tfplan.train.optimizer.ActionOptimizer(config)¶

Bases: object

ActionOptimizer wraps a TensorFlow optimizer.

Parameters:	config (Dict[str, Any]) – The optimizer config dict.

minimize(loss)¶: Returns the train op corresponding to the loss minimization.

class tfplan.train.policy.OpenLoopPolicy(compiler, horizon, parallel_plans=True)¶

Bases: object

OpenLoopPolicy returns an action independently of the current state.

Note

It uses the current state only for constraining the bounds of each action fluent.

Parameters:	compiler (`rddl2tf.compiler.Compiler`) – A RDDL2TensorFlow compiler. batch_size (int) – The simulation batch size. horizon (int) – The number of timesteps.

_build_policy_variables(initializers=None)¶: Builds the policy variables for each action fluent.

_get_policy_variable(fluent, fluent_shape, initializer=None)¶

Returns the correspondig policy variable for fluent with fluent_shape.

Parameters:	fluent (str) – The fluent name. fluent_shape (Sequence[int]) – The fluent shape.
Returns:	The policy variable for the action fluent.
Return type:	tf.Tensor

static _get_action_tensor(policy_variable, bounds)¶

Returns the action tensor for policy_variable with domain constrainted by the action fluent precondition bounds

Parameters:	policy_variable (tf.Tensor) – The policy variable. (Tuple[Optional[rddl2t.core.fluent.TensorFluent], (bounds) – Optional[rddl2tf.core.fluent.TensorFluent]]) – The (lower, upper) bounds.
Returns:	The action fluent tensor.
Return type:	tf.Tensor