tfplan.train package

Submodules

tfplan.train.optimizer module

class tfplan.train.optimizer.ActionOptimizer(config)

Bases: object

ActionOptimizer wraps a TensorFlow optimizer.

Parameters:config (Dict[str, Any]) – The optimizer config dict.
build()

Builds the underlying optimizer.

compute_gradients(loss)
apply_gradients(grads_and_vars)
minimize(loss)

Returns the train op corresponding to the loss minimization.

tfplan.train.policy module

class tfplan.train.policy.OpenLoopPolicy(compiler, horizon, parallel_plans=True)

Bases: object

OpenLoopPolicy returns an action independently of the current state.

Note

It uses the current state only for constraining the bounds of each action fluent.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – A RDDL2TensorFlow compiler.
  • batch_size (int) – The simulation batch size.
  • horizon (int) – The number of timesteps.
graph

Returns the compiler’s graph.

batch_size

Returns the compiler’s batch size.

build(scope, initializers=None)

Builds the policy.

_build_policy_variables(initializers=None)

Builds the policy variables for each action fluent.

_build_warm_start_op()
_get_policy_variable(fluent, fluent_shape, initializer=None)

Returns the correspondig policy variable for fluent with fluent_shape.

Parameters:
  • fluent (str) – The fluent name.
  • fluent_shape (Sequence[int]) – The fluent shape.
Returns:

The policy variable for the action fluent.

Return type:

tf.Tensor

static _get_action_tensor(policy_variable, bounds)

Returns the action tensor for policy_variable with domain constrainted by the action fluent precondition bounds

Parameters:
  • policy_variable (tf.Tensor) – The policy variable.
  • (Tuple[Optional[rddl2t.core.fluent.TensorFluent], (bounds) –
  • Optional[rddl2tf.core.fluent.TensorFluent]]) – The (lower, upper) bounds.
Returns:

The action fluent tensor.

Return type:

tf.Tensor

Module contents