# Generalized Planning as Heuristic Search

Heuristic search is one of the most successful approaches to classical planning. Unfortunately, it is not straightforward to adopt state-of-the-art search algorithms and heuristics from classical planning to Generalized Planning (GP). The planning as heuristic search approach traditionally addresses the computation of sequential plans with a grounded state-space search. Generalized planners reason however about the synthesis of algorithm-like solutions that, in addition to action sequences, contain branching and looping constructs. Furthermore, GP aims to synthesize solutions that generalize to a (possibly infinite) set of planning instances (where the domain of variables may be large), making the grounding of classical planners unfeasible.

## Classical Planning

In this work, we define a classical planning instance as a tuple $P = \langle X, A, I, G\rangle$, where $X$ is a set of state variables and each $x\in X$ has a domain $D_x$ , $A$ is a set of actions parameterized by a subset of state variables, $I \in S$ is an initial state where all variables have an assigned value, and $G$ is a goal condition on the state variables that induces the subset of goal states. A solution to $P$ is a plan $\pi = \langle a_1, \ldots, a_m \rangle$ of $m$ actions that applied in the initial state $s_0=I$, induces a trajectory of states and actions $s_0, a_1, \ldots, a_m, s_m $ such that the goal condition holds in the last state, i.e., $G\subseteq s_m$.

## Generalized Planning and Planning Programs

Generalized Planning aims at computing algorithm-like solutions, named generalized plans, which in addition to action sequences contain branching and looping constructs, that apply to a possibly infinite class of planning instances [6]. This work builds on top of the inductive formalism of GP, where a GP problem with shared state variables is defined as a finite set of $T$ classical planning instances, i.e., ${\cal P} = { P_1, \ldots, P_T }$ where each classical planning instance, i.e. $P_1 = \langle X,A,I_1,G_1 \rangle$, $\ldots$, $P_T = \langle X,A,I_T,G_T \rangle$, shares the set of state variables $X$ and actions $A$, but may differ in the initial state $I_i$ or goal condition $G_i$.

We represent GP solutions as *planning programs* [5],
which are defined as a sequence of instructions, i.e.
$\Pi = \langle w_0,\ldots,w_{n-1} \rangle$, where each instruction
is associated to a *programming line* $0\leq i < n$ and is either:

- A
*planning action*$w_i \in A$ - A
*goto instruction*$w_i = go(i’,!y)$, such that $0\leq i’ < i$ or $i+1 < i’ < n$ and $y$ is a proposition. - A
*termination instruction*$w_i = end$

The execution of a planning program requires a *program state* $(s,i)$,
to indicate the current classical planning state $s$ and program line $i$.
If the instruction $w_i$ is a classical planning action, the effects of the
action are applied in the state and the program line advances to the next instruction,
i.e. $(s\oplus w_i, i+1)$. If the instruction is a goto, the new program state
is $(s,i+1)$ if $y$ holds, otherwise it is $(s,i’)$. If it is a terminal instruction,
the execution terminates.

## A Tractable Representation of Planning Programs

Planning programs bounded in the number of lines $n$, where the last line is always a termination instruction $w_{n-1}=end$, can be compactly represented with three bit-vectors:

- The
*action vector*of length $(n-1)\times|A|$, indicating whether action $a$ appears on line $0\leq i < n-1$. - The
*transition vector*of length $(n-1)\times (n-2)$, indicating whether $go(i’,*)$ appears on line $0\leq i < n-1$. - The
*proposition vector*of length $(n-1)\times \sum_{x\in X} |D_x|$, indicating whether $go(*,!\langle x=v \rangle)$ appears on line $0\leq i < n-1$.

A planning program is encoded as the concatenation of these three bit-vectors. The length of the resulting vector is:

\[(n-1) \left( |A|+ (n-2) + \sum_{x\in X}|D_x| \right).\]Note that the proposition vector is *intractable* when $D_x$ is very large
or even infinite, which is the case for integer variables. In this regard,
we propose *GP with pointers*, where state variables are extended with a set
$Z$ of pointers, each $z\in Z$ with domain $[0,|X|)$, the zero flag $ZF$ and
the carry flag $CF$, to capture the outcome of the last executed instruction.
The set of propositions or feature language is then reduced to the four value
assignment combinations of the $ZF$ and the $CF$.

Thus, in the new representation, the action vector length
is $(n-1)\times|A_Z’|$, where $A_Z’$ is the set $A$ parameterized
over pointers, and the proposition vector length is $4$, which is
the size of the feature language.
This more succinct representation of planning programs results in a vector
which now is *tractable* in the space of pointers:

The following is an example of a generalized plan for solving any problem for reversing a list of numbers:

```
0. set(j,tail)
1. swap(*i,*j)
2. dec(j)
3. inc(i)
4. cmp(j,i)
5. goto(1, not (not ZF and not CF))
6. end
```

The set of pointers are $i$, $j$, $tail$, the zero flag is $ZF$ and the
carry flag is $CF$. Initially, the pointer $j$ is set to the last state variable,
then the content of $i$ and $j$ pointers is swapped, $j$ decreseases by one and
$i$ increases by one, and process is repeated from the *swap instruction* while $j>i$, which
is captured in the $ZF$ and $CF$ after comparing $j$ and $i$.

## Search Algorithm

We implement a Best First Search (BFS) in the space of planning programs with $n$ program lines, and $|Z|$ pointers. The BFS is implemented as a frontier search algorithm, meaning that, to reduce memory requirements, we only store the open list of generated nodes, but not the closed list of expanded nodes. The BFS starts with an empty planning program. To generate a tractable set of successor nodes, child nodes in the search tree are restricted to planning programs that result from programming the maximum line reached after executing the current program on the classical planning instances given as input. This procedure for successor generation guarantees that duplicate successors are not generated.

## Heuristics and Evaluation Functions

Our BFS sequentially expands the best node in the open list. The open list is sorted by evaluation/heuristic functions that exploit two different sources of information:

- The
*program structure*. These are evaluation functions computed in linear time in the size of program $\Pi$.- $f_1(\Pi)$, the number of goto instructions in $\Pi$.
- $f_2(\Pi)$, number of undefined program lines in $\Pi$.
- $f_3(\Pi)$, the number of repeated instructions in $\Pi$.

- The
*program performance*. These functions assess the performance of $\Pi$ executing it on each of the classical planning instances $P_t\in\mathcal{P}$, $1\leq t \leq T$; the execution of a planning program on a classical planning instance is a deterministic procedure that requires no variable instantiation. If the execution of $\Pi$ on an instance $P_t\in \mathcal{P}$ fails, this means that the search node corresponding to the planning program $\Pi$ is a dead-end, and hence it is not added to the open list:- $h_4(\Pi,\mathcal{P})=n-PC^{MAX}$, where $PC^{MAX}$ is the maximum program line that is eventually reached after executing $\Pi$ on all the instances in $\mathcal{P}$.
- $h_5(\Pi,\mathcal{P})=\sum_{P_t\in\mathcal{P}}\sum_{v\in G_t} (s_t[v]-G_t[v])^2$. This function accumulates, for each instance $P_t\in\mathcal{P}$, the
*euclidean distance*of state $s_t$ to the goal state variables $G_t$. The state $s_t$ is obtained applying the sequence of actions $exec(\Pi,P_t)$ to the initial state $I_t$ of that problem $P_t\in\mathcal{P}$. Computing $h_5(\Pi,P_t)$ requires that goals are specified as a partial state. Note that for Boolean variables the squared difference becomes a simple goal counter. - $f_6(\Pi,\mathcal{P}) = \sum_{P_t\in{\mathcal{P}}} |exec(\Pi,P_t)|$, where $exec(\Pi,P_t)$ is the sequence of actions induced from executing the planning program $\Pi$ on the planning instance $P_t$.

Experimentally we obtained the best performance by guiding the BFS with the cost-to-go heuristic function $h_5$ and breaking ties with the structural evaluation function $f_1$.

## Wrap up

This work [1,2] is the first native heuristic search approach for GP that leverages solutions represented as planning programs. We believe that the GP as heuristic search approach builds a strong connection between the two closely related areas of automated planning and program synthesis. With this regard, GP problems and solutions can be represented with a high-level programming language, like C++ [4]. The C++ representation has shown effective to validate GP solutions in large classical planning instances, with several thousands of objects, where off-the-shelf classical planners get stuck in the pre-processing.

A wide landscape of effective techniques, coming from heuristic search and classical planning, promise to improve the base performance of our GP as heuristic search approach. For instance, better estimates may be obtained by exploiting sub-goal information that is not present in the problem representation [3]. What is more, our GP as heuristic search approach provides a path to transfer classical planning techniques, e.g. like the landmark counting heuristics, originally conceived for state-space, to the solution-space search of GP. A promising future research direction is to incorporate into the GP as heuristic search approach all these techniques that have proved successful for classical planning.

## References

[1] Javier Segovia-Aguas, Sergio Jiménez, Anders Jonsson. 2021. Generalized Planning as Heuristic Search. *International Conference on Automated Planning and Scheduling (ICAPS)*: 569-577.

[2] Javier Segovia-Aguas, Sergio Jiménez Celorrio, Anders Jonsson. 2022. Computing Programs for Generalized Planning as Heuristic Search (Extended Abstract). *Thirty-First International Joint Conference on Artificial Intelligence (IJCAI) - Sister Conferences Best Papers Track*: 5334-5338.

[3] Javier Segovia-Aguas, Sergio Jiménez Celorrio, Laura Sebastiá, Anders Jonsson. 2022. Scaling-Up Generalized Planning as Heuristic Search with Landmarks, *Fifteenth International Symposium on Combinatorial Search (SoCS)*: 171-179.

[4] Javier Segovia-Aguas, Yolanda E-Martín, Sergio Jiménez. 2022. Representation and Synthesis of C++ Programs for Generalized Planning. *Sixth Workshop on Generalization in Planning at IJCAI-ECAI (GenPlan)*.

[5] Javier Segovia-Aguas, Sergio Jiménez Celorrio, Anders Jonsson. Computing programs for generalized planning using a classical planner, *Artificial Intelligence Journal (AIJ)* 272: 52-85.

[6] Sergio Jiménez Celorrio, Javier Segovia-Aguas, Anders Jonsson. 2019. A Review of Generalized Planning, *The Knowledge Engineering Review (KER)* 34: e5.