The PheonixRT algorithm iteratively improves the plan based on a comparison between the actual and goal DVH curves for all terms in the prescription. The comparison serves as the objective function for the numerical optimization algorithm. The comparison metric is the Kullback-Liebler Divergence (also known as relative entropy), which compares two probability distributions and determines how similar they are.
In this case, the probability distributions are the DVH curves in differential form. For a clinical user, the cumulative form of the DVH may be the most familiar for evaluating the dose distribution and for specifying the goal DVH curve to be matched. However, there are advantages to using the DVH in the differential form, at least for the internal objective function calculation.
EditMatching Dose PDF Using Information Theory
The differential DVH, when normalized, is essentially the dose probability density function (PDF). The optimization objective function, when formulated as a dose PDF match, can be built from a number of tools in the information theory toolbox. The Pheonix algorithm uses the Kullback-Liebler (K-L) divergence to compare the actual PDF to the goal PDF

Figure 2
The K-L divergence is zero if both PDFs match exactly, and becomes larger as they become less similar. Using the K-L divergence within the context of optimization confers the advantage of “focusing” the optimization on the parts of the PDF for which changes can be the most profitable. In Figure 2, the PDF region with the highest K-L divergence term is the region where the actual PDF has significant density that differs substantially from the goal PDF. This is the part of the actual PDF where the optimization should concentrate in order to best meet the goal.
The gradient of the K-L divergence can be readily derived. This allows for the use of gradient-based optimization techniques, which can provide significant performance benefits for high-dimensional optimization problems. The gradient is a function, via the chain rule, of the gradient of the actual PDF. Therefore an efficient means is needed to calculate the gradient of the PDF w.r.t. the optimization parameters.
A number of methods are available to estimate the actual PDF. The simplest is to compute the histogram and then normalize the resulting curve. This is computationally efficient, but provides little help with calculating the gradient. A more sophisticated method for estimating a PDF is Parzen windowing (Duda and Hart 1973). Given a set of samples, a weighting kernel (typically a gaussian) is used to represent the probability distribution of each individual sample. The estimated PDF is formed by summing the weighting kernels for all of the sample values. The Parzen window provides an efficient means of calculating the gradient of the PDF, by applying the chain rule a second time with the derivative of the weighting kernel.
EditAdaptive Parzen Window Optimization
When using a Parzen window to estimate a PDF, the accuracy of the resulting estimate depends on how well the weighting kernel captures the sources of uncertainty for the individual samples. For example, if the PDF of an image is being estimated, sources of uncertainty could include noise in the image formation process, or uncertainty introduced by processes that have operated on the image.
In the case of IMRT inverse planning, the “image” is the spatial dose distribution, and the main process that operates on it is the optimization itself. The key concept is that numerical optimization can be viewed as a type of estimation process. The value of the optimization parameters at each iteration are progressively better estimates of the global optimal parameter values. These estimates have associated uncertainty. The uncertainty is large at the beginning of the optimization, and decreases to a minimum as the estimates approach the global optimum.

Figure 3
Figure 3 is a simple illustration of the concept for a hypothetical four-iteration optimization. The objective function “landscape” is shown as light gray isocurves. For each iteration the current parameter value is indicated by the corresponding numeral. Each of the successive parameter values, as an estimate of the global optimum, has an associated uncertainty (shown as a broken error radius around the parameter value). As the optimization proceeds, the error radius shrinks.
To model this change in uncertainty using Parzen windowing, the weighting kernel width is chosen to be large at the beginning of the optimization, and is reduced during the course of the optimization to a minimum at the time of convergence.
A potential problem arises from this strategy. Modifying the objective function during the course of optimization can adversely affect the directional conjugacy of those directions that have already been optimized. To address this issue, the Pheonix algorithm combines the self-correcting properties of some formulations of conjugate gradient optimization (i.e. the Polak-Ribiere formula) with a special update that occurs between each iteration.
In addition, a multi-resolution optimization strategy is employed, with the weighting kernel width updated along with spatial resolution at each level of the optimization. The optimization schedule used for preliminary results is provided in Table 1.
Table 1. Example multi-resolution optimization schedule| Scale | Resolution (mm) | σ (Gy) |
| G3 | 16 | 20 |
| G2 | 8 | 10 |
| G1 | 4 | 5 |
| G0 | 2 | 2.5 |
EditSummary
Coupling progressive increases in resolution with adaptation of the weighting kernel width is an effective optimization strategy because it matches up the optimization’s “focus” on the dose axis (as depicted in Figure 2) with the spatial focus of the current level’s resolution. At the beginning of the optimization, the wide weighting kernel spreads the estimated actual PDF, expanding the focus over a broader range of dose values. Concurrently, the low spatial resolution focuses the optimization on satisfying the large-scale geometry of the targets and OARs. Together, these two factors expedite the early stages of the optimization, when dose values tend to be adjusted by significant amounts over large regions.
Later in the optimization, the higher spatial resolution focuses the optimization on smaller-scale spatial features, while the narrower weighting kernel facilitates finer adjustments to the actual PDF. This adaptation of the focus of the optimization, both spatially and on the dose axis, should help reduce the influence of local minima on the optimization, resulting in a high-performance, steerable algorithm.
This approach to probabilistically-motivated optimization problems, adaptive Parzen window optimization, can be readily applied to other optimization problems where the objective function is expressed in terms of PDFs. These include mutual information image registration, probabilistic approaches to segmentation, and inverse planning using probabilistic biological indices.