Algorithm AS 296: Optimal Median Smoothing

W. Hardle, W. Steiger · 1995

The paper studies running medians for a window of odd size $K=2k+1$, replacing

$$ X_i \mapsto \operatorname{median}(X_{i-k},\ldots,X_{i+k}). $$

Because the median is robust, this smoothing reveals structure in noisy series better than running averages. The authors present an efficient algorithm based on a two-heap “partitioning heaps” data structure and argue optimality up to constants.

They first contrast two baselines.
(1) Recompute each window’s median via a fast selection algorithm: at most $3K$ comparisons per window, totaling $3K(N-K)$.
(2) Maintain the window in sorted order: deletion and insertion by binary search take $O(\log K)$ comparisons but pointer maintenance dominates, yielding about $(2\log K + cK/2)(N-K)$ steps for some constant $c$.

The proposed method stores the current window in an array

$$ H_{-k},\ldots,H_{-1},H_0,H_1,\ldots,H_k $$

with $H_0=X_i$. The negative indices form a max-heap of elements $\leq X_i$; the positive indices form a min-heap of elements $\geq X_i$. No order is imposed within levels; overall depth is at most $2(1+\log k)$. Two permutation arrays map between window positions and heap indices.

To advance the window, the outgoing $X_{i-k}$ is removed by propagating a “hole” upward through parent indices $j/2$ to the heap apex; the incoming $X_{i+k+1}$ is placed on the appropriate side (decided by comparison with $X_i$) and trickled down to restore the heap property. Each update needs at most $2\log k$ comparisons and $4\log k$ pointer updates. Overall complexity is bounded by

$$ 4K + (3+4d)\log\!\left(\tfrac{K}{2}\right)(N-K), $$

with $d$ the relative cost of a division in pointer arithmetic. Empirically the average cost per median is proportional to $6\log k$. Round-off is irrelevant because the algorithm performs comparisons and integer pointer operations only.

For optimality, they note a trivial lower bound of $2(N-K)$ total steps via segmentation and the result that finding a single median requires $2K$ comparisons, and cite a lower bound showing any correct algorithm needs $\Omega((\log K)(N-K))$ steps; thus their method is optimal up to constant factors. They remark it is hard to imagine fewer than $2\log(K/2)$ steps per update in a realizable algorithm.

Experiments smooth $N=16000$ i.i.d. $\mathrm{Unif}(0,1)$ sequences (Atari RNG), with
$K \in \{7,15,\ldots,4095\}$, repeated 10 times per $K$. Compared with a simplified insertion method that scans $O(K)$, the new algorithm is about ten times faster on these tests; performance degrades for initially sorted sequences because the new element must traverse full heap depth. For random data, runtime grows with $\log K$, whereas the insertion method grows linearly in $K$.

The paper also includes a program structure and interface for a procedure runmed and lists the required arrays and types.

Takeaway. Running medians can be maintained in $O(\log K)$ work per shift using partitioning heaps; this achieves the asymptotically optimal update cost (up to constants) while preserving the robustness benefits of median smoothing.

Reference

W. Hardle, W. Steiger “Algorithm AS 296: Optimal Median Smoothing” (1963) DOI: 10.2307/2986349. Journal of the Royal Statistical Society. Series C (Applied Statistics) 44, no. 2 (1995): 258-264.

@article{hardle1995,
  title = {Algorithm AS 296: Optimal Median Smoothing},
  volume = {44},
  ISSN = {0035-9254},
  url = {https://www.jstor.org/stable/2986349},
  DOI = {10.2307/2986349},
  number = {2},
  journal = {Applied Statistics},
  publisher = {JSTOR},
  author = {Hardle, W. and Steiger, W.},
  year = {1995},
  pages = {258}
}