Algorithm AS 296: Optimal Median Smoothing
The paper studies running medians for a window of odd size $K=2k+1$, replacing
$$ X_i \mapsto \operatorname{median}(X_{i-k},\ldots,X_{i+k}). $$Because the median is robust, this smoothing reveals structure in noisy series better than running averages. The authors present an efficient algorithm based on a two-heap “partitioning heaps” data structure and argue optimality up to constants.
They first contrast two baselines.
(1) Recompute each window’s median via a fast selection algorithm: at most $3K$ comparisons per window, totaling $3K(N-K)$.
(2) Maintain the window in sorted order: deletion and insertion by binary search take $O(\log K)$ comparisons but pointer maintenance dominates, yielding about $(2\log K + cK/2)(N-K)$ steps for some constant $c$.
The proposed method stores the current window in an array
$$ H_{-k},\ldots,H_{-1},H_0,H_1,\ldots,H_k $$with $H_0=X_i$. The negative indices form a max-heap of elements $\leq X_i$; the positive indices form a min-heap of elements $\geq X_i$. No order is imposed within levels; overall depth is at most $2(1+\log k)$. Two permutation arrays map between window positions and heap indices.
To advance the window, the outgoing $X_{i-k}$ is removed by propagating a “hole” upward through parent indices $j/2$ to the heap apex; the incoming $X_{i+k+1}$ is placed on the appropriate side (decided by comparison with $X_i$) and trickled down to restore the heap property. Each update needs at most $2\log k$ comparisons and $4\log k$ pointer updates. Overall complexity is bounded by
$$ 4K + (3+4d)\log\!\left(\tfrac{K}{2}\right)(N-K), $$with $d$ the relative cost of a division in pointer arithmetic. Empirically the average cost per median is proportional to $6\log k$. Round-off is irrelevant because the algorithm performs comparisons and integer pointer operations only.
For optimality, they note a trivial lower bound of $2(N-K)$ total steps via segmentation and the result that finding a single median requires $2K$ comparisons, and cite a lower bound showing any correct algorithm needs $\Omega((\log K)(N-K))$ steps; thus their method is optimal up to constant factors. They remark it is hard to imagine fewer than $2\log(K/2)$ steps per update in a realizable algorithm.
Experiments smooth $N=16000$ i.i.d. $\mathrm{Unif}(0,1)$ sequences (Atari RNG), with
$K \in \{7,15,\ldots,4095\}$, repeated 10 times per $K$. Compared with a simplified insertion method that scans $O(K)$, the new algorithm is about ten times faster on these tests; performance degrades for initially sorted sequences because the new element must traverse full heap depth. For random data, runtime grows with $\log K$, whereas the insertion method grows linearly in $K$.
The paper also includes a program structure and interface for a procedure runmed and lists the required arrays and types.
Takeaway. Running medians can be maintained in $O(\log K)$ work per shift using partitioning heaps; this achieves the asymptotically optimal update cost (up to constants) while preserving the robustness benefits of median smoothing.
Reference
W. Hardle, W. Steiger “Algorithm AS 296: Optimal Median Smoothing” (1963) DOI: 10.2307/2986349. Journal of the Royal Statistical Society. Series C (Applied Statistics) 44, no. 2 (1995): 258-264.
@article{hardle1995,
title = {Algorithm AS 296: Optimal Median Smoothing},
volume = {44},
ISSN = {0035-9254},
url = {https://www.jstor.org/stable/2986349},
DOI = {10.2307/2986349},
number = {2},
journal = {Applied Statistics},
publisher = {JSTOR},
author = {Hardle, W. and Steiger, W.},
year = {1995},
pages = {258}
}