Running Median Algorithm and Implementation for Integer Streaming Applications

The paper proposes a novel running median algorithm for integer streams that computes the median of a sliding window of size $m$ in $O(\log\log m)$ time per update. The key idea is that the new median can be deduced from the previous median together with the element removed and the element inserted, using a simple decision rule (Theorem 1). This reduces the problem to predecessor/successor queries, which can be answered efficiently with suitable integer data structures.

The authors begin with the standard formulation: a window $W$ of $m=2k+1$ integers, possibly with repetitions, and the running median as the sequence

$$ y[j] = \operatorname{median}(x[j],\ldots,x[j+m-1]). $$

Theorem 1 shows that after one deletion and one insertion, the new median $M'$ is determined by:

  • if (old $\geq M$ and new $< M$): $M' = \max(\text{pred}(M),\; \text{new})$
  • if (old $\leq M$ and new $> M$): $M' = \min(\text{succ}(M),\; \text{new})$
  • otherwise: $M' = M$.

Thus the update takes constant time in principle, provided predecessor and successor can be found in $O(1)$. In practice, maintaining a sorted window after each update leads to $O(\log\log m)$ complexity using structures such as van Emde Boas trees or array-based variants. The algorithm generalizes to any rank statistic, not only the median.

The procedure Running_Median(x,k) first initializes the window of $2k+1$ elements and computes its median $M$. Then, for each new element, the outgoing and incoming values are identified, the window updated, and the new median $M'$ computed via Theorem 1. This yields an overall time complexity $O(n\log\log m)$ for a sequence of length $n$.

For implementation, the paper describes bit-vector methods to represent windows. Each integer value corresponds to a bin of unary digits, allowing increments/decrements to handle multiplicities. A van Emde Boas tree or an array-based tree maintains cluster summaries, achieving the required predecessor/successor operations in $O(\log\log m)$.

Experiments compare the new method with skip lists (which achieve $O(\log m)$ per update). Tests were run on multiple platforms (Raspberry Pi, Nvidia embedded kit, AWS, x86-64 laptop) with real and synthetic data streams (radio signals, accelerometer data, speech samples). Results show consistent speedups, often exceeding a factor of six on embedded processors, and up to an order of magnitude in some cases. The array-based variant outperforms the full van Emde Boas implementation.

Takeaway. The median of a sliding window of integers can be updated in $O(\log\log m)$ time using a decision rule based on the previous median and efficient predecessor/successor queries. This leads to substantial performance gains for streaming applications, especially on constrained embedded processors.

Reference

Cadenas, Oswaldo, and Graham M. Megson. 2019. “Running Median Algorithm and Implementation for Integer Streaming Applications.” IEEE Embedded Systems Letters 11, no. 2. DOI: 10.1109/les.2018.2868409. http://dx.doi.org/10.1109/LES.2018.2868409.

@article{cadenas2019,
  title = {Running Median Algorithm and Implementation for Integer Streaming Applications},
  volume = {11},
  ISSN = {1943-0671},
  url = {http://dx.doi.org/10.1109/LES.2018.2868409},
  DOI = {10.1109/les.2018.2868409},
  number = {2},
  journal = {IEEE Embedded Systems Letters},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  author = {Cadenas, Oswaldo and Megson, Graham M.},
  year = {2019},
  month = {jun},
  pages = {58–61}
}