Fast Computation of the Hodges–Lehmann Estimator with the Monahan Algorithm

Andrey Akinshin · 2025-09-23

The Hodges–Lehmann location estimator ( Estimates of Location Based on Rank Tests
By J L Hodges, E L Lehmann · 1963 hodges1963) is a robust central tendency statistic. Given a sample $\mathbf{x} = (x_1, x_2, \ldots, x_n)$, it is defined as the median of all pairwise averages:

$$ \operatorname{HL} = \operatorname{median}\left\{ \tfrac{x_i + x_j}{2} : 1 \leq i \leq j \leq n \right\}. $$

The estimator resists outliers and achieves high asymptotic efficiency (about 95% of the sample mean under Gaussian data). Its computational requirements previously limited practical use.

A direct approach requires generating all $n(n+1)/2$ pairwise sums and sorting them. This needs quadratic storage and $O(n^2 \log n)$ time. For $n=10{,}000$, the algorithm must handle approximately 50 million elements — impractical for most applications.

In 1984, John Monahan created a new algorithm that reduced the expected complexity to $O(n \log n)$ ( Algorithm 616: fast computation of the Hodges-Lehmann location estimator
By John F Monahan · 1984-08-28 monahan1984). This made the Hodges–Lehmann estimator computable for large samples.

The Problem Structure

Monahan’s approach uses the observation that the set of pairwise sums can be arranged as an implicit upper triangular matrix:

Sort the input values: $x_1 \leq x_2 \leq \cdots \leq x_n$.
Define a matrix $M$ where $M_{i,j} = x_i + x_j$ for $i \leq j$.
Each row is sorted (non-decreasing).
Each column is sorted (non-decreasing).
The matrix is symmetric: $M_{i,j} = M_{j,i}$.

The entire collection of pairwise sums is structured, even if the algorithm never explicitly forms it.

The Hodges–Lehmann estimator equals the median element of this matrix, scaled by 1/2. For an odd number of sums, the estimator equals the middle element. For an even number, the estimator equals the average of the two middle elements.

Selection Instead of Sorting

Rather than sorting all pairwise sums, Monahan’s algorithm uses a selection strategy (like quickselect) to find the median:

Maintain bounds per row.
For each row $i$, track a left index $L_i$ and right index $R_i$. Initially, $L_i = i\!+\!1$, $R_i = n$. This defines which entries in row $i$ are still “active.”
Choose a pivot sum.
At each step, select a candidate sum $a_m$ as a pivot. Monahan first experimented with weighted medians of row medians; later versions (HLQIST/HLQEST) use randomized selection for speed.
Partition the active set.
Count how many sums are less than $a_m$. This takes linear time: the algorithm walks from the upper-right corner of the matrix downward, moving left until sums drop below the pivot. Because rows are sorted, the entire partitioning across all rows costs only $O(n)$.
Compare to target rank.
The median of the pairwise sums corresponds to a target rank $k = \lfloor (N+1)/2 \rfloor$, where $N = n(n+1)/2$.
- If fewer than $k$ sums lie below $a_m$, the median must be above $a_m$.
- If more than $k$, the median must be below.
- If exactly at $k$, $a_m$ is a candidate solution.
Narrow the search space.
Based on the comparison, the algorithm updates row bounds $(L_i, R_i)$ to shrink the active set. The algorithm discards large portions of the implicit matrix without ever building them.

This process continues until the active set is small enough to solve.

Handling Ties and Stalling

Real data contains ties: many sums can equal the pivot. Monahan recognized two risks:

No progress. If the count of sums below the pivot is the same as the previous step, the search can stall.
Flat regions. The Hodges–Lehmann estimator is the midrange of all possible roots, so multiple consecutive values may be correct.

To handle this, Monahan introduced midrange tie-breaking:

If stalling occurs, the algorithm computes the smallest and largest active sums.
The algorithm replaces the pivot with their average.
If min = max, the algorithm is complete: all remaining sums are identical.

This allows the algorithm to converge with discrete or tied data.

Randomized Pivot Strategy

The weighted-median strategy required more computation. Following Hoare’s FIND, Monahan used:

Pick a pivot at random from the active set, or
Pick a random active row, then its median element.

This avoids weighted computations and provides progress with high probability.

The expected number of iterations is $O(\log n)$. Combined with the $O(n)$ partitioning cost, the overall runtime is $O(n \log n)$.

Even and Odd Cases

When $n(n+1)/2$ is odd, the median is a single element. When even, the estimator equals the average of the two middle elements. Monahan’s algorithm therefore always computes two consecutive order statistics in the implicit matrix. This handles the definition of the Hodges–Lehmann estimator and makes the algorithm extendable to confidence intervals (which require quantiles other than the median).

Complexity

Initial sorting: $O(n \log n)$.
Each partitioning step: $O(n)$.
Expected number of steps: $O(\log n)$.
Total complexity: $O(n \log n)$.

This matches the complexity of sorting a single array while avoiding the quadratic blow-up of creating all pairwise sums.

Monahan reported that his final version (HLQEST) outperformed both direct sorting and earlier iterative Wilcoxon-based methods ( Algorithm 616: fast computation of the Hodges-Lehmann location estimator
By John F Monahan · 1984-08-28 monahan1984).

Practical Notes

Robustness: The method is exact, not approximate. No randomization affects correctness — only runtime.
Ties: Midrange pivoting provides stability.
Extensions: The same approach can compute other quantiles of pairwise sums, useful for confidence intervals and hypothesis tests.

Conclusion

Monahan’s algorithm made the Hodges–Lehmann estimator computationally practical. The algorithm uses structure in the data — the monotone implicit matrix of pairwise sums — to reduce complexity from quadratic to log-linear.

The result is a robust estimator that is computationally practical on large datasets.

Reference Implementation

private double HodgesLehmannMonahan(IReadOnlyList<double> values)
{
  int n = values.Count;
  if (n == 1) return values[0];
  if (n == 2) return (values[0] + values[1]) / 2;

  // Calculate target median rank(s) among all pairwise sums
  long totalPairs = (long)n * (n + 1) / 2;
  long medianRankLow = (totalPairs + 1) / 2;     // For odd totalPairs, this is the median
  long medianRankHigh = (totalPairs + 2) / 2;    // For even totalPairs, average of ranks medianRankLow and medianRankHigh

  // Initialize search bounds for each row in the implicit matrix
  long[] leftBounds = new long[n];
  long[] rightBounds = new long[n];
  long[] partitionCounts = new long[n];

  for (int i = 0; i < n; i++)
  {
    leftBounds[i] = i + 1;  // Row i can pair with columns [i+1..n] (1-based indexing)
    rightBounds[i] = n;     // Initially, all columns are available
  }

  // Start with a good pivot: sum of middle elements (handles both odd and even n)
  double pivot = values[(n - 1) / 2] + values[n / 2];
  long activeSetSize = totalPairs;
  long previousCount = 0;

  while (true)
  {
    // === PARTITION STEP ===
    // Count pairwise sums less than current pivot
    long countBelowPivot = 0;
    long currentColumn = n;

    for (int row = 1; row <= n; row++)
    {
      partitionCounts[row - 1] = 0;

      // Move left from current column until we find sums < pivot
      // This exploits the sorted nature of the matrix
      while (currentColumn >= row && values[row - 1] + values[(int)currentColumn - 1] >= pivot)
        currentColumn--;

      // Count elements in this row that are < pivot
      if (currentColumn >= row)
      {
        long elementsBelow = currentColumn - row + 1;
        partitionCounts[row - 1] = elementsBelow;
        countBelowPivot += elementsBelow;
      }
    }

    // === CONVERGENCE CHECK ===
    // If no progress, we have ties - break them using midrange strategy
    if (countBelowPivot == previousCount)
    {
      double minActiveSum = double.MaxValue;
      double maxActiveSum = double.MinValue;

      // Find the range of sums still in the active search space
      for (int i = 0; i < n; i++)
      {
        if (leftBounds[i] > rightBounds[i]) continue; // Skip empty rows

        double rowValue = values[i];
        double smallestInRow = values[(int)leftBounds[i] - 1] + rowValue;
        double largestInRow = values[(int)rightBounds[i] - 1] + rowValue;

        minActiveSum = Min(minActiveSum, smallestInRow);
        maxActiveSum = Max(maxActiveSum, largestInRow);
      }

      pivot = (minActiveSum + maxActiveSum) / 2;
      if (pivot <= minActiveSum || pivot > maxActiveSum) pivot = maxActiveSum;

      // If all remaining values are identical, we're done
      if (minActiveSum == maxActiveSum || activeSetSize <= 2)
        return pivot / 2;

      continue;
    }

    // === TARGET CHECK ===
    // Check if we've found the median rank(s)
    bool atTargetRank = countBelowPivot == medianRankLow || countBelowPivot == medianRankHigh - 1;
    if (atTargetRank)
    {
      // Find the boundary values: largest < pivot and smallest >= pivot
      double largestBelowPivot = double.MinValue;
      double smallestAtOrAbovePivot = double.MaxValue;

      for (int i = 1; i <= n; i++)
      {
        long countInRow = partitionCounts[i - 1];
        double rowValue = values[i - 1];
        long totalInRow = n - i + 1;

        // Find largest sum in this row that's < pivot
        if (countInRow > 0)
        {
          long lastBelowIndex = i + countInRow - 1;
          double lastBelowValue = rowValue + values[(int)lastBelowIndex - 1];
          largestBelowPivot = Max(largestBelowPivot, lastBelowValue);
        }

        // Find smallest sum in this row that's >= pivot
        if (countInRow < totalInRow)
        {
          long firstAtOrAboveIndex = i + countInRow;
          double firstAtOrAboveValue = rowValue + values[(int)firstAtOrAboveIndex - 1];
          smallestAtOrAbovePivot = Min(smallestAtOrAbovePivot, firstAtOrAboveValue);
        }
      }

      // Calculate final result based on whether we have odd or even number of pairs
      if (medianRankLow < medianRankHigh)
      {
        // Even total: average the two middle values
        return (smallestAtOrAbovePivot + largestBelowPivot) / 4;
      }
      else
      {
        // Odd total: return the single middle value
        bool needLargest = countBelowPivot == medianRankLow;
        return (needLargest ? largestBelowPivot : smallestAtOrAbovePivot) / 2;
      }
    }

    // === UPDATE BOUNDS ===
    // Narrow the search space based on partition result
    if (countBelowPivot < medianRankLow)
    {
      // Too few values below pivot - eliminate smaller values, search higher
      for (int i = 0; i < n; i++)
        leftBounds[i] = i + partitionCounts[i] + 1;
    }
    else
    {
      // Too many values below pivot - eliminate larger values, search lower
      for (int i = 0; i < n; i++)
        rightBounds[i] = i + partitionCounts[i];
    }

    // === PREPARE NEXT ITERATION ===
    previousCount = countBelowPivot;

    // Recalculate how many elements remain in the active search space
    activeSetSize = 0;
    for (int i = 0; i < n; i++)
    {
      long rowSize = rightBounds[i] - leftBounds[i] + 1;
      activeSetSize += Max(0, rowSize);
    }

    // Choose next pivot based on remaining active set size
    if (activeSetSize > 2)
    {
      // Use randomized row median strategy for efficiency
      // Handle large activeSetSize by using double precision for random selection
      double randomFraction = random.NextDouble();
      long targetIndex = (long)(randomFraction * activeSetSize);
      int selectedRow = 0;

      // Find which row contains the target index
      long cumulativeSize = 0;
      for (int i = 0; i < n; i++)
      {
        long rowSize = Max(0, rightBounds[i] - leftBounds[i] + 1);
        if (targetIndex < cumulativeSize + rowSize)
        {
          selectedRow = i;
          break;
        }
        cumulativeSize += rowSize;
      }

      // Use median element of the selected row as pivot
      long medianColumnInRow = (leftBounds[selectedRow] + rightBounds[selectedRow]) / 2;
      pivot = values[selectedRow] + values[(int)medianColumnInRow - 1];
    }
    else
    {
      // Few elements remain - use midrange strategy
      double minRemainingSum = double.MaxValue;
      double maxRemainingSum = double.MinValue;

      for (int i = 0; i < n; i++)
      {
        if (leftBounds[i] > rightBounds[i]) continue; // Skip empty rows

        double rowValue = values[i];
        double minInRow = values[(int)leftBounds[i] - 1] + rowValue;
        double maxInRow = values[(int)rightBounds[i] - 1] + rowValue;

        minRemainingSum = Min(minRemainingSum, minInRow);
        maxRemainingSum = Max(maxRemainingSum, maxInRow);
      }

      pivot = (minRemainingSum + maxRemainingSum) / 2;
      if (pivot <= minRemainingSum || pivot > maxRemainingSum)
        pivot = maxRemainingSum;

      if (minRemainingSum == maxRemainingSum)
        return pivot / 2;
    }
  }
}

References

Estimates of Location Based on Rank Tests · 1963 · J L Hodges et al.
Algorithm 616: fast computation of the Hodges-Lehmann location estimator · 1984-08-28 · John F Monahan