Fast Computation of the Hodges–Lehmann Estimator with the Monahan Algorithm
The Hodges–Lehmann location estimator (
Estimates of Location Based on Rank Tests
By J L Hodges, E L Lehmann
·
1963hodges1963) is a robust central tendency statistic. Given a sample $\mathbf{x} = (x_1, x_2, \ldots, x_n)$, it is defined as the median of all pairwise averages:
The estimator resists outliers and achieves high asymptotic efficiency (about 95% of the sample mean under Gaussian data). Its computational requirements previously limited practical use.
A direct approach requires generating all $n(n+1)/2$ pairwise sums and sorting them. This needs quadratic storage and $O(n^2 \log n)$ time. For $n=10{,}000$, the algorithm must handle approximately 50 million elements — impractical for most applications.
In 1984, John Monahan created a new algorithm that reduced the expected complexity to $O(n \log n)$ (
Algorithm 616: fast computation of the Hodges-Lehmann location estimator
By John F Monahan
·
1984-08-28monahan1984). This made the Hodges–Lehmann estimator computable for large samples.
The Problem Structure
Monahan’s approach uses the observation that the set of pairwise sums can be arranged as an implicit upper triangular matrix:
- Sort the input values: $x_1 \leq x_2 \leq \cdots \leq x_n$.
- Define a matrix $M$ where $M_{i,j} = x_i + x_j$ for $i \leq j$.
- Each row is sorted (non-decreasing).
- Each column is sorted (non-decreasing).
- The matrix is symmetric: $M_{i,j} = M_{j,i}$.
The entire collection of pairwise sums is structured, even if the algorithm never explicitly forms it.
The Hodges–Lehmann estimator equals the median element of this matrix, scaled by 1/2. For an odd number of sums, the estimator equals the middle element. For an even number, the estimator equals the average of the two middle elements.
Selection Instead of Sorting
Rather than sorting all pairwise sums, Monahan’s algorithm uses a selection strategy (like quickselect) to find the median:
Maintain bounds per row.
For each row $i$, track a left index $L_i$ and right index $R_i$. Initially, $L_i = i\!+\!1$, $R_i = n$. This defines which entries in row $i$ are still “active.”Choose a pivot sum.
At each step, select a candidate sum $a_m$ as a pivot. Monahan first experimented with weighted medians of row medians; later versions (HLQIST/HLQEST) use randomized selection for speed.Partition the active set.
Count how many sums are less than $a_m$. This takes linear time: the algorithm walks from the upper-right corner of the matrix downward, moving left until sums drop below the pivot. Because rows are sorted, the entire partitioning across all rows costs only $O(n)$.Compare to target rank.
The median of the pairwise sums corresponds to a target rank $k = \lfloor (N+1)/2 \rfloor$, where $N = n(n+1)/2$.- If fewer than $k$ sums lie below $a_m$, the median must be above $a_m$.
- If more than $k$, the median must be below.
- If exactly at $k$, $a_m$ is a candidate solution.
Narrow the search space.
Based on the comparison, the algorithm updates row bounds $(L_i, R_i)$ to shrink the active set. The algorithm discards large portions of the implicit matrix without ever building them.
This process continues until the active set is small enough to solve.
Handling Ties and Stalling
Real data contains ties: many sums can equal the pivot. Monahan recognized two risks:
- No progress. If the count of sums below the pivot is the same as the previous step, the search can stall.
- Flat regions. The Hodges–Lehmann estimator is the midrange of all possible roots, so multiple consecutive values may be correct.
To handle this, Monahan introduced midrange tie-breaking:
- If stalling occurs, the algorithm computes the smallest and largest active sums.
- The algorithm replaces the pivot with their average.
- If min = max, the algorithm is complete: all remaining sums are identical.
This allows the algorithm to converge with discrete or tied data.
Randomized Pivot Strategy
The weighted-median strategy required more computation. Following Hoare’s FIND
, Monahan used:
- Pick a pivot at random from the active set, or
- Pick a random active row, then its median element.
This avoids weighted computations and provides progress with high probability.
The expected number of iterations is $O(\log n)$. Combined with the $O(n)$ partitioning cost, the overall runtime is $O(n \log n)$.
Even and Odd Cases
When $n(n+1)/2$ is odd, the median is a single element. When even, the estimator equals the average of the two middle elements. Monahan’s algorithm therefore always computes two consecutive order statistics in the implicit matrix. This handles the definition of the Hodges–Lehmann estimator and makes the algorithm extendable to confidence intervals (which require quantiles other than the median).
Complexity
- Initial sorting: $O(n \log n)$.
- Each partitioning step: $O(n)$.
- Expected number of steps: $O(\log n)$.
- Total complexity: $O(n \log n)$.
This matches the complexity of sorting a single array while avoiding the quadratic blow-up of creating all pairwise sums.
Monahan reported that his final version (HLQEST) outperformed both direct sorting and earlier iterative Wilcoxon-based methods (
Algorithm 616: fast computation of the Hodges-Lehmann location estimator
By John F Monahan
·
1984-08-28monahan1984).
Practical Notes
- Robustness: The method is exact, not approximate. No randomization affects correctness — only runtime.
- Ties: Midrange pivoting provides stability.
- Extensions: The same approach can compute other quantiles of pairwise sums, useful for confidence intervals and hypothesis tests.
Conclusion
Monahan’s algorithm made the Hodges–Lehmann estimator computationally practical. The algorithm uses structure in the data — the monotone implicit matrix of pairwise sums — to reduce complexity from quadratic to log-linear.
The result is a robust estimator that is computationally practical on large datasets.
Reference Implementation
private double HodgesLehmannMonahan(IReadOnlyList<double> values)
{
int n = values.Count;
if (n == 1) return values[0];
if (n == 2) return (values[0] + values[1]) / 2;
// Calculate target median rank(s) among all pairwise sums
long totalPairs = (long)n * (n + 1) / 2;
long medianRankLow = (totalPairs + 1) / 2; // For odd totalPairs, this is the median
long medianRankHigh = (totalPairs + 2) / 2; // For even totalPairs, average of ranks medianRankLow and medianRankHigh
// Initialize search bounds for each row in the implicit matrix
long[] leftBounds = new long[n];
long[] rightBounds = new long[n];
long[] partitionCounts = new long[n];
for (int i = 0; i < n; i++)
{
leftBounds[i] = i + 1; // Row i can pair with columns [i+1..n] (1-based indexing)
rightBounds[i] = n; // Initially, all columns are available
}
// Start with a good pivot: sum of middle elements (handles both odd and even n)
double pivot = values[(n - 1) / 2] + values[n / 2];
long activeSetSize = totalPairs;
long previousCount = 0;
while (true)
{
// === PARTITION STEP ===
// Count pairwise sums less than current pivot
long countBelowPivot = 0;
long currentColumn = n;
for (int row = 1; row <= n; row++)
{
partitionCounts[row - 1] = 0;
// Move left from current column until we find sums < pivot
// This exploits the sorted nature of the matrix
while (currentColumn >= row && values[row - 1] + values[(int)currentColumn - 1] >= pivot)
currentColumn--;
// Count elements in this row that are < pivot
if (currentColumn >= row)
{
long elementsBelow = currentColumn - row + 1;
partitionCounts[row - 1] = elementsBelow;
countBelowPivot += elementsBelow;
}
}
// === CONVERGENCE CHECK ===
// If no progress, we have ties - break them using midrange strategy
if (countBelowPivot == previousCount)
{
double minActiveSum = double.MaxValue;
double maxActiveSum = double.MinValue;
// Find the range of sums still in the active search space
for (int i = 0; i < n; i++)
{
if (leftBounds[i] > rightBounds[i]) continue; // Skip empty rows
double rowValue = values[i];
double smallestInRow = values[(int)leftBounds[i] - 1] + rowValue;
double largestInRow = values[(int)rightBounds[i] - 1] + rowValue;
minActiveSum = Min(minActiveSum, smallestInRow);
maxActiveSum = Max(maxActiveSum, largestInRow);
}
pivot = (minActiveSum + maxActiveSum) / 2;
if (pivot <= minActiveSum || pivot > maxActiveSum) pivot = maxActiveSum;
// If all remaining values are identical, we're done
if (minActiveSum == maxActiveSum || activeSetSize <= 2)
return pivot / 2;
continue;
}
// === TARGET CHECK ===
// Check if we've found the median rank(s)
bool atTargetRank = countBelowPivot == medianRankLow || countBelowPivot == medianRankHigh - 1;
if (atTargetRank)
{
// Find the boundary values: largest < pivot and smallest >= pivot
double largestBelowPivot = double.MinValue;
double smallestAtOrAbovePivot = double.MaxValue;
for (int i = 1; i <= n; i++)
{
long countInRow = partitionCounts[i - 1];
double rowValue = values[i - 1];
long totalInRow = n - i + 1;
// Find largest sum in this row that's < pivot
if (countInRow > 0)
{
long lastBelowIndex = i + countInRow - 1;
double lastBelowValue = rowValue + values[(int)lastBelowIndex - 1];
largestBelowPivot = Max(largestBelowPivot, lastBelowValue);
}
// Find smallest sum in this row that's >= pivot
if (countInRow < totalInRow)
{
long firstAtOrAboveIndex = i + countInRow;
double firstAtOrAboveValue = rowValue + values[(int)firstAtOrAboveIndex - 1];
smallestAtOrAbovePivot = Min(smallestAtOrAbovePivot, firstAtOrAboveValue);
}
}
// Calculate final result based on whether we have odd or even number of pairs
if (medianRankLow < medianRankHigh)
{
// Even total: average the two middle values
return (smallestAtOrAbovePivot + largestBelowPivot) / 4;
}
else
{
// Odd total: return the single middle value
bool needLargest = countBelowPivot == medianRankLow;
return (needLargest ? largestBelowPivot : smallestAtOrAbovePivot) / 2;
}
}
// === UPDATE BOUNDS ===
// Narrow the search space based on partition result
if (countBelowPivot < medianRankLow)
{
// Too few values below pivot - eliminate smaller values, search higher
for (int i = 0; i < n; i++)
leftBounds[i] = i + partitionCounts[i] + 1;
}
else
{
// Too many values below pivot - eliminate larger values, search lower
for (int i = 0; i < n; i++)
rightBounds[i] = i + partitionCounts[i];
}
// === PREPARE NEXT ITERATION ===
previousCount = countBelowPivot;
// Recalculate how many elements remain in the active search space
activeSetSize = 0;
for (int i = 0; i < n; i++)
{
long rowSize = rightBounds[i] - leftBounds[i] + 1;
activeSetSize += Max(0, rowSize);
}
// Choose next pivot based on remaining active set size
if (activeSetSize > 2)
{
// Use randomized row median strategy for efficiency
// Handle large activeSetSize by using double precision for random selection
double randomFraction = random.NextDouble();
long targetIndex = (long)(randomFraction * activeSetSize);
int selectedRow = 0;
// Find which row contains the target index
long cumulativeSize = 0;
for (int i = 0; i < n; i++)
{
long rowSize = Max(0, rightBounds[i] - leftBounds[i] + 1);
if (targetIndex < cumulativeSize + rowSize)
{
selectedRow = i;
break;
}
cumulativeSize += rowSize;
}
// Use median element of the selected row as pivot
long medianColumnInRow = (leftBounds[selectedRow] + rightBounds[selectedRow]) / 2;
pivot = values[selectedRow] + values[(int)medianColumnInRow - 1];
}
else
{
// Few elements remain - use midrange strategy
double minRemainingSum = double.MaxValue;
double maxRemainingSum = double.MinValue;
for (int i = 0; i < n; i++)
{
if (leftBounds[i] > rightBounds[i]) continue; // Skip empty rows
double rowValue = values[i];
double minInRow = values[(int)leftBounds[i] - 1] + rowValue;
double maxInRow = values[(int)rightBounds[i] - 1] + rowValue;
minRemainingSum = Min(minRemainingSum, minInRow);
maxRemainingSum = Max(maxRemainingSum, maxInRow);
}
pivot = (minRemainingSum + maxRemainingSum) / 2;
if (pivot <= minRemainingSum || pivot > maxRemainingSum)
pivot = maxRemainingSum;
if (minRemainingSum == maxRemainingSum)
return pivot / 2;
}
}
}
References
Estimates of Location Based on Rank Tests · 1963
· J L Hodges
et al.
Algorithm 616: fast computation of the Hodges-Lehmann location estimator · 1984-08-28
· John F Monahan