Hodges-Lehmann-Sen shift and shift confidence interval estimators



In the previous two posts (1, 2), I discussed the Hodges-Lehmann median estimator. The suggested idea of getting median estimations based on a cartesian product could be adopted to estimate the shift between two samples. In this post, we discuss how to build Hodges-Lehmann-Sen shift estimator and how to get confidence intervals for the obtained estimations. Also, we perform a simulation study that checks the actual coverage percentage of these intervals.

Hodges-Lehmann-Sen shift estimator

Let’s consider two samples \(x = \{ x_1, x_2, \ldots, x_n, \}\) and \(y = \{ y_1, y_2, \ldots, y_m \}\). The Hodges-Lehmann-Sen shift is defined as follows:

\[\hat{\Delta} = \underset{1 \leq i \leq n;\; 1 \leq j \leq m}{\operatorname{median}}\Big(y_j - x_i\Big). \]

It was suggested in [Hodges1963] and [Sen1963].

Confidence Interval

Once we get a shift estimation \(\hat{\Delta}\), we may want to also get a confidence interval for this estimation. [Hodges1963] doesn’t contain any mention of confidence intervals. [Sen1963] includes a discussion about confidence interval, but it’s quite vague. Recently, I have found a nice straightforward approach for getting CIs (see [Deshpande2018], 7.11.2). Let’s calculate all the pairwise differences \(y_j-x_i\), sort them and denote the result as \(d = \{ d_1, d_2, \ldots, d_{nm} \}\). Next, we define the required confidence interval with confidence level \(\alpha\) as \([d_l, d_u]\) where

\[l = \Big[ \frac{nm}{2} - z_{\alpha/2}\sqrt{\frac{nm(n+m+1)}{12}} - 0.5 \Big], \]

\[u = \Big[ \frac{nm}{2} + z_{\alpha/2}\sqrt{\frac{nm(n+m+1)}{12}} - 0.5 \Big]. \]

Simulation Study

Now it’s time to check the above equations. Let’s enumerate different distributions, sample sizes, confidence levels, and check the actual coverage percentage in each situation.

Here is the source code of this simulation:

library(evd)
library(EnvStats)
library(stringr)

d.unif <- list(title = "Uniform(a=0, b=1)", r = runif, q = qunif)
d.tri_0_2_1 <- list(title = "Triangular(a=0, b=2, c=1)", r = function(n) rtri(n, 0, 2, 1), q = function(p) qtri(p, 0, 2, 1))
d.tri_0_2_02 <- list(title = "Triangular(a=0, b=2, c=0.2)", r = function(n) rtri(n, 0, 2, 0.2), q = function(p) qtri(p, 0, 2, 0.2))
d.beta2_4 <- list(title = "Beta(a=2, b=4)", r = function(n) rbeta(n, 2, 4), q = function(p) qbeta(p, 2, 4))
d.beta2_10 <- list(title = "Beta(a=2, b=10)", r = function(n) rbeta(n, 2, 10), q = function(p) qbeta(p, 2, 10))

d.norm <- list(title = "Normal(m=0, sd=1)", r = rnorm, q = qnorm)
d.weibull1_2 <- list(title = "Weibull(scale=1, shape=2)", r = function(n) rweibull(n, 2), q = function(p) qweibull(p, 2))
d.student3 <- list(title = "Student(df=3)", r = function(n) rt(n, 3), q = function(p) qt(p, 3))
d.gumbel <- list(title = "Gumbel(loc=0, scale=1)", r = rgumbel, q = qgumbel)
d.exp <- list(title = "Exp(rate=1)", r = rexp, q = qexp)

d.cauchy <- list(title = "Cauchy(x0=0, gamma=1)", r = rcauchy, q = qcauchy)
d.pareto1_05 <- list(title = "Pareto(loc=1, shape=0.5)", r = function(n) rpareto(n, 1, 0.5), q = function(p) qpareto(p, 1, 0.5))
d.pareto1_2 <- list(title = "Pareto(loc=1, shape=2)", r = function(n) rpareto(n, 1, 2), q = function(p) qpareto(p, 1, 2))
d.lnorm0_1 <- list(title = "LogNormal(mlog=0, sdlog=1)", r = function(n) rlnorm(n, 0, 1), q = function(p) qlnorm(p, 0, 1))
d.lnorm0_2 <- list(title = "LogNormal(mlog=0, sdlog=2)", r = function(n) rlnorm(n, 0, 2), q = function(p) qlnorm(p, 0, 2))

d.lnorm0_3 <- list(title = "LogNormal(mlog=0, sdlog=3)", r = function(n) rlnorm(n, 0, 3), q = function(p) qlnorm(p, 0, 3))
d.weibull1_03 <- list(title = "Weibull(shape=0.3)", r = function(n) rweibull(n, 0.3), q = function(p) qweibull(p, 0.3))
d.weibull1_05 <- list(title = "Weibull(shape=0.5)", r = function(n) rweibull(n, 0.5), q = function(p) qweibull(p, 0.5))
d.frechet1 <- list(title = "Frechet(shape=1)", r = function(n) rfrechet(n, shape = 1), q = function(p) qfrechet(p, shape = 1))
d.frechet3 <- list(title = "Frechet(shape=3)", r = function(n) rfrechet(n, shape = 3), q = function(p) qfrechet(p, shape = 3))

ds <- list(
  d.unif, d.tri_0_2_1, d.tri_0_2_02, d.beta2_4, d.beta2_10,
  d.norm, d.weibull1_2, d.student3, d.gumbel, d.exp,
  d.cauchy, d.pareto1_05, d.pareto1_2, d.lnorm0_1, d.lnorm0_2,
  d.lnorm0_3, d.weibull1_03, d.weibull1_05, d.frechet1, d.frechet3
)

hls <- function(x, y, alpha = 0.95) {
  df <- expand.grid(x = x, y = y)
  d <- sort(df$y - df$x)
  shift <- mean(d)
  n <- length(x)
  m <- length(y)
  A <- n * m / 2 - 0.5
  z <- qnorm((1 + alpha) / 2)
  B <- z * sqrt(n * m * (n + m + 1) / 12)
  l <- max(round(A - B), 1)
  r <- min(round(A + B), length(d))
  list(shift = shift, l = d[l], r = d[r])
}
coverage <- function(rdist, n, true.shift, alpha = 0.95, iterations = 10000) {
  cover.cnt <- 0
  for (iter in 1:iterations) {
    est <- hls(rdist(10), rdist(10) + true.shift, alpha)
    if (est$l < true.shift && true.shift < est$r)
      cover.cnt <- cover.cnt + 1
  }
  cover.cnt / iterations
}

for (n in c(5, 10, 50))
  for (alpha in c(0.90, 0.95, 0.99)) {
    cat(paste0("*** n = ", n, ", alpha = ", alpha * 100, "% ***\n"))
    for (d in ds) {
      c <- coverage(d$r, n, 5, alpha) * 100
      cat(paste0(str_pad(d$title, 30, "right"), ": ", str_pad(format(c, nsmall = 2), 6, "left"), "%\n"))
    }
    cat("\n")
  }

And here are the results:

*** n = 5, alpha = 90% ***
Uniform(a=0, b=1)             :  89.65%
Triangular(a=0, b=2, c=1)     :  89.20%
Triangular(a=0, b=2, c=0.2)   :  89.00%
Beta(a=2, b=4)                :  89.72%
Beta(a=2, b=10)               :  90.32%
Normal(m=0, sd=1)             :  88.78%
Weibull(scale=1, shape=2)     :  89.24%
Student(df=3)                 :  89.56%
Gumbel(loc=0, scale=1)        :  89.32%
Exp(rate=1)                   :  89.52%
Cauchy(x0=0, gamma=1)         :  89.16%
Pareto(loc=1, shape=0.5)      :  89.19%
Pareto(loc=1, shape=2)        :  89.29%
LogNormal(mlog=0, sdlog=1)    :  89.53%
LogNormal(mlog=0, sdlog=2)    :  89.18%
LogNormal(mlog=0, sdlog=3)    :  89.06%
Weibull(shape=0.3)            :  89.56%
Weibull(shape=0.5)            :  88.83%
Frechet(shape=1)              :  89.42%
Frechet(shape=3)              :  89.63%

*** n = 5, alpha = 95% ***
Uniform(a=0, b=1)             :  94.72%
Triangular(a=0, b=2, c=1)     :  94.93%
Triangular(a=0, b=2, c=0.2)   :  94.58%
Beta(a=2, b=4)                :  94.56%
Beta(a=2, b=10)               :  94.80%
Normal(m=0, sd=1)             :  94.36%
Weibull(scale=1, shape=2)     :  95.07%
Student(df=3)                 :  94.73%
Gumbel(loc=0, scale=1)        :  95.03%
Exp(rate=1)                   :  94.71%
Cauchy(x0=0, gamma=1)         :  94.38%
Pareto(loc=1, shape=0.5)      :  94.64%
Pareto(loc=1, shape=2)        :  94.58%
LogNormal(mlog=0, sdlog=1)    :  94.56%
LogNormal(mlog=0, sdlog=2)    :  94.90%
LogNormal(mlog=0, sdlog=3)    :  94.90%
Weibull(shape=0.3)            :  94.56%
Weibull(shape=0.5)            :  94.43%
Frechet(shape=1)              :  94.88%
Frechet(shape=3)              :  94.52%

*** n = 5, alpha = 99% ***
Uniform(a=0, b=1)             :  99.21%
Triangular(a=0, b=2, c=1)     :  99.28%
Triangular(a=0, b=2, c=0.2)   :  99.22%
Beta(a=2, b=4)                :  99.38%
Beta(a=2, b=10)               :  99.17%
Normal(m=0, sd=1)             :  99.13%
Weibull(scale=1, shape=2)     :  99.23%
Student(df=3)                 :  99.35%
Gumbel(loc=0, scale=1)        :  99.23%
Exp(rate=1)                   :  99.29%
Cauchy(x0=0, gamma=1)         :  99.37%
Pareto(loc=1, shape=0.5)      :  99.30%
Pareto(loc=1, shape=2)        :  99.41%
LogNormal(mlog=0, sdlog=1)    :  99.21%
LogNormal(mlog=0, sdlog=2)    :  99.29%
LogNormal(mlog=0, sdlog=3)    :  99.34%
Weibull(shape=0.3)            :  99.21%
Weibull(shape=0.5)            :  99.29%
Frechet(shape=1)              :  99.31%
Frechet(shape=3)              :  99.30%

*** n = 10, alpha = 90% ***
Uniform(a=0, b=1)             :  89.33%
Triangular(a=0, b=2, c=1)     :  89.36%
Triangular(a=0, b=2, c=0.2)   :  89.77%
Beta(a=2, b=4)                :  88.51%
Beta(a=2, b=10)               :  89.07%
Normal(m=0, sd=1)             :  89.58%
Weibull(scale=1, shape=2)     :  89.43%
Student(df=3)                 :  89.19%
Gumbel(loc=0, scale=1)        :  89.45%
Exp(rate=1)                   :  88.99%
Cauchy(x0=0, gamma=1)         :  90.34%
Pareto(loc=1, shape=0.5)      :  89.42%
Pareto(loc=1, shape=2)        :  89.38%
LogNormal(mlog=0, sdlog=1)    :  89.37%
LogNormal(mlog=0, sdlog=2)    :  89.34%
LogNormal(mlog=0, sdlog=3)    :  89.42%
Weibull(shape=0.3)            :  89.59%
Weibull(shape=0.5)            :  89.25%
Frechet(shape=1)              :  89.60%
Frechet(shape=3)              :  89.56%

*** n = 10, alpha = 95% ***
Uniform(a=0, b=1)             :  94.85%
Triangular(a=0, b=2, c=1)     :  94.79%
Triangular(a=0, b=2, c=0.2)   :  94.51%
Beta(a=2, b=4)                :  95.02%
Beta(a=2, b=10)               :  94.71%
Normal(m=0, sd=1)             :  94.85%
Weibull(scale=1, shape=2)     :  95.04%
Student(df=3)                 :  94.71%
Gumbel(loc=0, scale=1)        :  94.63%
Exp(rate=1)                   :  94.72%
Cauchy(x0=0, gamma=1)         :  94.47%
Pareto(loc=1, shape=0.5)      :  94.28%
Pareto(loc=1, shape=2)        :  94.96%
LogNormal(mlog=0, sdlog=1)    :  94.75%
LogNormal(mlog=0, sdlog=2)    :  94.49%
LogNormal(mlog=0, sdlog=3)    :  94.41%
Weibull(shape=0.3)            :  94.71%
Weibull(shape=0.5)            :  94.83%
Frechet(shape=1)              :  94.66%
Frechet(shape=3)              :  94.69%

*** n = 10, alpha = 99% ***
Uniform(a=0, b=1)             :  99.25%
Triangular(a=0, b=2, c=1)     :  99.25%
Triangular(a=0, b=2, c=0.2)   :  99.24%
Beta(a=2, b=4)                :  99.20%
Beta(a=2, b=10)               :  99.21%
Normal(m=0, sd=1)             :  99.31%
Weibull(scale=1, shape=2)     :  99.21%
Student(df=3)                 :  99.36%
Gumbel(loc=0, scale=1)        :  99.49%
Exp(rate=1)                   :  99.19%
Cauchy(x0=0, gamma=1)         :  99.41%
Pareto(loc=1, shape=0.5)      :  99.19%
Pareto(loc=1, shape=2)        :  99.28%
LogNormal(mlog=0, sdlog=1)    :  99.30%
LogNormal(mlog=0, sdlog=2)    :  99.31%
LogNormal(mlog=0, sdlog=3)    :  99.39%
Weibull(shape=0.3)            :  99.20%
Weibull(shape=0.5)            :  99.36%
Frechet(shape=1)              :  99.28%
Frechet(shape=3)              :  99.34%

*** n = 50, alpha = 90% ***
Uniform(a=0, b=1)             :  89.62%
Triangular(a=0, b=2, c=1)     :  88.94%
Triangular(a=0, b=2, c=0.2)   :  89.03%
Beta(a=2, b=4)                :  89.16%
Beta(a=2, b=10)               :  89.71%
Normal(m=0, sd=1)             :  89.14%
Weibull(scale=1, shape=2)     :  89.53%
Student(df=3)                 :  89.10%
Gumbel(loc=0, scale=1)        :  88.51%
Exp(rate=1)                   :  89.23%
Cauchy(x0=0, gamma=1)         :  89.14%
Pareto(loc=1, shape=0.5)      :  89.55%
Pareto(loc=1, shape=2)        :  89.27%
LogNormal(mlog=0, sdlog=1)    :  89.54%
LogNormal(mlog=0, sdlog=2)    :  89.12%
LogNormal(mlog=0, sdlog=3)    :  89.69%
Weibull(shape=0.3)            :  89.15%
Weibull(shape=0.5)            :  89.03%
Frechet(shape=1)              :  89.55%
Frechet(shape=3)              :  88.43%

*** n = 50, alpha = 95% ***
Uniform(a=0, b=1)             :  94.86%
Triangular(a=0, b=2, c=1)     :  94.54%
Triangular(a=0, b=2, c=0.2)   :  94.32%
Beta(a=2, b=4)                :  94.63%
Beta(a=2, b=10)               :  94.58%
Normal(m=0, sd=1)             :  94.97%
Weibull(scale=1, shape=2)     :  94.22%
Student(df=3)                 :  94.85%
Gumbel(loc=0, scale=1)        :  94.51%
Exp(rate=1)                   :  94.06%
Cauchy(x0=0, gamma=1)         :  94.57%
Pareto(loc=1, shape=0.5)      :  94.67%
Pareto(loc=1, shape=2)        :  94.84%
LogNormal(mlog=0, sdlog=1)    :  94.46%
LogNormal(mlog=0, sdlog=2)    :  94.60%
LogNormal(mlog=0, sdlog=3)    :  95.00%
Weibull(shape=0.3)            :  94.68%
Weibull(shape=0.5)            :  94.49%
Frechet(shape=1)              :  94.75%
Frechet(shape=3)              :  94.69%

*** n = 50, alpha = 99% ***
Uniform(a=0, b=1)             :  99.24%
Triangular(a=0, b=2, c=1)     :  99.11%
Triangular(a=0, b=2, c=0.2)   :  99.34%
Beta(a=2, b=4)                :  99.30%
Beta(a=2, b=10)               :  99.26%
Normal(m=0, sd=1)             :  99.21%
Weibull(scale=1, shape=2)     :  99.28%
Student(df=3)                 :  99.31%
Gumbel(loc=0, scale=1)        :  99.27%
Exp(rate=1)                   :  99.28%
Cauchy(x0=0, gamma=1)         :  99.31%
Pareto(loc=1, shape=0.5)      :  99.41%
Pareto(loc=1, shape=2)        :  99.40%
LogNormal(mlog=0, sdlog=1)    :  99.20%
LogNormal(mlog=0, sdlog=2)    :  99.34%
LogNormal(mlog=0, sdlog=3)    :  99.24%
Weibull(shape=0.3)            :  99.30%
Weibull(shape=0.5)            :  99.37%
Frechet(shape=1)              :  99.33%
Frechet(shape=3)              :  99.36%

It seems that the approach is accurate enough, the coverage percentage is quite close to the requested confidence level.

References

  • [Hodges1963]
    Hodges, J. L., and E. L. Lehmann. “Estimates of Location Based on Rank Tests.” The Annals of Mathematical Statistics 34, no. 2 (June 1963): 598–611.
    https://doi.org/10.1214/aoms/1177704172.
  • [Sen1963]
    Sen, Pranab Kumar. “On the Estimation of Relative Potency in Dilution (-Direct) Assays by Distribution-Free Methods.” Biometrics 19, no. 4 (December 1963): 532.
    https://doi.org/10.2307/2527532.
  • [Deshpande2018]
    Deshpande, J. V., Uttara Naik-Nimbalkar, and Isha Dewan. Nonparametric Statistics: Theory and Methods. New Jersey: World Scientific, 2018.