# Unobvious limitations of R *signrank Wilcoxon Signed Rank functions

In R, we have functions to calculate the density, distribution function, and quantile function
of the Wilcoxon Signed Rank statistic distribution: `dsignrank`

, `psignrank`

, and `qsignrank`

.
All the functions use exact calculations of the target functions
(the R 4.3.1 implementation can be found here).
The exact approach works excellently for small sample sizes.
Unfortunately, for large sample sizes, it fails to provide the expected function values.
Out of the box, there are no alternative approximation solutions that could allow us to get reasonable results.
In this post, we investigate the limitations of these functions and
provide sample size thresholds after which we might get invalid results.

## psignrank

`psignrank`

returns values of the Wilcoxon Signed Rank statistic cumulative distribution function (CDF).
It has the following signature:

```
psignrank(q, n, lower.tail = TRUE, log.p = FALSE)
```

The Statistic value takes values between $0$ and $n(n+1)/2$. For example, for $n=4$, the Statistic values range from $0$ to $10$:

```
psignrank(-1:10, 4)
## [1] 0.0000 0.0625 0.1250 0.1875 0.3125 0.4375 0.5625 0.6875 0.8125 0.8750 0.9375 1.0000
```

Problems appear when $n \geq 1039$.
Let us consider $n=1039$ to illustrate this problem.
For this sample size, the Statistic values range from $0$ to $540280$.
Let us explore the `psignrank`

results for some of these values using the following snippet:

```
n <- 1039
q <- c(
0,
seq(230000, 262000, by = 2000),
262633,
262634,
270140,
270141,
277645,
277646,
seq(278000, 310000, by = 2000),
540280
)
cdf <- psignrank(q, n)
cbind(q, cdf)
```

Here are the results:

q | cdf | q | cdf |
---|---|---|---|

0 | 0.00000 | 270141 | -Inf |

230000 | 0.00002 | 277645 | -Inf |

232000 | 0.00004 | 277646 | 0.78101 |

234000 | 0.00009 | 278000 | 0.79166 |

236000 | 0.00021 | 280000 | 0.84587 |

238000 | 0.00044 | 282000 | 0.88983 |

240000 | 0.00091 | 284000 | 0.92399 |

242000 | 0.00180 | 286000 | 0.94942 |

244000 | 0.00343 | 288000 | 0.96757 |

246000 | 0.00628 | 290000 | 0.97997 |

248000 | 0.01104 | 292000 | 0.98810 |

250000 | 0.01867 | 294000 | 0.99319 |

252000 | 0.03039 | 296000 | 0.99626 |

254000 | 0.04764 | 298000 | 0.99802 |

256000 | 0.07197 | 300000 | 0.99900 |

258000 | 0.10483 | 302000 | 0.99951 |

260000 | 0.14738 | 304000 | 0.99977 |

262000 | 0.20017 | 306000 | 0.99990 |

262633 | 0.21899 | 308000 | 0.99996 |

262634 | Inf | 310000 | 0.99998 |

270140 | Inf | 540280 | 1.00000 |

If we continue exploring in-between values, we obtain four intervals in the function domain:

q | cdf |
---|---|

`[0; 262633]` | `[0.00000; 0.21899]` |

`[262634; 270140]` | $\infty$ |

`[270141; 277645]` | $-\infty$ |

`[277646; 540280]` | `[0.78101; 1.00000]` |

As we can see, in the middle part of the distribution ($q \in [262634; 277645]$),
`psignrank`

returns $\pm \infty$.
For larger values of $n$, we have the same pattern: `psignrank`

returns correct values only at the distribution tails.
The middle part of the distribution gives invalid values.
For $1039 \leq n \leq 1074$ these invalid values are $\pm \infty$.
Starting $n \geq 1075$, we observe `NaN`

.

## dsignrank

`dsignrank`

returns values of the Wilcoxon Signed Rank statistic probability density function (PDF).
It has the following signature:

```
dsignrank(x, n, log = FALSE)
```

It works fine for $n \leq 1038$.
However, for $n \geq 1039$, we encounter issues similar to `psignrank`

:

```
n <- 1039
x <- c(262633, 262634, 277646, 277647)
dsignrank(x, n)
## [1] 0.00003051712 Inf Inf 0.00003051712
```

For the middle part of the distribution, `dsignrank`

always returns `Inf`

.
The range of invalid values almost matches the corresponding range for `psignrank`

.

## qsignrank

`qsignrank`

returns values of the Wilcoxon Signed Rank statistic quantile function values.
It has the following signature:

```
qsignrank(p, n, lower.tail = TRUE, log.p = FALSE)
```

It performs normally for $n \leq 1074$, but it hangs for $n \geq 1075$. For example, the following function call never returns a result:

```
qsignrank(0.5, 1075)
```

The hanging behavior is observed for $p \in (0; 1); n \geq 1075$.

## Conclusion

The signrank functions in R are reliable for small sample sizes, but they have some unobvious limitations when dealing with larger sample sizes. If one needs values that cannot be obtained using standard functions, various approximation models like the normal one or the Edgeworth expansion can be considered.