Adaptation of continuous scale measures to discrete distributions
In statistics, it is often important to have a reliable measure of scale since it is required for estimating many types of the effect size and for statistical tests. If we work with continuous distributions, there are plenty of available scale measures with various levels of statistical efficiency and robustness. However, when distribution becomes discrete (e.g. because of the limited resolution of the measure tools), classic measures of scale can collapse to zero due to tied values in collected samples. This can be a severe problem in the analysis since the scale measures are often used as denominators in various equations. To make the calculations more reliable, it is important to handle such situations somehow and ensure that the target scale measure never becomes zero. In this post, I discuss a simple approach to work around this problem and adapt any given measure of scale to the discrete case.
The problem
First, let us consider an example that illustrates the problem. For two given samples, we want to estimate the effect size that is expressed as a difference between measures of central tendency divided by the pooled measure of scale. The classic example of the effect size from this family is Cohen’s d:
where
where
When we calculate Cohen’s d for two samples from normal distributions,
everything works smoothly.
Here are two samples from
Case A
x ≈ {0.474, -0.555, -0.01, 1.067, -0.712, -0.321, -1.238, -1.085, 0.869, 2.316}
y ≈ {1.315, -0.271, 1.563, 0.54, 1.81, 1.905, -0.209, 1.001, 0.543, 2.149}
For this case, the pooled standard deviation is
Now let us consider another example:
Case B
x = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
y = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
In this case, the standard deviation is zero for both samples. Therefore, the pooled standard deviation is also zero, the Cohen’s d cannot be evaluated. A convenient measure of effect size for such a discrete degenerated case is the absolute difference between measures of central tendency expressed in raw measurement units.
When the properties of the distribution are known in advance, we can choose the proper effect size measure during the design stage of the research. But what if we don’t know these properties, and we want to apply a universal approach to any given data?
A possible solution
To tackle this issue, we can adapt the continuous scale measures to discrete data
using a simple method that guarantees non-zero values.
The main idea is to add a small constant
where
The difference between
It seems that
For example, in Case A, switching from
In Case B, switching from
Conclusion
Adapting continuous scale measures to discrete data is a simple yet effective approach to handling situations where classical measures may collapse to zero. By adding a small constant based on the resolution of the measurement scale, we can ensure that the scale measure never becomes zero, allowing us to compute effect size measures for a wide range of data types.
This method can be applied to any given measure of scale, not just the pooled standard deviation, making it a versatile tool for researchers and practitioners working with discrete data. However, it is essential to choose the constant value carefully, considering the context and the resolution of the measurement scale to avoid introducing significant distortions to the results.