The Effect Existence, Its Magnitude, and the Goals

by Andrey Akinshin · 2024-02-27

If you are curious if something impacts something else, the answer is probably “yes.” Does that indicator depend on those factors? Yes, it does. If we change this thing, would it affect …? Yes, it would. If a person takes this pill, could it cause a non-exactly-zero change in the body? Yes, the presence of the pill is already a change that can always be detected with the right amount of effort.

One may argue that in some cases (assuming the list of specific cases is presented), zero effect does exist. For a moment, let us pretend that it is true. Now, let us imagine a parallel universe, which is the same as ours but with the presence of the effect. Unfortunately, the effect is so small that our tools are not sophisticated enough to detect it. Imagine being put into one of these worlds, but you don’t know which one. How do you determine the existence of the effect? Of course, you can improve the resolution of the measurement tools via new scientific discoveries, but with the current state of technology, the absence of the effect cannot be checked. Therefore, it is always safer to assume that the effect exists, keeping in mind that it can be negligible. Let us accept this assumption and continue if it is absolute truth.

Since the effect exists and we know that, it does not make sense to perform studies to check it. If we want to investigate something and collect more scientific knowledge about the world, we should be curious about the size of the effect! The magnitude degree should be the primary focus of our attention.

If we perform generic exploratory research without declaring further goals, the estimation itself should be the primary research target. In this context, we must not use any artificial arbitrary chosen thresholds. We do not interpret, and we do not judge. We just collect data to make a foundation for new theories, which should be checked in new studies.

If we want to come up with an interpretation, we should present the goal. This is crucial to understand what we want to achieve since the answer should be the basis for the interpretation. We cannot say that an effect is big or small if we do not provide a comparison reference. Unfortunately, the clear goal definition is often so complicated that researchers experience an extreme and unconscious temptation to gently omit “obvious” explanations.

For example, we examine the treatment effect of a pill against an illness. At which degree of the effect would it be reasonable to start mass production of such pills and take money from people for that? What if the pill has been proven to always reduce the illness duration but only by 0.0001%? While I feel like some real pills in real drug stores have such a degree of usefulness, I would not recommend starting the production and distribution of such a medicine. And what if the pills reduce the illness duration by 1%? Or 5%? Or 50%? Or 99%? Where do we draw a threshold? And what is more important is why? Handling this question can be tremendously difficult, but only the right answer will give us a proper baseline that we can use in the research.

Or let us consider a popular topic of psychological studies: does the social context affect human behavior? For example, could people perform extremely cruel actions if they think it is socially acceptable in a given situation? First of all, yes, they could; there are plenty of historical precedents. Here, researchers start asking new questions like how many people are influenced by this effect and to what degree? What’s even worse, they start labeling their findings as “strong effect,” “almost statistically significant,” “enough statistical power,” or “reasonably large sample.” Reviewers are ready to open their sticker packs with “p-value should be banned,” “underpowered,” “randomized sample is not random enough,” “correlation does not imply causation,” “underfitted,” “overfitted,” and dozens of other perfectly valid critical comments. Sometimes, the subsequent discussions can be highly educational and/or entertainable. Unfortunately, in the heat of the argument, these researchers often miss the main point: why do we care? Again, in pure explorational research, we just collect the measurements and put them into the shared dataset: the sample size does not matter, and statistical testing is not needed (there is no hypothesis to test!). In non-explorational research, we want to use the new information somehow. Let us assume that one study claimed that 65% of people are capable of performing cruel actions under pressure. Next, the study was debunked, and the new study claims that the correct estimation is 85%. The error of 20% is “obviously” huge, but why is it important? How should it change our perception of the results? How should we adjust our decision-making strategy based on this extremely useful correction?

Here is another example. A mayor considers building a new park and is curious if it would increase the happiness of the citizens. Most likely, yes, it would. Many people like parks, so we can expect some folks to be happier. A better question is how much happier they would be. A much better question is, why do we care? Let us say that the mayor wants to be re-elected, and the city renovation is a part of the campaign. If it is the true goal, we should approach the situation differently. First of all, we should focus not on the citizens’ happiness but on the current mayor’s support level. Secondly, we should consider different renovation project options. We want to choose the best one, so we need some criteria to compare them. For example, we can compare them by the ratio of costs and benefits. While such a trivial metric is not recommended as the only one without acknowledging other parameters, it is good enough for an example. The cost can be evaluated separately but for the expected support level rise, we may want to perform a social survey. The survey should be designed to handle the following specific question: which renovation project would lead to the highest support level increase divided by the cost? (This question also has a lot of pitfalls, but we want to keep the example simple.) Also, we should notice a not-so-obvious outcome: none of the available projects will improve the support level tangibly. In this case, the mayor may want to reconsider spending the campaign budget on city renovation. But how do we pick a threshold for the minimum increment worth the investment? If we collect the data first and “then decide,” the result will be biased. The reason why we advocate the usage of the scientific method is to eliminate such biases. Such a “then decide” procedure often leads to situations when people find a way to support their favorite preselected option with the collected data. And there is always a way. Such thresholds must be chosen before the study and should have reasonable explanations.

We can also blame classic statistical textbooks for lacking goal-oriented thinking in the scientific community. These textbooks are typically over-focused on specific tools that are applied to over-simplified examples, having the goal formation process out of the scope. But why the hell do we need to know if a coin is fair, how many red balls are in the urn, or if men are taller than women? The requirement of clearly defined goals should always be the primary one.

Some people perceive such strict requirements as too formal and unnecessary. Indeed, it may feel that following the formal guidelines of modern scientific methodologies is just irritating bureaucracy. We have all the right to feel that way since many of us had unpleasant experiences with formal guidelines in our lives. But this demand for clear goals is a good requirement. This requirement cares about us. This requirement does not want us to be in a situation in which we spend years of our lives and tons of money on useless research. This requirement prevents us from starting any endeavors without a clear and explicit answer to “Why do we want to do this?”

While most reasonable people agree that “why” is an important question, this knowledge does not always help. It is like the knowledge of a diet program on its own does not help to lose weight. We all have cognitive biases that prevent us from following the right path. And we cannot fully control our abilities to self-control. We cannot always ask the right questions and make the right decisions. But we can try to develop a mental habit of asking “why” and try to change our mindsets to be more goal-oriented. A simple exercise is to always carefully write down the goals and check if they contain any magic numbers and/or unmotivated hidden evaluations.

With the right mindset, it should be fine to perform flawed research. Any study contains flaws anyway; it is not something to be ashamed of; it is something that should be acknowledged and properly used. These kinds of mistakes are valuable sources of experience and knowledge. It helps us evolve and advance our understanding of the world.