How much should theories guide learning through experiments? (2022)

This is Jessica. I recently wrote about the role of theory in fields like psych. Here’s a related thought experiment:

A researcher is planning a behavioral experiment. They make various decisions that prescribe the nature of the data they collect: what interventions to test (including the style of the intervention and any informational content), what population to recruit subjects from, and what aspects of subjects’ behavior to study. The researcher uses their hunches to make these choices, which may be informed by explanations of prior evidence that have been proposed in their field. This approach could be called “strongly” theory-driven: they have tentative explanations of what drives behavior that strongly influence how they sample these spaces (note that these theories may or may not be based on prior evidence).

Now imagine a second world in which the researcher stops and asks themselves, as they make each of these decisions, what is a tractable representation of the larger space from which I am sampling, and how can I instead randomly sample from that? For example, if they are focused on some domain-specific form of judgment and behavior (e.g., political attitudes, economic behavior) they might consider what the space of intervention formats with possible effects on those behaviors are, and draw a random sample from this space rather than designing an experiment around some format they have a hunch about.

(Video) Inside the World's Largest Science Experiment

Is scientific knowledge gain better in the first world or the second?

Before trying to answer this question, here’s a slightly more concrete example scenario: Imagine a researcher interested in doing empirical research on graphical perception, where the theories take the form of explanations of why people perform some task better with certain visual encodings over others. In the first world, they might approach designing an experiment with implications of a pre-existing theory in mind, conjecturing, for example, that length encodings are better than area encodings because the estimated exponent of Stevens’ power law from prior experiments is closer to 1 for length compared to area. Or they might start with some new hunch they came up with, like density encodings are better than blur encodings, where there isn’t much prior data. Either way they design an experiment to test these expectations, choosing some type of visual judgment (e.g., judging proportion, or choosing which of two stimuli is longer/bigger/blurrier etc in a forced choice), as well as the structure and distribution of the data they visualize, how they render the encodings, what subjects they recruit, etc. Where there have been prior experiments, these decisions will proably be heavily influenced by choices made in those. How exactly they make these decisions will also be informed by their theory-related goal: do they want to confirm the theory they have in mind, disconfirm it, or test it against some alternative theory? They do their experiment and depending on the results, they might keep the theory as is, refine it, or produce a completely new theory. The results get shared with the research community.

In the “theory-less” version of the scenario, the researcher puts aside any hunches or prior domain knowledge they have about visual encoding performance. They randomly choose some some set of visual encodings to compare, some visual judgment task and some type of data structure/distribution compatible with those encodings, etc.. After obtaining results they similarly use them to derive an explanation, and share their results with the community.

(Video) The Scientific Method: Steps, Examples, Tips, and Exercise

So which produces better scientific knowledge? This question is inspired by a recent preprint by Dubova, Moskvichev, and Zollman which uses agent-based modeling to ask whether theory-motivated experimentation is good for science. The learning problem they model is researchers using data collected from experiments to derive theories, i.e., lower dimensional explanations designed to most efficiently and representatively account for the ground truth space (in their framework these are autoencoders with one hidden layer, trained using gradient descent). As theory-informed data collection strategies, they consider confirmation, falsification, crucial experimentation (e.g., sampling new observations based on where theories disagree), and novelty (e.g., sampling a new observation that is very different from its previously collected observations) and compare these to random sampling. They evaluate how well the theories produced by each strategy compare in terms of perceived epistemic success (how well does the theory account for only the data they collected) and “objective performance,” how well they account for representative samples from the full ground truth distribution.

They conclude from their simulations is that “theoretically motivated experiment choice is potentially damaging for science, but in a way that will not be apparent to the scientists themselves.” The reason is overfitting:

The agents aiming to confirm, falsify theories, or resolve theoretical disagreements end up with an illusion of epistemic success: they develop promising accounts for the data they collected, while completely misrepresenting the ground truth that they intended to learn about. Agents experimenting in theory-motivated ways acquire less diverse or less representative samples from the ground truth that are also easier to account for.

(Video) The Onion Peel Experiment: All The Science You Need to Know for Class 9

Of course, as in any attempt to model scientific knowledge production, there are many specific parameter choices they make in their analyses that should be assessed in terms of how well they capture real-world experimentation strategies, theory building and social learning, before we place too much faith in this claim. For the purposes of this post though, I’m more interested in the intuition behind their conclusions.

At a high level, the possibility that theory-motivated data collection reduces variation in the environment being studied seems plausible to me.It helps explain why I worry about degrees of freedom in experiment design, especially when one can pilot test different combinations of design parameters and one knows what they want to see in the results. It’s easy to lose sight of how representative your experimental situation is relative to the full space of situations in which some phenomena or type of behavior occurs when you’re hell bent on proving some hunch. And when subsequent researchers design experiments informed by the same theory and set-up you used, the knowledge that is developed may become even more specialized to a particular set of assumptions. Related to the graphical perception example above, there are regularly complaints among people who do visualization research about how overfit certain design principles (e.g., choose encodings based on perceptual accuracy) are to a certain class of narrowly-defined psychophysics experiments. On top of this, the new experiments we design on less explored topics (uncertainty visualization, visualization for machine learning, narrative-driven formats etc.) can be similarly driven by hunches researchers have, and quickly converge on a small-ish set of tasks, data generating conditions or benchmark datasets, and visualization formats.

So I find the idea of random sampling compelling. But things get interesting when I try to imagine applying it in the real world. For example, on a practical level, to randomly sample a space implies you have some theory, if not formally at least implicitly, about the scope of the ground truth distribution. How much does the value of random sampling depend on how this is conceived of? Does this model need to be defined at the level of the research community? On some level this is the kind of implicit theory we tend to see in papers already, where researchers argue why their particular set of experimental conditions, data inputs, behaviors etc. covers a complete enough scope to enable characterizing some phenomena.

(Video) Food Theory: How to BEAT the Buffet (Food Theory's Lost Episode)

Or maybe one can pseudo-randomly sample without defining the space that’s being sampled, and this pseudo-randomly sampling is still an improvement over the theory-driven alternative. Still it seems hard to conceptually separate the theory-driven experiment design from the more arbitrary version, without having some theory of how well people can intentionally randomize. For example, how can I be sure that whatever conditions I decide to test when I “randomly sample” aren’t actually driven by some subconscious presupposition I’m making about what matters? There’s also a question of what it means for one to choose what they work on in any real sense in a truly theory-free approach. For various reasons researchers often end up specializing in some sub-area of their field. Can this be like an implicit statement about where they think the big or important effects are likely to be?

I also wonder how random sampling might affect human learning in the real world, where how we learn from empirical research is shaped by conventions, incentives, ego, various cognitive limits, etc. I expect it could feel hard to experiment without any of the personal committments to certain intuitions or theories that currently play a role. Could real humans find it harder to learn without theory? I know I have learned a lot by seeing certain hunches I had fail in light of data; if there was no place for expectations in conducting research, would I feel the same level of engagement with what I do? Would scientific attention or learning, on a personal level at least, be affected if we were supposed to leave our intuitions or personal interests at the door? There’s also the whole question about how randomizing experimenters would fare under the current incentive structures, which tend to reward perceived epistemic success.

I tend to think we can address some theory problems, including the ones I perceive in my field, by being more explicit in stating the conditions that our theories are intended to address and that our claims are based on. For example, there are many times when we could do a better job of formalizing the spaces that are being sampled from to design an experiment. This may push researchers to recognize the narrowness or their scope and sample a bit more representatively. To take an example Gigerenzer used to argue that psychologists should do more stimuli sampling, in studies of overconfidence, rather than asking people questions framed around what might be a small set of unusual examples (e.g., “Which city lies further south: Rome or New York? How confident are you?”, where Rome is further north yet warmer), the researcher would do better to consider the larger set of stimuli from which these extreme examples are sampled, like all possible pairs of large cities in the world. It’s not theory-free, but would be a less drastic change that would presumably have some of the same effect of reducing overfitting. It seems worth exploring what lies between theoretically-motivated gaming of experiments and trying to remove personal judgment and intuitions about the objects of study altogether.

(Video) Marty Lobdell - Study Less Study Smart


1. August 31, 1973: Psychic science, circa-1973
(60 Minutes)
2. The interesting engineering behind the SHAPE of Train wheels!
(Fun Science)
4. Brian Cox explains quantum mechanics in 60 seconds - BBC News
(BBC News)
5. Math Antics - Basic Probability
6. The Big Bang Theory S11E05 How to deal with Sheldon

You might also like

Latest Posts

Article information

Author: Margart Wisoky

Last Updated: 09/25/2022

Views: 6021

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.