*tl;dr I propose a half-baked and totally wrong theory of the philosophy of science based on information theory that explains why some fields are more data oriented, and some more theory oriented*

Which is better: data or theory? Here’s a better question: why do some people do theory, and some people analyze data? Or rather, why do some fields tend to have a large set of theorists, and some don’t?

If we look at any individual scientist, we could reasonably say that they were trying to understand world as well as possible as well as possible. We could describe this in information theory terms: they are trying to *maximize the information* they have about that description, when given some set of data. One way to think about information is that it *reduces uncertainty*. In other words, when given a set of data we want to reduce our uncertainty about our description of the world as much as possible. When you have no information about something, you are totally uncertain about it. You know nothing! But the more information you have, the less uncertain you are. How do we do that?

Thanks to Shannon, we have an equation that tells us how much information two things share. In other words, how much will knowing one thing tell us about the other:

I(problem; data) = H(problem) – H(problem | data)

This tells you (in bits!) how much certain you will be about a *problem* or our* **description of the world* if you get some set of *data*.

H(problem) is the *entropy* function; it tells us how many different possibilities we have of describing this problem. Is there only one way? Many possible ways? Similarly, H(problem | data) is how many possible ways we have of describing the problem if we’ve seen some data. If we see data, and there are still tons of possibilities, the data has not told us much; we won’t have much information about the problem. But if the data is so precise that for each set of data we know exactly how to describe the problem, then we will have a lot of information!

This tells us that there are two ways to maximize our information about our problem if we have a set of data. We can either increase our set of *descriptions* of the problem or we can decrease how many possible ways there are to describe the problem when we see data.

In a high-quality, data-rich world we can mostly get away with the second one: the data isn’t noisy, and will tell us what it represents. Information can simply be maximized by collecting more data. But what happens when the data is really noisy? Collecting more data gives us a smaller marginal improvement in information than working on the set of descriptions – modeling and theory.

This explains why some fields have more theory than others. One of the hopes of Big Data is that it will reduce the noise in the data, shifting fields to focusing on the H(problem|data) part of the equation. On the other hand, the data in economics, honestly, kind of sucks. It’s dirty and noisy and we can’t even agree on what it’s telling us. Hence, marginal improvements come by creating new theory!

Look at the history of physics; for a long time, physics was getting high-quality data and had a lot of experimentalists. Since, oh, the 50s or so it’s been getting progressively harder to get new, good data. Hence, theorists!

Biology, too, has such an explosion of data it’s hard to know what to do with it. If you put your mind to it, it’s surprisingly easy to get good data that tells you about *something*.

**Theory: proposes possible alternatives for how the world could work [H(problem)]; data limits H(problem | data). Problem is data itself is noisy.**