Data: Don't Get Your Hopes Too High
A column in the Financial Times reminds us that data—and the more “big,” the better—are often seen as a sesame key to the door of knowledge. It is even imagined that small data are owned by the person who has chosen to share them on an open platform that belongs to somebody else. (See Benedict Evans, “There Is No Such Thing as ‘Data’,” Financial Times, May 27, 2022):
This is mostly nonsense. There is no such thing as “data”, it isn’t worth anything, and it doesn’t belong to you anyway. … “Data” does not exist—there are merely many sets of data. … Most of the meaning in “your” data is not in you but in all of the interactions with other people.
But this is not my topic, although it is related. My topic is the false idea that one can induct theory from data without first having a theory, formal or intuitive, explicit or implicit, to indicate which data are relevant. In economics, this idea has been lately associated with Harvard University economist Raj Chetty, who apparently aims to teach microeconomic principles by first looking at the data (see Don Boudreaux, “How Should Econ 101 Be Thought,” Econlib, January 6, 2020).
That this is not consistent with the scientific way of understanding the physical or social world has been well explained by Karl Popper, the famous philosopher of science, in a series of articles in Economica (“The Poverty of Historicism,” May 1944, August 1944, and May 1945):
I believe that theories are prior to observations as well as to experiments, in the sense that these are significant only in relation to theoretical problems. … Therefore, I do not believe in the “method of generalization”, that is to say, in the view that science begins with observations from which it derives its theories by some process of generalization or induction. (Part 2, p. 134-135)
I believe that the prejudice that we proceed in this way is a kind of optical illusion, and that at no stage of scientific development do we begin without something in the nature of a theory, such as a hypothesis, or a prejudice, or a problem … which in some way guides our observations, and helps us select from the innumerable objects of observation those which may be of interest. (Part 3, p. 79)
The literary literature provides us with a fun example of another sort. In an 1841 letter to his sister, French novelist Gustave Flaubert wrote:
Since you are now studying geometry and trigonometry, I will give you a problem. A ship sails the ocean. It left Boston with a cargo of wool. It grosses 200 tons. It is bound for Le Havre. The mainmast is broken, the cabin boy is on deck, there are 12 passengers aboard, the wind is blowing East-North-East, the clock points to a quarter past three in the afternoon. It is the month of May. How old is the captain?
Puisque tu fais de la géométrie et de la trigonométrie, je vais te donner un problème : Un navire est en mer, il est parti de Boston chargé de coton, il jauge 200 tonneaux. Il fait voile vers le Havre, le grand mât est cassé, il y a un mousse sur le gaillard d’avant, les passagers sont au nombre de douze, le vent souffle N.-E.-E., l’horloge marque 3 heures un quart d’après-midi, on est au mois de mai…. On demande l’âge du capitaine?
If you are looking for what determines a captain’s age or what is determined by it, most data in the universe are irrelevant. Of course, the exercise proposed by Flaubert could have been a mere cryptographic enigma, but solving it would still have shown nothing about induction as a way to derive scientific laws.