| 
  PREFACE
 When we choose to say there are "a lot" of gorillas, or "17 gorillas," 
          or "around 20," we are making a choice based on our interests and 
          abilities. Those interests and abilities are systematically expressed 
          in the numbers we choose to send out into the world. The Internet provides 
          us with a large and diverse database of the stuff people have sent out 
          into the world, and it naturally includes large numbers of numbers. 
          Since 1997, we have collected at intervals a novel set of data on the 
          popularity of numbers: by performing a massive automated Internet search 
          on each of the integers from 0 to 1,000,000 and counting the number 
          of pages which contained each, we have obtained a picture of the Internet 
          community's numeric interests and inclinations. The interactive visualization 
          which accompanies this statement attempts to make some of the more striking 
          trends visible. Our data can be explored point by point, or viewed in 
          larger sets, and may lend some insight into the cognitive structure 
          of numeracy, culture, and memory.
 Certain patterns are made readily visible in the data browser, such 
          as people's preference for multiples of 10, or reduplicative numbers 
          such as 1010, 1111, 1212, etc. While American zip codes do not present 
          a similarly coherent visual pattern, they can nevertheless be quite 
          prominent—and unlike viewing cities from space, where only the larger 
          cities are visible, here, both the larger and the more interesting places 
          are brighter. Further highlights in the data reveal more fleeting reflections 
          of our activities: bright spots such as those for 80486 and 68040 reveal 
          our interest not only in technology, but tell us about the state of 
          that technology at the time. Other points indicate our interest, or 
          lack of interest, in history. And popular culture inevitably makes its 
          presence felt, but perhaps less than we might expect.  Linguists have noted that the ideal construct of proper "Language" 
          diverges considerably from its actual use, suggesting that the linguistic 
          "ideal" may be a pale abstraction of a far more nuanced and textured 
          practice. Although we like to think of the use of numbers as objective 
          and removed from our personal lives, it appears they also display elements 
          of "practice." The denizens of the number line are not the mere automatons 
          or corporate tools we have made them out to be: each has a personality, 
          talents, communities, and sometimes a little je ne sais quois. 
          They reflect us. This unusual reflection is the focus of this project. 
          NUMBERS ARE TOOLS
 In learning how to abstract, we learn that all information is potentially 
          expressible in numbers. The ability to abstract from perceived phenomenon 
          (such as a group of cows, or the effects of gravity) to descriptions 
          of the physical world (such as 23 or 9.8s^2) has allowed us to see commonalities 
          in phenomena that may first have appeared to be distinct.
 One consequence of abstraction is that we must ignore the individual 
          characteristics of the entities we abstract. As a result, the numbers 
          we use to codify these abstractions must also lack character. Twenty-three 
          cows may be better (or worse) than three cows, but "23," is not better 
          than "3." Both numbers are simply descriptors, which inherit their meaning 
          solely from taking part in fixed systems of fixed relations with other 
          numbers. Apart from the existence of the numeric system (and the numbers' 
          participation in it) individual numbers have no meaning.  Thus, our number system is seen as an objective tool—a tool that does 
          not reflect human preference, emotion, or inconsistency. As such it 
          is a tool used not to express ourselves, but is reserved only to describe 
          the world around us. We do not write poetry with numbers, nor do we 
          express our personal doubts or prejudices through them ...except as 
          our humanity is projected onto the emotionless toil of mathematical 
          proof, ledger balances, or pedagogical exercises. But like every symbiotic 
          couple, the tool we would like to believe is separate from us (and thus 
          objective) actually provides an intricate reflection of our thoughts, 
          interests, and capabilities. One intriguing result of this symbiosis 
          is that the numeric system we use to describe patterns, is actually 
          used in a patterned fashion to describe. We are imperfect users of our perfect tool. Buildings often skip the 
          13th floor, there is no year 0, and our only contact with very large 
          numbers comes from government debt, numbers which remain unreal to us 
          for their very size. We spend most of our time using numbers not for 
          calculating, or even measuring, but in acts of remembering, guessing, 
          and simplifying. The secret lives of numbers presented here, deals less 
          with the apparently inviolate laws we have contrived for the staid little 
          number, than with our own natures expressed through their quixotic use. 
          This secret life tells us not only about the number system we have fashioned, 
          but also about our cognitive, professional, and creative liaisons with 
          the various inhabitants of the number line.   THE DATA ITSELF
 The characteristics of the data we describe below include aspects of 
          the population as a whole, patterns of smaller groups, and individual 
          charm. All tell us something about our culture as expressed through 
          numbers.
 The Population The first and most striking characteristic of the data is the overall 
          distribution of the numbers' occurrences. Instead of the uniform distribution 
          one might expect if every number were equally useful, we see an exponential 
          drop-off in popularity beginning with the number 1. These earliest, 
          and most popular individuals aren't a glamorous set, but instead see 
          their popularity rise from their accessibility. They are the first numbers 
          we learn, and are the easiest to understand and use. For these reasons, 
          with every increase in magnitude along the number line, the numbers 
          see a sharp drop in this kind of basic popularity.
 Prominent Families There are, however, certain numbers further down the line that enjoy 
          great popularity in spite of their greater number of digits. These numbers 
          comprise some of the basic "royal families" of the number 
          line: the base-2, base-10, base-12, and base-60 families. While most 
          of these families have their niches in technology, time-based media 
          and the English measurement system, the most prominent of these families 
          is the one which clearly reflects our biology. The 10 family can be 
          seen everywhere: numbers at multiples of 10, (and powers of ten, like 
          100, 1000, etc.) enjoy a popularity far greater than their neighbors 
          throughout the data. Our biases for "rounding" suggest that most of 
          these numbers' high standing comes at the direct expense of their nearest 
          neighbors. The positively ignored 49949 is one such serf, apparently 
          yielding its worldly recognition to its more prominent neighbor 50000.
  In addition to the population at large, and the most prominent families 
          within it, some numbers enjoy a certain degree of popularity owing to 
          their occupation. Here are just a few examples; many more can yet be 
          found in the data: The Journalists 90-99, and 1990-2002 (no time like the present)
 The Stars90210 (the television show)
 The Dorks & Techies 68040, 68030, 68000 (Macintosh)
 286, 386, 486, 8086, 80286,80386, 80486 (its competitor)
 2,4,16,32,64,128,256,512,1024 (base-2, now RAM sizes)
 2400, 4800, 9600, 19200, 38400 (baud rates)
 8859 (from the ISO-8859 character set)
 The Responsible Citizens 1040, 1041 (Uncle Sam loves you)
 10036, 26161, 13131, 77058… (American zip codes)
 800, 888, 877 (toll-free phone number prefixes)
 52062, 52064, 52066 (German postal codes)
  The Salesmen98, 99 (why don't things ever just cost $1.00 even?)
 900 (sex sells)
  A LOOK AT OURSELVES
 Moving up a level from the individual points and patterns covered in 
          the data, what can we deduce about ourselves by examining the kinds 
          of interests we display? The explosion of occurrences of numbers in 
          the range from 1990-2002 points not only to the growth of the Web during 
          this period, but also to a kind of temporal narcissism. We are most 
          interested in the year in which we live, and are less interested in 
          events in the past regardless of their import.
 Furthermore, historical years before the 1990s do not have magnitudes 
          that reflect an atemporal vantage point, but appear instead to be talked 
          about less, the further they fall from the present. (Perhaps this trend 
          would not be visible if we were to do these searches on Web sites devoted 
          only to history.) These phenomena point to the possibility of measuring 
          the longevity of our memory, or the degree to which we care about how 
          historical events may have shaped our present lives. In the slope of 
          our curve between 1600 and 2002, we see an image of our cultural rate 
          of forgetting.  FUTURE DIRECTIONS
 The discussion above appears to suggest some degree of historical ignorance 
          on the Web. Can we observe any awareness of historical context—a context 
          which is actually present in our collective consciousness—on the Web? 
          In the periodic gathering of this data, a vernacular history suggests 
          itself. From the information we have gathered, we can construct a time 
          line of historic individuals searched for by year of birth, death, or 
          significant event, to create a history that consists solely of individuals 
          holding the public interest in any given year. This history could be 
          reconstructed every year with hopes of observing shifts in this expressed 
          historical context over time. A preliminary look at the names that accompany 
          number searches tells us that we may not only retrieve biographical 
          (Sartre, 1905-1980), and historical information (Columbus, 1492) but 
          also glimpses into how people are feeling about the individuals singled 
          out (Bill Gates, 666).
  Further comparisons over time are possible, and are likely to yield 
          more interesting results than a look at any single slice of time can 
          provide. Numeric searches on a defined subset of Web pages, such as 
          those devoted to medicine, history, physics, or literature, may generate 
          further insights. Differentiating country of origin may also prove interesting. 
          On the most basic level, arranging the data using different parameters 
          may shed light on patterns not visible in the current arrangements.  The numeric system has helped document the regularity and periodicity 
          inherent in our environments and ourselves for millenia. In allowing 
          us to examine our own patterns of use, we hope this data will be used 
          to shed some light on our cultural biases and numeric capacities. We 
          also hope to underscore the influence technology has had in changing 
          the set of numbers we can and cannot imagine.   REFERENCES
 Benford, F. 1938. The law of anomalous numbers. Proceedings 
          of the American Philosophical Society 78:551.
 Boyle, J. 1994. An application of the Fourier series to the most 
          significant digit problem. American Mathematical Monthly 101(November):879.
 
 Hill, T.P. 1998. The first digit phenomenon. American Scientist 
          86(July-August):358.
 
 Newcomb, S. 1881. Note on the frequency of the use of digits in natural 
          numbers. American Journal of Mathematics 4:39.
 
 Raimi, R.A. 1976. The first digit phenomenon. American Mathematical 
          Monthly 83:521.
 
  |