Georgia Tech Research Horizons

Sorry, Wrong Number

Century-old math rule ferrets out modern-day digital deception.

By T.J. Becker

Tempted to fudge some numbers on your tax return? Better not. Benford's Law might catch you.
photo by Gary Meek

To illustrate the probability theory behind Benford's Law, mathematics professor Dr. Ted Hill, center, and students toss the dice. Hill constructed a rigorous mathematical proof of Benford's Law, which maintains that certain digits will show up more often than others in certain sets of data. Benford's Law has numerous real-world applications, including income tax fraud detection. (300-dpi JPEG version - 326k)

Once considered a mathematical oddity, this century-old rule is causing a stir in fraud-busting circles, thanks to Dr. Ted Hill, a Georgia Tech mathematics professor whose 1996 proof paved the way for numerous real-world applications. From financial documents to clinical trials, Benford's Law is becoming a valuable tool to smoke out cheaters.

What's known in mathematical lingo as a "probability distribution," Benford's Law maintains that certain digits will show up more often than others in certain sets of data. For example, "1" should appear as the first non-zero digit about 30 percent of the time, "2" as the leading digit about 18 percent of the time and "9" only 4.6 percent.

"It's not so surprising to have digits show up unequally, but it is striking to have a law that predicts their exact distribution," Hill says. Frequencies of following digits can also be predicted ("0" is the most likely second digit, showing up about 12 percent of the time), however, digits become increasingly more uniform deeper into the number.

The phenomenon was first noticed in 1881 when astronomer and mathematician Simon Newcomb observed that logarithm books displayed more wear and tear in the opening pages, indicating that people were looking up more numbers that began with 1 than 2, and more that began with 2 than 3. Newcomb proposed a formula to express the phenomenon: Probability (first significant digit = d) = log (1 + 1/d).

Newcomb published an article about his discovery, but it attracted little attention. Then in 1938, General Electric physicist Frank Benford independently made the same observation. Yet Benford went further. Drawing upon 20,229 observations, he found the significant digit rule holds in mixtures of many different sets of data: drainage areas of rivers, atomic weights of elements, American baseball statistics – even numbers pulled from newspaper pages.

Though Benford's paper sparked considerable interest – causing him to become the law's namesake – there was no real proof. "It was treated like a gimmick," Hill says. "No one could explain why it happened or predict when it would happen, so people didn't take it seriously."

Hill, who specializes in mathematical probability, became interested in Benford's Law in the early 1990s, when preparing for a speech on surprises in probability. What began as a recreational experiment quickly turned into a full-fledged academic pursuit.

Many mathematicians had tackled Benford's Law over the years, but a solid probability proof remained elusive. In 1961, Rutgers University Professor Roger Pinkham observed that the law is scale-invariant – it doesn't matter if stock market prices are changed from dollars to pesos, the distribution pattern of significant digits remains the same.

In 1994, Hill discovered Benford's Law is also independent of base – the law holds true for base 2 or base 7. Yet scale- and base-invariance still didn't explain why the rule manifested itself in real life. Hill went back to the drawing board. After poring through Benford's research again, it clicked: The mixture of data was the key. Random samples from randomly selected different distributions will always converge to Benford's Law. For example, stock prices may seem to be a single distribution, but their value actually stems from many measurements – CEO salaries, the cost of raw materials and labor, even advertising campaigns – so they follow Benford's Law in the long run.

Hill constructed a rigorous mathematical proof in 1996 that finally gave Benford's Law the credibility it needed.

As a result, the nearly forgotten rule is winning new exposure. Benford's Law is being put to work in a number of areas, such as mathematical modeling and computer design.

Perhaps its most intriguing application is fraud detection, an idea introduced by Southern Methodist University accounting Professor Mark Nigrini. The U.S. Institute of Internal Auditors is conducting classes on how to apply Benford's Law. Hill has done preliminary consulting for both the IRS and the International Institute for Drug Development in Brussels, which is interested in using the law to reveal fabricated data in clinical trials.

In the past, certain "red flags" were used to detect fraud, but few statistical tests like Benford's Law existed to ferret out fakers. "When people fabricate data – either for fraudulent purposes or just to fill in the blanks – their conception of random numbers doesn't match reality," Hill explains.

Case in point: Hill asks his students to either (1) flip a coin 200 times and record the pattern of "heads" or "tails", or (2) simply make up results. The next day he stuns the class by separating the faked data from true trials at a glance. No psychic feat, just a little probability theory at work. A sequence of 200 random coin tosses has a high probability of containing a run of six heads or tails. When people try to fake results, they rarely include such long runs.

Hill travels extensively, talking about Benford's Law and probability theory to groups that range from university professors to senior citizens to Boy Scouts. This outreach program is part of a $500,000 grant from the National Science Foundation (NSF). Through a separate NSF grant, Hill is also continuing research on a number of probability topics including Benford's Law. "One of the beauties of NSF is that it takes chances on individuals and ideas that are not always part of the mainstream," Hill says.

Although a solid scientific explanation now exists for Benford's Law, there are still some loose ends to tie up, such as:

   1) Determining how much information (how many different types of distributions and samplings from each) is needed for the law to kick in.

   2) Picking a probability measure at random: Finding a good model will give indications of the speed of convergence to Benford's Law and the deviations one might expect.

   3) Trying to understand why certain dynamical systems follow Benford's Law. A number of physicists have contacted Hill, finding their data often follows the rule; an oceanographer studying plankton found two families conformed to the law, while one did not.

Despite the flurry of attention resulting from his proof, Hill deems it "a relatively small idea – it's not at all my deepest theorem or one I'm most proud of." Still, as a theoretical mathematician, it's rare to have a real-life application, he points out. "It's thrilling to make a small contribution to society."

A West Point graduate and former U.S. Army Ranger captain, Hill holds a master's degree in operation research from Stanford and a Ph.D. in mathematics from the University of California at Berkeley. A dedicated adventurer, Hill says: "If I'm working on a math problem, it has to be as interesting as diving in the Yucatan."

And in probability theory, there's never a dull moment. "Math is extremely exciting right now," says Hill, noting that computer technology allows exchange of ideas and research to accelerate exponentially. "Science is just bursting.... We are entering an unprecedented age of discovery."

For more information, contact Dr. Ted Hill, School of Mathematics, Georgia Tech, Atlanta, GA, 30332-0160. (Telephone: 404-894-4408) (Email: theodore.hill@math.gatech.edu)

T.J. Becker is a Chicago-based freelance writer.


Contents    Research Horizons    GT Research News    GTRI    Georgia Tech

Send questions and comments regarding these pages to Webmaster@gtri.gatech.edu
Last updated: Sept. 10, 2000