Analyzing Frequency Distribution for Leading Digits of Numbers in Datasets: How Can Benford’s Law Help Detect Anomalies?

Numbers in sequence or patterns are often seen in nature as visible regularities from our natural world. Mathematical laws such as the Fibonacci Sequence can be observed in various natural phenomena. Aside from it, Benford's Law is another law that most people have probably never heard of.

Analyzing Frequency Distribution for Leading Digits of Numbers in Datasets: How Can Benford’s Law Help Detect Anomalies?
Pexels/ Mati Mango

What is Benford's Law?

A random distribution of digits can be observed in a diverse set of data. Benford's Law explains that in a naturally occurring set of numbers, most numbers start with smaller digits than larger ones. If the first digits are distributed equally, they will have an 11.1% chance of occurring. However, it is not always observed in many datasets. In big datasets, more numbers will probably begin with one than 2, more begin with two than 3, and so on. It states that approximately 30% of numbers begin with a 1, while less than 5% begin with a 9.

Named after physicist Frank Benford, the law is also known as the law of anomalous numbers or the first-digit law. Simon Newcomb first discovered the distribution in 1881, while Benford popularized it in 1938. Hence, experts sometimes refer to it as the Newcomb-Benford Law.

Provided that several orders of magnitude are considered, Benford's Law is proven to be stable across a wide scope of natural phenomena. What makes this even more surprising is that Benford's Law is still the same even if the units used in measurements are changed.

According to the Institute of Physics, Benford's Law is an effective tool for evaluating data quality and identifying anomaly data in different fields. It contributes to the enhancement of strategic data in our modern era and supports various industries in making scientific decisions.

A study published by the American Geophysical Union reveals that the first-digit law is evident in 15 sets of modern observations gathered from various fields such as physics, chemistry, and astronomy. It covers geographical observations and the number of diseases the World Health Organization registered.

In another research reported by Springer Nature, Benford's Law was used in investigating natural hazard dataset homogeneity. Researchers discovered that the trend in first-digit distribution for the entire record follows the prediction, although changes such as satellite detection could have severe effects on the dataset.


Importance of Benford's Law

Although the law seems intriguing due to its counterintuitive distribution, it has proven practical applications. Mathematical analysts have observed that stock prices, death rates, and billing amounts often show first digits that follow the distribution described by Benford's Law.

Analysts consult Benford's Law in identifying frauds and manipulations in financial data and other related records. By comparing the distribution of the first digit in these datasets to Benford's Law, they declare fraud if the leading digits do not follow the distribution.

It should be noted, however, that Benford's Law does not usually apply to assigned numbers such as phone numbers and zip codes. It can be best observed on data over multiple orders of magnitudes ranging from very low to very high.

Check out more news and information on Benford's Law in Science Times.

Join the Discussion

Recommended Stories

Real Time Analytics