Benford’s law is used in auditing to identify data sets that may have been manipulated or adjusted. In actual data sets when reviewing values the 1st digit of the values tends to follow a predetermined frequency. For example, roughly 30% of the values should start with a 1.
There is a formula that determines the percentage distribution of the digits 1 to 9.
In the image below there is a data set in columns A and B. This data was created using the RANDBETWEEN function and as such it does not conform to Benford’s Law. Note the randomised numbers did not include any numbers that started with 9.
The formulas in columns D to G analyse the data set in columns A and B. The chart below compares Benford’s expected percentage (blue line) with the actual percentage (orange line) from the data set.
The formulas in the cells are shown in the table below.
The formulas are also listed below.
=SEQUENCE(9)
=ROUND(LOG(1+(1/D2#),10),4)
=BYROW(D2#,LAMBDA(x,SUM((LEFT(B2:B1368)=""&x)*1)))
=ROUND(F2#/COUNT(B2:B1368),4)
The SEQUENCE function in cell D2 provides the 9 digits from 1 to 9. This function spills down from cell D2.
Cell A2 has the formula to calculate Benford’s percentage. A rarely used function called LOG is required, it stands for logarithm. We can see from this distribution that roughly 60% of the numbers are expected to start with 1, 2 or 3.
The formula in cell F2 counts how many numbers in column B start with the relevant number from column D.
The spill range in cell D2 needs to be split into individual cells. We can use the BYROW function to do that.
The BYROW row function splits up a spill range by rows and passes each row individual row to a variable inside a LAMBDA function. In this case it passes it to the x variable inside the LAMBDA function.
We use the LEFT function to test the first character against the value from column D and this will return either TRUE or FALSE. Note the LEFT function returns the first character from a number. The LEFT function is a text function and hence it returns a text result. That’s why we need to combine the quotation marks with the number. This converts it into a text number to compare against the LEFT result.
The TRUE and FALSE results are then multiplied by one which converts the TRUE into a one and the FALSE into a 0. The SUM function then adds up the result, which returns how many of the values start with the number involved. The LAMBDA function returns the SUM result.
The formula in cell G2 calculates the percentage for each digit.
As we have seen there are several steps required to analyse the data set.
In the next post I will share the single formula in cell I1 that analyses the data set. We will also create a custom function to simplify using Benford’s Law on a data set.
Please note: I reserve the right to delete comments that are offensive or off-topic.