More on Serial Numbers

I looked around and found some papers related to Ruggles and Brodie’s 1947 Economic Intelligence paper.

I found:

A Soviet Estimate of German Tank Production

This looked to be a counterpart to Ruggles and Brodie, from a different viewpoint. It’s not. It’s only 4 pages, and based on obscure Nazi German documents that contain a translation of a Soviet intelligence publication.

The Soviets over-estimated Nazi Germany’s tank production just as the UK did. There’s a mention of Germany doing serial number analysis as the Economic Warfare Division did, and a claim that the Soviets did not. The Soviets encoded tank serial numbers only at the end of the war.

The author lists himself as an “independent scholar”. I wonder what that means?

Some Practical Techniques in Serial Number Analysis

Leo A. Goodman runs through some derivations of ways to estimate total production based on serial number sampling. The paper is beautifully typeset, but Goodman dances around writing actual useful formulas. For example, his calculation for estimated total production is buried in footnote 2, section 3.2.1. The only reason I noticed this is his example calculation refers back to “section 3.2.1”.

Goodman illustrates all his derivations by estimating the number of “pieces of equipment” that the University of Chicago Social Sciences Department purchased and numbered between 1928 and 1934.

It’s very much a product of it’s time. The typesetting is great, even the mathematical formulas are legibly set. There’s only 1 graphic, and I still can’t make it work out:

Figure 5.1 from Goodman (1954)

Figure 5.1, Goodman, 1954

The units on the vertical axis are weird. They increment by 4, except the top one, which increments by 2, and isn’t scaled appropriately. I think there’s a typo, the top horizontal line is 32, not 30.

Goodman gives all 31 serial numbers he discovered on that afternoon in 1954 when he was trying to come up with a good example, so it should be possible to re-create figure 5.1, and his guess at total “pieces of equipment”.

I re-created Figure 5.1 from Goodman (1954)

Re-creation of Figure 5.1

Serial numbers from the example

I can’t quite make his page 110 maximum absolute difference between the two cumulative distributions work out.

Goodman has this as the maximum absolute difference: (9.65 - 5)/29 = .16

The value of 5 is the Y-coordinate of the serial number 895: that’s the 6th serial number Goodman gives, when all serial numbers are ordered numerically. I can’t figure out whee the 9.65 value comes from. It’s supposed to be the distance from the cumulative distribution to the diagonal line.

Goodman defines the diagonal line:

The diagonal line in Figure 5.1 represents the uniform cumulative distribution between the smallest serial number 83 and the largest serial number 2787.

I can see two ways to create an equation for the diagonal line:

  • Line alternative A: (83,0) and (2787,30), y = 0.011054x - 0.917465
  • Line alternative B: (83,0) and (2787,29), y - 0.010685x - 0.886883

Line A is my naive reading. The value of 30 is the number of steps up from 0.

Line B is a confused reading. The value of 29 is the value of the cumulative distribution from the right side of figure 5.1. The diagonal line of figure 5.1 ends at close to (2800,29) by eyeball. Luckily, there’s a serial number of 2787 in Goodman’s list. I don’t understand the Y-coordinate of 29. If serial number 83 is 0, then serial number 2797 is 30. That’s zero-indexed, there are 31 serial numbers.

Alternative A gives me: y = 0.011054*895 - 0.917465 = 8.976

Alternative B gives me: y = 0.010685*895 - 0.886883 = 8.676

While both lines give the maximum difference between the continuous distribution and the cumulative distribution at serial number 895, neither of the max differences is 9.65.

I used alternative A in my re-creation of figure 5.1

LESSONS FROM THE GERMAN TANK PROBLEM

This paper contains a derivation of the formula that James Grime elucidates in the Clever way to count tanks video. It also includes a similar formula that gives an estimate even if you don’t know where the serial number sequence starts.

I did not understand these derivations: combinatorics is not my strong suit.

I ended up writing a little simulation, generating k random numbers in the range 1-100, then trying the two formulas on it.

The formula that doesn’t assume a lowest serial number of 1 gives worse estimates than the formula that does. This makes sense, as it has less information to go on.

Comparison

I looked at the “pieces of equipment” example that Leo Goodman uses to illustrate his paper.

  • count of serial numbers k = 31
  • minimum serial number 83
  • maximum serial number 2787

I tried 3 methods of calculating the actual number of “pieces of equipment”.

  • method 1 Counting tanks the clever way video, N = max + (max - k)/k
  • method 2 Lessons from the German tank problem, unknown lower bound, N = (max - min)(1 + 2/(k + 1)) - 1
  • method 3, Goodman’s “practical techniques” paper, N = (max - min)(k + 1)/(k - 1) + 1

I ran 5 trials. Each trial selected 31 randomly-chosen “serial numbers”, between 1 and 2885. I calculated three estimates of the number of “pieces of equipment” from these 31 “serial numbers”.

method 1 method 2 method 3
2929.6 2951.7 2963.3
2862.5 2542.6 2552.6
2891.4 2816.8 2827.8
2922.4 2981.4 2993.1
2949.2 2875.2 2886.5

Goodman gives 2885 as the actual number of “pieces of equipment”.