Standard Deviation — A Myopic’s Explanation for the Blind
What is it? What good is it? How can I use it?
by Ken Howell
copyright © Precision Shooting Magazine 2004
“This bronze plate on the deck,” a Navy officer in an old joke told a group of civilians who were visiting his war ship, “is where our brave Captain fell.”
“Well, I’m not surprised!” a lady tourist said. “I tripped on the damn thing myself.”
I’m one of the many shooters who have tripped on the term standard deviation more than once in the technical literature about guns, without really knowing what it’s there for — what it is, what it means, what good it is (if any) to shooters, how to use it if it’s truly useful.
One trouble is that while a huge number of technically oriented shooters must be comfortably familiar with it, they haven’t been telling the rest of us — not where I’ve seen it, at least — what we need to know about it to be able first to decide whether it’s useful to us and then, if we find that it is indeed useful to us, to know how we can apply it usefully to our shooting.
We see it hither and yon, not quite everywhere, in shooting literature. Clearly, those who know it well consider it worth our while to know it and use it, so if the clear-sighted among us aren’t going to lead us to the light, then I guess that a near-sighted old guy with one eye gone will have to try leading the blind to a clear shot at it. It’s hard to find an explanation that’s both easily understandable and obviously accurate, but I think I’m getting the hang of it.
The average of a stack of numbers is useful only up to a point — where the numbers in the stack are all pretty much like each other. In certain extreme situations, the average can be useless, even misleading. One extreme example that I ran across presented the case of one contributor who had donated $1,000,000 to a cause, while 1,000 other contributors had donated $1 each, for a total of $1,001,000 and an average donation of $1,000. That mathematically precise average tells us nothing about the spread of the donations and gives us an utterly false impression of the typical donation. A fund-raiser, for example, would love it. He could press for you to plop at least an average donation ($1,000) into the pot, but he’d give you no clue that the typical donation was only a buck.
Bullet velocities and target groups don’t vary across as wide a gulf as one-dollar and million-dollar donations, but the same principles apply equally to them, too. The average and the typical aren’t always the same or even nearly the same. Here’s where standard deviation can come to the rescue — with a stiff price for its help. It’s complicated to understand and tedious to use, so a lot of shooters who’d find it useful (if it were easier) just naturally pass it by and get along well enough without it.
Some published chronograph data list the standard deviation of each tested load presented, and the Oehler chronographs figure it out for us inside their little electronic whiz boxes. The numbers that we get from these “free” calculations are meaningful, and they can be useful.
How nearly like one another are the velocities of ten test rounds? How nearly alike are two or more sets of velocities? How much smaller is one ten-round target group than another group formed from ten rounds of a different load? Extreme spread doesn’t accurately describe the size of a group, and the average velocities of several ten-round test strings don’t accurately describe the relative consistency of several ten-round stacks of velocities. Standard deviation, cuss its complexity though we may, can tell us these useful facts about our loads and their performance.
The standard deviation of any related group of numbers gives us a mathematical measure of that group’s relative consistency. The smaller its standard deviation, the more nearly like each other are all the numbers in that group.
Understanding that much about standard deviation is the easy part. Understanding how it works is the hardest part. Understanding how to find it and use it is somewhere in between the easiest and the hardest. Here’s the way to find it — as I understand the procedure — for a group of ten chronographed velocities:
· Add the ten velocities and divide the sum by ten (the result is the average velocity).
· Subtract each below-average velocity from the average velocity, and subtract the average velocity from each above-average velocity (these results are the deviations from the average velocity. If one velocity is right smack-dab on the average velocity, its deviation is zero. This very rarely happens).
· Square the deviation for each velocity (multiply it by itself).
· Add the squares of the deviations and divide the total by ten (the number of rounds chronographed). The result is the variance of these deviations.
· Find the square root of the variance. One poke on the right button on a good electronic calculator does it for you. This result, the root main square deviation, is what we more commonly and handily call the standard deviation.
So what good is it?
For one thing, it shows you how consistent your loads are, from one round to the next in one load and from one load to the next. When I worked for the Army, one of the statisticians at the proving ground gave me Table 1 and Table 2 to make standard deviations more useful to me. I have to confess that I don’t fully understand these tables, but here’s what I do understand about how to use them. Table 1 shows that for any standard deviation in the first column, you can be 90% confident that the actual average of a much larger set of numbers is at least this close to the average of your smaller tested sample.
The real value of a shooting test is not how that group of rounds did perform when you fired them. Just as you can’t strike the same match twice, you can’t fire the same round twice. The real value of a shooting test, for velocities or accuracy, lies in what the performance of those test rounds tells you about how you can reasonably expect that all the rest of those same loads (even those that you haven’t loaded yet) will perform for you in as much as a lifetime of shooting the rest of them.
Table 1 also shows how the value of a shooting test — its reliability as an indicator of future performance — increases as the number of rounds that you fire in a test increases. The more rounds that you fire in a test, the more precisely does its standard deviation indicate how consistent or uniform the performance of that load will be. It’s easy to see from this table why the data from a five-round test are more reliable than the data from a three-round test — and a ten-round test is better than a five-round test, but a twenty-round test doesn’t offer enough more useful precision to be worth the trouble for most of us to consider shooting twenty rounds per test to be practical for our purposes.
The standard deviations from the shooting tests of two or more loads also give us a reasonably precise way to compare the consistencies of those loads relative to each other — more precisely than we can compute them by simply comparing their average velocities. Table 2 shows how the standard deviation of one load must compare with the standard deviation of a different load, to give you a 90% confidence that one load is more consistent than the other.
Let’s compare a few sets of chronographed velocities to see how the standard deviation of each ten-round test string gives us an index of how consistent each string’s velocities are. The first string below is purely imaginary — the perfectly consistent ideal string with all ten velocities exactly the same:
- Round 1: 2,500 feet per second (ft/sec)
- Round 2: 2,500 ft/sec
- Round 3: 2,500 ft/sec
- Round 4: 2,500 ft/sec
- Round 5: 2,500 ft/sec
- Round 6: 2,500 ft/sec
- Round 7: 2,500 ft/sec
- Round 8: 2,500 ft/sec
- Round 9: 2,500 ft/sec
- Round 10: 2,500 ft/sec
You don’t really have to add all these up to see that these ten rounds average 2,500 feet per second. Also, since each velocity varies exactly zero feet per second from the average velocity, the standard deviation for all ten rounds is likewise exactly zero feet per second. This is the unattainable perfect standard deviation. The closer the standard deviation from any real load is, the more nearly perfect is its consistency.
It’s almost as obvious to the naked eye, unaided by the analytical math, that the “consistency” of the next string is horrible, even though its average velocity is exactly the same as the perfect string that I’ve listed above (2,500 feet per second).
- 2,000 ft/sec (500 ft/sec below the average)
- 2,100 ft/sec (400 ft/sec below the average)
- 2,200 ft/sec (300 ft/sec below the average)
- 2,300 ft/sec (200 ft/sec below the average)
- 2,400 ft/sec (100 ft/sec below the average)
- 2,600 ft/sec (100 ft/sec above the average)
- 2,700 ft/sec (200 ft/sec above the average)
- 2,800 ft/sec (300 ft/sec above the average)
- 2,900 ft/sec (400 ft/sec above the average)
- 3,000 ft/sec (500 ft/sec above the average)
Notice that in this string, no single round produced the average velocity — which is often true of chronographed strings. The wide range of this string’s velocities totals the same 250,000 feet per second, which when divided by ten yields the same average, 2,500 feet per second. But the standard deviation for this horrible example gives us a high number for its poor consistency. Its standard deviation, if I’ve figured it right, is 332 feet per second — a long way from zero! Here’s how to derive it:
· Square each of the above deviations (100 squared — 100² — for example, equals 100,000). Never mind the units. They’re not important at this stage of the calculation.
· Add the squares of all ten deviations (I get 1,100,000 for this example).
· Divide by the number of rounds (ten) to get the average squared deviation (110,000).
· Calculate the square root of the average squared deviation (331.66247, rounded off to 332 feet per second).
The perfect string never happens, and when we have strings that are nearly as inconsistent as the horrible example above, we don’t really need the analytical math to show us that they’re horribly inconsistent. The relative consistencies of typical strings and groups are impossible to compare by eye-ball examination of the chronograph numbers. This is where the standard deviations of these strings are worth knowing. So let’s look at a couple of typical strings, both with the same average velocity of 2,500 feet per second:
- 2,550 ft/sec (50 ft/sec above the average)
- 2,540 ft/sec (40 ft/sec above the average)
- 2,530 ft/sec (30 ft/sec above the average)
- 2,520 ft/sec (20 ft/sec above the average)
- 2,510 ft/sec (10 ft/sec above the average)
- 2,490 ft/sec (10 ft/sec below the average)
- 2,480 ft/sec (20 ft/sec below the average)
- 2,470 ft/sec (30 ft/sec below the average)
- 2,460 ft/sec (40 ft/sec below the average)
- 2,450 ft/sec (50 ft/sec below the average)
The squares of these deviations total 11,000 feet per second. The average squared deviation is thus 1,000 feet per second, and its square root — the standard deviation of this string — is 33 feet per second (33.166 rounded-off to 33 feet per second).
Now let’s look at another string:
- 2,502 ft/sec (2 ft/sec above the average)
- 2,504 ft/sec (4 ft/sec above the average)
- 2,506 ft/sec (6 ft/sec above the average)
- 2,508 ft/sec (8 ft/sec above the average)
- 2,510 ft/sec (10 ft/sec above the average)
- 2,492 ft/sec (8 ft/sec below the average)
- 2,498 ft/sec (2 ft/sec below the average)
- 2,496 ft/sec (4 ft/sec below the average)
- 2,490 ft/sec (10 ft/sec below the average)
- 2,494 ft/sec (6 ft/sec below the average)
The squares of these deviations total only 440 feet per second. The average squared deviation is thus only 44 feet per second, and its square root — the standard deviation of this sample string — is a very nice, tight 6.6 feet per second (6.63 rounded-off to 6.6 feet per second) — much better than the 33 feet per second of the first typical string, above.
The average velocity of each string hasn’t given us much useful information about these last three examples. It’s the same 2,500 feet per second for each string, as for the impossibly perfect first string and the unacceptably horrible string. The standard deviations usefully show how these strings really compare. They give us solid, precise, reliable numerical indications of the relative consistencies of all the loads.
To keep this presentation simple and to keep the examples easily obvious, I haven’t included the more likely comparison problem — test strings with different average velocities and comparable but different standard deviations. It is not unusual for one load to have a higher or lower average velocity than another load, while the difference between their standard deviations is much more dramatic. If I find myself facing test strings with very different standard deviations, I prefer the load that has the lower standard deviation, no matter how the velocities of the two loads compare. Test strings with lower standard deviations (better consistencies) usually shoot tighter groups.
The standard deviations of test-group targets, a special kind of standard deviation (radial standard deviations) are also better measures of the relative tightness of each group but just about unusable for comparing groups fired at only a hundred yards and sometimes even those fired at two hundred yards.
Why? Two reasons.
Good groups fired at these closer ranges are too small, with overlapping bullet holes, to let you measure the radius from the center of the group to the center of each bullet hole.
Groups that are big enough, with each bullet hole separate and clearly defined, to let you measure these radial distances usually aren’t worth bothering with.
Besides, groups that you test-fire at only a hundred yards don’t always indicate accurately how well you can expect those loads to group at long ranges.
The bullets fired from many loads aren’t yet stable at a hundred yards. Some are barely stable at two hundred yards. The garlic and ketchup that spoil the ice cream are that they’re still yawing at the shorter ranges, and they may or may not be accurate when they finally “go to sleep” and become as stable, at the longer ranges, as they’re ever going to be.
Just eye-balling your hundred-yard groups is usually enough of an examination to tell you about all that a hundred-yard group can tell you. Groups that you fire at three hundred yards and farther, however, are both (a) harder to judge by eye alone unless they’re vastly different sizes and (b) easier to compare by their radial standard deviations, because they’re usually big enough to let you measure the position of each bullet hole in the group in relation to the center of the group.
Measuring your test groups this way is still a galloping royal pain. You may decide not to bother with the procedure. Here’s how to use this method to measure and to analyze your long-range test groups, if you decide it’s worth the trouble to use it:
· Draw a vertical straight line to the left of the group, as I’ve done with example Group 1.
· Draw a horizontal straight line below the group, as I’ve done with Group 1.
· Measure and write-down the horizontal distance from the vertical line to the center of each bullet hole in the group.
· Add these horizontal distances. Divide the total by the number of bullet holes in the group. The result is the horizontal (x) distance from the vertical line to the center of the group.
· Measure and write-down the vertical distance from the horizontal line to the center of each bullet hole in the group.
· Add these vertical distances. Divide the total by the number of bullet holes in the group. The result is the vertical (y) distance from the horizontal line to the center of the group.
· Draw a vertical line x distance to the right of the left-hand vertical line.
· Draw a horizontal line y distance above the lower horizontal line.
Where these lines cross is the true center of the group, more accurately established than is possible with the extreme spread or any other quickie measure of the group.
Now we’re (whew! finally!) really getting this show on the road.
Next, we have to establish the radial distance from the center of the group to the center of each bullet hole.
· Measure and write-down the distance from the center of the group to the center of each bullet hole.
· Add these radial distances and divide the total by the number of bullet holes in the group. The result is the average radius.
· Subtract each below-average radius from the average radius, and subtract the average radius from each above-average radius. The results are the deviations from the average radius. If one radius is right smack-dab on the average, its deviation is zero. You’re not likely to see one of these.
· Square the deviation for each radius (multiply it by itself).
· Add the squares of the deviations and divide the total by the number of bullet holes in the group. The result is the variance of these deviations.
· Find the square root of this variance — just as in the above examples, one poke on the right button on a good electronic calculator does it for you. The result, technically known as the root main square deviation of these radial distances, is what we more conveniently call the radial standard deviation.
Now let’s try all this with the groups on a couple of long-range ten-round targets. Notice first of all that both Group 1 and Group 2 have pretty nearly the same extreme spread, but one group is obviously tighter than the other. Extreme spread, even with two ten-round groups, is not a reliable or even always useful way to compare groups.
I’ve taken most of my construction lines out of Group 1, except for the vertical and horizontal reference lines that I started with, the circle that connects the most widely separated holes, and the cross-hair that marks the center of the group (CG).
Some authorities define the extreme spread (ES) as the diameter of the smallest circle that connects the centers of the farthest-apart bullet holes in the group. Others define the ES as the smallest circle that encloses all the bullet holes in the group. Note that both definitions describe the extreme spreads of Groups 1 and 3 — but not the ES of Group 2. This is one of the reasons that extreme spread, which is the easiest group dimension to measure once you’ve decided which definition of extreme spread you like better, is not always the best criterion dimension for comparing the relative consistencies of your test groups. Once you’ve established the true center of the group, using the procedure that I’ve just described, the average radius (AR) of the group is a better criterion dimension for comparing the sizes of your test groups (their sizes but not their relative consistencies).
Is their sizes the only thing that you want to know about your test groups? For some comparisons, it’s also important to know how widely the spread radii of all the bullet holes in the group deviate from the average radius of the spread. Or to say the same thing another way, how close to the average radius each actual radius is. For this, you have to know the radial standard deviation of the group.
I’ve taken the reference lines and all the intermediate construction lines out of Groups 2 and 3, leaving only the extreme-spread circle and the center-of-group cross-hair in each of these two groups.
If you study these three example groups and their pertinent figures carefully, you’ll come away with a pretty good gut-level grasp of how tricky it is to try to compare important test groups by sight alone, or even with a ruler and whatever flavor of extreme spread bongs your gong. You’ll also see that neither the average radius nor the radial standard deviation is a single universal cure-all for comparing one load’s or one rifle’s performance with another’s.
These aren’t the only useful applications of standard deviation.
Early on, I bought a modest supply of head-stamped, factory-fresh .220 Howell brass from the first batch that American Hunting Rifles (AHR) had manufactured for resale. Later, I bought a few hundred more, from the latest run. I’ve also bought a couple of hundred AHR cases for the .340 Howell, and have just learned that I can form .340 Howell cases from 9.3x66mm Sako cases. I’ll of course want to know whether one lot or brand of these cases is more nearly uniform than the other. I’ll want to have a good supply of loaded rounds of the .220 Howell on hand if I have an opportunity for some heavy long-range shooting at a populous prairie-dog town. I won’t need as large a supply of the .340 Howell for hunting big game, but I’ll want those loads to be as consistent as I can make ‘em, too.
I’ll want all those rounds, in both cartridges, to be as accurate as I can make ‘em. This means that I’ll want to load my most consistent bullets and most consistent powder charges in the most nearly uniform lots of these cases. How can I tell whether one lot of bullets or cases is more nearly uniform than the other? Do you suppose now that maybe — just maybe — the standard deviations of these cases’ weights would tell me what I want to know about how they compare? How should I go about making this comparison? How would you do it, now that you know how useful standard deviation is, and how to calculate it?
· Select a number of cases at random from each lot. Weigh ‘em. The tables show that a hundred test samples generate better data by far than, say, five or ten sample. I think that I’d favor going with a hundred cases from each lot, even though that great number would make the calculations awfully tedious.
· For each lot of cases, add all the weights of the individual cases in the test sample and divide the total by the number of cases in that sample, to get the average weight.
· Subtract each below-average weight from the average weight, and subtract the average weight from each above-average weight, to get the deviations from the average weight.
· Square each of these deviations (multiply it by itself).
· Add all these squared deviations and divide the total by the number of cases, to get the average variance of that lot.
· Derive the square root of this variance, to get the standard deviation of the lot.
· Repeat all these steps with the other lot of cases.
· Compare the standard deviations of both lots and check the difference against Table 2, to see whether one lot is significantly more uniform than the other.
So now you know — I hope — what standard deviation is. If you decide to pass it by, as too much trouble to bother with, at least you know what it is that you’re not using and what it could tell you about the velocities and long-range precision of your loads.
Table 1. For the observed standard deviation, you can be 90% confident that the actual average will fall at least this close to the average of the observed sample. |
Standard Deviation |
Number of Rounds Tested |
3 |
5 |
10 |
20 |
50 |
100 |
2 |
3.4 |
1.9 |
1.2 |
0.8 |
0.5 |
0.3 |
3 |
5.1 |
2.9 |
1.7 |
1.2 |
0.7 |
0.5 |
4 |
6.7 |
3.8 |
2.3 |
1.5 |
0.9 |
0.7 |
5 |
8.4 |
4.8 |
2.9 |
1.9 |
1.2 |
0.8 |
6 |
10 |
5.7 |
3.5 |
2.3 |
1.4 |
1.0 |
7 |
12 |
6.7 |
4.1 |
2.7 |
1.7 |
1.2 |
8 |
13 |
7.6 |
4.6 |
3.1 |
1.9 |
1.3 |
9 |
15 |
8.6 |
5.2 |
3.5 |
2.1 |
1.5 |
10 |
17 |
9.5 |
5.7 |
3.9 |
2.4 |
1.7 |
12 |
20 |
11 |
7.0 |
4.6 |
2.8 |
2.0 |
14 |
24 |
13 |
8.1 |
5.4 |
3.3 |
2.3 |
16 |
27 |
15 |
9.3 |
6.2 |
3.8 |
2.7 |
18 |
30 |
17 |
10 |
7.0 |
4.3 |
3.0 |
20 |
34 |
19 |
12 |
7.7 |
4.7 |
3.3 |
25 |
42 |
24 |
14 |
9.7 |
5.9 |
4.2 |
30 |
51 |
29 |
17 |
12 |
7.1 |
5.0 |
35 |
59 |
33 |
20 |
14 |
8.3 |
5.8 |
40 |
67 |
38 |
23 |
15 |
9.5 |
6.6 |
45 |
76 |
43 |
26 |
17 |
11 |
7.5 |
50 |
84 |
48 |
29 |
19 |
12 |
8.3 |
60 |
101 |
57 |
35 |
23 |
14 |
10 |
70 |
118 |
67 |
41 |
27 |
17 |
12 |
80 |
135 |
76 |
46 |
31 |
19 |
13 |
90 |
152 |
86 |
52 |
35 |
21 |
15 |
100 |
169 |
95 |
58 |
39 |
24 |
17 |
Table 2. The standard deviation of Load B must be greater than the value below for 90% confidence that Load A is more uniform than Load B. |
Standard Deviation |
Number of Rounds Tested |
3 |
5 |
10 |
20 |
50 |
100 |
2 |
8.7 |
5.1 |
3.6 |
2.9 |
2.5 |
2.4 |
3 |
13 |
7.6 |
5.3 |
4.4 |
3.8 |
3.6 |
4 |
17 |
10 |
7.1 |
5.9 |
5.1 |
4.8 |
5 |
22 |
13 |
8.9 |
7.4 |
6.4 |
6.0 |
6 |
26 |
15 |
11 |
8.8 |
7.6 |
7.2 |
7 |
31 |
18 |
12 |
10 |
8.9 |
8.4 |
8 |
35 |
20 |
14 |
12 |
10 |
9.6 |
9 |
39 |
23 |
16 |
13 |
11 |
11 |
10 |
44 |
25 |
18 |
15 |
13 |
12 |
12 |
52 |
30 |
21 |
18 |
15 |
14 |
14 |
61 |
35 |
25 |
21 |
18 |
17 |
16 |
70 |
40 |
28 |
24 |
20 |
19 |
18 |
78 |
46 |
32 |
26 |
23 |
22 |
20 |
87 |
51 |
36 |
29 |
25 |
24 |
25 |
109 |
63 |
44 |
37 |
32 |
30 |
30 |
131 |
76 |
53 |
44 |
38 |
36 |
35 |
153 |
89 |
62 |
51 |
44 |
42 |
40 |
174 |
101 |
71 |
59 |
51 |
48 |
45 |
196 |
114 |
80 |
66 |
57 |
54 |
50 |
218 |
126 |
89 |
74 |
64 |
60 |
60 |
262 |
152 |
107 |
88 |
76 |
72 |
70 |
305 |
177 |
125 |
103 |
89 |
84 |
80 |
349 |
202 |
142 |
118 |
102 |
96 |
90 |
392 |
228 |
160 |
132 |
114 |
108 |
100 |
436 |
253 |
178 |
147 |
127 |
120 |
Copyright © 2004 by Kenneth E Howell, ThD.
Use granted: First North American Periodical Rights only. All other rights reserved.
|