## Illustrating error analysis

It turns out that this is my 300th post, which is rather disturbing. I really should find something better to do (enough cheering in the back there). Yesterday’s post was about Ocean heat content uncertainties and it reminded that there was something I wanted to look at and maybe write about. I had some time last night (yes, my life can be boring) and so managed to do what I’d been considering.

For the last decade or so, one way of determining the ocean heat content has been to use ARGO buoys/floats. There are 3000 of these around the world’s oceans, and they drop down to 2000m and (amongst other things) measure the temperature. The temperature measurements can then be converted into an energy and these can then be combined to determine how the energy in the ocean – in different layers – is changing with time. In the last decade, it’s increased by about 1023J which means that the average temperature in the ocean has increased by about 0.05oC. The uncertainty in each temperature measurement is also about 0.05oC, which leads some to argue that ARGO buoys should not be able to measure such small changes [Correction : Mike McClory, in the comments, points out that the ARGO measurements are likely accurate to +-0.002oC, not 0.05oC, but that doesn’t really change what I’m trying to illustrate here].

Of course, this may be true when it comes to individual measurements, but when you combine lots of measurements you can measure such a small change. That’s what I thought I would try to illustrate here. Just to be clear, what I’m doing here is very simple and I’m not suggesting that it somehow represents ARGO data analysis. All I want to try and illustrate (hopefully) is that even if each measurement could not detect such a small change, summing over many measurements allows one to determine the effect of such a small change.

Let’s consider the following. Imagine we have some medium (the ocean, for example) and we have 3000 measuring devices evenly distributed throughout this medium. Let’s imagine these devices can measure the change in some quantity (energy for example) and that they can do so to an accuracy (1σ) of 1 (units don’t matter for this illustration). Let’s also consider that, initially, there is no change in this quantity (i.e., it is zero). I wrote a little computer code that would randomly generate 3000 numbers with a mean of 0 and a standard deviation (which is the 1σ error) of 1. This is shown in the figure below.

The figure above is my 3000 initial measurements. To determine the total change in this quantity (over the whole medium) I need to add all these measurements together. Normally one would then need to do some error analysis, but because I’m using a simple computer code I can simply repeat the entire experiment as many times as I like. So, I’ve done this 10000 times – I recalculate the 3000 measurements and sum them 10000 times. The distribution of the results are below. As expected, the mean is zero (i.e., the quantity is unchanged) but there is a range of values. The standard deviation of a distribution is the the distance from the mean such that 67% of the values lie within this region. By eye, it appears to be about 50. Given that my measurement errors are uncorrelated, I would expect the error in the sum to be the square root of the sum of the squares of the individual errors. These are 1, so the sum of the squares is 3000, and the square root of that is 54. So, looks pretty good and quite nicely illustrates how basic error analysis works.

At this stage I still haven’t illustrated how one can use many measurements to determine what is essentially a small change in some quantity. Let’s change the above slightly. Let’s consider a situation in which the quantity I’m measuring has increased (in each measurement cell) by an amount 0.1 (i.e., much smaller than the measurement error of 1). I can then produce 3000 measurements with a standard deviation of 1, but with a mean of 0.1. That’s shown in the figure below. It looks very similar to the first figure I showed and you’d be hard pressed to argue (by eye alone at least) that the mean is 0.1 and not 0.

Now, I redo the second part of my illustration by doing 10000 realisations of the sum of the measurements (i.e., I produce 3000 random numbers with a mean of 0.1 and a standard deviation of 1 and sum them 10000 times). The result is shown below. Now what I have – as expected – is a distribution with a mean of 300 (0.1 x 3000) and a standard deviation (1σ error) of about 50. So, even though an individual measurement would not have indicated that this quantity had increased, by summing many measurements I’ve been able to show, quite clearly, that – over the entire volume – this quantity has increased by 300 +- 50.

So, unless I’m made some kind of silly mistake, I think this illustrates how you can use many measurements to determine the change in something despite the change in each measurement potentially being smaller than the measurement error. To be clear, I’m not suggesting that this correctly represents ARGO data analysis or is even a particularly good illustration of ARGO data analysis. I’m simply trying to illustrate that arguing that you can’t measure an average increase in temperature of 0.05oC across the entire ocean because the uncertainty in each measurement is 0.05oC is wrong.

Rachel’s comment made me consider what you’d get if you took an average of the measurements, rather than the sum. So, for the case where the quantity has increased by 0.1 in each measurement volume, I repeated the calculation, but this time averaging the measurements, rather than summing, and repeating the “experiment” 10000 times. The result is below. It’s clear that the quantity has increased by 0.1 with a 1σ error of about 0.02 (which is, I think, the square root of the sum of the squares of the original measurement errors divided by the number of measurements, – i.e., because we’re averaging, each measurement can be regarded as being divided by the number of measurements and each measurement error is now 1/3000, so the sum of the squares is 0.000333 and the square root of that is 0.018). So, whether you sum or average, you can still detect a change in a quantity that, for each measurement, is smaller than the uncertainty in each measurement.

This entry was posted in Climate change, ENSO, Global warming, IPCC, Science and tagged , , , , , , , . Bookmark the permalink.

### 34 Responses to Illustrating error analysis

1. Rachel says:

I had to have this post explained to me so I thought it might be good to say a couple of things for the benefit of other people like me who do not have a mathematical background.

Your random numbers have been generated using a bell curve distribution (aka normal or Gaussian distribution) and that this is a universally accepted way of modelling this type of phenomenon. Many natural measurements produce a bell curve distribution like, for instance, penis size (a bit more interesting than ocean heat content).

I am not entirely sure why you have chosen to illustrate the sum of your 3000 measurements rather than the mean, but maybe it is to demonstrate the point which I think you’re making, which is that a very small change in the mean of a single measurement – 0.1 – can produce a very large change in the sum of many. And that is something which I found quite amazing.

2. Rachel,
Thanks, that that does clarify some things. I did the sum because I think that is what the OHC is. The OHC is the total change in energy in each layer which is presumably computed by summing the change in energy in each measurement volume. I haven’t tried, but it could well be that if I did the average, I would get something similar. I’ll try that.

3. BBD says:

Actually very illuminating for another semi-innumerate here (ie me) and an excellent contrarian (!) myth-buster. Although with all due respect to the Atomic Squirrel, I’ll stick to OHC. I hope ATTP will do likewise.

4. BBD,
If you mean OHC rather than average temperature – I agree. Overall, it’s really all about energy, not temperature.

Rachel,
I’ve done the example with averaging, rather than summing, and that does nicely illustrate how you can clearly determine a small change in the average even if the change in each measurement is smaller than the measurement error. When you said “had to have this post explained to me” I hope you’re not implying that an actual mathematician looked at it. That could be embarrassing 🙂

5. Rachel says:

AndThen,

I hope you’re not implying that an actual mathematician looked at it.

Yes – although not a statistician – and thanks to you I was forced to endure a short lesson in statistics. Your post got the tick of approval.

I think BBD’s preference for OHC is instead of my much more interesting example. If I had time, I’d look for a different example at the Journal of Scientific Exploration but I’ve got a train to catch.

6. BBD says:

What Rachel said re preference 😉 Although possibly ATTP is being deadpan?

7. BBD,
No, I’m just being dense and naive 🙂

Rachel,
I’m pleased it got the tick of approval. One of the reasons I prefer to do this by writing little computer codes is because I don’t really understand statistics either 🙂

8. Tom Curtis says:

My favourite illustration of how multiple observations can increase resolution in the data is from image stacking in amateur astronomy. Basically, the amateur astronomer takes multiple pictures of some faint object. They then use software to superimpose multiple images, the software ensuring good alignment. The results can be quite spectacular, and show detail there is no hint of in the original pictures. Here is an example, with a single image shown above, and the stacked image shown below. What is particularly interesting in this case is that without the random noise introduced by the atmosphere, this technique would be ineffective. With that noise, it can produce accurate images with resolution higher than the original camera images. More details of the process can be found here.

9. Tom,
Yes, that is very good. I think it’s also related to a method called Lucky imaging that even professionals are starting to use. I think the idea (which I don’t understand completely) is that you make your exposure time as short as possible so as to reduce the noise and throw away any images that have too much noise. You’re then left with a bunch of images with little noise that you can stack. I think the main problem is the efficiency (i.e., because you’re throwing some fraction of the images away, you’re observing for longer).

I must say, that I’m pleased that you’ve commented without, yet, finding some basic error in my post. Makes me feel confident that I haven’t completely messed up this illustration 🙂

10. Mike McClory says:

I’ve had many arguments with folks about surface temperature measurements and ‘how do you get a figure for changes that is less than the resolution of the devices?’ – by taking lots and lots of measurements!

I think your figure for the Argo float temperature resolution is a bit out though, they are rather well designed and get a far better accuracy, at least when they leave the factory. In the order +/- 0.002 deg C (http://www.argo.ucsd.edu/Data_FAQ.html#accurate). During the 4 year lifetime of the floats they are cross checked against each other or when they are in highly stable areas (http://www.argo.ucsd.edu/Argo_data_and.html).

11. Mike,
Thanks. I did mean to say that I hadn’t had a chance to check that number properly (I did find a similar value to what I used on another blog, but should probably have not trusted that). I stand corrected 🙂

12. I wonder if lucky imaging would change the results reported by the researchers Rachel cited.

13. AnOilMan says:

You’re just talking about over sampling… We do that all the time in signal processing. How do you get 16 bit accuracy with a 1 bit DAC? (Answer: Sample really fast.)
http://electronics.howstuffworks.com/question620.htm

Page 22… shows the typical noise histogram for a device I’m actually using.

Click to access CS5566_PP2.pdf

If you look closely the data does indeed look bad.

For the application in hand we run that ADC at 2.5ksps, average 32 samples (crappy low pass) and decimate by 16. The signal being searched for is typically 1 second long, and we really just needed to reduce overall processing time (CPU loading) and spurious noise caused by the ADC (to use a crappy signal detection algorithm common to the industry). Overall we are achieving 20bits of accuracy. Doing better than that gets into some pretty funky engineering.

14. Barry Woods says:

Could you do the calculation of 1 ARGO buoy per how many cubic metres (or cubic kilometres, olympic sized swimming pools, 😉 ) of water being measured?

which leads to the next question, how do we know we have enough ARGO buoys to get a meaningful measurement.

15. Barry,

Could you do the calculation of 1 ARGO buoy per how many cubic metres (or cubic kilometres, olympic sized swimming pools, 😉 ) of water being measured?

You certainly could. Could probably include some random element that takes into account that the buoys are only sampling a tiny patch of the volume. Of course, they sample that volume a lot so it would be unlikely that they would always end up in a patch that happened to be anomalous relative to the rest of the volume.

which leads to the next question, how do we know we have enough ARGO buoys to get a meaningful measurement.

Know is a strong word, but I would guess that people have done much more complicated versions of what I’ve done here to see what would happen if the temperature in each ARGO volume happened to be highly variable and what the chance would be that you would regularly get temperature measurements that weren’t representative of the typical for that volume.

My question for you? What makes you think that we don’t have enough. So we now have a decade (or mores) worth of measurements. It shows increasing OHC. Are you really suggesting that that could be just because the ARGO buoys are – by chance – sampling patches today that just happen to be warmer than the patches they sampled 10 years ago?

I suspect people have actually checked how likely that would be, but maybe I’m wrong.

16. AnOilMan says:

Barry Woods: Contact your local Navy met-office. They’ve been successfully doing this for 30 years. Applying sparse data to exactly identify where submarines can hide is old old old hat. That is the purpose of XBT data.

If they could not do this, then they could not sink or hide subs.

17. BBD says:

It’s funny how desperate the… contrarians are to deny the validity of the OHC data, isn’t it?

Unfortunately, neither sampling density nor individual instrument measurement uncertainty can be used to do this.

18. > which leads to the next question, how do we know we have enough ARGO buoys to get a meaningful measurement.

Which leads to the next question: how do we set up our criteria for getting a “meaningful measurement”.

Which leads to the next question: when do we set up such criteria.

Which leads to the next question: why does being a “meaningful measurement” is oftentimes what Barry does not seem to have?

19. AnOilMan says:

This is FUD (Fear Uncertainty Doubt)… I believe he’s trying to emphasize the U. It tried and true denial technique. You can spot it because he doesn’t try to actually solve his question. aaand after battling internet trolls for 4 years, its getting old hat.

20. > I believe he’s trying to emphasize the U.

Peddle, OilMan, peddle.

Also, only were we among friends would it make sense to call one anotter names. Therefore we should be thankful for Barry’s concerns. We have no reason to disbelieve they are genuine.

21. jsam says:

Had the buoys demonstrated a downward trend I somehow doubt our one way sceptics would spend so much time trying to say “we can’t know”>

On the other hand, I’m glad to see them queuing up to invest in more and better measurements. It makes a nice change after Bush, Abbott and Harper.

22. Steve Bloom says:

“Nah, nah, nah I can’t hear you” is nothing if not genuine, Willard.

23. AnOilMan says:

Steve Bloom: You are really bringing down the so called ‘skeptic’ side of the arguments. You are convincing me that you don’t know what you are talking about. An actual skeptic would learn about that they are talking about.

In any case, Steve Bloom, we are talking about engineering… not science. Using sparse GIS ocean column profiles to solve exactly what a location will look like, is old news and a solved problem. If it was not solved then your navy would be useless. Blind.

24. BBD says:

OilMan

Ah, I think Steve is being humorous. He’s neither a fake sceptic nor short of a clue.

25. BBD,
Yes, I was trying to work out if AoM was trying a double bluff 🙂

26. BBD says:

It’s been one of those threads, ATTP 🙂

27. AnOilMan says:

I can be confused. 🙂

28. I have read the same argument about temperature changes, you know of the surface air temperature that gets such a bad rep at this blog. 🙂 Some climate ostrich claimed that we could not know whether the temperature changes because the readings are done in Fahrenheit. Well like this blog post illustrates, the ostrich “forgot” the power of averaging (summing).

Your estimate for the accuracy should not just be the measurement accuracy. Thus your original estimate (0.05°C) is likely the better value. The error is likely mainly determined by the sampling error. The probe only measures a tiny spot in a large volume and inside this volume the temperature varies. Consequently the uncertainty for a single measurement is quite large; the spot measurement is a noisy estimate of the mean temperature of its full representative volume. I would thus not be surprised if the uncertainty is much larger than 0.05°C.

On the other hand, we do not only have 3000 argos (I guess that is where the number comes from) making a measurement at any one time, we can also average (or sum) in time. Without it, the sampling error may be too large to compute a meaningful global mean temperature / ocean heat content.

29. JCH says:

14.25! Can I put my IQ on the metric system?

30. Victor,

On the other hand, we do not only have 3000 argos (I guess that is where the number comes from) making a measurement at any one time, we can also average (or sum) in time. Without it, the sampling error may be too large to compute a meaningful global mean temperature / ocean heat content.

Indeed, we can both sample in space and time and so, yes, my illustration is clearly very simplistic. I don’t know why you think the surface temperatures get such a bad rep here 🙂

31. JasonB says:

Your random numbers have been generated using a bell curve distribution (aka normal or Gaussian distribution) and that this is a universally accepted way of modelling this type of phenomenon. Many natural measurements produce a bell curve distribution like, for instance, penis size (a bit more interesting than ocean heat content).

It’s not a coincidence that natural measurements tend to produce a normal distribution — the Central Limit Theorem (which is what we’re really talking about) tells us that the mean will be normally distributed even when the distribution of individual measurements is not normal (although you tend to need more individual measurements to produce a normal-shaped mean when the individual measurements themselves are not normally distributed). Since in real life macroscopic properties are often the result of many individual properties working together, the macroscopic properties tend to be normally distributed.

ATTP could test this by using a normal distribution of individual measurements between [-1,1] for the first case and [-0.9,1.1] for the second to prove that the sum (and mean) still have the characteristic bell curves. It should show that the shape of the resulting curves does not depend on whether he used normally distributed individual measurements (although in the real-life case, we would expect the individual ARGO measurements to be normally distributed as well for the same reason — they are measuring macroscopic properties, with many factors combining into the final measurement reported).

32. JasonB says:

ATTP could test this by using a normal distribution of individual measurements between [-1,1] for the first case and [-0.9,1.1] for the second

Of course I meant uniform distribution for the individual measurements there, otherwise the experiment makes no sense. 🙂

33. JasonB,
Interesting idea. If I get a chance, I’ll give it a try.

This site uses Akismet to reduce spam. Learn how your comment data is processed.