You only need about 60 surface stations

Since surface temperature changes are correlated over distances of about 1000 km (it does depend somewhat on the latitude of the stations), it turns that you only need about 60 stations to produce a reasonable surface temperature dataset. [Edit: As Andrew Dessler points out in this comment, this is true for temperature anomalies, but not for absolute temperatures.]

I realise that Nick Stokes has covered this a number of times before. However, it’s probably worth repeating. Also, the main reason I wrote this is because I came across a site that allows you to experiment with this yourself. I have to admit that someone else highlighted this on Twitter, and I can’t remember who it was. If I remember (or someone reminds me) I’ll give credit [Edit: Someone has reminded me. Credit to Marcus N. Hofer]. I’m also not sure of the source of the site [Edit. It’s from Kevin Cowtan and it’s highlighted in this Skeptical Science post.].

I quickly produced the figure above. I used the GHCN adjusted plus ocean data. I initially used all the stations, then 1/10 of the stations, then 1/25, and then 1/80 (only 65 stations). The time-series look very similar (as expected). The mean trend, however, does vary slightly, but the uncertainty (not shown – see Nick Stokes’ posts for a discussion of the uncertainty) also increases. The reason, I think, that the mean trend increases slightly as the number of stations goes down, is that land stations start to dominate more and more over ocean stations, and the land is warming faster than the global average.

To be clear, I’m not suggesting that there aren’t any potential issues with the global surface temperature datasets (see one of Victor’s posts for some discussion of this). I’m mainly just trying to highlight that the sampling is almost certainly not much of an issue; you don’t need lots and lots of stations to produce a reasonable approximation for how global surface tempertures have changed. I also thought others might like to try some other variations, so wanted to highlight the site that allows you to do so (see link below).

Nice comment from Kevin Cowtan suggesting that a somewhat more careful analysis would suggest that you need maybe 130 stations. Doesn’t really change the key point; you don’t need an enormous number of stations if what you’re wanting to estimate how global surface temperatures are changing (temperature anomalies, rather than absolute temperatures). There’s a video explainer, which I’ve posted below.

Tool for producing global surface temperature datasets.
Spectral Approach to Optimal Estimation of the Global Average Temperature (Shen, North and Kim paper suggesting that you only need about 60 stations to produce a global surface temperature time series).
Global trends of measured surface air temperature (Hansen and Lebedeff paper demonstrating that surface temperatures are correlated on scales of about 1000 km).
Why raw temperatures show too little global warming (Victor Venema’s post).
Just 60 stations? (One of Nick Stokes’ original posts about only needing 60 stations).
Global 60 Stations and coverage uncertainty (Nick Stokes’ post about what happens if you cull down to 60 stations).
Are the CRU data “suspect”? An objective assessment (Realclimate post by Kevin Wood and Eric Steig demonstrating that in fact you probably only need about 30 stations).

This entry was posted in Climate change, Global warming, Science and tagged , , , , , , , . Bookmark the permalink.

45 Responses to You only need about 60 surface stations

  1. Dave_Geologist says:

    OMG. Hansen & Lebedeff (which I see I downloaded in January 2017 but never got round to reading). Figure 3 is an upside-down experimental variogram. The first data analysis step before kriging. They could have done a Cowtan & Way 30 years ago!

  2. Dave_Geologist says:

    OK correlation instead of variance, but that’s just another way of describing the same thing. Close together, similar; far apart, less similar; beyond a certain range, constant background level of dissimilarity. And they even checked for directionality! The cruder geological mapping packages won’t let you incorporate directional variograms.

    HL87’s range does seem to be a few times larger than C&W’s. “For this purpose we computed the correlation coefficient between the annual mean temperature variations for pairs of stations selected at random from among the station pairs with at least 50 common years in their record”. Doe that mean they looked at temperature series, not just same-year anomalies, which I could see introducing longer-range correlation? For example large chunks of the Pacific move together with ENSO phase, warming or cooling.

  3. Hyperactive Hydrologist says:

    Cool tool. However, the random subset always selects the same stations. For example the 1/85 stations always gives me a trend of 0.105 degreesC/decade. I check the map and it seems to always select the same stations. Not sure if it is a bug or intended.

  4. Hyperactive Hydrologist says:

    I know in R you can use “set.seed” command to set the random number generator to always select the same results. I wonder if they have done something like that.

  5. HH,
    Yes, it looks like you’re right. It would seem that they may have done as you suggest and set the seed in such a way that it generates the same random number sequence every time.

  6. Yes, you do not need that many stations to compute the large-scale long-term trend from station observations.

    You do need more stations in praxis, so that you can compare them with each other and remove changes due to changes in the way temperature was measured. Otherwise those 60 stations would not be reliable. #homogenisation

    You need more stations for spatial information on how the climate system works. #ElNino

    You need more stations for more accurate monthly, seasonal, annual and even decadal averages and to thus avoid even more stupid “hiatus” discussions.

    The Global Climate Observing System (GCOS) is working on a plan for a global climate reference station network (similar to the US reference network) with stable measurements at pristine locations. For such a network you do not need that many stations if you use it as the stable backbone of the entire system and use the other stations for higher-resolution spatial and temporal information.

  7. Dave_Geologist says:

    Ah, I see why you say “about 1200 km” ATTP, and that it’s close to C&W’s latest (?) 860 km range. HL87 chose a 1200 km range at which the correlation falls to 0.5. With kriging you normally choose a range where the variance stabilises at a maximum (or in HL87, the correlation stabilises at a minimum, more like 3000 km). It doesn’t matter that the correlation is arbitrarily low or the variance arbitrarily large. Often it will have risen to the variance of the global dataset. The software automatically takes that into account. Those distant points just make a very small contribution to the infilled value, and the uncertainty range on their contribution is large. If there are no nearby points, the infilled point will be perturbed very slightly away from the global mean, with a large uncertainty range that spans the global mean. But it’s still a better estimate than the global mean.

  8. The spatial correlation distance will depend on the averaging period. In case of temperature anomalies they will typically increase together.

    I am actually not that sure whether the spatial correlation distance is that important or even helpful. If the distance is short you have more independent samples. Does anyone know literature on this? The main reason to need 60 stations in my view is to sample all the different climatic regions of the world. They mainly need to be well spread.

  9. Joshua says:

    Anders –

    . The reason, I think, that the mean trend increases slightly as the number of stations goes down, is that land stations start to dominate more and more over ocean stations,

    Why would that be? Why wouldn’t a subset of fewer stations maintain the land/ocean stations ratio?

  10. HH,
    I hadn’t realised that this was produced by Kevin Cowtan. I’ve sent him a message asking if there is a problem with the random number generator.

    I had assumed that the correlation was relevant because it meant that you really only needed 1 station for a region with a size scale of about 1000 km. Given this, if you have about 60 well spaced stations, then – as you say – you can sample all the different regions and, because of the correlation length, each station will reasonably represent the changes in its region.

  11. Joshua,
    That’s a far question. However, as you get to 1/85 there seems to be only about 1 ocean station so, possibly, this is simply too low to produce a reasonable estimate for sea surface warming. However, it does look as though there is a problem with the random selection (it seems to always be the same) so it may be that my explanation is not correct.

  12. Hyperactive Hydrologist says:


    I mentioned it to the wife and she was very surprised that a such a small random subset would give results that so closely matched the global data set. I think that the subset has to have certain spacial distribution around the globe.

  13. HH,
    Yes, I think that is also the case (as Victor mentions).

  14. Hyperactive Hydrologist says:

    Also it going to depend on the length of the trend – 1900-2017 is a long trend period. Use 30 years and it is going to vary a lot more.

  15. Dave_Geologist says:

    On the range point, what I mean is that in-year temperature anomaly may decorrelate at 1000 km, because that is the length scale at which annual air or oceanic circulation patterns are well mixed laterally. ENSO transition years, when half the Pacific moves up and down in lock-step, would correlate more widely, but would be swamped by the other years and just appear as outliers. But if you were comparing time series, particularly back when natural variation was dominant, you’d get an ENSO signal several years long and at least half of the time series would probably comprise years going into and coming out of El Niño. Same for other “oscillations”. So there would be correlation out to the scale at which those events affect SST coherently.

  16. Eric Steig says:

    We wrote about this right after the CRU hack in 2009.

    You actually only need about 30 stations, as we showed (though the figure has disappeared — I’ll try to find it and get it back up!).

  17. Eric,
    Thanks, I think I now remember that you’d written about this. I’ll add a link to your post at the end of this one.

  18. You really only need two stations to track most of the global temperature variability

  19. John Palkovic says:

    Hyperactive: try 2/85, 3/85, …, 85/85 for different subsets.

  20. John,
    Okay, yes that makes sense. I misread the instructions. I’ve also redone the figure with different random selections and it looks like the increasing mean trend with decreasing station number was mainly due to the particular selection of the first figure.

  21. Hyperactive Hydrologist says:

    OK, so they are predefined subsets then. May be the word random should be removed.

  22. HH,
    That depends on how they generated the subsets. Could be that they simply found it easier to randomly generate these samples in advance than to run a random number generator in real time.

  23. Dave_Geologist says:

    VV and ATTP, whether the range is important depends on what you’re going to do with the data. If you’re just going to say “correlation greater than 0.5 is Good Enough, and 1000 km gives me that, so I need at least one sample per 1000 m bin”, then no. If you’re kriging, then the range (the x-axis value where it goes flat) and sill (the y-axis value where it goes flat) are essential components of the estimate. The observation distance vs. the range dictates how much the contribution of each observation is weighted at the infilled point (not at all beyond the range), and the variogram value at that distance contributes to the uncertainty estimate at each infilled point (both usually represented by a curve out to the range such as linear, spherical, exponential etc., then constant beyond the range). There may also be a nugget, representing variance at arbitrarily small distances, i.e. two nearby points with different observation values. If you include a nugget the uncertainty never goes down to zero, even at an observation point. Conceptually that could be due to irreducible measurement uncertainty, but if you think it’s real variation but just not at a scale you care about, you can eliminate the nugget by averaging and binning nearby points. That’s what Cowtan & Way did IIRC.

    I need to check what C & W did I guess, but I’ve always assumed they constructed variograms for annual average anomalies, because that’s what I’d have done 🙂 . I.e. treat each year separately, then stack multiple years on the same graph to construct your variogram. Probably with a bit of stratification to check that there wasn’t a change in behaviour, e.g. from the natural variability period to the forced period. There would be a trade-off there between self-consistency in the range throughout the time series (without which you’d get artefacts where some infilled points switched from year to year between the global mean and an extrapolated value), and capturing a genuine change in spatial correlation (for example if rapid Arctic warming and a step change in the Jet Stream has made the Arctic or cool-temperate regions more or less coherent compared to the unforced era).

    If your target product is year-by-year anomaly maps, I’d be very wary of using multi-year data because you risk introducing correlation lengths associated with regionally coherent multi-year events, and your map will potentially be over-optimistic part of the time. For example, if there’s a large, spatially coherent signal across much of the Pacific during the transition into and out of an El Niño, but not during steady El Niño, La Niña or neutral conditions, that may dominate the variogram. Especially in the era when natural variation dominated and it was the largest inter-annual signal. You’d be extrapolating temperatures much too far in non-transition years. OTOH if your target product is 30-year rolling averages, you probably want to analyse 30-year rolling averages, because they’re probably more regionally coherent than annual averages and a longer range is fully justified.

  24. Andrew E Dessler says:

    Worth explicitly stating that this is true for *anomalies*, but not for absolute temperature. For absolute temperature, you’d need a very very dense network (probably every few 10s of meters, in order to get differences in surface type). That’s why we don’t really know the absolute temperature of the planet, but can nonetheless measure the temperature change very accurately.

  25. Andrew,
    Yes, a good point, which I didn’t really make clear. Thanks.

  26. Steven Mosher says:

    “Worth explicitly stating that this is true for *anomalies*, but not for absolute temperature. For absolute temperature, you’d need a very very dense network (probably every few 10s of meters, in order to get differences in surface type). That’s why we don’t really know the absolute temperature of the planet, but can nonetheless measure the temperature change very accurately.”

    hmm need to be a bit more explicit. even with a dense network you never really KNOW. you only have estimates.
    These estimates are predictions of the UNMEASURED locations, GIVEN the measured locations.
    whether you do this prediction in absolute temperature or anomalie doesnt effect your ability to measure change, PROVIDED that when you do it in absolute that you are taking account of the variables that account for the majority of the variance. We can always estimate with less data, but it comes with an associated bias/uncertainty.

    That is, if you have a sample that gives you good latitude coverage, and good altitude coverage,
    then you will have accounted for over 90% of the variance in absolute surface temperatures.

    T at any given location is a function of

    1. Latitude
    2. Elevation
    3. Season
    4. A bunch of other variables, like surface type, promimity to water etc

    I recently played around with adding things like land cover to the regressions and sure
    enough you can explain some more variance by accounting for land cover.
    A simple one is bare earth. In some cases bare earth can even be warmer than urban
    land cover. Not trying to go Pielke Sr here, but it would be really cool to instrument the earth
    more fully. Think IoT.

    back to the stats:
    Turns out that over 90% of T is defined by 1-3, such that you can ignore #4.. for the
    GLOBAL absolute average. Your average will be close, But as you drop to 60 stations those stations would have to be optimally placed– sampling latitude and elevation, otherwise your absolute answer could be way off. So if you randomly ended up picking stations in low latitude bands and worked in absolute temps, then your estimate of absolute would be off. But if your sample was constructed to give you measures at enough latitude bands and altitude bands, then its would not suck as much.

    Victors project ( a CRN network for the world ) is really cool if you are a data nerd.

    It might be instruction to ask the question what is the minimum number required for absolute T
    I dont think it would be 10s of meters. Provided you have good stratification on altitude, latitude,
    geomorphology, and land cover, you shouldnt need to sample at 10s of meters.

    just thinking.

  27. (1) For extreme seed initialization see Note, too, that when threads are run concurrently, a special random number generator is needed to keep them from being correlated, per L’Ecuyer et al. (2002).

    (2) Apparently the decorrelation spatial scale is known in engineering, for it forms the basis of the Jacobsen, et al national renewable energy plan, otherwise known in meteorology as the synoptic scale. The work on that dates back to 1957. In particular, because generation of wind and solar decorrelate at weather system scales, this forms the basis of how Jacobsen and company provide effectively 24/7 generation. Naturally there might be blue moon exceptions, but these can be backed up by natural gas peakers: They’ll emit but not very often. The problem there is how to pay owners of the peakers for keeping them around.

    (3) The decorrelation spatial scale for solar is more complicated and more favorable. There’s a background which varies with the synoptic scale, but there’s also a shorter scale which varies with cloud cover, particularly partially cloudy conditions. Because of high albedo of clouds, such days can actually help amount of insolation in addition to hinder, because they offer PV arrays multiple angles to be illuminated. It’s kind of like multipath scattering of radio signals for cell phone reception, something which, if it did not exist, say, if it were line-of-sight only, cell coverage would be much worse than it is.

  28. One caution regarding my comment just above: The results for Jacobsen-style renewables depend upon weather patterns remaining as they historically have been. There is evidence by Dr Jennifer Francis (of Rutgers) and colleagues that, due to Arctic amplification and the like, the Rossby waves (example details) are holding in stationary patterns for longer. Dr Francis summarized the recent literature in 2017. The work is ongoing.

    See also Dr Francis c.v..

    Were these long term stationary situations to prove out, that would blow up the notion of a synoptic scale as a permanent feature. In a time-dependent way the scale would grow much larger, and then be busted up in a semi-periodic way. It would also bust up the Jacobsen, et al plans. If it did, this is Nature being particularly mean to our technology, nudging us increasingly in the direction of removing our environmental contamination (by CO2) in order to fix things. That would be bad.

  29. Richard S J Tol says:

    You need, of course, many more than 60 stations to show that spatial correlation.

    More seriously, you need to guard against inhomogeneity and local trends in climate.

    For rain and wind, you need a denser pattern of measurement, and you might as well add a thermometer.

  30. izen says:

    @-“You need, of course, many more than 60 stations to show that spatial correlation.”

    But it can be derived from basic physics without measurement.
    There is a thermodynamic limit to how in-homogeneous energy distribution can be given the heat capacity and rate of transport.

    Global climate is easy to measure with a handfull of data.
    I wonder weather raising regional variation in this context is a red herring.

  31. Dave_Geologist says:

    because they offer PV arrays multiple angles to be illuminated. It’s kind of like multipath scattering of radio signals for cell phone reception

    Off-topic but nerdy 🙂 , that reminds me of one of the two popular misconceptions about skin colour variation from the tropics to the poles. Often brought up by people who don’t want to believe that it’s just about the worst possible indicator of “race” (as in lineage), because it depends mostly on where your ancestors lived for the last few thousand years, not who their many-times-great grandparents were.

    “Why are people living in the African jungle so dark then”. Because there’s a lot of UV at ground level due to multi-path scattering off leaves. IIRC it’s been measured, and you’re exposed to much more UV deep in the jungle than (say) in Norway. Indeed I wonder if plants have evolved leave pigments that reflect UV that’s too short-wave for photosynthesis, to prevent cell damage*.

    Except on a Norwegian mountain-top, which relates to the second misconception. That it’s about sunlight’s angle of incidence on your body. Well, duh. People walk upright. The angle of incidence is less favourable in the Tropics because the Sun is directly overhead. It’s about the length of atmosphere it has to travel through, i.e. angle of incidence on the Earth, not on your body.

    * Some plants yes, others no. And it’s waxy cuticles on the surface, not pigments. Which makes sense. Stop it before it reaches the cell wall.

  32. Dave_Geologist says:

    Length and density of atmosphere, hence the mountain-top.

  33. I had a response from Kevin C. It’s seeded pseudo-random, so that you can generate non-overlapping sets. You can also change the seed by putting a colon followed by a number (i.e., 1/85:3).

  34. Steven Mosher says:

    “But it can be derived from basic physics without measurement.
    There is a thermodynamic limit to how in-homogeneous energy distribution can be given the heat capacity and rate of transport.”

    I believe that Shen came up with the 60 number without appealing to measurements.
    hmm . i read it a long while ago and cant recall the approach.

  35. Magma says:

    I found the Hansen and Lebedeff (1987) paper on the correlation of annual temperature anomalies out to ~1200 km very interesting when I first read it a few years ago. It’s a little counterintuitive, which may explain why I’ve seen commenters in the Dunning-Kruger bubble insist that without one temperature station per km² no robust conclusion about global surface temperature change can be drawn.

  36. Olof R says:

    Great tool!
    I think the area weighting is done by binning data in 5×5 degree cells, and then make an area weighted average of all gridcells with data. If only met stations are used (not SST) it becomes a a Crutem-like land only dataset.
    Nick Stokes 60 station dataset is different, it triangulates the stations to global coverage, and resembles more Gistemp dTS, which attempts to estimate the global 2 m SAT.

    I think 20 stations can be enough to catch the essentials of global warming. Here is my own attempt, a minimalistic Ratpac inspired calculation with only one station in each of the eighteen 30×120 degree regions:

    The 18 station dataset is of course noisier than the big Gistemp loti and dTs, but I would say that it has less bias in the 61 year trend. Gistemp loti has blend bias and is too low, dTs has coverage bias and is too high. Subsampled model average data suggest that the 18 point pattern has very little bias (only -0.003 C/decade compared to the complete global dataset), with the 18 station trend quite near (+0.005 C/decade difference).

    Too good to be true, or sheer luck with the choice of stations?

  37. @Olof R,

    Too good to be true, or sheer luck with the choice of stations?

    Either jitter them spatially and see, or pick a slightly difference subset and set how the performance changes. if you resample can get an idea of the density.

  38. Steven Mosher says:

    “Olof R”

    Very cool.

  39. Dave_Geologist says:

    Waaaah. Not 42. No fair!

  40. Magma says:

    65?! Seem much too high. Let’s see if I can get the tone of some of my old math and physics professors down right…

    1. For any continuously differentiable time-varying surface function on an ellipsoid there exists at least one point where the value of the function averaged over an arbitrary interval t > 0 is equal to the averaged value over the entire ellipsoid. The proof is elementary and left as an exercise for the student. For bonus points, discuss degenerate solutions.

    (just to be clear, the above is offered as an example of faintly plausible BS)

  41. Kevin Cowtan says:

    I’d like to argue for a slightly higher value of ~130 stations. This is based on our recent paper in Dynamics and Statistics of the Climate System, here:
    And explanatory video here:

    Why the larger value? The number of stations is determined by the length scale, which in turn determines the optimal weight to be given to an isolated station compared to a densely sampled region. Isolated stations receive a weight which is equal to a region of densely sampled stations whose radius is about 1.4 times the length scale. This result is a fundamental property of the autocorrelation in the data and does not depend on the arbitrary correlation cutoff of Hansen and Lebedev.

    The length scale depends on both the noise in the observations and location on the planet. We ran a simulation to determine the optimum length scale for temperature reconstruction using modern coverage, which is therefore dominated by the Antarctic, being the largest missing region. The station noise was determined empirically from the agreement between neighbouring cells, with the noise for Antarctic stations (which have no neighbours) being determined from Arctic cells containing single stations. This gives a length scale of ~850km (which we arbitrarily rounded to 800km), which is rather lower than you would obtain from cells at lower latitudes or with more stations per cell.

    This in turn leads to a station ‘information radius’ of ~1100km, which would mean ~130 optimally located stations under the assumption that those stations behave like isolated Arctic stations – given the greater temperature variability at high latitudes, I think this is a safer estimate. Relaxing the ‘Arctic’ part of the assumption would lead to a value of ~90 stations.

    Nothing in the paper is particularly new – the mathematics was all derived almost 50 years ago. However the statistics are unfortunately not widely understood even among publishing climate scientists, hence the need for another paper.

  42. Kevin,
    Thanks. I did watch your video a while ago, but had forgotten when I wrote this post. If I get a free moment, I’ll try to add an update to the post.

  43. Pingback: John McLean, PhD? | …and Then There's Physics

  44. Pingback: Global oppvarming: Slik reknar ein ut temperaturauken

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.