Guest post: On Baselines and Buoys

Posted on February 12, 2017 by ...and Then There's Physics

One of the key criticisms of Karl et al. (2015) is that it used a dataset that adjusted buoy data up to ship data – the suggestion being that, in doing so, they produced more apparent warming than if the ships were adjusted down to the buoys. In a guest post below, Zeke Hausfather shows how it makes no difference if you adjust the buoys up to the ships, or the ships down to the buoys.

Guest post: On Baselines and Buoys

Much of the confusion when comparing the different versions of NOAA’s ocean temperature dataset comes down to how the transition from ships to buoys in the dataset is handled. The root of the problem is that buoys and ships measure temperatures a bit differently. Ships take their temperature measurements in engine room intake valves, where water is pulled through the hull to cool the engine, while buoys take their temperature measurements from instruments sitting directly in the water. Unsurprisingly, ship engine rooms are warm; water measured in ship engine rooms tends to be around 0.1 degrees C warmer than water measured directly in the ocean. The figure below shows an illustrative example of what measurements from ships and buoys might look like over time:

Buoys only started being deployed in the early-to-mid 1990s. Back then about 95 percent of our ocean measurements came from ships. Today buoys are widespread and provide over 85 percent of our total ocean measurements, so it’s useful to be able to combine ships and buoys together into a single record. One option would be to ignore the temperature difference between ships and buoys and simply average them together into a single record. This is what the old NOAA dataset (version 3) effectively did, and we can see the (illustrative) results in the figure below:

Now, this approach of simply averaging together ships and buoys is problematic. Because there is an offset between the two, the resulting combined record shows much less warming than either the ships or the buoys would on their own. Recognizing that this introduced a bias into their results, NOAA updated their record in version 4 to adjust buoys up to the ship record, resulting in a combined record much more similar to a buoy-only or ship-only record:

Here we see that the combined record is nearly identical to both records, as the offset between ships and buoys has been removed. However, this new approach came under some criticism from folks who considered the buoy data more accurate than the ship data. Why, they asked, would NOAA adjust high quality buoys up to match lower-quality ship data, rather than the other way around? While climate scientists pointed out that this didn’t really matter, that you would end up with the same results if you adjusted buoys up to ships or ships down to buoys, critics persisted in making a big deal out of this. As a response, NOAA changed to adjusting ships down to match buoys in the upcoming version 5 of their dataset. When you adjust ships down to buoys in our illustrative example, you end up with something that looks like this:

The lines are identical, except that the y-axis is 0.1 C lower when ships are adjusted down to buoys. Because climate scientists work with temperature anomalies (e.g. change relative to some baseline period like 1961-1990), this has no effect on the resulting data. Indeed, the trend in the data (e.g. the amount of warming the world has experienced) is unchanged.

What the folks at the Global Warming Policy Forum have been trying to do is to compare “Up to Ships” and “Down to Buoy” records without accounting for the fact that they are on separate baselines (e.g. they are not both showing anomalies with respect to a common climate period). The graph they show, using our illustrative example, looks something like this:

However, when we put both on the same climatological baseline, we see there is in fact no difference between the two lines:

Similarly, here is what the actual graph comparing ERSSTv4 (which adjusts buoys up to ships) and an early draft version of ERSSTv5 (which adjusts ships down to buoys) looks like. When put them on the same baseline, however, we see that the new version 5 is nearly identical to the old version 4:

Here the old NOAA record is shown in blue, while the new NOAA record is shown in red. Its clear that the difference between the two is quite small, and in no way changes our understanding of recent warming.

As Peter Thorne, one of the authors of the upcoming version 5 of NOAA’s ocean dataset told Carbon Brief:

“It’s worth noting that the ERSSTv4 and ERSSTv5 series are virtually indistinguishable in recent years and that the comparison does not include the data from 2016. The recent changes that were made for ERSSTv4 are largely untouched in the new version in terms of global average temperature anomalies. Therefore, as currently submitted, ERSSTv5 would not change the bottom-line findings of Karl et al (2015)… The change in long-term global average time series in the proposed new version is barely perceptible when the series are lined up together with the same baseline period, and much smaller than the uncertainties we already know about in the existing dataset.”

He continues:

If ever there was a storm in a teacup, this was it. There is no major revision proposed here and anyone who tells you otherwise fundamentally misunderstands the submitted paper draft (which at this juncture should be the sole provenance of the editor and reviewers per the journal’s policy).

We should let peer review complete its course. Then, and only then, we can discuss this new analysis in more depth.

In the Daily Mail last week David Rose quoted John Bates as saying that “They had good data from buoys. And they threw it out and “corrected” it by using the bad data from ships.” This statement is patently false. Not only did NOAA not “throw out” any buoy data, they actually gave buoys about 7 times more weight than less reliable ship data in their new record. As we discussed in our recent Science Advances paper, relying on the higher quality buoy data removed some bias in recent years due to the changing composition of the global shipping fleet.

At the end of the day what matters is not that ships were adjusted down to buoys or buoys up to ships, what matters is that the offset between ships and buoys was effectively removed. This is now done by all groups producing sea surface temperature records, including NOAA, the U.K.’s Hadley Centre, and the Japanese Meteorological Association.

Author: Zeke Hausfather is a climate/energy scientist who works with Berkeley Earth and is currently finishing a PhD at the University of California, Berkeley.

This entry was posted in Climate change, ClimateBall, Global warming, Science and tagged adjustments, David Rose, ERSST, ERSSTv4, ERSSTv5, global surface temperatures, Karl et al., NOAA, Peter Thorne, Sea surface temperatures, Zeke Hausfather. Bookmark the permalink.

300 Responses to Guest post: On Baselines and Buoys

verytallguy says:

February 12, 2017 at 5:39 pm

Thanks Zeke.

With such a simple and clear explanation of the issue we can now be confident that accusations of fraud will be withdrawn and a rational debate on the right across to take will proceed.
Marco says:

February 12, 2017 at 5:41 pm

Maybe make one small correction: it is the Global Warming Policy Forum. If the Foundation wrote anything like the Forum did, I’m sure Bob Ward has yet another case to file about it violating its charitable status.

I think we should now also seriously consider that Whitehouse doesn’t care about the facts anymore, either. First that stuff with removing the El Nino from the end of the record only (and not from the start – i.e. 1997/1998) and now this.

You’d think they have some academic advisors, a certain econometrician who claims to be so good at numbers comes to mind, who can tell Whitehouse to stop embarrassing himself and the GWPF. Oh wait, they are advisors to the *foundation*, so they don’t care about any politics the forum has going on.
...and Then There's Physics says:

February 12, 2017 at 5:44 pm

Marco,
Thanks, have changed it from Foundation to Forum.

I think we should now also seriously consider that Whitehouse doesn’t care about the facts anymore, either.

I’ve been considering that for quite some time now.
johnrussell40 says:

February 12, 2017 at 5:48 pm

This adjustment is so simple and logical that anyone who wants to play fast and loose with it can only be up to one thing: confusing the public into believing that scientists are dishonest. Rose has had his errors explained to him numerous times over the last week and still he repeats his crap. Clearly he has his fingers in his ears and wants to continue deceiving the public.

The only good news is that the Mail has been declared a purveyor of fake news. It’s clearly a perfect client for Rose.
Joshua says:

February 12, 2017 at 5:57 pm

=={ With such a simple and clear explanation of the issue we can now be confident that accusations of fraud will be withdrawn and a rational debate on the right across to take will proceed. }==

We have already seen full accountability for how the widespread interpretation of fraud came about. Judith has explained:

–snip–
I agree that the Mail editor often goes ‘over the top’ with headlines etc.,
–snip–

and fortunately Tonyb has narrowed it down even further:

–snip–
I have met David Rose and spent several hours in his company on a non climate change related issue. The dark picture drawn of him is way off. He is charming, funny and highly knowledgeable . I found him to be an accurate and conscientious reporter. He would have had no say on the lurid headline that accompanied the article.
–snip–

Discussion: JC’s ‘role’
Christian says:

February 12, 2017 at 6:06 pm

Zeke,

Thanks first, but i think (not to you) we pay to much attention to David Rose and his crap? Ok, to correcting first place, but to do more like once gives him more attention as he is worth. So, i do think, he is talking this crap because of attention and so we better should ignore him as far as we can.

Climate denial is on the losing track and we should more care about climate science and help the public what the consequences are under current climate change
Victor Venema says:

February 12, 2017 at 6:10 pm

We can be confident that the vocal supporters of Integrity(TM) will be the ones that help spread this blog post the most energetically.
Andy Skuce says:

February 12, 2017 at 6:19 pm

I very much appreciate the simple explanation provided by Zeke’s here. It’s kind of crazy that it should be necessary, though.
Peter Smith says:

February 12, 2017 at 7:12 pm

Thank you Zeke, for clearing up the fact that there is no essential difference between adjusting ships to match buoys or vice versa and that this distinction is equivalent to changing baselines. The trend slopes are the same in both cases.

However, I suggest that the baseline is not completely arbitrary but is defined by convention to be the average temperature for a pre-defined interval. So, which direction of adjustment is needed to maintain the current baseline convention, and what is that convention?
...and Then There's Physics says:

February 12, 2017 at 7:35 pm

Peter,

However, I suggest that the baseline is not completely arbitrary but is defined by convention to be the average temperature for a pre-defined interval. So, which direction of adjustment is needed to maintain the current baseline convention, and what is that convention?

The baseline is indeed the average temperature for a pre-defined interval. However, it is the average for the dataset being considered. In other words, you produce the baseline average after you’ve done the adjustment. The anomalies are then relative to that baseline. Therefore, it doesn’t matter if you shift the buoys up to the ships, or the ships down to the buoys. Once you’ve done the shift, you then calculate the average for the pre-defined interval and present your data as anomalies relative to that baseline.
John Hartz says:

February 12, 2017 at 8:37 pm

The Brietbart headline about this OP will be:

Hausfather confirms NOAA manipulates sea surface temperature data!
Steven Mosher says:

February 12, 2017 at 8:52 pm

One reason I like always showing absolute temperature before taking an anomaly.

It precludes gwpf stupidity.
Steven Mosher says:

February 12, 2017 at 8:54 pm

And do it kelvin to preclude
Wuwt nitpicking stupidity.
Clive Best says:

February 12, 2017 at 9:24 pm

The only reason to use anomalies rather than absolute temperatures for global sea surface temperatures is that this is the only way they can be combined with land surface temperature data because these are sampled at different elevations above sea level.

However anomalies don’t directly measure global warming because they amplify meridional warming towards the poles. This is because anomalies are much greater there since seasonal variations are much greater.
...and Then There's Physics says:

February 12, 2017 at 9:32 pm

Clive,

The only reason to use anomalies rather than absolute temperatures for global sea surface temperatures is that this is the only way they can be combined with land surface temperature data because these are sampled at different elevations above sea level.

The reason we use anomalies is to try and produce a largely homogeneous dataset. I think you would use anomalies even if you were considering only the SSTs.

However anomalies don’t directly measure global warming because they amplify meridional warming towards the poles. This is because anomalies are much greater there since seasonal variations are much greater.

As far as I’m aware, the seasonal effects are removed.
Magma says:

February 12, 2017 at 9:36 pm

Please try to explain your hypothesis in a less gibberishy way, Clive, while bearing in mind a substantial number of readers here have advanced degrees in the physical sciences, a familiarity with the subject matter, and [Snip. -W]
Andrew J Dodds says:

February 12, 2017 at 9:40 pm

You do wonder if there was ever a time when loudly making a mistake as obvious as this would see David Rose humiliated and/or sacked. Which, given his previous, he should be. Or has it always been like this.
...and Then There's Physics says:

February 12, 2017 at 9:41 pm

Andrew,
I’ve wondered the same. Are we really in a period of real fake news, or has it always been like this and we just haven’t noticed because it hasn’t seem to matter that much?
Magma says:

February 12, 2017 at 9:45 pm

One of the previous two posts can be deleted… I used words or phrases that automatically put them into moderation.

Seasonality is removed in annual average temperature anomalies and in month to month comparisons (e.g. Jan 2017 vs Jan 2016 or Jan 1981-2010).

Unless Best is arguing that climate change is now beginning to affect high latitude seasonality (which may well be a real second-order effect), which would be an odd position for a skeptic to take.
...and Then There's Physics says:

February 12, 2017 at 9:46 pm

Magma,
I removed one of them. I’m not sure what was being caught.
verytallguy says:

February 12, 2017 at 9:59 pm

has it always been like this and we just haven’t noticed because it hasn’t seem to matter that much?

1924: The Mail

The “Zinoviev letter” was a controversial document published by the British Daily Mail newspaper four days before the general election in 1924. It purported to be a directive from the Communist International in Moscow to the Communist Party of Great Britain. It said the resumption of diplomatic relations (by a Labour government) would hasten the radicalisation of the British working class. The letter took its name from the apparent signature of a senior Soviet official Grigory Zinoviev. The letter seemed authentic at the time but historians now believe it was a forgery

1982: The Sun

When Ian McKay, a sergeant in the British army who died in the battle, was posthumously awarded the Victoria Coss, his widow agreed the Daily Mirror and ITN to give exclusive interviews. In exchange, the privileged media would ‘protect’ her from the rest of the media. The Sun then published a completely fabricated interview of Marica McKay based on testimonies from her mother-in-law who told them about the VC
award and from previous public speeches Mrs McKay made in the past… …Several
months later, the Press Council denounced the Sun’s false interview as “A deplorable,
insensitive deception on the public”.

1989: Hillsborough – The Sun:

On 19 April, four days after the disaster, Kelvin MacKenzie, editor of The Sun, ordered “The Truth” as the front page headline, followed by three sub-headlines: “Some fans picked pockets of victims”, “Some fans urinated on the brave cops” and “Some fans beat up PC giving kiss of life”. Mackenzie reportedly spent two hours deciding on which headline to run; his original instinct being for “You Scum” before eventually deciding on “The Truth”.[203]…
…The Sun apologised for its treatment of the Hillsborough disaster “without reservation” in a full page opinion piece on 7 July 2004

Those are just my instant recollections.

Teh google reveals far better researched accounts

http://www.politico.com/magazine/story/2016/12/fake-news-history-long-violent-214535
Zeke Hausfather says:

February 12, 2017 at 10:13 pm

Hi Peter,

Regarding baselines, the usual convention is setting the mean of the series over the 1961-1990 period to zero, though some groups like NOAA (1901-2000) and NASA (1951-1980) use slightly different ones. To put everything on the same baseline is as simple as subtracting the mean over the desired period from the series. Since its just an offset it has no impact on the resulting trend.

The only time baselines matter is the baseline period used for individual stations when converting absolute temperatures to anomalies prior to constructing regional or global records. This matters for two reasons: first, the anomaly baseline determines what seasonal cycle is removed, and significant changes to the seasonal cycle since the baseline period can cause some (generally minor) issues; second, if a common anomaly method is being used, some stations that do not have sufficient coverage during the baseline period can end up being discarded. There are various techniques to avoid both of these problems, for example the least squares combination approach used by Berkeley Earth and Nick Stokes.
Zeke Hausfather says:

February 12, 2017 at 10:17 pm

Clive,

Actually, the main reason anomalies are used rather than absolute temperatures for SSTs is inconsistent temporal coverage. Ships, drifting buoys, and Argo floats all move, and using absolutes would result in spurious artifacts creeping in due to changing coverage over time, especially in the early (pre-1950) periods when SST coverage was much more limited. Using anomalies mostly avoids this due to the much longer spatial correlation of anomalies vs. absolute temperatures.

If the anomalies are larger near the poles its due to polar amplification of warming. The seasonal cycle is removed when anomalies are created.
paulski0 says:

February 12, 2017 at 10:25 pm

I had to look up “meridional warming” (using quotes) and Clive Best was the first three results. This is his recent post.

Unless I’m missing something he’s simply talking about difference in warming rate and amount by latitude. In which case “meridional” is wrong, he’s talking about “zonal”.

It seems his idea comes down to this:

“The use of anomalies to present global temperatures is confusing because it assumes that all measurement locations are warming by the same amount globally. Instead warming is concentrated towards the Arctic, reflected an increase in heat transport away from the tropics. An increase of 1C at the equator would be far more serious than a 1C rise in the Arctic winter from -50C to -49C.”

So, no use of anomalies doesn’t assume anything like that. Yes, warming is much larger in the Arctic than the global average, but it is a relatively small area so isn’t a dominant factor on the global average, even though disproportionately significant. Sure a 1C long-term increase around the Tropics would be more significant than in the Arctic. But regional impact studies don’t use the global average (or they generally shouldn’t) – they use the trend for the relevant region.
Pingback: Zeke Hausfather regarding Baselines and Buoys | Hypergeometric
paulski0 says:

February 12, 2017 at 10:42 pm

In which case “meridional” is wrong, he’s talking about “zonal”.

I guess you could say he’s talking about the meridional warming gradient, or profile, and he does mention “profile” in the post. I think I’m just confused by the use of “meridional warming” on its own, thought it must be talking about something different.
Steven Mosher says:

February 12, 2017 at 10:48 pm

“The only reason to use anomalies rather than absolute temperatures for global sea surface temperatures is that this is the only way they can be combined with land surface temperature data because these are sampled at different elevations above sea level.”

wrong.

http://berkeleyearth.org/land-and-ocean-data/

For example, we use HADSST because we can get the absolute temp from them and other data we need to calculate uncertainties.

We do land in absolute and ocean in absolute and combine them.
if you like you can take SST under the ice if you are concerned about kriging over a land/ice possible discontinuity.

Data is all supplied in absolute, Subtract the constant of your choice ( ie baseline)

Guess what?

Nothing changes.
izen says:

February 13, 2017 at 1:36 am

The original post by ATTP on the Karl15 paper

No “pause”?

Seems to contain all the same good arguments why it is an improvement in accuracy, not cooking for Paris.

It is a tribute to the motivated credulity of their audience, and the efficacy of the Rose-Curry-GWPF gang that they have been able to re-animate the zombie claim that Karl15 is evidence that ‘scientists’ alter the data to whatever ‘They’ want.

Especially as the Karl15 new temperature series ends around 2013. If you shift the baselines, round up the difference and like the GWPF ‘mistake Fahrenheit for Centigrade then you can claim that Karl caused 0.2? of fake warming in the record up to that point.
In the 2+ years since K15 the observed temperature has risen by more than double that.
Pingback: Breakdown of an anti-science hit piece in National Review – pressingwax
Pingback: This is why conservative media outlets like the Daily Mail are ‘unreliable’ | Dana Nuccitelli – Enjeux énergies et environnement
Olof R says:

February 13, 2017 at 12:57 pm

Actually, we don’t need anomalies to validate ERSSTv4, it works fine to compare it with absolute temperatures from ARGO, at least since 2007 when the Argo array was fully deployed.

I can live with the fact that ERSST is 0.137 C warmer and not the expected 0.12 C, or that the trend of ERSSTv4 trend is 0.002 C/decade (0.5%) larger than that of ARGO. It’s within the error margin, so to say…

Btw, Nice work Zeke!
MarkB says:

February 13, 2017 at 5:44 pm

@Clive: “The only reason to use anomalies . . .”

Some other reasons to prefer anomalies discussed at this link:
https://moyhu.blogspot.com.au/2017/01/global-anomaly-spatial-sampling-error.html
Zeke Hausfather says:

February 13, 2017 at 5:59 pm

Nice Olof,

Those Argo results are quite similar to those in our recent paper. In contrast, the trend of the old ERSSTv3b (that averaged buoys and ships without accounting for the offset) was only around 0.28 C per decade.
Skibum says:

February 13, 2017 at 6:27 pm

Hey Zeke,
I get pretty much everything you describe in your post except the part about the “up to ships” anomalies and the “down to buoys” anomalies not being on the same baseline. Are you saying that, for example in the case of the “down to buoys” anomalies, the subtraction of the corrective offset from the ship anomalies effectively moves what would otherwise be the average over the specified baseline period (i.e, 1961 to 1990?) down to what would be an average over a different and earlier time period? Or, is it something else?
Thanks.
Zeke Hausfather says:

February 13, 2017 at 6:45 pm

Hi Skibum,

Say we have ship-based data set such that the anomaly is calculated with respect to 1961-1990. That means that the mean temperature over that period is set equal to zero. Now, we move all that ship data down ~0.1 C to match recent measurements from buoys. Suddenly the mean over the 1961-1990 period is -0.1 C, rather than 0 C, and the anomalies are no longer shown with respect to a 1961-1990 baseline. We can fix this by simply subtracting out the 1961-1990 mean (in this case, -0.1 C) to ensure that the mean over that period is in fact zero. This is why the direction of the offset correction doesn’t matter; as long as the baseline is consistant the resulting anomalies will not change.
Olof R says:

February 13, 2017 at 7:52 pm

Thanks Zeke,
Yes, the trend of ERSSTv3b, with similar treatment as v4 and ARGO in my example above, is 0.294 C/dec, and absolute temperatures are between the other two.
Regarding ARGO data, I just picked it straight from the Global Argo Marine atlas, which I think use the RG2009 method, the Argo dataset that had the lowest trend in your recent paper.
However, it seems like the Argo Marine atlas only uses data with full 2000 m profiles, excluding areas with less than 2000 m water depth. My feeling is that ocean surface in shallow waters should warm faster than surface over deep ocean, and Argo Marine atlas may hence underestimate the global SST warming.
Steven Mosher says:

February 13, 2017 at 8:09 pm

very nice Ol
Skibum says:

February 13, 2017 at 9:41 pm

Thanks for the response Zeke. Think I get it now. One follow up question though. Using our previous example, would it be true that if the 1961-1990 baseline period included some buoy data, then the new mean (after applying the -1C corrective offset to the ship data) for the combined ship and buoy data over the baseline period would be a bit different than the corrective offset itself? Sorry to get in the weeds here, but love this stuff and it bothers me when I don’t fully understand it.
Clive Best says:

February 13, 2017 at 9:53 pm

Zeke,

“Actually, the main reason anomalies are used rather than absolute temperatures for SSTs is inconsistent temporal coverage. .. “If the anomalies are larger near the poles its due to polar amplification of warming. The seasonal cycle is removed when anomalies are created.”

I agree with all those statements. However, the same argument also applies to land measurements which change with time, plus there is the added complication of changing elevation. Polar amplification of warming is a consequence of the radiative imbalance with latitude. The earth’s weather is a consequence of excess tropical heat moving towards the poles where net OLR radiation balances the global energy budget. The extreme case is in winter with zero incoming solar radiation at the pole. That is why you see most warming of anomalies in the Arctic occurs during winter months.
Victor Venema says:

February 13, 2017 at 10:26 pm

Skibum, no, it would make the example more complicated to explain, but the result would be the same.
Skibum says:

February 13, 2017 at 11:50 pm

Thanks Victor. Darn, thought I had it. Oh well… Back to the drawing board.
anoilman says:

February 14, 2017 at 12:32 am

This was kind of funny for me… I saw what the article was talking about and by graph one I thought to myself… “Don’t they just look for trend lines and avoid absolute measurements?” Later on… oh.. ok I’m right.

Does anyone have a link to the trend line math they use for surface temperatures? I seem to recall that skeptical science had a nice description of how its done, and why it gets rid of error.
John Hartz says:

February 14, 2017 at 12:49 am

anoilman: Perhaps you are referring to this SkS article…

Has Global Warming Stopped? by Alden Griffith, Skeptical Science, Aug 2, 2010
anoilman says:

February 14, 2017 at 2:40 am

No… it was about calculating the temperature trends. It showed all the math, namely how sites were broken up, how sampling variance was handled, and finally how all the trends were later combined.
John Hartz says:

February 14, 2017 at 3:31 am

anoilman: Do you recall when the article was posted?
Victor Venema (@VariabilityBlog) says:

February 14, 2017 at 3:30 pm

His oiliness, this one? One of the best explainers why we use anomalies:
https://skepticalscience.com/OfAveragesAndAnomalies_pt_1A.html
Zeke Hausfather says:

February 14, 2017 at 7:10 pm

Hi Skibum,

Perhaps this explanation will help. On land, we calculate anomalies separately for each station, than average those anomalies within their gridcell to calculate temperatures for that gridcell.

In the ocean, on the other hand, things move. A particular ship or a drifting buoy could end up in a very different part of the ocean a few years later, so calculating anomalies on a per-ship or per-buoy basis doesn’t make any sense. The ocean is also a lot more homogenous than the land; a single land gridcell could have mountains and valleys, forests and fields, and lots of other characteristics that can affect the absolute temperatures. Ocean temperatures, on the other hand, are pretty well spatially correlated, and measurements in any one part of that gridcell will be pretty representative of the whole gridcell. So the way ocean anomalies are calculated is generally relative to the gridcell average temperature during the baseline period, rather than the average temperature for specific instruments within that grid cell. If the gridcell contained both ships and buoys during the baseline period, a constant amount would be subtracted from both ships and buoys for each month. Because the amount subtracted from ships and buoys would be the same, it wouldn’t in any way affect the offset between the two types of instruments, and a separate correction for that offset would still be needed.
Clive Best says:

February 14, 2017 at 7:43 pm

Hi Zeke,

There is just one small problem with your otherwise perfect logic The gridcell average temperature during any baseline period will depend on the particular mix of drifting buoys and ship measurements made during that 30 year period. If that mix changes significantly with time (and we know it does), the baseline itself is not constant. So if you choose a different 30 year normalisation period you will get a different (although small) answer.
anoilman says:

February 14, 2017 at 7:49 pm

Victor… Thanks! That’s it.
...and Then There's Physics says:

February 14, 2017 at 8:12 pm

Clive,

There is just one small problem with your otherwise perfect logic The gridcell average temperature during any baseline period will depend on the particular mix of drifting buoys and ship measurements made during that 30 year period. If that mix changes significantly with time (and we know it does), the baseline itself is not constant.

Technically the mix can only change if you add measurements that you had taken but not previously included (i.e., we can’t compute the 1971-2000 baseline until after 2000). Also, I don’t think it really matters, as long as when you recompute your baseline, you also recompute your anomalies. Where I think there could be an issue (which, I think, Zeke mentioned earlier) is if your baseline is determined from a mix of buoys and ships. If you then compute your anomalies and then adjust the buoys to the ships, then when you recompute your baseline, it won’t be zero (i.e., if your anomalies are relative to a 1971-2000 baseline, then the average of your anomalies over that period should – I think – be zero). You would – I think – have to do an additional shift so that your baseline value (when computed from the anomalies) was 0.
izen says:

February 15, 2017 at 12:00 am

This gets far to computationally complex for the majority of the numerically limited. Trying to refute the idea that corrections are corruptions is not aided by extra algebra.

The Irreducibly Simplified (TM Monckton) mathematical model that people might have of the process:-
There are ten (quintennial?) measurements,
0, 1, 2, – 4, 5, 6, 7, – 7, 8, 9.
It is known the middle four measurements are too high.
If you raise the last three to match the middle it sure LOOKS like the overall trend has been maximised.

I suspect that is the rationalisation for favouring fraud as an explanation.
JCH says:

February 15, 2017 at 2:14 pm

… Otherwise known as hide the decline.
“the offset between ships and buoys was effectively removed”
Hide the cooling under the carpet.
I am ashamed to see Zeke publishing excuses like this. – angech
Marco says:

February 15, 2017 at 3:48 pm

Angech should be ashamed that at least four different people have been trying to explain to him something so simple as having to correct for systematic biases, and he still doesn’t get it.
Ron Graf says:

February 15, 2017 at 4:40 pm

I notice on the last chart comparing global SST indexes that ERSST4 and 5 diverge from COBE-SST2 and HadSST3 just before 1940 and again just after 1945. Realizing this corresponds with the WWII shift in ship observation from use of canvas bucket sampling to mostly engine room intake (ERI) water temperatures, and then rebounding after WWII, this reveals NOAA’s ERI adjustment is in conflict with the other indexes. This is particularly surprising since I understand NOAA changed the ERI using Hadley’s nighttime ocean air temperature index (HadNMAT2). What is the debate between NOAA and Hadley and COBE on the ERI offset? Why was Matthews(2013) and (Part II), not cited in Juang15 (as Peter Thorne thought it should have been)? If Matthews was not accepted why not do a paper with new research to refute the ship adjustments established by Matthews?

My understanding is that the “pausebusting” was primarily due to the change in ERI offset so it seems that we should be talking about that rather than which observation is lifted or lowered to make the offset. BTW, I think Bates’s complaint was that adjusting the known good instruments instead of the assumed biased ones is bad protocol (perhaps leading to future complications). Bates it seems felt thus it was an example of window dressing for political appearance as was delaying Huang15 and rushing Karl15’s publication so that it could be the publicized paper (all with Paris in mind). I think Rose misunderstood Bates and thought he was accusing of data tampering, rather than just being too politically driven (which is also not good).
JCH says:

February 15, 2017 at 4:47 pm

Bates has admitted the political accusation was a guess… based upon no evidence whatsoever.
Marco says:

February 15, 2017 at 5:55 pm

“I think Bates’s complaint was that adjusting the known good instruments instead of the assumed biased ones is bad protocol ”

I think we should not assume that, since it really is difficult to fit with Bates’ purported claim that “they threw out perfectly good buoy data”. Sure, Rose may have completely misquoted, but Bates *never ever corrected Rose*. I thus think Rose did accurately quote Bates, and now he can’t take it back, even though he knows he is wrong. I think Bates mindlessly parroted what he had heard from pseudoskeptics. Bates is free to show me wrong by forcing Rose to correct his quote.

“as was delaying Huang15 and rushing Karl15’s publication”
Please explain what you mean here. Huang et al was received in final form October 3, 2014. That’s more than 2½ months before Karl et al was *submitted*; how was it delayed? And what exactly was “rushed?” about Karl et al? Please, give us clear and objective arguments, not the innuendo based on “feelings” from so many others.

Oh, small slight last thingie to consider: Matthews is thanked in Karl et al for his contribution to the analysis, and Thorne does not say those two papers should have been cited, but rather thought that they were.
JCH says:

February 15, 2017 at 7:06 pm

(all with Paris in mind)

Prove it.

Lamar Smith has the emails more than one year where such a conversation was most likely to have taken place for. He has had them. Not a peep out that miserable little fake except a demand for more emails and documents… meaning there was nothing in the ones he has had more than a year that would support continuing an investigation in the normal constitutional sense (which does not really apply here.) In a world where people have constitutional rights, Curry and Bates are kicked to the curb as they nothing… no evidence.
paulski0 says:

February 15, 2017 at 7:14 pm

Ron Graf,

What is the debate between NOAA and Hadley and COBE on the ERI offset?

The Hadley method uses results from physical modelling and direct experimentation on equipment to derive biases between measurement types. Metadata is then used to apply adjustments based on measurement type.

The NOAA method doesn’t pay any attention to metadata or ship measurement type. All ship measurements are adjusted by reference to HadNMAT2 (don’t ask me how exactly, Smith and Reynolds 2002 is the original reference for the method). Don’t know about COBE.

So there isn’t really a debate on ERI offset. They’re fundamentally different approaches and NOAA make no explicit accounting for different measurement types. The Matthews papers would be relevant for Hadley’s method, but they wouldn’t have any obvious implications for ERSSTv4.
Victor Venema (@VariabilityBlog) says:

February 15, 2017 at 7:48 pm

The relationship between sea surface temperature and air surface temperature is only accurate for averages over long time periods and large regions. Thus the adjustment of ship SST to NMAT is made over long periods (decades) and larger regions; it thus cannot follow the fast changes around WWII. Paul links to the article with the details. See also a recent BAMS paper on remaining problems in sea surface temperature reconstructions.
http://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-15-00251.1

One wonders why someone who does not know these things is so confident science is wrong.
Zeke Hausfather says:

February 16, 2017 at 1:34 am

I personally suspect that Hadley is probably better than NOAA in the pre/post WW2 period (while NOAA is better than Hadley over the last two decades). We are working on a paper using island and coastal land data to homogenize SSTs during the wonky period around WW2 that should provide more evidence one way or another.

Paulskio, the benefits of using NMAT as a reference is that (at least in theory) it shouldn’t be subject to inhomogeneities at the same time as SST measurements. E.g. before and after a switch from buckets to ERIs, NMAT measurement methods should be unchanged, which allows for an estimate of the magnitude of the SST instrumental bias since NMATs and SSTs are pretty well correlated. While NMAT itself isn’t really that homogenous over time, as long as its issues are mostly asynchronous from SST instrument changes its still a useful reference.
Ron Graf says:

February 16, 2017 at 1:44 am

Marco: says: “Please explain what you mean here,” regarding: “as was delaying Huang15 and rushing Karl15’s publication.”

Remember that Huang15 is the real pausebuster. From my reading of Bates he is claiming Juang15 was delayed and then released with no publicity while Karl15 was rushed to be launched in the summer news cycle to influence the political discussion about the fall Paris meeting, thus providing optimal buzz while minimal counter research vetting time. In the USA it can be equated with an “October surprise.”

Paulski, I agree with you regarding the methods of ERSST4 versus Hadley and the others. My question was more why didn’t Hadley just use their own nighttime marine temp record instead of the tedious metadata intensive adjustments? Was NOAA that much smarter then Hadley and COBE? Or, is there good reasons Hadley is not following?

Victor V: The relationship between sea surface temperature and air surface temperature is only accurate for averages over long time periods and large regions.

Actually the NMAT is only a proxy for Tmin near surface (<2m deep) SST. If the diurnal temperature range shifts over time, which is does in both in models and observation, then we get false warming over the 20th century. Also, NMAT has its own problems. According the Huang the deck heights for observation increased over time so they had to add a warm trend to compensate.

Marco: Oh, small slight last thingie to consider: Matthews is thanked in Karl et al for his contribution to the analysis, and Thorne does not say those two papers should have been cited, but rather thought that they were.

The acknowledgment in K15 to Jessica L Matthews from NCDC is not the J B Matthews from Univ of Victoria, BC, Canada. I looked them up. But you are correct that Thome wrote he “believed” they had been cited, not they “should” have been cited.

Looking at other papers by JB Matthews and JBR Matthews (certainly a relation) they are not skeptics. They wrote papers alarming and quantifying arctic amplification, etc… The paper on ship adjustments was motivated to lay groundwork for a steeper SST bias adjustment to enhance the warming trends. They took issue with the buckets being adjusted up to 0.5C cool (Folland and Parker 95). They tried to show observations could be done fast enough to get the bias down to 0.2C. In part II they find the ERI so difficult to analyze that they recommend they be thrown out of the record. But with the in situ testing they did they found the ERI to be as cool as buoys. Because even with the slight water residence time warming in the hull intakes the 7-10 meter depth gave them a cool bias that countered the 1 meter depth typical of buckets and buoys, especially in the daytime. JB Matthews give acknowledgements to John Kennedy, Andrew Weaver, David Parker and Phil Jones for comments.
Zeke Hausfather says:

February 16, 2017 at 2:14 am

Hi Ron,

In our recent paper we created ship-only and buoy-only records up through present. I can confirm a buoy-ship difference of around 0.12 C in between 1997 and 2009. After 2009 it steadily shrinks, as ships show considerably less warming than buoys (likely due to changes in the characteristics of the shipping fleet and a ~30% reduction in the number of observations from ships). We show this in Figure S13 and discuss it at some length in the text.
Ron Graf says:

February 16, 2017 at 3:53 am

Zeke, thank you. I will look at it. Can you point to your ftp archive for your data and code?

Can you point to or supply Juang15’s data and code. Peter pointed to K15’s data but not code.
Ron Graf says:

February 16, 2017 at 4:24 am

Zeke, at Brandon’s blog you said:

Hi Ron,
They do have side-by-side data from ships and buoys. Thats where the ~0.1 C offset comes from. ERSSTv5 will be updating this approach to use a dynamic offset calculated each month rather than a static offset for the full period. Both ERSSTv4 and v5 also assign much more weight (on average ~7x) to collocated buoy measurements over ship measurements when a gridcell has both, under the (likely correct) assumption that buoy data is much more homogenous than ship data.

I am supposing the ~.1C is the 0.12C you and Karl15 refer to in your papers. I never see the ERI and buckets separated out, though Karl speaks of them non-quantitatively. Is there any SST data that has them deferentially coded so as to allow separate analysis?
Frank says:

February 16, 2017 at 6:37 am

Zeke: I appreciate your taking the time to post useful information here and earlier at Judy’s. At the risk of sounding ungrateful, I do wish you (and others) had posted a graph of the difference between ERSST3 and ERSST4 (and possibly between ERSST4 and the new ERSST5). Even better. a difference plot that highlights the main period when the transition from mostly ships to mostly buoys occurred.

Perhaps then I will be able to unambiguously see that an 0.1 degC correction occurs during the period buoy data became dominant and that this change accounts for the finding in Karl 2015: a change in SST warming trend from 0.014 to 0.075 °C dec−1 for 1998-2012 (0.085 °C correction over 14 years).

Of course, Karl15 (and the IPCC) missed the most dramatic hiatus. Even with ERSST4, the warming rate from 1/2002 to 1/2012 is 0.00 °C dec−1 (and slightly negative if one cherry-picks the optimum starting and finishing month). Of course, with an almost unlimited number of potential periods and autocorrelated data, most of these trends are fairly meaningless without proper confidence intervals. And even then, 10% of periods “should” have a trend outside the 90% confidence interval (which should be about -0.10 to +0.10 °C dec−1 for 1/2002 to 1/2012).
Marco says:

February 16, 2017 at 6:43 am

Ron, thanks for the correction on Matthews. I thought about that right after I switched off my computer. Wasn’t worth the bother to start it up again.

I do need to take issue again with your “Remember that Huang15 is the real pausebuster. From my reading of Bates he is claiming Juang15 was delayed and then released with no publicity while Karl15 was rushed to be launched in the summer news cycle to influence the political discussion about the fall Paris meeting, thus providing optimal buzz while minimal counter research vetting time. In the USA it can be equated with an “October surprise”.”

Let’s go through this in smaller steps. Huang15 is not the (faux)pause-buster. Ultimately, in *Karl15*, the updates to ERSST are the most important to remove the supposed ‘pause’. But that does not come from Huang15, because there were no people defining a ‘pause’ based on SST alone.
I don’t think Bates claimed Huang15 was delayed, but rather the release of ERSSTv.4 into the public domain until Karl15 was approved. I’m not surprised if they really did delay that, as it is rather normal to delay a data release that is used in a new paper, where the analysis is so ‘simple’ that anyone could have done the same and thus ‘steal’ the paper you are preparing.
Then we get to the Karl15 paper, where Bates himself already admitted he had no proof whatsoever it was rushed to influence political discussions. Anyone familiar with peer review also knows it would be hard to control what happens at a journal, so that already makes it problematic. In this particular case, we know Karl15 was in review for longer than the average at Science. If it had not been, it would have been released in April. That would actually have been a little bit better, but only marginally so, if anyone had wanted to influence Paris, because in February the primary draft agreement for the Paris meeting was ready. In the period of March to June, countries were expected to come with information on how much they would reduce their emissions. Karl15? Way too late to do anything at all for the Paris meeting.

Somebody has created a narrative in their mind, too bad the facts mess it all up.
Willard says:

February 16, 2017 at 8:54 am

Speaking of narratives, Doc sticks to his elsewhere:

Fact check, none of the instruments measure anomalies in the sense you have just described.
They measure real temperatures. Anomalies are something added afterwards and can be anything you like because as you said you can choose whichever baseline you like.

Discussion: JC’s ‘role’
Tony Banton says:

February 16, 2017 at 9:37 am

Willard:
Thanks – it was me he was replying to.
I have responded again (twice).
I have no hopes.
Phil says:

February 16, 2017 at 10:20 am

Marco,

it is rather normal to delay a data release that is used in a new paper, where the analysis is so ‘simple’ that anyone could have done the same and thus ‘steal’ the paper you are preparing.

I don’t really think that this stands up to scrutiny, since Karl15 was submitted in Dec 14 and the mooted publication date for ERSST.v4 data was Jan 15. Any ‘rival’ paper would need to be researched, written, submitted and peer-reivewed in less time it took Karl15 to go through the latter step only for them to be beaten.
Peter Thorne provides an explanation:

Whenever NOAA puts out new data sets, that will be updated each month for tracking global temperatures they always get numerous questions that they have to be able to answer. So, NOAA waited on releasing this new ERSSTv4 data set until the Karl et al. paper was out because the Karl et al. paper presented the implications of the new corrections

In other words, NOAA judged that all the questions the public/media would ask about the ERSST.v4 data release could only be answered by “That question will be answered by a new paper (Karl15) coming out soon”, in which case there was no real point in publicly releasing the SST data until Karl15 was released.
Victor Venema (@VariabilityBlog) says:

February 16, 2017 at 11:21 am

“Of course, Karl15 (and the IPCC) missed the most dramatic hiatus. Even with ERSST4, the warming rate from 1/2002 to 1/2012 is 0.00 °C dec−1 (and slightly negative if one cherry-picks the optimum starting and finishing month).”

Good that you know Cherry Picking exists, it does not apply only to selecting specific months, but also years. It is easy to find short periods (10 years is really short) with a clearly lower trend than the actual long-term trend.
http://variable-variability.blogspot.com/2017/01/cherry-picking-short-term-trends.html
Marco says:

February 16, 2017 at 11:22 am

Phil, Peter Thorne’s comment makes sense, and yet not quite, because I don’t think many media outlets would actually ask the questions he mentioned. Maybe I oversold the paper-idea, but a few blogposts with this type of analysis, and it really would not be *that* difficult to do (take GISTEMP and use ersstv4 instead of 3b, for example), would have killed the Science paper, too. Plenty of other journals would likely still take it, but for Science it would not have been novel and ‘sexy’ enough anymore.
Marco says:

February 16, 2017 at 1:51 pm

Maybe I should try once more here for Angech, but with an example he perhaps can better understand.

We have many options to measure someone’s body temperature, but let’s say we have a large population of people where we started to measure only rectally in period A, and where we measure almost exclusively orally in period C. In period B we slowly went from rectal to oral. Now suppose rectal systematically measures higher than oral by 0.5 degrees (that is, they are on a different baseline!), and that you are tasked, retrospectively, to determine whether the body temperature of this population has increased, decreased, or stayed the same.

Failing to correct for this change in measuring approach, we most likely see a drop in body temperature going from period A to period C. Is it Angech’s view that OPTION 1: the body temperature of these people has indeed dropped, or OPTION 2: does he consider it necessary to correct for the systematic error?
Maybe then he can tell us which of the two (oral or rectal) should be adjusted, but more importantly, whether it matters which one is adjusted, if one is only interested in the question whether the body temperature of this large group of people has decreased, stayed the same, or increased.

Over to you, Angech!
Willard says:

February 16, 2017 at 3:21 pm

You just do not get it, Marco.

David Rose got his baselines wrong, but any baseline is arbitrary, it’s adjustments all the way down, there’s no measurement unless it’s absolutely absolute.

Science is a sham. Teh Donald will make absolutely absolute measurements great again. No need to get agitated over all this.

The only good news is that Chief is back. Don Don is too, but I had hope he’d replace teh Sean. Sad.
Ron Graf says:

February 16, 2017 at 3:52 pm

Angech: Fact check, none of the instruments measure anomalies in the sense you have just described. They measure real temperatures. Anomalies are something added afterwards and can be anything you like because as you said you can choose whichever baseline you like.

Although not easy to compose, Angech’s argument well understood by skeptics. So I will try to explain it for non-skeptics. Once one is dealing with anomalies one is already removed from the “raw” data. There has already been processing that contains assumptions. As un-intuitive as it may seem one can calculate anomalies several different ways and come up with several different results, all being valid. This is why it is not raw data. Clive Best just did a post (Temperature Data Averaged Three Ways) demonstrating a similar issue with three equally valid methods of determining GMST. He showed Hadcrut chose the warmest of the three choices.

In SST the measurements are by different instruments and, like Marco’s body temperature readings, have biases that need to be corrected. Confounding things more, the density of those measurements changes over time and location. Confounding more, the technology and measurement protocols evolve over a 130-year period, going back to the age of sail and use of wooden buckets.

What makes skeptics so concerned is the lack of concern by non-skeptics in correcting these biases. Juang15 sweeps the whole issue under the rug and effectively replaces ship measurements in the far past with night marine air temps, which have their own biases and possibly non-linear relationship to SST. And for the near past they simply weight them out of significance. Marcia McNutt, the president of the NAS, calls Zeke’s confirmation of K15 (really Juang15) the “The platinum test”. Although using the infrared satellite data is a good independent database, so is the MSU satellite database (but that was not used). My platinum test would be to do in situ testing of all the historical instrumentation, do reconstruction of the meta-data, and see if the behavior of the separate series behave according to prediction over time. Then I would do the same for the NMAT. Then I would compare the Tmin (night time) SST to NMAT and expect a confirmatory correlation.

If this is not logical please let me know. If it is logical then let me know a good reason why it is too expensive versus a multi-trillion-dollar policy implication.

Zeke, your help in supplying my request for data and code should not be taken as a slight. The platinum test is when skeptics confirm your work.

BTW, Marco, Marcia McNutt was the chief editor of Science before President Obama appointed her to head the NAS. McNutt has been quoted saying the “climate debate is over” and such. I’m sure there would be no way for the White House to influence her or the publishing of K15, (but only if you are non-skeptical).
JCH says:

February 16, 2017 at 3:54 pm

Butt, a thumb on the bulb means more income.
JCH says:

February 16, 2017 at 3:56 pm

Zeke, your help in supplying my request for data and code should not be taken as a slight. The platinum test is when skeptics confirm your work.

Zeke is a skeptic; he confirmed K-15, in part, using suggestions made by Professor E Curry.
verytallguy says:

February 16, 2017 at 4:37 pm

request for data and code

Chant repeatedly, glassy eyed.
Marco says:

February 16, 2017 at 6:14 pm

“What makes skeptics so concerned is the lack of concern by non-skeptics in correcting these biases.”

Thank you for your concerns, Ron (ha! I beat Willard)
Personally, I am not concerned, because different groups work on the same topic. The concerns of “skeptics” are in my opinion more the “concerns” of pseudoskeptics: the constant notion that something nefarious *must* be going on, and those people just cannot know what they are doing.

“BTW, Marco, Marcia McNutt was the chief editor of Science before President Obama appointed her to head the NAS. McNutt has been quoted saying the “climate debate is over” and such. I’m sure there would be no way for the White House to influence her or the publishing of K15, (but only if you are non-skeptical).”

Ron, you might want to consider that you are apparently so “skeptical” that you start making up your own stories that have no basis in fact. Like McNutt being appointed to head the NAS by Obama.
Clive Best says:

February 16, 2017 at 6:41 pm

This subject was already discussed in 2011 see: “Kennedy J.J., Rayner, N.A., Smith, R.O., Saunby, M. and Parker, D.E. (2011c). Reassessing biases and other uncertainties in sea-surface temperature observations since 1850 part 2: biases and homogenisation. J. Geophys. Res., 116

The paper includes a percentage plot of measurement technique with time and includes a discussion of the effect of drifting buoys on the normalisation period.

The uncertainty range in the 2000s is wider than in the climatology period, despite the fact that there is a greater number of more reliable drifting buoy observations in the modern period. This is due, in part, to the step of setting the average bias adjustment to zero over the 1961-1990 period. The anomaly associated with a drifting buoy observation is therefore equal to the accurately measured buoy SST minus a more uncertain climatological value of the SST at that point.
paulski0 says:

February 16, 2017 at 8:19 pm

Ron Graf,

Although using the infrared satellite data is a good independent database

And the Argo datasets. And, yes, the buoy-only dataset. “Skeptics” are trying to minimize Hausfather et al. 2017 by saying the buoy-only dataset is not independent due to the same buoy data making up a large percentage of observations in ERSSTv4. While that is true, it misses the point that the whole argument brought against ERSSTv4 and Karl et al. 2015 was that the larger recent trend was caused by incorrect adjustments. The buoy-only dataset is independent in this context because it contains none of the adjustments “skeptics” said were wrong.

…so is the MSU satellite database (but that was not used).

Definitely not. Doesn’t measure SSTs.

My platinum test would be to do in situ testing of all the historical instrumentation, do reconstruction of the meta-data, and see if the behavior of the separate series behave according to prediction over time.

This seems like changing the subject slightly. McNutt’s “platinum test” clearly refers to recent trends. Presumably you’re referring to a test of corrections over the full record. To some extent what you’re describing is the work put in to HadSST3 corrections. I’m not sure I fully grasp the full extent of your proposal. Are you suggesting we actually fully recreate the thousands of voyages which contributed to the historical SST record over the past 150 years?
willard (@nevaudit) says:

February 16, 2017 at 8:53 pm

> This seems like changing the subject slightly.

After the platinum test should come the Osmium test, the Iridium test, the Palladium test, the Rhodium test, the Rhutenium test, the Tellurium test, and the Rhenium test.

And that’s just the metal tests.

Auditing tests never end.
Frank says:

February 16, 2017 at 8:56 pm

Frank wrote: “Of course, Karl15 (and the IPCC) missed the most dramatic hiatus. Even with ERSST4, the warming rate from 1/2002 to 1/2012 is 0.00 °C dec−1 (and slightly negative if one cherry-picks the optimum starting and finishing month).”

Victor replied: “Good that you know Cherry Picking exists, it does not apply only to selecting specific months, but also years. It is easy to find short periods (10 years is really short) with a clearly lower trend than the actual long-term trend.”
http://variable-variability.blogspot.com/2017/01/cherry-picking-short-term-trends.html

Frank replies: The 14-year period in Karl15 is also too short. If one believes that a linear AR1 model is appropriate for characterizing warming, the difference between ERSST3 and ERSST4 isn’t statistically significant. So one has to wonder why Karl15 was worthy of being published in Science. The only thing I pay attention to is the 40+ year linear AR4 trend since the mid-1970’s or since the beginning of the satellite era. That trend gets a little smaller if you get back into the 1960s and 1950s due to another hiatus, but CO2 was only increasing at a rate of 1 ppm/yr in the 1960’s (vs 2 ppm/yr recently). The mid-1970’s is a reasonable compromise and the confidence intervals for the trend is only about +/-0.02 K/decade. For a 20-year period, it is +/- about 0.06 K/decade. So maybe one can say that the trend for a particular 20-year period is significantly different from the 40-year trend – except that it isn’t. Currently the trend for the 19-years since the 1997/8 El Nino is the same as the 19 years before and the whole period.

Rather than assuming a linear AR1 model, it makes sense to use the output from AOGCMs as a physical (rather than statistical) model for warming. Before Karl15, there was strong evidence that the AOGCM’s disagreed with observations, since a 15-year period with no net warming was seen very infrequently in model output. So one could conclude with reasonable confidence that model ECS was too low OR that models didn’t produce enough unforced variability. The change from ERSST3 to ERSST4 reduced the length of the period with no net warming to only 10 years, which is still a “very unlikely” result in IPCC terminology.
willard (@nevaudit) says:

February 16, 2017 at 9:12 pm

> Although not easy to compose, Angech’s argument well understood by [contrarians].

A Carbon test on that overall understanding by Ron of the contrarian community would be nice.

My own understanding is that Doc’s argument is invalid – it’s a tu quoque (“what Rose does applies to everyone”) based on a misunderstanding of measurement theory (“absolute temperature”) that commits an invalid inference (“conventions are arbitrary”).

A high number of sleights of hand in so few words indicates an above average ClimateBall efficiency.
Zeke Hausfather says:

February 16, 2017 at 10:45 pm

Ron and Frank,

Sorry for the delay in responding; been fairly swamped with other work lately.
Our code can be found here: http://www-users.york.ac.uk/~kdc3/papers/ihsst2016/methods.html
Separate data for ships and buoys can be obtained via ICOADS: http://icoads.noaa.gov/products.html
You can see ERSSTv3 and v4 compared to buoy-only and satellite-only records here (v5 isn’t available yet): http://www-users.york.ac.uk/~kdc3/papers/ihsst2016/background.html

Unfortunately I don’t know if Huang’s data is online. Regarding MSU data over oceans, its not really a useful comparison for a short period like 1997-2015 due to large difference in tropospheric amplification of El Nino events compared to surface records. Satellite radiometer data, on the other hand, is measuring effectively the same thing as ships/buoys and matches in-situ buoy records quite well.
Zeke Hausfather says:

February 16, 2017 at 10:50 pm

Also, differences between v3 and v4 are significant over the post-1997 period even if trends of each along may not be. The reason is because most of the short-term variability is shared between the two of them and is removed in the difference series, which does show strongly significant trends, as we discuss in our paper.
Victor Venema (@VariabilityBlog) says:

February 17, 2017 at 3:36 am

Frank, I agree that the idea that global warming has stopped or paused is ludicrous. It is bad statistics and completely ignores the large influence of cherry picking; as the article I linked to before demonstrated. That ERSSTv4 makes this claim even more ludicrous is thus mildly interesting, had Science asked me I would have advised a lesser journal. I would have advised the same for (nearly) any “hiatus” paper in Science and Nature. They are mostly interesting articles on natural variability that should go to good climatological journals.

Let’s not confuse the question whether there was a decrease in the rate of warming with the question whether models and observations fit together. The CMIP ensembles are not made for decadal prediction, but for long-term warming. For short term fluctuations, the uncertainty is about twice as large as the model spread.
http://variable-variability.blogspot.com/2015/09/model-spread-is-not-uncertainty-nwp.html

Thus I do not see any problem with this comparisons. The more so because when models and observations would never be near the bounds of their confidence intervals, that would mainly show that the confidence interval is too wide.

When models and observations do not fit, the models can have a problem, the observations can have a problem or the comparison can have a problem. Or all three. Do not jump to conclusions and immediately blame tze modelz.
http://variable-variability.blogspot.com/2015/09/models-and-observations.html
angech says:

February 17, 2017 at 3:54 am

“Tony Banton wrote this relevant to the discussion here but at JC.
As I’ve tried to explain (along with a few others) to angech. Given that the two sets of instruments are measuring the same thing, then the trend is the same, yes?”
Two wrong comments in one sentence pushing hard to make up a fiction.
–
First the two sets of instruments are both measuring temperature [not anomalies]. They are however not measuring the same thing.
One is measuring sea water temperature collected by ships,from different levels and heated and cooled by various other inputs on the way.
The other is measuring temperatures in sea water at a set level with hopefully the same sort of thermometer without ship and human interference.
-Second “then the trend is the same, yes?” No.
One is said to be measuring 0.12 C average lower than the other.
That does not tell you the trend of the two types at all
That would be an average of each trend over the same time period for the same number of ships and buoys. It says nothing about the actual trend of each type over that time period.
One could be double the trend of the other but one could still say the average trend is 0.12 lower.
The trends are said to be similar. You need to specify the time intervals for comparison and it is obvious that two such disparate systems should rarely be in synchronicity as to trends.
There is no common period where one can truly compare trends. Buoys go from 0 to 7/8 of measuring system type used over 20 years. There are no true trends to compare.
That is not to say the mathematicians like Zeke cannot do a serial breakdown of the ship and Buoy temperatures over the time period of common use but varying number.
I do not see it?
Could he put it up?
Would be grateful to see the true trends and their correspondence.
Pingback: One more Daily Mail update - Skeptical Science
Willard says:

February 17, 2017 at 6:20 am

> Would be grateful to see the true trends

It might be nice for to tell what would be a trend you’d consider true, Doc.

Would also be grateful to know what you mean when you say that the two trends (which I suppose are untrue to you) are not the same, or indeed why you find it obvious that the two systems can’t have similar trends.

This is what happens when you use words like “true,” “same,” and “similar.”

From a primer on anomalies:

When calculating an average of absolute temperatures, things like station location or elevation will have an effect on the data (ex. higher elevations tend to be cooler than lower elevations and urban areas tend to be warmer than rural areas). However, when looking at anomalies, those factors are less critical. For example, a summer month over an area may be cooler than average, both at a mountain top and in a nearby valley, but the absolute temperatures will be quite different at the two locations.

https://www.ncdc.noaa.gov/monitoring-references/dyk/anomalies-vs-temperature

After all these years, Doc.

Sad.
Marco says:

February 17, 2017 at 7:51 am

“There is no common period where one can truly compare trends. Buoys go from 0 to 7/8 of measuring system type used over 20 years. There are no true trends to compare.”

Yes there are. The figures are also right there, in the blog post itself.

Seriously.
...and Then There's Physics says:

February 17, 2017 at 7:56 am

angech,

One is measuring sea water temperature collected by ships,from different levels and heated and cooled by various other inputs on the way.
The other is measuring temperatures in sea water at a set level with hopefully the same sort of thermometer without ship and human interference.

Yes, but that is essentially the point. You want to try and use as much of the data as possible. However, you can’t simply combine the data because they’re not measuring the exactly the same thing. Therefore, you have to apply a correction so that they are adjusted in such a way that you compensate for this difference.
angech says:

February 17, 2017 at 9:23 am

Anomalies vs. Temperature
“tell what would be a trend you’d consider true.”
Simple,
There would be a trend for the Buoys, It would start off awkward with only 1 Buoy and as more are added one would have to merge [sigh] the data sets and this would give a buoy only trend.
One would already have a full ship only set of data which again would have to merge the ships as they begin to decrease in number.
Zeke has both of these.
Then to compare trends in general you could have the ship only trend and the buoy only trend.
Obviously due to the much longer ship data length, CO2 warming increase and natural variability these two trends will not match.
Next you could take the period where ship and buoy data are both available and truly compare their anomaly trends on the same baseline.
This also allows you to compare the difference in real temperature between buoys and ships.
This is the average temp over the time period of each data set and Zeke quotes 0.12 but does not give the period this must be quoted for.
It really should be for 30 years but it might be extrapolated out over 20 years of data I guess.
When I say “must be quoted for” I mean you cannot pick a point out in time at the start of the change and say buoy temps immediately dropped 0.12 C below the ships. It had to be worked out over time.
Here is the nub of the problem,
What I would hope to see is quite variable data with an overall match in trend*.
I would expect reasonably marked differences in the data from the two different ways of measuring and the various improvement/changes in ship measuring.
If we found an exact match I would hope everyone knows that is basically impossible.. If we found a highly correlated match we should be extremely suspicious mathematically. If we found quite variable data with an overall match in trend this would be very reassuring that the science is being done correctly.
The trends may be quite different because they are over a fixed time period but the difference in temps is simply the average difference over this time.
Again, like the pause [where one can always find a pause] but in reverse one can always find a matching trend in overlapping trends if they overlap twice while passing and you use those two points.
angech says:

February 17, 2017 at 9:25 am

Further,
“why you find it obvious that the two systems can’t have similar trends.”
Not what I said.
I did not say they could not have similar trends*, in fact I would expect similar trends. I said you cannot extrapolate, as Tony did,saying ” Given that the two sets of instruments are measuring the same thing, then the trend is the same, yes?”
They are not measuring the same thing, they are measuring different things, hence the trends can not be the same.
“why you find it obvious that the two systems can’t have exactly the same trend.”
Statistics , Taleb and common sense.
Pingback: “Tony Banton | asoliduniverse
Clive Best says:

February 17, 2017 at 10:38 am

OK here is the problem.

HadSST3 uses a normalisation period 1961-1990 during the last few years of which there were <10% of drifting buoys in use. There is a net bias of about -0.1C in measurements of buoys compared to Ship measurements (ERI). So the net bias in the HADSST3 climatology is 30% of drifting buoys. This introduces a negative bias on the climatology (baseline) of >-0.03C. By now correcting for this bias after 2000 caused by the increasing fraction of buoys to ships reaching 80% in 2016, they overestimate the warming trend since 2000.

HadSST3 on the other hand does correct for the buoy/ERI bias after 1990 (AFAIK) and thereby avoids the extra negative offset, because its normalisation period is free from the problem.
Clive Best says:

February 17, 2017 at 10:50 am

Sorry I’ll try that again – it got snipped.

OK here is the problem.

HadSST3 uses a normalisation period 1961-1990 during the last few years of which there were <10% of drifting buoys in use. There is a net bias of about -0.1C in measurements of buoys compared to Ship measurements (ERI), so their climatology is -0.03C. By now correct for this bias after 2000, caused by the continuing increase in the fraction of buoys to ships, reaching ~80% by 2016, they overestimate the warming trend since 2000 by the extra -0.03C offset in the baseline.

HadSST3 on the other hand does correct for the buoy/ERI bias after 1990 (AFAIK) and thereby avoids the extra negative offset, because its normalisation period is free from the problem.
...and Then There's Physics says:

February 17, 2017 at 10:56 am

Clive,
I don’t think what you’re suggesting is correct. As I think Zeke has already said, you can fix that by simply shifting the baseline to adjust for that offset.

As I understand it, the baseline is an average over some time period for each location. You can then determine the anomalies for the ships and buoys for each location by determining the difference between the measurements and the baseline. If there is an offset between ships and buoys, you can then adjust one of the sets of anomalies to correct for this offset. If your baseline was detemined using both ships and buoys, then that would slightly modify your baseline (which is equivalent to it being relative to a different climatological period). This, however, can also be resolved by simply adjusting your anomalies so that the average over your baseline period is still 0.
Marco says:

February 17, 2017 at 10:56 am

“If we found an exact match I would hope everyone knows that is basically impossible.. ”

What exactly would you consider “an exact match”? You might have noted (probably not, though, I doubt you read the paper – call me skeptical) that Hausfather et al shows “an exact match” (as exact as things usually come in these kind of studies) between satellites and buoys and ERSSTv.4 in terms of variability and the trend.
Nick Stokes says:

February 17, 2017 at 10:57 am

Clive,
I didn’t understand where the 30% came from, or the .03 change to the 1961-90 base. But there is no issue about the anomaly base. Once you have made the .012 correction to buoys, they become virtual ships. Or if ships are corrected, they become virtual buoys. You have corrected the bias as best possible. When you then compute the base for a cell, over whatever period and with whatever mix, you are using a consistent type.
Clive Best says:

February 17, 2017 at 11:10 am

Nick,
For some reason my text gets cut every time. ERSST4 uses 1970-2000 which contains 30% buoys. So there normalisation is biased at -0.03C. Did they correct this? No they didn’t – but they did correct the trend caused by the increase to 80% by 2016. HADSST3 does not have this problem.
...and Then There's Physics says:

February 17, 2017 at 11:16 am

Did they correct this? No they didn’t

How do you know this? As far as I’m aware (and as Nick suggests) they did correct for this. I don’t know if they did this before determining the baseline values, or did an extra correction after determining the anomalies and correcting for the ship – buoy bias, but it would seem obvious that they would not forget to make sure that their measurements are relative to the correct climatology.

but they did correct the trend caused by the increase to 80% by 2016. HADSST3 does not have this problem.

What do you mean by “they did correct the trend”? All they did was shift one set of measurements so as to correct for a bias; a bias that has been known about for about 10 years. This, of course, changes the trend, but since the bias is known, the correction would seem to be the correct thing to do.
Nick Stokes says:

February 17, 2017 at 12:09 pm

Clive,
” Did they correct this? No they didn’t”
I’m sure they did. It’s hard not to. SST anomalies are calculated by cell, and calculation of normals is just part of the process.

But if they didn’t, it still wouldn’t matter, because they are using a fixed base time interval. So if there is a base error in a cell, it’s the same error for all time. It will cause spatial effects but won’t change trends.
...and Then There's Physics says:

February 17, 2017 at 12:58 pm

I downloaded the ocean and land/ocean data from NOAA. As far as I can tell the period from mid-1970 to mid-2000 has an average of 0. So, not quite 1971-2001, but they do seem to be both relative to the same climatology.
Willard says:

February 17, 2017 at 1:13 pm

> Simple.

Then why aren’t you *showing* any trend you’d consider *true* instead of repeating yet again something you still can’t formulate in an argument?

Throwing words seldom undermines formal relationships.

***

> Not what I said. I did not say they could not have similar trends*, in fact I would expect similar trends.

Here’s what you said, Doc:

The trends are said to be similar. You need to specify the time intervals for comparison and it is obvious that two such disparate systems should rarely be in synchronicity as to trends.

Not that this matters much to the *questions* I asked. What is a similar trend to you, Doc?

Show me.
Willard says:

February 17, 2017 at 1:31 pm

> They are not measuring the same thing, they are measuring different things, hence the trends can not be the same.

If you really believe two different time series cannot have the same trend, then the word “trend” may not mean what you think it means.

Your only way out is a very strict notion of sameness.

Hence my previous question.
Clive Best says:

February 17, 2017 at 2:24 pm

“The ERSSTv3 data are baselined on the period 1997-2015, requiring data from at least 10 years for each grid cell and month of the year. The remaining cells are set to missing.”
– Why on earth would you chose yet another normalisation period? ERSST is documented as having a baseline of 1971-2000. Why change it ? How can you then compare results on different baselines. One step offsets? If so please list the offsets you used.

“The buoy record is infilled by kriging using the method of Cowtan and Way (2014).”
– So it is simply smoothed out into cells where there are no buoys – My God!!

How did you normalise the Buoys only signal? Did you form a buoys only baseline or did you simply use ERSST4?

What initially sounds like a nice clean independent study is looking more like a Kevin Cowtan Kluge to me.
...and Then There's Physics says:

February 17, 2017 at 2:26 pm

What initially sounds like a nice clean independent study is looking more like a Kevin Cowtan Kluge to me.

Really? Jeez, you’re a piece of work.
...and Then There's Physics says:

February 17, 2017 at 2:32 pm

Clive,
Oh, you’re also talking about Zeke’s paper? What else were they meant to do? They don’t have buoy data before 1997. Therefore they can only baseline over a period starting in 1997, at the earliest. Please tell me you’re not one of those who goes around criticising things for not doing something that was impossible to actually do?
Magma says:

February 17, 2017 at 3:01 pm

Cowtan provides an excellent example of valuable quantitative contributions to climate science made by a highly competent researcher coming from an area of specialization (protein structure) well outside the norm.

I’d ask what Best’s issue with Cowtan’s work is, if I cared. (I don’t.)
verytallguy says:

February 17, 2017 at 3:06 pm

Please tell me you’re not one of those who goes around criticising things for not doing something that was impossible to actually do?

The nirvana fallacy is a name given to the informal fallacy of comparing actual things with unrealistic, idealized alternatives

…and is perhaps the single most common fallacy of the “sceptic” movement.

https://en.wikipedia.org/wiki/Nirvana_fallacy
...and Then There's Physics says:

February 17, 2017 at 3:08 pm

vtg,
I’m going to have to use that one more. It does seem very apt.
Magma says:

February 17, 2017 at 3:13 pm

@vtg

Perhaps that’s why skeptics like to view themselves as a mixture of Galileo, Feynman and Einstein.

(Except taller and better-looking.)
Willard says:

February 17, 2017 at 3:17 pm

> I’m going to have to use that one more.

You just did – look back at the XKCD cartoon.
...and Then There's Physics says:

February 17, 2017 at 3:23 pm

Willard,
I interpret them slightly differently, but maybe you have a point.
Clive Best says:

February 17, 2017 at 6:12 pm

My basic argument is that the ship/buoys trend bias is a function of the ratio of Buoy measurements to Ship Measurements changing with time. If there was a fixed ratio of buoys/ships the trend would be unaffected. It is NOT the case that ERSST3 simply made a (Buoys+Ships)/2 average as implied by Zeke’s graphs above. The trend bias is caused because there are more buoy measurements at the end of the time period, pulling that average down sightly, relative to the start time-frame in 1997. Buoys only or ships only would both give the same trend – just offset.

The normalisation introduces an extra uncertainty. Cowtan used ERSST3 for normalising 1997-2015. He then matched the smoothed (or kriged) Buoy data relative to that normal, rather than calculating a Buoy’s only normal baseline. Perhaps there aren’t enough data. However, both sets have different trends and thereby introduce an error in the normalisation baseline.

Having once downloaded Kevin’s Python scripts and checked his results for the TAS/SST blending, I agree the guy is smart! I couldn’t find any errors! I just wonder how many times he gets the ‘wrong’ answer which then gets shelved. 😉
willard (@nevaudit) says:

February 17, 2017 at 6:18 pm

> My basic argument is that the ship/buoys trend bias is a function of the ratio of Buoy measurements to Ship Measurements changing with time.

Have you ever tried to test that hypothesis, Clive?
Clive Best says:

February 17, 2017 at 6:25 pm

I don’t need to Willard. It is obvious because there is a fixed offset (like a baseline) A – 0.12*B and if A and B don’t change with time neither does any derived trend.
Joshua says:

February 17, 2017 at 6:30 pm

VTG –

=={ …and is perhaps the single most common fallacy of the “skeptic” movement }==

I don’t know if there is any “official” designation as a fallacy, but in my observations the most common form of fallacious thinking in “skeptics” as a group (and perhaps non-“skeptics” as well), is Apophenia
...and Then There's Physics says:

February 17, 2017 at 6:56 pm

Clive,

My basic argument is that the ship/buoys trend bias is a function of the ratio of Buoy measurements to Ship Measurements changing with time.

Well, yes, I think this is true, and I think is essentially the point. If you have a change in the ratio of buoys to ships, then you end up with a trend bias if there is an offset between the buoys and ship measurements. This is the reason they have to make the adjustment.

If there was a fixed ratio of buoys/ships the trend would be unaffected.

Yes, but there isn’t.

It is NOT the case that ERSST3 simply made a (Buoys+Ships)/2 average as implied by Zeke’s graphs above.

No, I don’t think this is what is implied in the post. I think that was just intended to be illustrative.

The trend bias is caused because there are more buoy measurements at the end of the time period, pulling that average down sightly, relative to the start time-frame in 1997. Buoys only or ships only would both give the same trend – just offset.

Yes, I think this is exactly the point. It’s why you need to make the adjustment.

I don’t think the argument you’re making is your argument, it is the argument (unless I’m missing something).

I think your comment about Cowtan is wrong, but I have to get the dinner out of the oven.
...and Then There's Physics says:

February 17, 2017 at 7:14 pm

Clive,

Cowtan used ERSST3 for normalising 1997-2015. He then matched the smoothed (or kriged) Buoy data relative to that normal, rather than calculating a Buoy’s only normal baseline. Perhaps there aren’t enough data. However, both sets have different trends and thereby introduce an error in the normalisation baseline.

I don’t think this is correct, or really matters. The baseline value is a constant, which is then used to determine the anomalies. If you have a bunch of datasets that are meant to be measuring the same thing, then aligning them in some way is perfectly reasonable way to compare them. The key point in Hausfather et al. is that ERSSTv4 matches the buoy/argp/satellite measurements better than ERSSTv3b. Hence the adjustments made between the buoys and ships seems reasonable.

Having once downloaded Kevin’s Python scripts and checked his results for the TAS/SST blending, I agree the guy is smart! I couldn’t find any errors! I just wonder how many times he gets the ‘wrong’ answer which then gets shelved. 😉

Indeed, but quite why you felt the need to say this is beyond me.
willard (@nevaudit) says:

February 17, 2017 at 7:19 pm

> I don’t need to Willard.

Then I fail to see how it can be a scientific argument, Clive.
willard (@nevaudit) says:

February 17, 2017 at 7:24 pm

> I just wonder how many times he gets the ‘wrong’ answer which then gets shelved.

I’m sure you wonder the same about the auditing sciences’ deliverables 😀

Somehow auditors will need to distinguish double blinds and double binds 😉
Frank says:

February 17, 2017 at 7:31 pm

Thanks for the reply, VIctor. I don’t want to start a big debate with you, but there could be some value in an alternative point of view.

Victor wrote: “When models and observations do not fit, the models can have a problem, the observations can have a problem or the comparison can have a problem. Or all three. Do not jump to conclusions and immediately blame the models.
http://variable-variability.blogspot.com/2015/09/models-and-observations.html”

Frank replies: We are supposed to be scientists. Comparing hypotheses with observations is how we determine what is true about the world. Each model (with a given set of parameters) is a different hypothesis. As Box said, every model is wrong, but we need to determine which model is useful (before we find out at the end of the 21st century). With those principles established, I’m willing to admit the possibility that error or uncertainty may cause us to fail to get the right answer when we are doing our job as scientists, conducting experiments. The change from ERSST3 to ERSST4 may be looked at as an experiment to see if there was a better was integrate historical SST data and ERSST3 could be considered to be a flawed experiment.

Remarks like this make me feel that climate scientists think model output is real and that observations (which do have many limitations and problems) are a much less useful version of that reality. That is an unproven hypothesis at best, and a massive source of confirmation bias at worst. The use of homogenization algorithms may illustrate this difference. You may view the corrected output as “real observations”, but I look at each undocumented breakpoint correction as an untested hypothesis about what happened at a particular time and place – and part of the uncertainty in the overall trend. If a breakpoint were caused elimination of gradual growing bias – say a station is moved from an urbanizing site to a nearby park – correcting the resulting breakpoint biases the trend. Unfortunately, it is far easier to detect isolated breakpoints than gradually growing biases.

Victor: “Let’s not confuse the question whether there was a decrease in the rate of warming with the question whether models and observations fit together. The CMIP ensembles are not made for decadal prediction, but for long-term warming. For short term fluctuations, the uncertainty is about twice as large as the model spread.
http://variable-variability.blogspot.com/2015/09/model-spread-is-not-uncertainty-nwp.html”

Frank replies: There is no good reason, other than simplicity and tradition, to analyze warming trends using a linear AR1 model. AFAIK, nothing tells us that warming should be perfectly linear with time or that the noise is best fit by an AR1 model. There is fundament physics underlying climate models, so their output provides an alternative to a linear AR1 model. 20th-century fits very poorly to a linear AR1 model, but fits much better to climate model output.
Clive Best says:

February 17, 2017 at 7:32 pm

The normalisation period does matter. If you your baseline is calculated using different measurements with a lower trend, then your results will have a higher trend during that same normalisation period. Nor does moving them up or down change that.
...and Then There's Physics says:

February 17, 2017 at 7:38 pm

Clive,

The normalisation period does matter. If you your baseline is calculated using different measurements with a lower trend, then your results will have a higher trend during that same normalisation period. Nor does moving them up or down change that.

Yes, but so what? The baseline value is simply a number. All it does is move the data up, or down. I’m also not sure that your claim is actually correct. How do you know that the Hausfather et al. uses ERSSTv3 data to produce the 1997-2015 baseline? If you look at the caption to Figure 1, it says

ERSSTv4 is shown as a broad band for visualization purposes; this band does not represent an uncertainty range. The series are aligned on the 1997–2001 period for comparison purposes.
...and Then There's Physics says:

February 17, 2017 at 7:41 pm

Actually, the paper says

The spatially complete ERSSTv3b record was therefore aligned to 0 during the 1997–2015 period, and then the other data sets are aligned to the normalized ERSSTv3b map series. This method is a conservative choice in attempting to detect a bias in the ERSSTv3b record, as it may bias the compared series slightly toward it.
Clive Best says:

February 17, 2017 at 7:44 pm

Check out http://www-users.york.ac.uk/~kdc3/papers/ihsst2016/series.html Kevin did all the calculations as far as I can see.

No it is not just a number. It is a 30 year average of the 12 monthly seasonal variation at each 1 degree grid point.
...and Then There's Physics says:

February 17, 2017 at 7:47 pm

No it is not just a number. It is a 30 year average of the 12 monthly seasonal variation at each 1 degree grid point.

Yes, but each baseline value is just a number from which one can compute an anomaly.
Victor Venema (@VariabilityBlog) says:

February 17, 2017 at 8:19 pm

Frank: “Remarks like this make me feel that climate scientists think model output is real and that observations (which do have many limitations and problems) are a much less useful version of that reality. That is an unproven hypothesis at best, and a massive source of confirmation bias at worst. The use of homogenization algorithms may illustrate this difference. You may view the corrected output as “real observations””

How does me saying that the model could be wrong make you think that climate scientists think model output it real? How does me saying that the estimated warming from observations could be wrong make you think that climate scientists think this observational estimate if real?

Maybe you prefer not to admit it, but it sounds as if you agree with me that both model and observations can be wrong. (And we should really not forget that also the comparison method can have problems, such as comparing SST and marine air temperature, taking into account where we have measurements, etc.)

When one sees a discrepancy one studies it and that provides the evidence what was wrong. The discrepancy only showed that attention is needed; it is not automatically the curve showing the smallest climatic changes that is right. (And also without (statistically significant) discrepancy scientists keep on trying to improve our understanding of the climate system.)
willard (@nevaudit) says:

February 17, 2017 at 9:10 pm

> That is an unproven hypothesis at best […]

I too prefer my hypotheses proven.
Frank says:

February 17, 2017 at 10:23 pm

Victor: We may have differing perspectives. My first priority is to test a hypothesis or model. Careful examination of the experiment or data analysis used to test the hypothesis is step 2. You appear to be starting with the assumption that the model/hypothesis and the data are equally likely to be wrong, and that any resolution of a discrepancy is a satisfying answer because the model or hypothesis survives. Now, I’ll apologize right now if I’ve mis-stated your position in any way. I find this difference in perspective hard to describe and would appreciate being corrected.

When the putative “hot-spot” in the upper tropical troposphere first became an issue, dozens of researchers leapt to find biases in the radiosonde data. I don’t know of any efforts to re-parameterize or refine AOGCMs to see if the laws of physics lying behind these models always produces a hot-spot. (We know that the convective storms in this location are difficult to model). Eventually radiosonde data sets were homogenized (or cherry-picked – it is hard to tell from the outside) and the discrepancy was minimized after about a decade of effort. One paper claimed the whole problem can be eliminated by taking wind speed in the upper troposphere into account. By then, however, the RSS and UAH records were long enough that they demonstrated the absence of a hot spot too. Now they are being closely scrutinized also. Hopefully I’m not naive about this subject. Tracking changes in the upper tropical troposphere over decades is challenging. There is a moist adiabatic lapse rate. There is more warming in the upper troposphere than the surface during strong El Ninos (including 15/16???) and more cooling during strong La Ninas (but these phenomena are poor models for GW.) Someday the discrepancy may be resolved in favor of the models. but there should be more emphasis on testing the models and less on testing the data.

This paper tests models – while properly dealing with the uncertainty in observations. Do models do a good job of reproducing feedbacks in response to seasonal warming and cooling? Is your first thought that the observations could be wrong and that models could be right about climate sensitivity despite these problems? Or did you notice that the models disagree with each other and that some of them are seriously wrong?

Click to access 7568.full.pdf
izen says:

February 18, 2017 at 12:27 am

@-Frank
“By then, however, the RSS and UAH records were long enough that they demonstrated the absence of a hot spot too. … There is more warming in the upper troposphere than the surface during strong El Ninos (including 15/16???) and more cooling during strong La Ninas”

As predicted by the models. That there is a hot-spot or tropospheric amplification is a result of accurate physical modelling and observations. Because the magnitude of variation is much greater in the troposphere the short satellite measurements are unable to demonstrate NO AGW driven warming of the hot-spot as it is buried in the ‘noise’.

@-” Is your first thought that the observations could be wrong and that models could be right about climate sensitivity despite these problems? Or did you notice that the models disagree with each other and that some of them are seriously wrong?

My first thought is that the observations are obviously wrong. the various satellite measurements do not match each other. In the case of cloud radiative forcing in figs 4 and 5 the error bars hardly overlap!?
Second thought is that models are always wrong, the map is not the territory, but they can still be helpful. Seems odd to pick a paper to highlight model discrepancies over observational disagreements that says this –

B. The average gain factor of the solar clear-sky feedback of the CMIP models is 0.30 and is similar to the 0.32 obtained from both ERBE and CERES SRBAVG but is larger than the 0.18 obtained from CERES EBAF.
Victor Venema (@VariabilityBlog) says:

February 18, 2017 at 1:47 am

Frank: “You appear to be starting with the assumption that the model/hypothesis and the data are equally likely to be wrong, and that any resolution of a discrepancy is a satisfying answer because the model or hypothesis survives.”

I would almost rhetorically ask where I stated that? I wrote that there are three possible reasons and that it is wrong to assume that there is only one reason, like you initially stated and then later kinda retracted by admitting that also observations could have errors.

How likely which cause is depends on the case and is a purely subjective assessment and thus not a likelihood in the statistical sense. Only once we understand the problem better can we say what the reason was.

I work on the quality of observational data and thus know there are problems with the observational data. Reversely because there are problems with observational data, I work on the quality of observational data. From my perspective there could be nothing more stupid that assuming that any discrepancies can only be caused by climate models.

When not comparing the data with models, the mitigation sceptical movement has no problem with emphasising errors in observational estimates. This guest post, the Daily Mail article that started this mess and the political harassment of Lamar Smith (TX21) in response clearly show that the mitigation movement loves attacking observational estimates, with great unreasonable ferociousness. But when it comes to models comparisons that should suddenly be off limits? One should put on blinders and only be allowed to study problems with the climate models? Sounds unreasonable to me.

Let’s not open up the tropical-hotspot can of worms before we agree on something this basic.
Steven Mosher says:

February 18, 2017 at 2:23 am

“Once one is dealing with anomalies one is already removed from the “raw” data. There has already been processing that contains assumptions.”

somebody needs to explain to this boy that F and C are not actually real temperatures..
but scales that contain a lot of assumptions about freezing and boling water and standard conditions..

So ya, Like I said, in anticipation of stupid skeptical arguments just use Kelvin

answer doesnt change. world is warming, man is a primary cause.

WRT to baselines.. jeez. I wasted a bunch of time long ago because I thought baselines matter.

For surface temperatures I calculated the result for every possible 30 year baseline, 40 year, 60 year, complete series… balh blah blah..

If I found anything publishable, well, I’d have my nobel by now.

noise changes, duh, but the answer is still the same. world is warming, man is the primary cause.

Funny how that works.
angech says:

February 18, 2017 at 5:08 am

Marco says: February 17, 2017 at 10:56 am
“If we found an exact match I would hope everyone knows that is basically impossible.. ”
What exactly would you consider “an exact match”? You might have noted (probably not, though, I doubt you read the paper – call me skeptical) that Hausfather et al shows “an exact match” (as exact as things usually come in these kind of studies) between satellites and buoys and ERSSTv.4 in terms of variability and the trend.”
Well there you have it.
“Hausfather et al shows “an exact match” between satellites and buoys and ERSSTv.4 in terms of variability and the trend.”
I would have hoped that this did not happen.
Exact matches with chaotic inputs at different times and with different thermometers as the two sets above just do not match exactly.
I would be prepared to say never.
It is worse than getting a bridge hand with 13 spades.
one chance in 635,013,559,600 deals.
Note that this particular hand does occur far more frequently than the odds suggest.
“If we found an exact match I would hope everyone knows that is basically impossible.. ”
angech says:

February 18, 2017 at 6:07 am

Willard says: February 17, 2017 at 1:13 pm
1 “> Not what I said. I did not say they could not have similar trends*, in fact I would expect similar trends”
.Here’s what you said, Doc
2. The trends are said to be similar. You need to specify the time intervals for comparison and it is obvious that two such disparate systems should rarely be in synchronicity* as to trends.”
–
When in a hole stop digging? Still. Digs frantically.
–
1 refers to expected similar trends referring to the common baseline comparison.
You may have read my previous comment where I said ” What I would hope to see is quite variable data with an overall match in trend*.
–
2 refers to trends from disparate [measuring] systems and different time intervals.
–
“Your only way out is a very strict notion of sameness.”
Thank you, I’ll take it.
Synchronicity is probably the wrong word for comparing a trend as it means the same variation change with time whereas a trend once defined is invariant.
If the synchronicity is a straight line that would fit the very strict notion of sameness.
Very gentlemanly of you, Willis.
izen says:

February 18, 2017 at 6:31 am

@=angtech
“If the synchronicity is a straight line that would fit the very strict notion of sameness.”

So they are not the same trend, but just have the same numerical value. (within statistical uncertainty)
Steven Mosher says:

February 18, 2017 at 7:18 am

““When models and observations do not fit, the models can have a problem, the observations can have a problem or the comparison can have a problem. Or all three. Do not jump to conclusions and immediately blame the models.”

Thanks Victor I missed that post of yours. It amazes me that skeptics miss this point, even though there are a few notable examples of it in the history of science.

It’s actually more complicated in my mind.

First you have a theory, then you extract a hypothesis from the theory to test.
Then you test the hypothesis, bu collecting observations, from an instrument.
The instrument itself is built in accordance with “known” theory.

When the observations disconfirm the hypothesis you have several choices, none of which
are determined by the outcome of the test. Typically, these choices are made pragmatically.

A) you check that you properly derived a hypothesis from the theory..
B) If the experiment is cheap, maybe you repeat it more times.
C) You also check your observations and instruments.
D) You look at your theory, is it missing something? what would have to change? how far does
that cascade into deeper theory.

I know I’ve raised this example a bunch of times but since Feynman is involved it is particularly delicious. It includes a climateball move by a cartoonist and a beautiful archtypical “its too complicated” skeptical meme.

read the whole thing

http://www.pbs.org/wgbh/nova/physics/solar-neutrinos.html
Marco says:

February 18, 2017 at 7:19 am

“I would have hoped that this did not happen.
Exact matches with chaotic inputs at different times and with different thermometers as the two sets above just do not match exactly.”

Argument from incredulity. Great. Thanks.
Hyperactive Hydrologist says:

February 18, 2017 at 9:14 am

Aren’t satellite observations based on models?

Frank,
A lot of the issues with models comes down to the resolution. As you mention above GCMs do not replicate convection instead it is parameterised, topography is poorly represented, especially for large mountain ranges such as the Andes and Himalayas.

In you opinion does this mean that models are useless?
Victor Venema (@VariabilityBlog) says:

February 18, 2017 at 2:31 pm

Steve Mosher, indeed it is more complicated. Reality (nearly?) always is.

A clear distinction between theory/understanding, models and observations is not possible. Observations always have a theory/model component because they are based on our current understanding of how the instrument measures the measurand and what is called “observations” nearly always includes some sort of data processing, if only an analog to digital converter with a band pass filter.

This is especially true for the satellite temperature estimates, which include a lot of modelling and are heavily adjusted.

Invloed van aanpassingen in UAH satelliet dataset op het temperatuurverschil tussen 1998 en 2016. JH pic.twitter.com/Je18iV0j1T

— KlimaatVerandering (@klimaatVeranda) January 5, 2017

But because it is more noisy, the trend is less visible. My guess is that that is what the mitigation sceptical movement likes.
John Hartz says:

February 18, 2017 at 5:43 pm

Is the following paper relevant to this discussion?

Consistent near-surface ocean warming since 1900 in two largely independent observing networks, Viktor Gouretski, John Kennedy, Tim Boyer, Armin Köhl, Geophysical Research Letters, 5 October 2012,

It is cited/linked to in:

Rising sea temperatures are shaping tropical storms in southern Africa by Jennifer Fitchett, The Conversation Africa, Feb 17, 2017
Willard says:

February 18, 2017 at 9:27 pm

> Very gentlemanly of you […]

Thanks, Doc. It is also self-interested: a complete blockade would disrupt the exchange, radicalize your stance, and muddy the waters. Clarifying the concept of synchronicity is enough for me.

With one breath, with one flow, we will know synchronicity.

There are many ways to characterize it. It can be seen in signal analysis. It can be seen in media theory. It can be seen as a sleep trance or a shared romance too.

I surmise your point is simply to say that we’d need all the data points between the two sets in a way to establish a pairwise time-space correlation. That would explain why you’d insist in requesting the measurement data. That would cohere with your dismissal of anomaly data as mere artifice, paraphrasing of course.

That criticism has been countered many times already. Anomalies are used to filter out what can be seen as noise, and the data is processed so that trends can be detected. These trends are far from satisfying any kind of synchronicity we’d all prefer – a connecting principle, linked to the invisible, almost imperceptible, something inexpressible, science insusceptible, logic so inflexible, causally connectible, yet nothing as invincible.

Such is life, I guess.
Frank says:

February 19, 2017 at 1:19 am

Victor wrote: “I work on the quality of observational data and thus know there are problems with the observational data. Reversely because there are problems with observational data, I work on the quality of observational data.”

Very well said. Everything would be fine if confirmation bias – believing AOGCMs are correct – didn’t influence the choices scientists made when scrutinizing observations.

Victor continued: “From my perspective there could be nothing more stupid that assuming that any discrepancies can only be caused by climate models.”

Is this evidence of confirmation bias? Given the existence of problems with observational data, shouldn’t the resolution of those problems be equally likely to increase the size of the discrepancy rather than close it? Or to reduce warming rather than increase it? When a discrepancy exists between observations and hypothesis/model/theory, any assumptions about how the discrepancy will be resolved mean that you are using a hypothesis to test data, not data to test a hypothesis.

I mentioned radiosondes and the hot-spot because this APPEARS to be one area where confirmation bias influenced the “homogenization” of records. Maybe problems will be found with RSS and UAH analysis of satellite data and I will be forced to conclude that homogenization of radiosonde data wasn’t biased.
Willard says:

February 19, 2017 at 1:29 am

> Is this evidence of confirmation bias? Given the existence of problems with observational data, shouldn’t the resolution of those problems be equally likely to increase the size of the discrepancy rather than close it? Or to reduce warming rather than increase it?

Arguing by Just Asking Questions is boring, Frank.

Please try to raise concerns on what you could bet a solid stack of chips.
Steven Mosher says:

February 19, 2017 at 3:53 am

“When a discrepancy exists between observations and hypothesis/model/theory, any assumptions about how the discrepancy will be resolved mean that you are using a hypothesis to test data, not data to test a hypothesis.”

Let me tell you what I live for as someone who actually gave away thousands of hours for nothing looking at observational data..

This is what I live for.. I live for the day when stuff I do with observations will force a modeler to change his code. And for that to happen I have to make damn sure any improvements I make to the the understanding of observations withstands the scrutiny of guys in the modelling world.
Passing the “acid test” of skeptics looking at my stuff.. lemon juice is acidic right? that’s about the level of their scruntiny

IF I was invested in maintaining the models view of things I would go work on models.

Lets put it another way. I dont work on models. I will never work on models. Been there done that. I have zero interest in supporting their predictions, and it would be a feather in my cap, if they ever had to change a model on account of my data. Aint gunna happen.

After we finish new versions, comparing to models is the last thing we do. And nobody goes back and says… oh how can we get a better fit. At one time we did a bunch of charts showing problems with some models.. It was fun.. but mostly an after thought.

There have been a few notable improvements in model observations comparisons since Ar5

really basic stuff that, well, nobody thought was that critical.

A) getting the masking right
B) using SST instead of SAT over the ocean.

Both of those did more to tighten up agreement than any data averaging choices.

I’ll repeat what I said. When models and data disagree, what you do next is largely Pragmatically determined. Not politically driven, Start with the easiest cheapest double checks you can do.
Start with what you have some expertise in. I dont fiddle with models when my observations dont match, because I dont have a supercomputer. So I double check the observations… and triple and well.. none of this has to do with confirmation bias toward preserving a model view of the world

Then again, when a model , like a GCM, disagrees with data.. I always think.. Hey that model makes certain assumptions about the laws of gravity and conservation of energy and it even assumes certain things about logic and math… Maybe the real problem lies deep within one of these fundamental pieces of understanding… I’ll go look at that first and try to upend our current understanding of gravity. We generally dont do this. Why? because of a “confirmation bias” toward the fundamentals. It way easier and more sensible to get more data, check the data twice, check the shaky parts of the model.. but logically strictly logically…. it could be anything, model data comparison method.. any or all

Another good history example is nuclear beta decay which appeared to violate the conservation of energy when it was first observed. While some (Bohr) suggested that this fundamental conservation of energy might have to go, in the end Pauli suggested a Unicorn to save the principle. Two years later someone found it.

what did they name the unicorn?
Steven Mosher says:

February 19, 2017 at 3:59 am

And Frank, go re read ATTPs post on Eddington
JCH says:

February 19, 2017 at 6:53 am

Sorry, but when the discussion of a warming pause first began, the most likely cause was bad observations, and the longer it went on, the more likely that became.
Clive Best says:

February 19, 2017 at 9:44 am

One correction:
“B) using SST instead of SAT over the ocean.”

This is a actually a correction made using the very models that are then to be compared to data. So it is a bit incestuous.
...and Then There's Physics says:

February 19, 2017 at 9:47 am

Clive,

This is a actually a correction made using the very models that are then to be compared to data. So it is a bit incestuous.

How else would you do it?
Clive Best says:

February 19, 2017 at 10:09 am

How about using the model SST values instead. So blend model values of TAS over land with model values of SST over oceans. Then compare to what is actually measured rather than what the models think should be measured.

This is also true of the way Sea Ice is treated as if it is land and not Ocean. It is not land – it is frozen ocean. In other words it is for the models to predict reality – not so that they can ‘correct’ the measurements.
...and Then There's Physics says:

February 19, 2017 at 10:15 am

How about using the model SST values instead. So blend model values of TAS over land with model values of SST over oceans.

As I understand it, this is what is done. According to Richardson et al. (2016):

Model temperatures are reconstructed in three ways: by using global air temperature (‘tas-only’), by blending air temperature over land and sea ice with ocean temperatures over water (‘blended’), and by blending temperatures and using the historical geographical coverage of observations in HadCRUT4 (‘blended-masked’). We assume that the modelled near-surface water temperature over oceans (‘tos’ in CMIP5 nomenclature) is equivalent to measured sea surface temperatures.
Clive Best says:

February 19, 2017 at 10:41 am

Sorry you’re right! That paper does the right thing and I even recalculated it myself at the time to check. The effect is pretty small though.

The blue curve is the average of 6 CMIP5 model results for the global temperature anomaly. The red curve is the blended result corresponding to Hadcrut4.
...and Then There's Physics says:

February 19, 2017 at 10:46 am

Clive,

The effect is pretty small though.

Small relative to what? The point is that if your surface datasets are a combination of air temperatures and sea surface temperature, and suffer from coverage bias, they could be under-estimating the surface warming.
angech says:

February 19, 2017 at 10:53 am

“If the synchronicity is a straight line that would fit the very strict notion of sameness.
izen says: So they are not the same trend, but just have the same numerical value. (within statistical uncertainty).
Willard picked me up ,correctly, on my definitions but kindly gave me an out.
If you have two variable data sets which cross over twice and you take the trends of both sets at the two crossover points you will have an identical [same] trend which will also have the same numerical value with no statistical uncertainty.
Synchronicity as a term could be applied to the two trends as they share the same timeline and are the same but is probably not appropriate as the time element was chosen, not evaluated.
–
It is best used on the variations in the two data sets over any matching periods of time.
If synchronicity is detected as in,
“Hausfather et al shows “an exact match” between satellites and buoys and ERSSTv.4 in terms of variability and the trend.”
then alarm bells should be ringing. Very loud alarm bells. All I hear is silence.
At least Zeke is only showing graphs for illustrative purposes only, not the real graphs.
So the matching Marco sees may well be for illustrative purposes only. I do see a slight variation around 2010 with flattening of the ship data.There should be a lot more.
Real data will have a lot more variation than this Marco, ask Zeke.
I can give him matching trends if taken at and only at two crossover points of the data, but I will never accept synchronicity [exact matching in time] of variation in such complex derived data.
The real graphs [not ones for illustrative purposes only] of 1 the buoy anomalies 2. the ship anomalies over time and over matching time. should be shown.
–
The time period used to determine the offset of 0.12 C claimed for the ship/buoy differences should also be given. Who worked it out, how and where and how it is applied.
Ron Graf says above: Zeke, gave a partial answer.
They do have side-by-side data from ships and buoys. That’s where the ~0.1 C offset comes from. ERSSTv5 will be updating this approach to use a dynamic offset calculated each month rather than a static offset for the full period.”
This sudden jump by matching baselines and a constant offset is wrong in principle and practice, there should be a constant merge of the changing ratio of the two data sets as Zeke says which invalidates the whole idea of of a lump change to a common baseline.
There is a defense here of a method which has already been thrown out by Zeke in ERSSTv5.
Clive Best says:

February 19, 2017 at 11:02 am

The net effect is only about -0.05C overestimate of measured temperatures. However, this assumes that the enhanced CO2 effect is modelled correctly by models at the ocean surface. I don’t see any physical reason why the air temperature 2m above the sea surface should be slightly warmer than at the surface. After all it is the Ocean surface that radiates IR , so I would expect this to rise in temperature first in response to GHE and then warm the air immediately above the ocean.] by direct contact.
...and Then There's Physics says:

February 19, 2017 at 11:07 am

Clive,
0.05K is, however, about 10% if we consider the period shown in your graph. There’s also coverage bias, which HadCRUT4 suffers from more than some of the other datasets.
Olof R says:

February 19, 2017 at 11:43 am

Clive, the air 2 m above Sea surface is not warmer than SST, but the warming trend is slightly higher in the air compared to the water..

A while ago I did this quick and dirty check for oceans 60N-60S, trends for 1901-2000.
Multimodel mean: The tas trend was 0.08 C/decade higher than the tos trend
Observations: The HadNMAT2 trend was 0.09 C/decade higher than the HadSST3 trend
Clive Best says:

February 19, 2017 at 12:13 pm

No – Hadcrut4 does not suffer from coverage bias. The other datasets suffer from an in-filling and smoothing bias especially in the Arctic. Cowtan and Way is simply their attempt to do the same thing and infill Hadrcut4 into the Arctic as well. Why do they not also infill into Antarctica or into Africa where coverage is very sparse.

For better or worse HADCrut4 is the only series based on pure measurements. Long may it remain so.
Marco says:

February 19, 2017 at 12:13 pm

“then alarm bells should be ringing. Very loud alarm bells. ”

But *why*? You keep on saying it, but provide no explanation whatsoever, other than that you will never accept it. The methods are measuring the same thing, and NOT seeing the same patterns should get alarm bells ringing!

If I measure the body temperature of a large population of people over time, and I see my two instruments (e.g. IR-based vs alcohol-based) giving a distinctly different temporal variability, I’d be really worried. Apparently you don’t start worrying *until* the two instruments do give the same temporal variability in this large population…
...and Then There's Physics says:

February 19, 2017 at 12:20 pm

Clive,

Hadcrut4 does not suffer from coverage bias.

Come on, of course it does. It doesn’t sample the polar regions, therefore it suffers from coverage bias. That the other datasets may use techniques to infill these regions does not somehow negate this.
Marco says:

February 19, 2017 at 12:22 pm

“Cowtan and Way is simply their attempt to do the same thing and infill Hadrcut4 into the Arctic as well. Why do they not also infill into Antarctica or into Africa where coverage is very sparse.”

Err….they did.
...and Then There's Physics says:

February 19, 2017 at 12:25 pm

Err….they did.

I must admit that I am impressed that Clive is willing to retract his claims when shown to be wrong. Might be nice if he checked before making them, though 🙂
Clive Best says:

February 19, 2017 at 12:26 pm

@Olof R

I think that what that really is saying is that night time air temperatures have been rising slightly faster than daytime temperatures. Diurnal (and probably also seasonal) Tmin air temperatures have risen at a faster rate that SST. However, that can’t continue indefinitely.
Joshua says:

February 19, 2017 at 12:33 pm

Anders –

=={ I must admit that I am impressed that Clive is willing to retract his claims when shown to be wrong. .) ==

Where did that happen?
Victor Venema (@VariabilityBlog) says:

February 19, 2017 at 12:38 pm

Frank: “Everything would be fine if confirmation bias – believing AOGCMs are correct – didn’t influence the choices scientists made when scrutinizing observations.”

No scientist believes models are perfect.

So even when model and observations fit within their uncertainties you would still reject climate science because of this hand-waving argument? Is there anything that could happen that would make you accept science?

Victor: “From my perspective there could be nothing more stupid that assuming that any discrepancies can only be caused by climate models.”

Frank: “Is this evidence of confirmation bias? Given the existence of problems with observational data, shouldn’t the resolution of those problems be equally likely to increase the size of the discrepancy rather than close it?”

No. No.

There is only one reality. Models try to model it. We use observations to estimate how it looks like. It is perfectly possible to find problems that increase or produce a discrepancy, but it is more likely than not that they converge on reality. No confirmation bias needed for that, just people dedicated to understand reality.

Frank: “Or to reduce warming rather than increase it?”

Contrary to your claim, the adjustments reduce the estimated warming and do not change the global mean much over de last decades. If that would not have been the case that would also be perfectly fine, reality is what it is and reality could have been different.

Frank: “When a discrepancy exists between observations and hypothesis/model/theory, any assumptions about how the discrepancy will be resolved mean that you are using a hypothesis to test data, not data to test a hypothesis.”

You are the one who makes (made?) assumptions on how the discrepancy would be resolved. I was the one pointing out that there are more possibilities. If I may get somewhat personal: Who should check for confirmation bias more?

Frank: “I mentioned radiosondes and the hot-spot because this APPEARS to be one area where confirmation bias influenced the “homogenization” of records. Maybe problems will be found with RSS and UAH analysis of satellite data and I will be forced to conclude that homogenization of radiosonde data wasn’t biased.”

So Roy Spencer and John Christy of UAH appears to have a confirmation bias to more warming or to make their data fit to the modelz? Interesting.

The first estimates by UAH of tropospheric temperature change showed a cooling. Now after many adjustments and more data they show a warming. UAH5 showed as much warming as the surface. That was the time mitigation sceptics showed the world the RSS dataset. RSS made some updates and showed about the same amount of warming as UAH5, then UAH updated its dataset to show considerably less warming. Now the mitigation sceptics show the world the UAHv6 dataset.

Invloed van aanpassingen in UAH satelliet dataset op het temperatuurverschil tussen 1998 en 2016. JH pic.twitter.com/Je18iV0j1T

— KlimaatVerandering (@klimaatVeranda) January 5, 2017

But scientists cannot be trusted because of confirmation bias. Why don’t you complain with your colleagues first? Why don’t you perform some introspection first?
...and Then There's Physics says:

February 19, 2017 at 12:41 pm

Joshua,
Here. To be fair, it might only be once.
Joshua says:

February 19, 2017 at 1:23 pm

Anders –

Thanks. I thought you meant he acknowledged being wrong about C&W not infilling in Africa or Antarctica (to skew the temp record).
Magma says:

February 19, 2017 at 3:08 pm

“There is only one reality. Models try to model it. We use observations to estimate how it looks like.” — Victor Venema

It’s hard to see why some have difficulty grasping basic concepts that can be expressed this clearly and simply.
John Hartz says:

February 19, 2017 at 3:59 pm

Magna:

It’s hard to see why some have difficulty grasping basis concepts that can be expressed this clearly and simply.

Their understanding of basic concepts is filtered through the lense of ideology, i.e,. cognitive disonance.
Victor Venema (@VariabilityBlog) says:

February 19, 2017 at 4:17 pm

John Hartz says: “Their understanding of basic concepts is filtered through the lense of ideology, i.e,. cognitive disonance.”

That goes for everyone. We are humans. But before getting into this internet US climate “debate” I have never encountered such egregious examples of this.

Now culminating into people willing to claim that the in inauguration of Trump was the biggest ever. It was big, much bigger than any group I ever talked to, but to claim it is the biggest is beyond bizarre. That really leads to the fair question what the hell is wrong with parts of American society and should not be waved away by saying it is normal human behaviour.
John Hartz says:

February 19, 2017 at 4:27 pm

Victor: While every human being views reality through the lense of their own unique set of life experiences, not everyone adheres to a specific political and/or religious ideology.
John Hartz says:

February 19, 2017 at 4:35 pm

Victor: Re the size of the crowd at Trump’s ignauguration, we know that Trump asserts it was the biggest ever, but I’m not sure that significant number of Americans believe him.
Clive Best says:

February 19, 2017 at 4:52 pm

I may be wrong, but to be honest after reading their paper, I can’t work out whether they do infill missing cells in Africa and Antarctica or not! Where is the gridded (lat,lon) data for C&W? One problem is that by renormalising Hadcrut4 to 1981-2010 in order to use the satellite data, they reduce the active H4 cells by nearly 4%, and most of those are concentrated in the Arctic.
...and Then There's Physics says:

February 19, 2017 at 5:02 pm

Where is the gridded (lat,lon) data for C&W?

Here, I think.
Victor Venema (@VariabilityBlog) says:

February 19, 2017 at 5:37 pm

@ John Hartz
https://www.washingtonpost.com/news/monkey-cage/wp/2017/01/25/we-asked-people-which-inauguration-crowd-was-bigger-heres-what-they-said/
“But what’s even more noteworthy is that 15 percent of people who voted for Trump told us that more people were in the image on the left — the photo from Trump’s inauguration — than the picture on the right [the photo from Obama’s inauguration]. We got that answer from only 2 percent of Clinton voters and 3 percent of nonvoters.”

That is just a few percent of all Americans who are willing to say that Trump’s inauguration was larger. That is not normal being a humanly biased to your pre-existing believes.

The people in the climate “debate” are a self-selected group. I would guess that mitigation sceptics mostly voted Trump. I would guess it is not far fetched to assume that the people willing to say the Trump inauguration was larger are overrepresented in the mitigation sceptical movement.
Clive Best says:

February 19, 2017 at 5:58 pm

You’re right. C&W do infill Antarctica !
BBD says:

February 19, 2017 at 7:41 pm

@ Victor

Alternative facts for an alternative reality. They will deny everything, even the evidence…
Marco says:

February 19, 2017 at 7:43 pm

…and Africa.
Marco says:

February 19, 2017 at 7:44 pm

Oh, P.S., Figure 6 in the paper.
Clive Best says:

February 19, 2017 at 8:03 pm

Not convinced Africa has been infilled. I also still believe that real measurements are sacrosanct.
...and Then There's Physics says:

February 19, 2017 at 8:06 pm

Clive,

I also still believe that real measurements are sacrosanct.

Believe away, but I think this is silly. If you know that your measurements don’t cover everything you would like to measure, then finding ways to compensate for this is an entirely reasonable strategy.
BBD says:

February 19, 2017 at 8:38 pm

I also still believe that real measurements are sacrosanct.

Now that dogwhistles nefarious intent: the infilling stuff is suspect. So it follows by insinuation that scientists inflated warming with biased infilling methodology.
Willard says:

February 19, 2017 at 9:07 pm

> and Africa.

Let’s infill furthemore:
Nick Stokes says:

February 19, 2017 at 10:49 pm

Clive
“I also still believe that real measurements are sacrosanct.”
The thing is, they aren’t even measurements. They are grid estimates based on station measurements. They are intermediates in particular way of estimating a global average based on a few thousand station readings. That is intrinsically an infilling exercise. You are infilling the whole globe based on samples.

Gridding isn’t a great way of doing that, but it is almost universal. When there are empty cells, the estimated grids form a secondary sampling problem. You have to estimate the global average based on what you know. You still have to estimate the empty cells. You should do that as well as you can (infill).
Steven Mosher says:

February 19, 2017 at 11:25 pm

Thank Nick

I think Clive doesnt get “infilling”

Clive… Nobody Infills.

At the heart of it all spatial methods are predictive methods. We use the point measurements
where we have them to estimate values where we dont have them

For BE we do a regression that models the temperature at any location given Latitude, elevation and season. There is no in filling.. there is estimating, interpolating, predicting what the temperature is in places where we have no measurements.

For GISS.. they work in anomalies and lat/lon.. based on the sample, they estimate anomalies as a function of distance and lat and lon

You can compare and test the effectiveness of these methods of predicting temperatures where you have none, several ways– holding samples out is the easiest.

Now, GISS and others can be seen as a subsample of BE.. They predict africa with a smaller sample than we have. so you can quite easily check their method.

Same with the arctic and greenland where CRU and GISS have far fewer stations.. they make a prediction where they have no data and we can check how well that works.. because… ya we have data they dont have.
Steven Mosher says:

February 19, 2017 at 11:35 pm

Willard yes Africa,

last I looked there were 500-600 active stations, at least in BE.

Now if we renamed these stations CET no one would question they could represent the whole planet and not just africa.
Steven Mosher says:

February 19, 2017 at 11:44 pm

“No – Hadcrut4 does not suffer from coverage bias. ”

Coverage bias arises IF

a) a non predicted region, increases or decreases more than the global average of the
predicted regions.

In the case of Hadcrut this happens because they dont predict the arctic and it has warmed at a higher rate than the global average.

In the case of NCDC the opposite has happened because they dont predict antarctica and it has ( in certains cases ) warmed LESS that the global average.

A while back folks were shocked when the warming increased in Hadcrut3 to version 4.

Well guess what happens when you increase coverage in arctic regions where the warming rate is higher than the global average.

Hadcrut is biased low because of coverage.. If the arctic starts to cool very rapidly they will be biased high.

Eliminating coverage bias is one of the reasons skeptics demanded that we use all the data and methods like krigging.

Go figure
Willard says:

February 19, 2017 at 11:47 pm

> Nobody Infills.

I do:

If we share this nightmare, we can dream spiritus mundi.
Steven Mosher says:

February 19, 2017 at 11:51 pm

One correction:
“B) using SST instead of SAT over the ocean.”

This is a actually a correction made using the very models that are then to be compared to data. So it is a bit incestuous.

##############################

Incestuous?

Lets see

I have observations of SAT over land and SST in the ocean

I run a model that produces SAT everywhere and SST

I want to compare my model to observations.

A. The Clive worst way

Take SAT over land and ocean
Compare it to SAT observations over land, and SST observations.
In other words. compare the wrong things

B. A better way
take the Model SAT over land
Compare it to the observations SAT over land
take the Model SST
Compare it to the observations SST

Now style A is quick and dirty and anyone can go to KNMI and download tas
hey Clive is in good company many used to do it this way..
USED TO…

Style B, known as the right way,…. its a lot of work.
Steven Mosher says:

February 19, 2017 at 11:56 pm

Thanks Willard

this was prolly the best live performance of their I ever witnessed.
angech says:

February 20, 2017 at 12:43 am

Marco says: February 19, 2017 at 12:13 pm
“then alarm bells should be ringing. Very loud alarm bells.But *why*?
–
Take a video camera and go to the sea. Film the waves coming in. Go home and replay the video.
–
Compare the waves coming in. They are never the same.
If you were to see exactly the same pattern you would have several explanations. You fell asleep ans are watching a loop. You are watching a replay. Someone has inserted the same loop. You are asleep and dreaming or a miracle has happened.
-You know intuitively [Kant] and scientifically that it is not possible
-.
“The methods are measuring the same thing. ”
No, Never. They are measuring different aspects of a thing.In your case sea temperature.
They are different instruments measuring different water bodies at different depths by different methods at different times.
And then claiming synchronicity of a sort in respect to time when graphed against each other.
–
“You provide no explanation whatsoever”
To paraphrase “The methods are not measuring the same thing, and seeing the same patterns should get alarm bells ringing!”
As in this case.
–
angech says:

February 20, 2017 at 1:00 am

Marco,
“If I measure the body temperature of a large population of people over time, and I see my two instruments (e.g. IR-based vs alcohol-based) giving a distinctly different temporal variability, I’d be really worried.”
True.
Working in the health field then or having a sick child you would appreciate the problems of taking human temperatures. The same person can have multiple different temperatures on the same device and it can vary up and down quite quickly when sick.The difficulty in holding it under an arm or in a mouth or in an ear in a wriggling crying protesting child or elsewhere when you need to is immense, not to mention how you estimate enough time.
“distinctly different temporal variability” per instrument is difficult enough.
I expect distinctly different temporal variability throughout all time periods.
I assume you mean the trend of the two instruments over time.
There is no variability in an individual trend.
The trend should always be similar but distinctly different [ie not identical] with two different types of thermometer working inside their limits with a set population with a non cherrypicked base.
If drastically distinctly different temporal trends then as Victor says, check your instruments, check your population, check your observers.
–
” Apparently you don’t start worrying *until* the two instruments do give the same temporal variability in this large population…”
Confusing variability and trends.
“What I would expect to see is quite variable data with an overall match in trend*.”
“alarm bells should be ringing. Very loud alarm bells”
angech says:

February 20, 2017 at 1:18 am

Steven Mosher says:
“Coverage bias arises IF a) a non predicted region [ added – In the map*], increases or decreases more than the global average of the predicted regions.”
–
Going outside the map on this one.
You either have a map to consider or not, the thing that is covered.
In the map you have observations or coverage and areas, and uncovered or non predicted areas.
To cover the map you predict, infill, krig, whatever.
If these observations go above or below the global average bias is introduced.
–
So “No – Hadcrut4 does not suffer from coverage bias. ” is strictly and semantically correct.
To attack this you have had to go outside the map, interesting.
–
It is interesting to see the flak Clive is taking and the different directions it is coming from.
Like Feynman said, Work out the problem from the bits that are missing.
HADCRUT4 has to be attacked, therefore there is a problem with HADCRUT4 not agreeing with the consensus.
Put that one down as a winner, Clive.
A. “‘The Clive worst way hey Clive is in good company many used to do it this way”
I’m still extracting thermometers too.
Steven Mosher says:

February 20, 2017 at 3:06 am

angech

Steven Mosher says:
“Coverage bias arises IF a) a non predicted region [ added – In the map*], increases or decreases more than the global average of the predicted regions.”

[added In the map? no. There is no need for a map. The principle is the same
for ANY prediction problem where you have a universe and sample some portion
of it. ]

–
Going outside the map on this one.
You either have a map to consider or not, the thing that is covered.

Wrong. you have a thing that is coverable. the extent and distribution of the covered
versus th not covered and the spatial coherence is the issue. Not map or no map.

In the map you have observations or coverage and areas, and uncovered or non predicted areas.

NO. you have observations at locations. you want to predict observations at places
where you dont have them or where you want to gather them again, or at places
where you have held out observations

To cover the map you predict, infill, krig, whatever.
If these observations go above or below the global average bias is introduced.

NO., when you predict you will always have an error. you may or may not have a bias
that is dependent on the spatial correlation of the field and your sampling.
–
So “No – Hadcrut4 does not suffer from coverage bias. ” is strictly and semantically correct.
To attack this you have had to go outside the map, interesting.

No. there is no going outside the map. the thing to be predicted is -180 to 180, -90
to 90.

Hadcrut effectively infills certain areas with the global average. when those areas see trends that are higher or lower than the global average a bias is introduced.
The easiest way to see this is by decimating the field. Alternatively You could test the robustness of hadcrut by changing their gridding. They grid on an equal angle grid which means that their extrapolation distance changes as a function of latutude. At the equator one site in a cell can represent a cell 111km on a side… at the pole, this is reduced because they dont use equal area grids.

#####################################
It is interesting to see the flak Clive is taking and the different directions it is coming from.
Like Feynman said, Work out the problem from the bits that are missing.
HADCRUT4 has to be attacked, therefore there is a problem with HADCRUT4 not agreeing with the consensus.

HUH? Nobody is attacking hadcrut. They happen to have a coverage issue. NCDC also have a coverage issue. this is of TECHNICAL interest, and the science doesnt change one bit whether you use it or not.
For example, and Victor can chime in, if I was looking at local data… say temperatures in a specific country i might PREFER HADCRUT because they rely on country series built by local experts. But its always good to check. Again, Robert Way did a detailed look at a smal region, labrador… he looked at various different products to get the best series he could for his study
and ya we looked at CRU, BE etc etc.

Now the coverage bias is small, but if it went the other way…. youd freak out if we didnt take account of it.
angech says:

February 20, 2017 at 6:14 am

” There is no need for a map. The principle is the same for ANY prediction problem where you have a universe and sample some portion of it.”
You are right but the universe is kinda big.
When you decide to choose a smaller universe you use a map.
you then have things inside the map and outside the map.
HADCRUT, the map made up of observations HADCRUT is happy to use is defined on Geographic latitude: -87.5°N to 87.5°N?.
You only consider things in the map you are using.
You disagree.
Your right to disagree but you are wrong.
There is no going outside the map. the thing to be predicted is -180 to 180, -90 to 90.
Just saying something is real to deny someones reasonable comment based on accepted parameters does not make it real.
Some of the Arctic is outside the HADCRUT definition as commonly used.
Tough. In the HADCRUT map as defined there is no bias.
Best is right.
...and Then There's Physics says:

February 20, 2017 at 7:23 am

angech,

Tough. In the HADCRUT map as defined there is no bias.
Best is right.

Only if you define the region of interest as being the region covered by HadCRUT. On the other hand, the term “global” means the entire globe. Therefore, as a global surface temperature dataset, it suffers from coverage bias.
Marco says:

February 20, 2017 at 7:53 am

angech, what happens to the temperature values if you do not just measure one single child, but thousands? Population statistics, angech. Clearly not your best subject in school. Try imagining a situation where your measurements show that influenza is going around. If only one of the instruments you use gives a population-averaged increase in body temperature during that influenza epidemic, you have a major problem to solve. The trend over the year for both instruments is very likely to be similar enough to not raise alarm bells!

I also did not confuse variability and trend. There’s a good reason I added “temporal”. Look it up. Hint: it does not mean “temperature”. Then consider the following with your example of looking at the waves of the sea: I assume you have heard of the tides. A temporal (there’s that word again) variability in local sea height. If I measure sea height at a location with two different measurements, I’d better see both of them indicate a synchronous increase and decrease in sea height.

Yes, there’s spatial coverage to consider, too, but that’s not what we were discussing.
Marco says:

February 20, 2017 at 8:05 am

“Not convinced Africa has been infilled.”

I can only lead the horse to water…
Clive Best says:

February 20, 2017 at 9:11 am

Nick,

Yes I agree, but the funny thing is though that no-body much cares about all that before ~1950. Before then the argument is that even with a few stations before 1900 we can still ‘measure’ the global temperature. No infilling needed there. Ed Hawkins is even arguing that Callendar could measure global temperatures using a slide rule and a couple of weather stations and still get the right answer.

At least Hadcrut4 has used the same consistent consistent algorithm for all 166 years of data. Warts and all.
...and Then There's Physics says:

February 20, 2017 at 9:16 am

Clive,

Before then the argument is that even with a few stations before 1900 we can still ‘measure’ the global temperature. No infilling needed there.

I don’t think this changes Nick’s argument. You’re still using the stations to estimate the grid temperatures.

At least Hadcrut4 has used the same consistent consistent algorithm for all 166 years of data. Warts and all.

As far as I’m aware, all of the datasets are based on a consistent algorithm for all of the data. Are you suggesting otherwise?
Clive Best says:

February 20, 2017 at 9:43 am

I just found an interesting one. Somehow Cowtan and Way have managed to infill the Arctic, Antarctic, Africa AND South America for January 1864! I was unaware that satellites were already in use in the early Victorian period.

Hadcrut4.5 shows what the actual data really look like.

Krigging is a truly wonderful thing !
paulski0 says:

February 20, 2017 at 9:59 am

angech,

Tough. In the HADCRUT map as defined there is no bias.
Best is right.

Trouble is, that argument goes both ways. You’re saying HadCRUT4 is not a global average estimate, it’s only an average of cells containing sufficient measurements. That means it would be inappropriate to compare HadCRUT4 directly with true global average estimates from models. Or to use it for estimating effective climate sensitivity.
Steven Mosher says:

February 20, 2017 at 10:25 am

“Yes I agree, but the funny thing is though that no-body much cares about all that before ~1950. Before then the argument is that even with a few stations before 1900 we can still ‘measure’ the global temperature. No infilling needed there. Ed Hawkins is even arguing that Callendar could measure global temperatures using a slide rule and a couple of weather stations and still get the right answer.

At least Hadcrut4 has used the same consistent consistent algorithm for all 166 years of data. Warts and all.”

Too funny EVERYONE uses the same algorithm for all 166 years

Clive you can actually test the various methods

HOW?

Steve Mcintyre and JeffID and other skeptics suggested the method..

So we did it

First a VISUAL example to show you the fundamental differences

Click to access visualizing-the-average-robert-rohde.pdf

And then a Test using data ( basically suggested by skeptics THANKS guys )

First you define a synethetic world a Given ground truth.
Then you sample that using GHCN -M coordinates
Then you decimate the time series using the temporal profile from GHCN-M
ie, this station only has data for these years etc

Then you feed the same data to all three prediction codes and have them
predict the “ground truth” you drew your sample from

Click to access robert-rohde-memo.pdf
Magma says:

February 20, 2017 at 10:39 am

“I was unaware that satellites were already in use in the early Victorian period.” — Clive Best

And the proud display of ignorance rolls on… try more reading & less writing next time.
Steven Mosher says:

February 20, 2017 at 10:40 am

When Clive gets back with an actual test of methods, I think people should read what he writes.
paulski0 says:

February 20, 2017 at 10:52 am

Clive,

The C&W kriging product going back to 1850 does not use satellites at all, even after 1979. The products which utilise satellites for spatial information are clearly labelled ‘hybrid’.

Whether you use kriging to infill or leave cells empty, if you take HadCRUT4 as an estimate of global average temperature you are infilling either way. In the latter you’re infilling with the global average (or hemispheric average in the standard HadCRUT4 process) for filled cells. When you understand that what matters is trying to determine the optimum approach for infilling, which is what the tests linked by Steven Mosher are checking.
Nick Stokes says:

February 20, 2017 at 10:54 am

Clive,
Yes, HADCRUT quotes temperatures back to 1850, when they don’t have much data. Here’s the plot:

And you see what happens. They make an inference about every point on the Earth and integrate. When the data is sparse you still have to do that; it’s what global average means. And what happens? The error goes up with the sparsity. You can’t stop making inference, but you can stop the plot, which they do at 1850. Some think that is ambitious, and I guess some might doubt their error calcs. But what you need there is a quantitative objection. No use saying, gosh I think that should be more.
Clive Best says:

February 20, 2017 at 12:02 pm

Nick,

Actually HADCRUT does not make any inference about places they don’t measure. They simply calculate Sum(cos(lati).Ti)/Sum(cos(lati)) which is the area weighted average of all occupied grid cells. Their answer is the average change in temperature anomaly over the regions of the world that have been measured. You can’t do better than that without making gross assumptions.

C&W kriging or anyone else’s kriging (BEST, GISS etc.) back to 1850 is (IMHO) nonsense because it assumes a smooth temperature field exists across the other 90% of the earth which has no temperature information. Now if you look at what C&W actually did you would see that they defined anomaly =0.0 for all grid locations without any data and too far away for interpolation. These zero values are then treated as if they were real measurements. H4 instead defines locations without any measurements as being NaN and ignores them.

Now looking carefully at the C&W paper I see that there is essentially no difference between their Hybrid(Satellite corrected) and their Kriging alone results. The UAH temperature data is essentially a red herring as far as I can see. Really they are just doing the same as GISS.
Clive Best says:

February 20, 2017 at 12:05 pm

Magma,

You are clearly devoid of a sense of humour.
...and Then There's Physics says:

February 20, 2017 at 12:17 pm

Clive,

C&W kriging or anyone else’s kriging (BEST, GISS etc.) back to 1850 is (IMHO) nonsense

Ideally, you would actually demonstrate this, rather than just waving your hands wildly.

You are clearly devoid of a sense of humour.

Are you suggesting that you’ve just been joking?
angech says:

February 20, 2017 at 12:31 pm

paulski0 says: February 20, 2017 at 9:59 am
“Trouble is, that argument goes both ways. You’re saying HadCRUT4 is not a global average estimate, it’s only an average of cells containing sufficient measurements. That means it would be inappropriate to compare HadCRUT4 directly with true global average estimates from models. Or to use it for estimating effective climate sensitivity.”
–
Careful what you wish to believe in.
I’m saying HadCRUT4 is a global average.
HadCRUT says HadCRUT4 is a global average.
People wish to make up their own definitions go and tell HadCRUT.
Further, It is entirely appropriate to compare models directly with true global average estimates from HadCRUT4.
Most models, as you would be aware, were developed precisely to compare with and included data sets like HadCRUT4 as their base.
Di you really wish to say that they picked C and W out as an infilled Global temperature model and modeled to that?
I don’t think so. Happy to be corrected of course. Lot’s of people here as well as yourself to do so if right.
angech says:

February 20, 2017 at 12:44 pm

Clive Best says: February 20, 2017 at 9:43 am
“I just found an interesting one. Somehow Cowtan and Way have managed to infill the Arctic, Antarctic, Africa AND South America for January 1864!”
I like this commentator.. Tells it how it is and packs a lit more credo than me.
C and W suffer from cognitive positive bias.
As I told Marco you must not look gift horses in the mouth.
When a study shows positive warming everywhere they look, every time, you have to get those alarm bells ringing.
Marco, you have to have some negative excursions, I don’t mind where, I don’t care where, well actually I do, I’m just using hyperbole. Find an arctic measurement that goes negative with their work, or an African or even a European swallow for heaven’s sake.
Good results always include outcomes positive and negative,
I do not mind 67/33 but 100/zip positive results?
paulski0 says:

February 20, 2017 at 12:45 pm

Clive,

Actually HADCRUT does not make any inference about places they don’t measure…

In that case HadCRUT4 is not an estimate of global average temperature and should not be treated as such.

Now looking carefully at the C&W paper I see that there is essentially no difference between their Hybrid(Satellite corrected) and their Kriging alone results. The UAH temperature data is essentially a red herring as far as I can see. Really they are just doing the same as GISS.

I’m a bit confused about what you’re taking as a conclusion here? Yes, in practice the kriging approach produces similar results to the GISS approach. Also utilising satellite and reanalysis data for spatial correlation information again produces similar results. On what basis do you decide that this shows the satellite data is a red herring, rather than it demonstrating robustness of the basic spatial correlation infilling approach?
paulski0 says:

February 20, 2017 at 12:53 pm

angech,

HadCRUT says HadCRUT4 is a global average.

Yes, this is true. That’s why estimating coverage (and other sources of) bias based on HadCRUT4 sampling and method is relevant and it’s something the people involved with HadCRUT take seriously.

I’m saying HadCRUT4 is a global average.

No, you’re not. You’ve said ‘Some of the Arctic is outside the HADCRUT definition as commonly used. Tough. In the HADCRUT map as defined there is no bias.’

You’re explicitly saying here that some of the Arctic (which is a part of the globe) is not included in HadCRUT4 and that’s that. This necessarily means that HadCRUT4 for you is not providing a complete estimate of the globe.
...and Then There's Physics says:

February 20, 2017 at 1:16 pm

angech,

I’m saying HadCRUT4 is a global average.
HadCRUT says HadCRUT4 is a global average.

You can’t have it both ways. Either you regard HadCRUT4 as a global average, in which case it suffers from coverage bias (it does not cover all regions). Or, you regard it as a temperature dataset that only represents the regions for which it has coverage (i.e., there is no coverage bias by definition) but then you can’t call it a global dataset.
Marco says:

February 20, 2017 at 1:36 pm

I see Clive decided to admit C&W do indeed infill Africa, too, without actually admitting it. Instead, he starts complaining *that* they infill.

angech, I see you once again ignored the population issue and in essence come with an argument from incredulity. You expect something based on what, exactly? What is your prior knowledge to make you so confident that there just *must* be a major difference in the temporal variability? And note that I am not talking about day-to-day variability here when it comes to ocean SST measurements.
paulski0 says:

February 20, 2017 at 1:54 pm

angech,

Good results always include outcomes positive and negative,
I do not mind 67/33 but 100/zip positive results?

Here’s a plot of monthly anomaly differences between C&W and HadCRUT4:
John Hartz says:

February 20, 2017 at 2:41 pm

Perhaps its just me, but if somone has unresolved questions about the C&W paper, wouldn’t it make sense to pose those questions directly to the authors?
izen says:

February 20, 2017 at 3:51 pm

@-“…but if somone has unresolved questions about the C&W paper, wouldn’t it make sense to pose those questions directly to the authors?”

Someone does not have unresolved questions about the C&W paper. They perceive another opportunity to cast doubt on the integrity of the process.
There is little interest in making the record more accurate. Only an enthusiasm for showing that ‘scientists’ use so much maths between measurement and record that there is no REAL global temperature reported, just what ‘THEY’ find politically expedient.
John Hartz says:

February 20, 2017 at 4:27 pm

About the validty of climate models…

Claire Parkinson, now a senior climate change scientist at NASA, first began studying global warming’s impact on Arctic sea ice in 1978, when she was a promising new researcher at the National Center for Atmospheric Research. Back then, what she and a colleague found was not only groundbreaking, it pretty accurately predicted what is happening now in the Arctic, as sea ice levels break record low after record low.

Parkinson’s study, which was published in 1979, found that a doubling of atmospheric carbon dioxide from preindustrial levels would cause the Arctic to become ice-free in late summer months, probably by the middle of the 21st century. It hasn’t been ice-free in more than 100,000 years.

Although carbon dioxide levels have not yet doubled, the ice is rapidly disappearing. This record melt confirms the outlook from Parkinson’s 1979 model.

“It was one of these landmark papers,” said Mark Serreze, director of the National Snow and Ice Data Center. “She was the first to put together the thermodynamic sea ice model.”

What’s more, collection of better data on sea ice over recent years has strengthened the models, making their predictions even more reliable—and disturbing.

Researcher’s 1979 Arctic Model Predicted Current Sea Ice Demise, Holds Lessons for Future</strong< by Sabrina Shankman, Inside Climate News, Feb 20, 2017
Steven Mosher says:

February 20, 2017 at 5:11 pm

“C&W kriging or anyone else’s kriging (BEST, GISS etc.) back to 1850 is (IMHO) nonsense because it assumes a smooth temperature field exists across the other 90% of the earth which has no temperature information. ”

No a smooth field is NOT assumed at ALL.
Clive Best says:

February 20, 2017 at 6:46 pm

@Steven Mosher Unfortunately you have provided your code in such an unreadable format without that without a huge effort it would be almost impossible to know exactly what you actually do. Oh and in addition I would have to fork out $1500 for a license to use Matlab.
Steven Mosher says:

February 20, 2017 at 7:31 pm

“@Steven Mosher Unfortunately you have provided your code in such an unreadable format without that without a huge effort it would be almost impossible to know exactly what you actually do. Oh and in addition I would have to fork out $1500 for a license to use Matlab.”

Not really, do what others have

1. Octave it free
2. read the code.

There are several people who have more than two digits in their IQ who have

A) read the code
B) understood it
C) asked good questions
D) suggested improvements

Early on in 2007 when we asked hansen for code the first complaint Gavin gave was that
Skeptics would pelt them with questions.
I promised no questions. If given the code ( Yukk fortran ) I would figure it out

I figured it out
Em Smith figured it out
Clear climate code re wrote it in Python
Peter O neil figured it out

The code is the best documentation. you have no excuse for ignorance. I work in R
Never wrote a line of matlab
In 2012 when I joined BE… I just took the time to learn.

No snowflake whining

wash your own windows, change your own diapers.

Oh ya,, supply your code as well and I dont care what language or how its written
Willard says:

February 20, 2017 at 7:37 pm

Needles in the eyes is an old tack.
Steven Mosher says:

February 20, 2017 at 7:37 pm

Absent reading the code you can read the methodology appendix.

Several have

They ask smart questions
They understand the real limitations of the method ( heck WE POINT THEM OUT)

Here is what they dont do

They dont make stupid assertions based on an utter lack of knowledge

For me the publishing of code and data was a real turning point in my assesment of skeptics
I thought they shared my desire for openness and transparency.. and that they would do like I did.

Put in the hard work to meet scientists on a level playing field.

But 97% of skeptics dont want to do the work. they dont want to test their ideas.. they dont want to improve the science..

Oh no.. its fortran or matlab or R .. can I please have a different language

Og no you used SVN please put it in GIT ( one guy demanded this in mail, otherwise we were hiding things )

obstruction.. not auditing or replicating or improving
BBD says:

February 20, 2017 at 8:14 pm

Whipsong from Mr Mosher.
Victor Venema (@VariabilityBlog) says:

February 20, 2017 at 10:02 pm

First showing you do not understand the basics of interpolation (“kriging … is (IMHO) nonsense because it assumes a smooth temperature field exists”), then complaining about readability of code.

That was (IMHIO) really rich. Please go the extra mile and read a few Wikipedia pages.
angech says:

February 20, 2017 at 10:38 pm

HadCRUT says HadCRUT4 is a global average. Yes, this is true. Thanks Paulskio.
You can’t have it both ways.
Meaning I can only have your definition?
–
Most people regard HadCRUT4 as the global average and as a temperature dataset that only represents the regions for which it has coverage, it is a global dataset by definition and widespread usage. It does not have coverage bias when used as per it’s definition.
It is not your definition of a global data set semantically but it has been good enough for the people who count, the scientists who decided that this was the best way to go years ago.
You can’t have it both ways is apt.
You cannot pick what the scientists say only when they agree with you and discard it when it doesn’t.
...and Then There's Physics says:

February 20, 2017 at 10:54 pm

angech,

Most people regard HadCRUT4 as the global average and as a temperature dataset that only represents the regions for which it has coverage, it is a global dataset by definition and widespread usage. It does not have coverage bias when used as per it’s definition.

Global has a reasonably precise definition. You appear to be redefining to suit your narrative. This is pretty tedious.
angech says:

February 20, 2017 at 10:59 pm

“If I measure sea height at a location with two different instruments, I’d better see both of them indicate a synchronous increase and decrease in sea height.”
True.
“If I measure sea height at two different locations and times with two different instruments or the same instrument , I’d better NOT see both of them indicate a synchronous increase and decrease in sea height.”
is the essence of what I have been saying.
...and Then There's Physics says:

February 20, 2017 at 10:59 pm

angech,

You cannot pick what the scientists say only when they agree with you and discard it when it doesn’t.

Okay, how about the paper called Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends. I don’t think that coverage bias in the HadCRUT4 temperature series is disputed within the relevant scientific community.
...and Then There's Physics says:

February 20, 2017 at 11:02 pm

angech,

“If I measure sea height at two different locations and times with two different instruments or the same instrument , I’d better NOT see both of them indicate a synchronous increase and decrease in sea height.”

Who said this (other than you) and why does it make any sense (it doesn’t seem to, unless I’m missing something).
Victor Venema (@VariabilityBlog) says:

February 21, 2017 at 12:08 am

As far as I know, but more knowledgeable people may correct me if I am wrong, HadCRUT takes the missing regions into account when computing the uncertainties. A continuation of the long-term warming between 1998 and 2014 fits fine within the confidence intervals provided by HadCRUT.

So in that respect HadCRUT is a global dataset. This while it still has a coverage bias for the average values because it is missing the Arctic warming. My high school physics teacher liked to say that a number without an uncertainty is worthless. The people at Climate Etc. worship the Uncertainty Monster, except that they forget the poor monster immediately when that is convenient for their ideological narrative.
Victor Venema (@VariabilityBlog) says:

February 21, 2017 at 1:52 am

The nailing New York Times interviewed several colleagues of Bates. My oh my. I live a boring life.
izen says:

February 21, 2017 at 5:27 am

First words in the NYT article

@-“A few weeks ago, on an obscure climate-change blog,…”

May underestimate Judith.

As it goes on to say,

@-“The outcry over Dr. Bates’s claims points to a push by some in the right-wing media to cast doubt on established climate science, and to dispel public support for emissions regulations.”

But misses the connections between Lamar Smith, Judith Curry, the GWPF and David Rose at the DM/MoS.

I have seen no claim that the new record is worse than the old version. It is not a dispute about accuracy and method. It is a means to justify doubt about the INTEGRITY of the information and the institutions that produce it.
Marco says:

February 21, 2017 at 7:09 am

“If I measure sea height at two different locations and times with two different instruments or the same instrument , I’d better NOT see both of them indicate a synchronous increase and decrease in sea height”

And that’s largely irrelevant for the SST measurements with ships and buoys because of the large number of samples and wide sampling. Population statistics, Angech. Still not getting it, I can see.
Clive Best says:

February 21, 2017 at 9:02 am

Victor, Steven,

To quote from the Berkeley Earth methods paper.

“Let T(x,t) be an estimate for the temperature field at an arbitrary
location x on the surface of the Earth, based on an interpolation from
the existing thermometer data for time t. In the simplest approach
the average surface land temperature estimate Tavg (t) is calculated by
integrating over the land area:

$T_{avg}(t) = \frac{1}{A} \int {T(x,t) DA}$

where A is the total land area. Note that the average is not
an average of stations, but an average over the surface using an
interpolation to determine the temperature at every land point.”

Tavg(t) is a temperature field, – a continuous function capable of interpolation.
...and Then There's Physics says:

February 21, 2017 at 9:07 am

Clive,
What is your point?
...and Then There's Physics says:

February 21, 2017 at 9:21 am

As far as I can tell, Clive is simply pointing out what Steven said here. What am I missing?
Clive Best says:

February 21, 2017 at 9:32 am

ATTP,

My point is that so far on this discussion I have been accused of 1) Not knowing what I am talking about 2) Not being able to program a computer 3) Talking nonsense about kriging assuming that a temperature field on the earth’s surface exists – It does. 4) Told I should read Wikipedia.

It is all a bit tedious. Especially since I have calculated my own global averages from scratch based on station data only, and even made a fit to GHCN minimising all station offsets.

It is obvious that Berkeley Earth assumes there is a temperature field on the surface of the earth, and that it is a continuous function of space and time.
Clive Best says:

February 21, 2017 at 9:38 am

Stephen: “No a smooth field is NOT assumed at ALL.”
...and Then There's Physics says:

February 21, 2017 at 9:39 am

Clive,
1) A number of things you’ve said have simply been wrong.

2) You were the one who complained about the code and having to pay for Matlab.

3) If you think it would be better if people treated others with respect (as I do) then you could try doing the same yourself. It’s not as if you’ve been a paragon of virtue during this discussion.

4) Two of those involved in this discussion (Steven and Victor) both work with these temperature datasets, and yet you seem to think you know more than they do. Maybe you do, but this is not obvious, and I’m sure there is plenty you (and I) could learn from them.

5) You’re not necessarily alone in thinking that this can be tedious.
...and Then There's Physics says:

February 21, 2017 at 9:42 am

Clive,

Stephen: “No a smooth field is NOT assumed at ALL.”

This was in response to your claim about kriging. Your quote, a few comments above, is about determining the global average.
Clive Best says:

February 21, 2017 at 9:50 am

I am not complaining. I have learned that C&W do extrapolate their data into Antarctica and Africa, and I admitted I was wrong about that. The underlying data of all series are basically the same and the results are all similar. The oceans dominate the global picture. hence the original subject of this post. Lets just leave it at that.
...and Then There's Physics says:

February 21, 2017 at 9:53 am

Clive,
I don’t think that’s the only thing you got wrong. You were also rather dismissive of what others have done and then seemed to complain about the responses. If you don’t like how people respond, you could always try to be less insulting of what others are doing. Just a thought, mind you.
angech says:

February 21, 2017 at 2:08 pm

“Okay, how about the paper called Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends. I don’t think that coverage bias in the HadCRUT4 temperature series is disputed within the relevant scientific community.”
Cowtan and Way ?
Basically they claim what you and others here claim.
That is, a perfectly adequate global temperature record, of real measurements , perfectly defined as to how “global” it is by the people using it is not the global record that they want to acknowledge.
In redefining the term global they allege bias in that the reading of one is different to that of the other.
Bias is in the eye of the beholder.
There is no one true global temperature record.
Hadcrut is a true record, C and W a pale extension of assumptions.
They should not be compared as to bias but since they have been two things are true.
Using the Amber definition C and W is definitely biased warm compared to HADCRUT.
A small point to the authors,
Their problems with Antarctica may be due to not taking the elevation of the continent into consideration when comparing central grids with the sea level stations giving most of their results apart from the Russians. 3000 meters causes a big temp drop. ( only joking, they are big boys but I did not see any land elevation in their formula in the paper and ther should have been, did anyone else?)
...and Then There's Physics says:

February 21, 2017 at 2:20 pm

angech,

In redefining the term global they allege bias in that the reading of one is different to that of the other.

But they don’t redefine the word global. Global means the whole globe, which HadCRUT4 does not cover. Therefore HadCRUT4 suffers from coverage bias. The reason this is relevant is because one of the portions of the globe that it does not cover happens to be warming faster than the rest of the globe and, hence, HadCRUT4 slightly underestimates the rate of global warming. This isn’t even all that contentious, so it’s amazing that we’re still having this discussion.
Willard says:

February 21, 2017 at 2:36 pm

> This isn’t even all that contentious, so it’s amazing that we’re still having this discussion.

What discussion?

Doc remains doubtful that we can take human temperature with any kind of reliability. How could anyone convince him that ships and buoys will ever do that!

Incredibilism is incredibly hard to disloge.
Victor Venema (@VariabilityBlog) says:

February 21, 2017 at 3:08 pm

angech says: “Cowtan and Way ?
Basically they claim what you and others here claim.
That is, a perfectly adequate global temperature record”

Apart from the bad statistics,
http://variable-variability.blogspot.com/2017/01/cherry-picking-short-term-trends.html
it is your political movement that assumes the global temperature record is so amazingly accurate that minimal adjustments of 0.05°C or less are a political scandal and that the temperature data is so accurate that it is not possible that what you call a “hiatus” is a measurement/estimation artefact.

Scientists acknowledge the Uncertainty Monster exists, your political movement ignores that we only know reality within our confidence limits.
verytallguy says:

February 21, 2017 at 3:14 pm

I did not see any land elevation in their formula in the paper and ther should have been, did anyone else?

Anomaly. “something that deviates from what is standard, normal, or expected.”

Whilst altitude changes the absolute temperature, the anomaly, to first order, is expected to be independent of altitude.
Steven Mosher says:

February 21, 2017 at 3:42 pm

“Cowtan and Way ?
Basically they claim what you and others here claim.
That is, a perfectly adequate global temperature record, of real measurements , perfectly defined as to how “global” it is by the people using it is not the global record that they want to acknowledge.”

No they dont claim that.

Fail.

again angech. you’ve proven that engaging with you is just a waste of time. you refuse to read.
refuse to understand ( you dont even try ), and refuse to make a cogent case.

done here.
paulski0 says:

February 21, 2017 at 4:06 pm

angech,

Bias is in the eye of the beholder.
There is no one true global temperature record.

There is one true globe. It is the whole globe (and nothing but the globe).
If you are presenting a record to be a true global average temperature record then you need to understand how your sampling relates to the parts of the globe which aren’t sampled.

HadCRUT4 is recognised as an estimate of true global average temperature because, as Victor says, their uncertainty estimate includes allowance for a range of potential anomalies in uncovered cells. In other words they do make assumptions about what is in the uncovered regions – it’s just incorporated into their uncertainty range. From what I can see, coverage uncertainty seems to be the largest part of their overall uncertainty. The HadCRUT4 uncertainty is then taken to be a normal distribution effectively centered on the result of infilling with hemispherical average in empty cells.

What Cowtan and Way find is that the values in empty cells for recent years are likely to push the true global average range to the upper end of the HadCRUT4 uncertainty range, hence uncertainty and central estimates appear to be biased.

the people who count, the scientists who decided that this was the best way to go years ago.
You can’t have it both ways is apt.
You cannot pick what the scientists say only when they agree with you and discard it when it doesn’t.

Ok, let’s see what the scientists behind HadCRUT4 say:

“Techniques that interpolated anomalies were found to result in smaller errors than noninterpolating techniques relative to the reanalysis reference. Kriging techniques provided the smallest errors in estimates of Arctic anomalies, and simple kriging was often the best kriging method in this study, especially over sea ice. A linear interpolation technique had, on average, root-meansquare errors (RMSEs) up to 0.55 K larger than the two kriging techniques tested. Noninterpolating techniques provided the least representative anomaly estimates.“
Steven Mosher says:

February 21, 2017 at 5:14 pm

Thanks paul.

One simple way I’ve explained it people willing to understand is this. If you want to claim that hadcrut is unbiased, then you are asserting that the true values in the blank regions have rates of warming that are precisely equal to the average of all the covered regions. . In other words you’re asserting something highly unlikely. . If the uncovered regions have rates of warming even slightly different than the global average then their will be a bias.
Joshua says:

February 21, 2017 at 5:27 pm

angech –

=={ You cannot pick what the scientists say only when they agree with you and discard it when it doesn’t. }==

My recollection (please correct me if I’m wrong) was that C&W felt that their their use of cross-validation, as a way to check their treatment of under-sampled regions, was important and supported their methodology in comparison to methodologies that don’t account for under-sampled regions..

It seems to me that you question their method for treating under-sampled regions. If so, can I assume that you likewise find a problem with the results of their cross-validation? (Keep in mind that I can only understand an answer to that question if it isn’t technical in nature).
Frank says:

February 21, 2017 at 6:55 pm

Continuing our discussion of confirmation bias…

Victor had written: “From my perspective there could be nothing more stupid that assuming that any discrepancies can only be caused by climate models.”

Frank had asked: “Is this evidence of confirmation bias? Given the existence of problems with observational data, shouldn’t the resolution of those problems be equally likely to increase the size of the discrepancy rather than close it?”

Victor had replied: “No. No. There is only one reality. Models try to model it. We use observations to estimate how it looks like. It is perfectly possible to find problems that increase or produce a discrepancy, but it is more likely than not that they converge on reality. No confirmation bias needed for that, just people dedicated to understand reality.”

Victor, haven’t you read the history of confirmation bias in the Millikan oil drop experiment as described in Feynman’s essay “Cargo Cult Science”? He calls the history of the replication of the Millikan oil drop experiment a “shameful episode”, because it is obvious in hindsight that experimenters let their expectations influence the choices they made while refining data. (The whole essay should be required reading for every scientist.)

http://calteches.library.caltech.edu/51/2/CargoCult.htm

We know that some proxies show good correlation with historical local temperature in some locations or at some times, but are red noise at other times and places that appear equivalent. However, trying to narrow the confidence intervals of reconstructions (and reject a warmer MWP) by removing “bad data” is problematic. Many skeptical websites have demonstrated the folly of “ex post” screening of proxy data to separate “valid” proxies from “invalid” ones. When one screens for “valid” proxies showing warming in the 20th-century, one keeps valid proxies AND red noise that show warming in the 20th century and one discards red noise that shows cooling in the 20th-century. When one tries the same approach using artificial proxy data that are all noise, one can reconstruct 20th-century warming from that noise. Two statistician at Penn State produced reasonable reconstructions using random walk data in place of temperature proxies. (Ask Mosher, if you don’t believe me.) IMO, the refinement of data done in climate reconstructions has been a black eye for climate science. Others may disagree.

So, I think it is essential to approach PHA with the utmost skepticism: Does it spread UHI to other locations? What happens if a breakpoint is caused by correction of a gradually growing bias? (Answer: one adds bias.) Can gradually growing bias be detected? The fact that PHA can remove artifacts that you deliberately introduce into temperature data is only the first step. The artifacts you introduced were precisely the type PHA was designed to address. You don’t know the origin of most breakpoints – they are undocumented. Mosher says that BEST is finding breakpoints in USCRN data.

The teams that were scanning data for gravity waves (LIGO) (and the Higgs boson?) were deliberately fed fake data with false and real positive signals. That way researchers were forced bring an open mind to each candidate event. You admit refining observational data with the expectation that refinement will eliminate discrepancies. (And strengthen the case for mitigation?)

To avoid any possibility of confirmation bias, it is essential to force oneself to assume that data correction is equally likely to increase the discrepancy with models or personal expectations. Otherwise, one isn’t using observations to test hypotheses/models; one is using hypotheses/models to test data. That isn’t the scientific method. That is the first step towards becoming a politician or lawyers: accepting information that agrees with preconceptions and rejecting information that doesn’t. It is easier to trust BEST – whose adjustments may have been contrary to their expectations (if they allowed themselves to have any) than other groups (who appear to have an agenda). It is easier to trust the combined output of UAH and RSS – because both teams are aware that every choice they make will be carefully scrutinized by others looking for bias. (I don’t accept either analysis as being the better one. In recent years, they have usually reached similar conclusions, but I may not be up-to-date.)

As for AOGCMs: 1) There is a GFDL paper showing that changes to the entrainment parameter alone can change ECS by 1 degC/doubling without degrading their model’s ability to reproduce current climate. 2) The ensemble work of Stainforth et al demonstrates that tuning parameters one-by-one doesn’t lead to a global optimum parameter set. Many different parameter sets are equally good (or bad depending on your perspecive). 3) Models predict very different amount of warming from GHGs and cooling from aerosols, but all assign roughly 100% of current warming. At least some must have been tuned to do so, which is trivial with sensitivity to aerosols and ocean heat uptake as fudge factors. 4) Isaac Held’s post discussing tuning: https://www.gfdl.noaa.gov/blog_held/73-tuning-to-the-global-mean-temperature-record/#comment-2629. It isn’t obvious to me why any scientist would let preconceptions from such models potentially interfere with doing their job as a scientist: Obtaining the best data with which to test those models. The models have numerous known flaws and it is critical for society to know if their output can be trusted.
Victor Venema (@VariabilityBlog) says:

February 21, 2017 at 9:00 pm

Frank: “Victor, haven’t you read the history of confirmation bias in the Millikan oil drop experiment as described in Feynman’s essay “Cargo Cult Science”?”

They converged on the right number. Just for the record, I am not the one claiming that there are no unknown problems, that taking the climate system humanity depends on in uncharted waters will not produce unexpected results. I am working on finding these problems in the station record and I expect such problems in the satellite tropospheric temperatures.

Confirmation bias is human, all scientists are human. Science works with real scientists, it has for a long time. Do you want to claim that there was no scientific progress? Otherwise please make claims that only pertain to climate science.

May I ask why do your accusations of a confirmation bias only go to scientists and not to yourself or to WUWT & Co.? It is not as if their claims of an imminent ice age have not been wrong over and over again. It is not that they keep on jumping on any technicality to wrongly claim all of climate is wrong. Don’t you see some confirmation bias in Salby, Curry or Rose?

https://twitter.com/thingsbreak/status/833872018940522496

You forgot to reply to the fact that you showed much larger signs of a confirmation bias by only blaming models for any discrepancies and ignoring the possibility that the observational estimate or the comparison method is wrong. So I wonder if you are the best person to get advice from on the topic of confirmation bias:

“To avoid any possibility of confirmation bias, it is essential to force oneself to assume that data correction is equally likely to increase the discrepancy with models or personal expectations.”

I gave you an answer to that above. It is a pity you ignored that answer and just restate your wrong claim (that somehow often happens with climate “sceptics”). Do you deny that reality exists? Do you deny that removing measurement artefacts from observations gets us closer to reality? Do you deny that improving models gets us closer to reality? Or why else was my claim wrong?

Frank: “So, I think it is essential to approach PHA with the utmost skepticism”

As a real sceptic your probably read a lot of articles on the topic. Which are the 3 scientific articles you see as the most important ones studying the performance of PHA?

Feel free not to answer that question, but then please also stop bringing up new topics before we have some sort of an agreement on the first question. That way we will never come to an agreement. And you Gish Gallop of shoddy or cherry picked science does not work here, the readers of this blog are much too well informed.

Do you agree that models, observations and comparisons can all be wrong and that we should investigate problems in all 3 of them?
Steven Mosher says:

February 21, 2017 at 10:07 pm

“So, I think it is essential to approach PHA with the utmost skepticism: Does it spread UHI to other locations? What happens if a breakpoint is caused by correction of a gradually growing bias? (Answer: one adds bias.) Can gradually growing bias be detected? The fact that PHA can remove artifacts that you deliberately introduce into temperature data is only the first step. The artifacts you introduced were precisely the type PHA was designed to address. You don’t know the origin of most breakpoints – they are undocumented. Mosher says that BEST is finding breakpoints in USCRN data.”

1. Does it spread UHI to other locations?. Generally speaking no. It MIGHt do so in situations
where you had many many urban stations all going in one consistent direction and a small
number of rural going the other way. I’ve looked at this in BEST by looking at cases
where UHI is expected or known to be high (tokoyo, etc) and generally speaking the urban
are adusted in the direction of the rural– TOWARDS the truth. The exception to this is places
like LA huge metro areas where urban outnumber the rural. In BEST I have a coouple
examples I am looking at. The other way I looked at this was by considering the best stations
stations that are hand adjusted by specialists, CRN etc.. the “infection rate” in these cases
is both small in number and small in overall effect. working on an approach to prevent this

2 What happens if a breakpoint is caused by correction of a gradually growing bias?
Its corrected. not an issue

3. Can gradually growing bias be detected?. yes, Pielke once challenged me on this and I showed him how the station he claimed could never be corrected by our method was infact
corrected. growing bias problem.

4. You don’t know the origin of most breakpoints – they are undocumented. For us, not true
Every break point is documented and the reason for breaking.

The issue is this. NOAA adjustments have been accused of

A) Fraud intentional thumb on the scale
B) Unconscious bias
C) Incompetence
D) Lack of perfection
E) Incomplete documentation.

We came to this problem to address A,B and C. We built a system that operates exclusively on the data and a couple reasonable assumptions.

A) Looking at the algorithms we found no evidence of fraud. Our results matched theirs in a global way, indicating no fraud

B) Unconcious bias ( not questioning an algorithm that gave you answers you liked) was also
eliminated. Both by our work and by double blind testing. They correct both cooling and warming biases.

C) incompetence.. their method misses key issues, misses key corrections that have a substantial
influence on the result. In our first test their method was superior to ours. Yup. in an
objective test they did better than we did in detecting bias and correcting it.

D) Lack of perfection: Guilty. send money please

E) incomplete documentation. neverending audit
paulski0 says:

February 21, 2017 at 10:17 pm

Frank,

So, I think it is essential to approach PHA with the utmost skepticism: Does it spread UHI to other locations? What happens if a breakpoint is caused by correction of a gradually growing bias? (Answer: one adds bias.) Can gradually growing bias be detected?

I’m not sure why you’d approach it with any more or less skepticism than any other method or result. Are you suggesting you assign more skepticism to PHA because it slightly increases the apparent warming trend overall?

You admit refining observational data with the expectation that refinement will eliminate discrepancies. (And strengthen the case for mitigation?)

Victor said that refinements would move towards eliminating discrepancy with reality. Why else would adjustments be made to data in any scientific field?

To avoid any possibility of confirmation bias, it is essential to force oneself to assume that data correction is equally likely to increase the discrepancy with models or personal expectations.

How does this statement apply to an objective algorithm like PHA? There’s nothing built in to the method which determines the magnitude or sign of any individual adjustment or the overall effect of adjustment. Fundamentally, PHA is a priori equally likely as not to increase discrepancy with models and personal expectations.

As it turns out the result of PHA is pretty much irrelevant to discussion of any perceived model-data discrepancy. The effect it has on the historical global average temperature is much smaller than the inter-model spread of trends. It may even often be smaller than the spread between different runs of the same model – i.e. internal variability.

It is easier to trust BEST – whose adjustments may have been contrary to their expectations (if they allowed themselves to have any) than other groups (who appear to have an agenda).

So again you apply more skepticism to one group than another based purely on a subjective perception of trustworthiness. I understand BEST actually found more warming than other groups. Did that data not alter your model for perceiving the nature of the agendas at work?

It is easier to trust the combined output of UAH and RSS – because both teams are aware that every choice they make will be carefully scrutinised by others looking for bias.

I’m struggling to believe you wrote this with a straight face. The surface temperature datasets have a good claim to being among the most scrutinized records in scientific history.

Models predict very different amount of warming from GHGs and cooling from aerosols, but all assign roughly 100% of current warming.

Not even close to true. Here’s a plot of individual models against an obs dataset (NOAA). I’ve clipped to 60S-60N as a simple way to minimise coverage and sea ice issues, and used model SAT+SST. It was made in 2014 so I guess it must be using ERSSTv3b, and I didn’t add on the projection periods so models stop in 2005. Extrapolating forwards a bit the model spread up to present is around 0.5-1.5K: A factor of three difference.
izen says:

February 21, 2017 at 11:40 pm

It is clear that those with some direct knowledge of the subject would regard the new NOAA record, post K15, as more accurate than the old.

But is there anyone who thinks that the OLD NOAA record was a BETTER match to reality ?
Can there be any justification for reverting back to the previous, pre-K15 version because of the doubts raised ?

And what difference would it make if we did.
angech says:

February 22, 2017 at 12:20 am

paulski0 says: February 21, 2017 at 4:06 pm
“There is one true globe. It is the whole globe (and nothing but the globe).”
We are limited in our knowledge of said globe and we choose to represent it by what we have available at the time. A poor example would be that 600 years ago most global representations did oit have Australia in. Hadcrut is and was a representation of the known globe
–
You assert that HadCRUT4 is then taken to be a normal distribution effectively centered on the result of infilling with hemispherical average in empty cells.This is interesting.
–
In which case you and Steven and ATTP would be right. But in this case all three of you would have already pointed this out. So it cannot be right.
–
My understanding is that HADCRUT is not a full hemispheric data set. It is infilled to a certain latitude only and parts of the Arctic and Antarctic are not uncovered cells but excluded cells.
There are uncovered cells in part of the Arctic and Antarctic and Africa in their latitude range due to poor observational areas which are “infilled”.
Could you clarify this?
Please.

Steven Mosher says: One simple way I’ve explained it people willing to understand is this. If you want to claim that hadcrut is unbiased, then you are asserting that the true values in the blank regions have rates of warming that are precisely equal to the average of all the covered regions. . In other words you’re asserting something highly unlikely.
Except HADCRUT say that they infill by using the global average, hence it is very likely. Why is this so hard to understand.
Steven Mosher says:

February 22, 2017 at 1:22 am

“Except HADCRUT say that they infill by using the global average, hence it is very likely. Why is this so hard to understand.”

It is highly unlikely and near impossible that the unpredicted locates have the exact same value as the global mean.

In fact none of the observed locations have values that are exactly equal to the global mean.

The warming rate for the observed locations is for example 0.156743 C per decade

For hadcrut to have no bias, the unobserved locations have to have warming rates exactly equal to this global average. if the true warming rate differs then hadcrut will be biased high or biased low. But it will of necessity and in strictly precise terms be biased.

That the guys who do the series agree, should give you paws
paulski0 says:

February 22, 2017 at 1:22 am

angech,

We are limited in our knowledge of said globe and we choose to represent it by what we have available at the time. A poor example would be that 600 years ago most global representations did oit have Australia in. Hadcrut is and was a representation of the known globe

Not really relevant to today though, is it?. Unless you have better information I’m going to assume the HadCRUT4 team are not deniers of the existence of the Arctic.

A large part of science is about precisely defining terms. The globe with Australia is different from the globe without Australia even if you refer to both as “the globe”. If you produced a “global” temperature record without including Australia, insisting that it would have been accurate 600 years ago, that would not really be a problem scientifically – you’ve defined your terms, that’s the main thing. However, that “global” record would not be the same thing as a global record which did include Australia, and therefore would not be truly directly comparable.

It’s true that the word “global” in practice is used to refer to different spatial configurations. For example most “global” altimeter sea level records clip to 60S-60N, RSS “global” TLT clips to 70S-82.5N. I’m not actually aware whether or not these records account for the uncovered regions in any way. I suspect probably not. Again, that is not really an issue since they have defined the terrms of their coverage. Analysis of these datasets should then account for those specific coverage terms.

In the case of HadCRUT4 I’m pretty sure it is intended to be an estimate of the whole globe: 90S-90N, 180W-180E. I can infer this from the nature of their large coverage uncertainty estimate. If it weren’t, and was simply using the word “global” to refer to a near-global area of grid cells containing collected measurements, that would still not be justification for directly comparing to an actual global record as if it were the same thing.
Steven Mosher says:

February 22, 2017 at 1:28 am

“It is clear that those with some direct knowledge of the subject would regard the new NOAA record, post K15, as more accurate than the old.”

The hilarious thing is that in due course K15 data will find its way into a CDR.

Then what?

I dont believe some folks have fully thought through what it means to finally have power

the FOIA shoe is on the other hand
angech says:

February 22, 2017 at 2:41 am

Paulskio,
Thanks for trying
“I’m going to assume the HadCRUT4 team are not deniers of the existence of the Arctic.
In the case of HadCRUT4 I’m pretty sure it is intended to be an estimate of the whole globe: 90S-90N, 180W-180E.”
So not definite?
Mosher “In the case of Hadcrut this happens because they dont predict the arctic”
ATTP ” It doesn’t sample the polar regions,”
Gerg’s net states “In HadCRUT, the data holes are just excluded from the average, resulting in slightly lower recent warming estimates than some other series, because of exclusion of the rapidly warming arctic.”
HADCRUT, the map made up of observations HADCRUT is happy to use is defined on Geographic latitude: -87.5°N to 87.5°S??.
Still happy to agree with you if you can prove this is wrong.
” I can infer this from the nature of their large coverage uncertainty estimate.”
Or it could just be the amount of uncovered area infilled in their data set.
Steven Mosher says:

February 22, 2017 at 3:02 am

“In the case of HadCRUT4 I’m pretty sure it is intended to be an estimate of the whole globe: 90S-90N, 180W-180E.”
So not definite?”

Yes definite.

Look when the producers of the data themselves say that they have a cool bias, you are kinda fighting a losing battle. When they improve the arctic coverage and say this leads to a warmer series, you are fighting a losing battle.
angech says:

February 22, 2017 at 3:15 am

Fail. again angech. you’ve proven that engaging with you is just a waste of time. you refuse to read.refuse to understand, and refuse to make a cogent case. And here?
–
Steven Mosher says:
“It is highly unlikely and near impossible that the unpredicted locates have the exact same value as the global mean.In fact none of the observed locations have values that are exactly equal to the global mean.”
–
So true. How can they?
When you take the value down to 6 decimal figures??
As in,
“The warming rate for the observed locations is for example 0.156743 C per decade”.
This is not an argument.
This is not a cogent case.
Cogent reasons?
No locate has temperature instruments reading to anywhere near this degree of accuracy so it should be rounded down to a reality commensurate with the thermometer ranges.
To argue that one site out of 5000 real or 40,000 should have a 1 in a million agreement is ludicrous.
If you reduced the warming rate to 0.16 C per decade, A figure most people here would be happy to contemplate, there would be hundreds of stations matching that figure and you know it.
I am impressed that you had to go to 6 figures to avoid a match though!
Feynman would say this shows you are aware that at least 1 and maybe 2 sites did show a 5 figure match. Imagine stations getting to 0.15674
.-
I understand and agree with the rest if your comment.
The fact remains though, that without other stations with data all you can do is assert Coverage bias should be present but never prove it..
Without other stations HADCRUT is perfectly entitled to put in their average global temp which does not introduce any known coverage bias to the average global temp they have for the real average they give for the real map of the world that they use.
Not the real world, no one knows that. The real map they use as a world temperature map with their qualifiers.
angech says:

February 22, 2017 at 3:19 am

Steven, it is 2 pm over here. Do you ever get any sleep?
Sorry to be so aggressive in my replies.
Yes the full world series are warmer than those without a full world series.
Willard says:

February 22, 2017 at 5:10 am

I think you said your piece, Doc & Mosh. Time to give it a rest.
Steven Mosher says:

February 22, 2017 at 6:15 pm

Thanks willard.
Doc… sleep is for normies.
angech says:

February 23, 2017 at 4:36 am

ok see you both 2 blogs up perhaps
Frank says:

February 23, 2017 at 9:47 am

Victor: The fact that physicists eventually found the correct mass/charge ratio despite their ignorance of confirmation bias isn’t a good excuse for ignoring it today. If the amount of 20th century warming is going to be increased to 1.5 K or decreased to 0.5 K as observations are scrutinized more careful, we want to know about it now, not gradually creep towards the right answer over the next decade or two. The same goes for satellite measurements of the troposphere. In another area, it took many years and an outsider (Nic Lewis) for the IPCC to recognize that the central estimates for energy balance models and AOGCMs (currently) disagree. Now maybe we can learn something about the cause(s) of the discrepancy. Some, but not all, models show a low apparent climate sensitivity in early decades and higher sensitivity later

Science is held back by confirmation bias. If haven’t noticed, there is a crisis in the credibility of science – within the scientific community. Ioannidis has published extensively on this subject, including his infamous: “Why most published research Finding Are False”. After challenges to the reliability of key papers on the role played (or not played) by various oncogenes, a systematic effort is being made to reproduce the results of about 50 key papers. “Data mining” without appropriate controls led to many false linkages between mutations to diseases. IMO (but probably not yours), the hockeystick. Science is both your and my profession, and I care about its credibility – which is essential for all scientists – not just climate scientists – when dealing with political leaders and the public.

As for WUWT, I don’t believe anything I read there without checking the source or the math/physics. Lots of errors in the latter. I’ve wasted time twice double-checking articles by Rose; he has no credibility with me. Don’t know much about Salby. I’m a little disappointed at Curry’s post on Bates. She didn’t make it clear whether the issue was purely procedural or involved a “thumb on the scales”. Most of her scientific posts seem reasonable, but she is mostly interested in policy. I’ve found Scienceofdoom, ClimateAudit and the Blackboard to be reliable when I’ve checked things I thought might be wrong. (RealClimate and SKS deleted my questions.)

Victor wrote: “You forgot to reply to the fact that you showed much larger signs of a confirmation bias by only blaming models for any discrepancies and ignoring the possibility that the observational estimate or the comparison method is wrong.”

You didn’t understand my answer. Scientists are SUPPOSED to be skeptical of hypotheses and models until they have withstood repeated testing. We aren’t supposed to let our biases about them interfere with properly testing them. Sure, part of the testing process is carefully scrutinizing and refining the observations used to test a model or hypothesis. Obvious the comparison must be done right. In some of my past work, I’ve had the luxury of working with data that is fairly reliable and with experiments that can be repeated. So yes, I probably place too much faith existing climate observations. But t I never forget we are often dealing with a few tenths of a degC over a decades. For that reason, the Manabe PNAS paper I cited above is a favorite, because it is based on the 3.5 K seasonal cycle and 10+ W/m2 changes in flux – and the experiment is replicated every year! That’s reliable data. If (when) a new value for ocean heat uptake (from ARGO) is used to update CERES-EBAG results, the results is this paper won’t change appreciably. In contrast, climate model output doesn’t show lengthy pauses in warming driven by the current rise in forcing seen in observations. Then along comes Karl15 and the pause is shortened by 5 years. Whether Karl15 is right or wrong (and Zeke’s paper says right), it is difficult to have any confidence in discrepancies or agreement, a problem that is amplified concerns about confirmation bias.

As for knowledge of PHA, I’ll refer you to my comments at the Blackboard when Zeke’s 2013 UHI/PHA paper was discussed at the Blackboard. I didn’t receive a reply to the issues I’ve raised then or since:

http://rankexploits.com/musings/2013/uhi-paper-finally-published-in-jgr/#comment-110147

Victor asked: “Do you agree that models, observations and comparisons can all be wrong and that we should investigate problems in all 3 of them?”

Only if those investigations are carried out without the expectation that the investigation will eliminate discrepancies. Otherwise confirmation bias can lead investigators to make choices that improperly eliminate discrepancies and validate incorrect theories and models.
...and Then There's Physics says:

February 23, 2017 at 9:52 am

Frank,

“Why most published research Finding Are False”.

FWIW, I think Ioannidis’s article is rubbish. Anyone who can write the above doesn’t – IMO – understand how research/science is done. Everything is probably wrong at some level. The important point is how the research we do influences our understanding of the system being studied. Some studies are wrong, but useful; some studies are wrong in some respects but still advance our understanding; some studies are almost right but just can’t capture all the complexity. Scientific research isn’t really about being right, it’s about developing understanding.

In another area, it took many years and an outsider (Nic Lewis) for the IPCC to recognize that the central estimates for energy balance models and AOGCMs (currently) disagree.

I don’t think this is true. As I understand it, this was known before Nic Lewis started publishing.
Frank says:

February 23, 2017 at 11:28 am

Steve Mosher: Thank you for your lengthy reply (February 21, 2017 at 10:07 pm). I don’t disagree with most of what you wrote about Karl15, Bates, BEST or the increased transparency of other groups. FWIW, I admire your contributions. My serious issue is with BEST’s splitting records at breakpoints and others correcting them with PHA.

After asking about the spread of UHI, I came across Zeke’s UHI/PHA (2013) paper that showed PHA removed much of the difference between urban and rural stations. In the Supplemental Material, you can see that PHA added ca 10% to the 1895-2010 Tmin trend of TOB-adjusted rural stations, while reducing the urban-rural difference. For 1960-2010, that increase in trend is negligible.

However, PHA doubles the trend in Tmax for the full period and increases it by 50% for the later period. This adds a substantial 0.3 K to overall warming in Tmax. What is going on? In theory, UHI effects Tmin more than Tmax. What kind of station move or change could be responsible for a massive shift in Tmax, but not Tmin? (Perhaps undocumented TOB is responsible, but more stations were moving to morning readings, which I assume would reduce Tmax and Tmin.)

One possibility is that the albedo of the station screen gradually increases with time (from dirt and surface aging) and/or station ventilation gradually becomes less effective (leaves, spiders, etc.) On calm sunny days, there would be a warm bias. When the screen was cleaned or repaired, unbiased measuring conditions would be restored. PHA would likely eliminate this return to unbiased conditions – thereby introducing bias.

BEST would split the record: keeping the biased trend and discard the correction.

From the larger perspective, by correcting or splitting an undocumented breakpoint, you are making a hypothesis about what caused that breakpoint – a permanent shift to new measuring conditions. However, you have no way of knowing what really happened: A shift to new conditions or restoration of earlier unbiased conditions. Detecting a gradual increasing bias in trend is far more difficult than finding a breakpoint at the end of it. (Menne explored looking for biased trends, but found many false positives.) My scenario appears to be a possibility. A station move away from growing biases to a nearby unbiased site is another possibility. (A station move from a city to a large city park can eliminate bias from the full record – if the full record started at an unbiased site.) Shadows from a growing tree will produce a gradually increasing cold bias. Correcting for documented TOB or instrument changes using a validate method is good science. Correcting artifacts without understanding their origin is risky.

Many of these bias scenarios require calm conditions. If you have daily data, consider looking for differences between windy and calm days, perhaps near USCRN stations.
Marco says:

February 23, 2017 at 12:21 pm

FWIW, *I* think Ioannidis’s article is just not understood by many who cite it as some kind of justification for their own ‘skepticism’. There is a problem in some areas of science where “statistically significant” differences are too often an implicit requirement to get your work published, or the necessity to find something “new” (like a new oncogene – finding the same isn’t interesting enough). The above significantly increases the chances of confirmation bias contributing to your findings. Add on top of this the ‘human’ aspect in much of that work. I have used the example here before, a long time ago, that laboratory animals react differently when you have a male vs a female handler.

A lot of these problems are in my view much less of an issue in the physical and chemical sciences.

Frank, maybe this blog helps you:
http://berkeleyearth.org/understanding-adjustments-temperature-data/
JCH says:

February 23, 2017 at 2:00 pm

“Why most published research Finding Are False”

Oh boy, I bet skeptics vetted this beauty with massive amounts of auditing.

For whatever it is worth, a large number of the “false” in one of the big components of the above statement have been redone and they are no longer false.

And try getting that through to the Judith Curry faithful.
Joshua says:

February 23, 2017 at 2:00 pm

Anders –

=={ FWIW, I think Ioannidis’s article is rubbish. {==

Marco’s point below about misuse of ioannidis aside, could you briefly explain why you think his article is “rubbish?”
...and Then There's Physics says:

February 23, 2017 at 2:04 pm

Joshua,
It’s possible I’ve done what Marco suggested. I’ll have another look at it, but – IIRC – I was put off by the suggestion that almost half are false, which made me think he didn’t really get the basics of fundamental research.
...and Then There's Physics says:

February 23, 2017 at 2:13 pm

Apologies, I read this which I thought was rubbish – mostly because it assumes that it can apply to “most scientific papers” rather than to – as Marco says – areas where you need to set up some kind of study design and then apply some kind of statistical test to – hopefully (from the researcher’s perspective) – produce a significant result.

Ioannidis’s article might actually be quite clever – I read, I think, something similar recently, but can’t find it now. If I do I will post it. As I understand it, it mainly applies to the idea that p=0.05 being some kind of boundary between right and wrong (which is not the case).
Joshua says:

February 23, 2017 at 2:17 pm

I think that both of these articles might help show why chuckling at “skeptics” and taking ioannidis’ work seriously are not mutually exclusive.

https://www.google.com/amp/scienceblogs.com/insolence/2007/09/24/the-cranks-pile-on-john-ioannidis-work-o/amp/?client=ms-android-verizon

https://www.painscience.com/articles/ioannidis.php
...and Then There's Physics says:

February 23, 2017 at 2:18 pm

I think this is the article I read recently – it’s time for science to abandon the term statistically significant.

This bit I found interesting

But the dichotomy between ‘significant’ and ‘not significant’ is absurd. There’s obviously very little difference between the implication of a p-value of 4.7 per cent and of 5.3 per cent, yet the former has come to be regarded as success and the latter as failure. And ‘success’ will get your work published, even in the most prestigious journals. That’s bad enough, but the real killer is that, if you observe a ‘just significant’ result, say P = 0.047 (4.7 per cent) in a single test, and claim to have made a discovery, the chance that you are wrong is at least 26 per cent, and could easily be more than 80 per cent. How can this be so?

Apologies to John Ioannidis, I don’t think his article is rubbish, I think how it is sometimes used/interpreted is rubbish.
...and Then There's Physics says:

February 23, 2017 at 2:25 pm

Joshua,
Thanks, those articles are very good. This is very close to my view

What Ioannidis really says is much less ominous: he argues that it should take rather a lot of good quality and convergent scientific evidence before we can be reasonably sure of a “scientific fact,” and he presents good (scientific!) evidence that a lot of so-called conclusions are premature, not as ready for prime time as we would hope.
JCH says:

February 23, 2017 at 2:46 pm

Is physics a chapter in Physchology Today?

Ioannidis on effect size inflation, with guest appearance by Bozo the Clown
Victor Venema (@VariabilityBlog) says:

February 23, 2017 at 3:16 pm

The work of Ioannidis is very important and helpful. The problem is the over-interpretation of his work and thoughtless extension to other sciences. In case of a clinical study all you have are observations. Humans are very complicated so there are nearly no theoretical constraints on the outcomes, especially so on how strong the effect is (and how strong side effects are).

This is very different from physics where theory binds all the lines of evidence together.

That is also why I think Frank’s idea of how research should be done is counter productive. If you pretend your measurements are perfect and without any evidence always put all the blame on models (our understanding of the climate system) you are ignoring all the understanding of the problem you have. You are ignoring that what binds everything together. The spatial and temporal patterns, the relationships between the atmospheric variables, the relationships between atmosphere, ocean, ice and vegetation.

If there is one way to produce biases it would be to make observations without considering theory or to do theory without considering observations.
http://variable-variability.blogspot.com/2016/08/naive-empiricism-and-what-theory.html

Frank, here is a recent podcast on climate science, interviewing a climate modeller. The host summarised the work of the modeller, I paraphrase: so you working assumption is that your model is wrong. (Also recommended for others who want to understand the mind set of a modeller better.)
http://forecastpod.org/index.php/2017/02/21/gabe-vecchi/

I would prefer not to keep on getting into other debates before we agree on this one, but let me just tell you, Frank, that your example of all stations having a bias due to for example soiling of the screen was introduced by me on WUWT being sick and tired of all the nonsense they were telling about homogenisation algorithms. This is at least a theoretical possibility were homogenisation algorithms would go wrong and where, when you homogenise fully automatically, you would produce the wrong results.

When homogenising smaller datasets by hand you would naturally see this happening in the difference time series, except if you want to make the unreasonable assumption that every single observer does not take care of cleaning and painting its screens and that this produces exactly the same artificial trend at every station. Without these additional assumptions you would see a lot of gradual trends in the difference series; we do see some, mostly related to urbanisation, but not that much, not that big, not that short.

Furthermore, regulations of the weather services naturally specify that the screens need to be cleaned, painted and replaced regularly. I cannot guarantee this for every country on Earth, but at least in Europe this is done well before the error becomes large enough and is present long enough that we could detect it as an inhomogeneity. It would thus “just” add to the noise.
Willard says:

February 23, 2017 at 3:54 pm

Thanks, Marco.

Now I feel like a laboratory animal.
Victor Venema (@VariabilityBlog) says:

February 23, 2017 at 10:35 pm

Frank: “However, PHA doubles the trend [for conterminous USA] in Tmax for the full period and increases it by 50% for the later period. This adds a substantial 0.3 K to overall warming in Tmax. What is going on?”

The introduction of automatic weather stations in the USA would fit:
http://variable-variability.blogspot.com/2016/01/transition-automatic-weather-stations-parallel-measurements-ISTI-POST.html

Frank: “Detecting a gradual increasing bias in trend is far more difficult than finding a breakpoint at the end of it.”

In some sciences people only homogenise/segment their series based on one series. In that case you cannot remove the gradual inhomogeneities and can only see large break inhomogeneities.

In climatology we compare one station to its neighbours. Then you can also see it if one station gradually becomes different from its neighbours. What counts in that case is the variance of the break/inhomogeneity signal relative to the noise variance of the difference time series. Whether the inhomogeneities are gradual or abrupt is not important for detection. In the (blind) validation studies we also included gradual inhomogeneities.

I would expect that Berkeley Earth does not remove gradual inhomogeneities effectively, it would detect breaks and scalpel such series, but the way they “correct” does not remove different warming rate of these segments, if I understand their paper right. It does downweight such segments for not fitting well to the regional climate signal, but the gradual inhomogeneity will be small compared to the noise variance and thus the downweighting will be limited.

That Berkeley Earth still gets basically the same temperature signal is another indication that gradual inhomogeneities are not that important for the global mean temperature. Regionally, for example in China due to strong urbanization, they can be important.
Steven Mosher says:

February 24, 2017 at 3:08 am

For clive.
The field is not necessarily smooth as you imply.
For example if you use 1 km elevation data the field varies with elevation. We did this a while back for Google.

Look at the equation. T is a function of latitude..elevation season and the residual is krigged.
Frank says:

February 24, 2017 at 4:20 am

ATTP and Victor: I agree with you that John Ioannidis’s article is misused by some. It was published in a medical journal and applies most clearly to that field. However, I cited it among several other examples of how there is generally less confidence in science (based on what I read in Nature and Science). I think it is fair to use in that context. In the case of climate science, one of the bigger dangers (besides confirmation bias) is the use of the terms “likely”, “very likely”, and worst of all, “more likely than not”, the fact that the use of these terms is negotiated with politicians, plus the fact that the public and policymakers may not understand that the normal standard for scientific publications is statistically significant or “virtually certain”. The work criticized by Ioannidis meets the “virtually certain” standard – and it turns out to be far from certain for a variety of complicated reasons. In the SPM for AR5 WG1, a text search found the term “virtually certain” (in a positive or negative sense) only seven times. Most statements are judged to be only “likely”, meaning dozens of them probably are wrong (if one approaches the situation from Ioannidis’ Bayesian perspective).

Policymakers deserve access to information that doesn’t meet the traditional scientific standard for publication. Nevertheless, I fear they either don’t understand or ignore the difference between likely and very likely and believe what they want to believe.

Victor: I read the linked post with the Darwin quote. Unfortunately, it is from a time before the problem of confirmation bias was recognized and before the biased Millikan oil drop experiments. As Feynman says, we have learned how not to fool ourselves in this way – or we are supposed to have (which is one reason why he gave his lecture on Cargo Cult Science.) Nevertheless, many early climate reconstructions failed to provide well-defined criteria for what sites should be analyzed and used dubious “ex post” tests for deciding which sites had proxies that behaved like a thermometers.

I’m glad you recognize the possibility that a gradual decrease in screen albedo could be handled incorrectly by PHA. I hope you remember that deteriorating ventilation, increasing shadows and a station move away from gradual urbanization are also phenomena that could be handled improperly by PHA. Most of all, I hope you remember that turbulent mixing should eliminate or minimize these issues. Daily or hourly temperature and wind data might provide a way to get a handle on weather this problem exists – to test the hypothesis – my version of how science is supposed to be done. (I wish I had the experience needed to do so.) Without evidence, we are both speculating about breakpoint corrections that add about 0.2? degC to global warming.

FWIW, When you asked me which papers I remembered, I did a quick review at Google Scholar and found your paper, and two of Menne’s that I remembered. However I didn’t remember Zeke’s at all. When I ran into difficulty fully understanding it, I checked the discussion at the Blackboard and was reminded about what I understood back in 2013. One of Zeke’s answers pointed to the Supplementary Material I had never seen before and cited above. I’ve been asking about these possibilities without acknowledgement for a long time. Finally someone has been gracious enough to listen.
Marco says:

February 24, 2017 at 10:21 am

No, Frank, Ioannidis isn’t talking about “virtually certain” but “extremely likely” in the terminology that Moss & Schneider explained in 2000, and is not “negotiated with politicians”.

I’ll also repeat my earlier comment: he mainly criticizes those research areas where finding “statistically significant” results is almost a necessity to get anything published, and where you often have multiple confounding hypotheses you could (should) test.
Victor Venema (@VariabilityBlog) says:

February 24, 2017 at 11:53 am

And, no, Frank, you summary of my comments do not honestly represent what wrote. I will assume you have an extreme case of confirmation bias or are not interested in the truth like most of your peers in the mitigation sceptical movement. Maybe it is about time to ask why you are actually against mitigation, that my be more productive than this proxy war about climate science.
Frank says:

February 25, 2017 at 8:11 pm

Marco: Thank you for correcting me on the IPCC’s terminology. Statistically significant is equivalent to the IPCC’s “extremely likely”.
Frank says:

February 25, 2017 at 11:03 pm

Victor: Why do you refer to me as a “mitigation skeptic”? Mitigation is about politics; I’m a scientist. I don’t have any special insight into politics. Perhaps Congress should make a grand compromise: institute a carbon tax, use the revenue and political capital to eliminate all the existing uneconomic subsidies and regulations, fix the tax code, repair infrastructure and balance the budget. If climate sensitivity is high, a carbon tax will be more valuable than current policies. If it isn’t, a “consumption tax” that addresses many other problems is OK with me. (As I said, I don’t have any special insight into politics(:)) .) So, let’s stop pretending my comments are politically motivated. For me, it’s all about science is supposed to be done. I’m not supposed to let my preconceptions bias my analysis of the data. I started with the expectation that conventional wisdom about climate was correct – until I read David Archer’s review of AIT at Realclimate. “Correlation is not causation” is a critical principle of science, so Archer’s dismissal of Gore’s misuse of ice core data was shocking. Especially in a movie targeted to schools. The danger of confirmation bias is another important aspect of science. To quote Feynman: “The easiest person to fool is yourself”. Skepticism is an essential part of science. IMO, your comments above reek of confirmation bias – but most climate “skeptics” are far worse.

Victor wrote above: “That is also why I think Frank’s idea of how research should be done is counter productive. If you pretend your measurements are perfect and without any evidence always put all the blame on models (our understanding of the climate system) you are ignoring all the understanding of the problem you have. You are ignoring that what binds everything together. The spatial and temporal patterns, the relationships between the atmospheric variables, the relationships between atmosphere, ocean, ice and vegetation.”

“If there is one way to produce biases it would be to make observations without considering theory or to do theory without considering observations.”
http://variable-variability.blogspot.com/2016/08/naive-empiricism-and-what-theory.html

From the blog post linked by Victor: “The graph was compute[d] by Andrew Poppick and colleagues and it looks as if the manuscript is not published yet. They model the temperature for the instrumental period based on the known human forcings — mainly increases in greenhouse gasses and aerosols (small airborne particles from combustion) — and natural forcings — volcanoes and solar variations. The blue line is the model, the grey line the temperature estimate from NASA GISS (GISTEMP). THE FIT IS ASTONISHING [Frank’s emphasis]. There are two periods, however, where the fit could be better: world war II and the first 40 to 50 years. So either the theory (this statistical model) is incomplete or the observations have problems.”

First, I’m not pretending data is perfect. It contains random variation that can be analyzed with statistics and systematic errors that can’t be identified by statistics. Eliminating systematic biases from data can be critical, but it must be done in an unbiased manner.

Frank, citing Cargo Cult Science: “There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.”

Von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

What Victor perceives as an “astonishing fit” is nearly meaningless given all of the adjustable parameters in the model. The model hasn’t made “something else come out right” – yet. We don’t have the slightest shred of evidence why there are discrepancies.

We DO know that climate behaves chaotically. If the WWII discrepancy could be assigned to an ENSO event, then Victor would be perfectly happy ignoring it. He is making the ASSUMPTION that a chaotic fluctuation in upwelling and downwelling could not have been responsible for the warmth in the current data for this period. We know that other forms of unforced variability exist (AMO, D/O events, etc). If temperature data is refined/tortured with the objective of removing discrepancies, we could lose track of real phenomena. Mann and company wanted to eliminate the MWP. The shenanigans they pulled made skeptics out of people like Steve Mosher. He wrote a book about it. (Now he is offended by the shenanigans of skeptics.)

Rightly or wrongly (probably rightly), the change from ERSST3 to ERSST4 has diminished, but not eliminated, variability in the 2000’s. If the re-analysis was done with the expectation of eliminating the hiatus, the chances are much greater that the change included confirmation bias. If Zeke et al expected to prove ERSST4 superior, the same problem exists. Hopefully, none of the scientists responsible for this work were as confidence as Victor appears to be that his work will eliminate the discrepancies. Unfortunately, such confidence can become a self-fulfilling prediction.

Victor accuses me of ignoring all of the “understanding of the climate system we have”, but he is ignoring our understanding of the behavior of chaotic systems: Such systems exhibits change without external or obvious cause. Reasoning about chaotic systems is dangerous.

Rather than rely upon models, I suggest Victor look to OBSERVATIONS to support his hypothesis that there may be a problem with SSTs during WWII. We have a land record for this period. It is reasonable to temporarily assume that warming in the ocean in the first half of the 1940’s would spread over land. El Nino warming does. The two ocean datasets at Moyhu show 0.18 and 0.23 K of warming between 9/39 and 1/46 (based on linear regression), but that change depends significantly on what start and stop dates one picks. (Normal shipping and monitoring didn’t stop in Sept 1939 or resume in Sept 1945.) FWIW (which is almost nothing), the NOAA ocean record has a very suspicious bulge during the war. The four land records showed changes of -0.12 K, -0.10 K, +0.02 K, and -0.02 K during the same period. Year to year variations of 0.05 K aren’t meaningful today and the uncertainty is about twice this big around 1940. The next question is: Is a divergence between land and ocean this big in a 6+ year period unusual? If yes, then we have observational reasons to be suspicious about this data; suspicions that are INDEPENDENT from the model we wish to evaluate with this data.

Victor, perhaps we should simply agree to disagree about this subject. My sensitivity to errors in climate data has been enhanced. Perhaps you have benefited from the discussion. Unless you explicitly request another response, I’ll try to leave you the last word.

(Enjoyed Veechi podcast. Much more recognition of the limitations of models than I get from reading about models in IPCC reports. Looked up his 2007 paper with Soden on the slow down of the Walker circulation.)
JCH says:

February 25, 2017 at 11:09 pm

Mann and company wanted to eliminate the MWP. The shenanigans they pulled made skeptics out of people like Steve Mosher. He wrote a book about it. (Now he is offended by the shenanigans of skeptics.)

Maybe SM will comment, but imo this is bad history.
JCH says:

February 25, 2017 at 11:44 pm

Deming actually wrote me and backed away from his claim. But I misplaced the email.
you buy that of course? … – SM

Hard to do it better.
Victor Venema (@VariabilityBlog) says:

February 26, 2017 at 10:20 pm

Happy to agree to disagree. Also happy to respond because otherwise the conversion would stop with an entire village of strawmen about my positions.

Wonderful that we agree on the political solutions to climate change, which are mainstream in the entire world apart from the published opinion and the political establishment in the USA.

I am happy to mirror: “The danger of confirmation bias is another important aspect of science. To quote Feynman: “The easiest person to fool is yourself”. Skepticism is an essential part of science. IMO, your comments above reek of confirmation bias – but most climate “skeptics” are far worse. ”

Wonderful that we agree on the larger confirmation bias of the mitigation sceptical movement, who lapped up the deceptions of Salby/Harde on CO2, David Rose on the pause, Wegeman and the auditor on proxy reconstructions. That does make me wonder why you are here complaining about scientists and not at WUWT & Co.

For the innocent reader I would like to add that science has always been performed by humans. Those scientists who guard themselves against their biases best are the ones that make most progress, but single scientists who make errors are no problem. The scientific culture and method is there to make scientific progress with real humans doing the science.

Let me repeat the link to my blog post, from which you cherry picked some sentences, so that any innocent reader can easily read the entire piece.
http://variable-variability.blogspot.com/2016/08/naive-empiricism-and-what-theory.html

The two main arguments were 1) that you cannot do science without theory. You would not know what to observe to make progress. 2) that there may well be remaining problems in the observed temperature data.

Frank chose to cherry pick and emphasis a side-statement that the fit of the statistical model and the observations was ASTONISHING, in isolation suggesting that this says more than my surprise at the goodness of fit given my scepticism of the quality of climate data. I hope that readers of the full piece that also states which kinds of data problems you cannot see this way got the gist.

Frank: “Von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” ”

That is exactly why theory is so important. Without it you could fit any functional relationship and have any strength of the relationship. The understanding of the problem, the theory needing to fit to many different problems and the relationships between all different components strongly limit the freedom to naively fit some curve. Understanding guards against the confirmation bias. Franks preference to ignore our understanding of the problem opens the floodgates to more (confirmation) bias.

Frank: “We DO know that climate behaves chaotically. If the WWII discrepancy could be assigned to an ENSO event, then Victor would be perfectly happy ignoring it.”

No. I am perfectly happy to ignore it if it is not a measurement artefact. I work on data problems.

Frank: “He is making the ASSUMPTION that a chaotic fluctuation in upwelling and downwelling could not have been responsible for the warmth in the current data for this period.”

No, I am not making that ASSUMPTION. I am open for the coincidence that the peak that is at least partially a measurement artefact (we already know that from studying it; there were more ERI observations, which have a known warm bias), is partially real. We will only know this after studying it. That is how science works. We do not refrain from studying something because it makes such a wonderful cherished discrepancy.

That study would naturally use observations. It is only in your adversarial biased mind that I wrote that the adjustment of WWII observations should be based on the difference with that statistical models. I do not write that, I only warned people that observations in this period are less reliable and gave several lines of evidence for that and suggested we should study this period better.

One potential bias problem would be a freak real peak that happens to be masked by a data problem that compensates for it would receive less attention. Fortunately experts know their data, would have known about the fast changes in observational methods in WWII also when this were a period with strong La Ninas, which masked by the warm bias. You also do not only have the global mean, but also the spatial patterns and the relationships with other measurements. Thus I am reasonably confident that my colleagues would have studied the potential problem anyway, but likely unfortunately with less intensity.

I wrote about such problems in comparisons of models with observations:
http://variable-variability.blogspot.com/2015/09/model-spread-is-not-uncertainty-nwp.html

Frank: “Enjoyed Veechi podcast. Much more recognition of the limitations of models …”

Glad, you enjoyed it. The way Veechi views his model results, I look at the limitations of observations. That is my job, that is what I know, that is where I can contribute, that is my passion.
Frank says:

February 27, 2017 at 4:59 pm

Frank: “We DO know that climate behaves chaotically. If the WWII discrepancy could be assigned to an ENSO event, then Victor would be perfectly happy ignoring it.”

Victor wrote: “No. I am perfectly happy to ignore it if it is not a measurement artefact. I work on data problems.”

Frank replies: In chaotic systems, unforced internal variability is the difference between theory (external forcing) and observation. As best I can tell, there is no simple way to tell the difference between “measurement artifact”, unforced variability, and error in calculating the temperature response to forcing (ie in climate sensitivity). Likewise, there is no way to distinguish between a breakpoint caused by a permanent shift to new measurement conditions (like undocumented TOB) and one caused by sudden elimination of a gradually increasing bias.
Victor Venema (@VariabilityBlog) says:

February 27, 2017 at 7:09 pm

That is possible if you know your data. You have more information than just the global average temperature curve.

You can compute the difference between bucket measurements and engine room intake measurements by comparing ships that happen to be at the same spot making these types of measurements. Or you can do experiments comparing the engine room intake temperature with various historically used buckets during a scientific ocean cruise.

Currently it looks as if the size of the warming bias of the engine room intake is getting smaller. You can see that by comparing this type of ship measurements with buoys.
Frank says:

March 1, 2017 at 1:35 pm

Victor: IMO, there is no way to distinguish AHEAD of time between artifacts in data and unforced variability. By definition, unforced variability is the different between theory and observation. The only things that can help is distinguish artifact from unforced variability are additional observations relevant to your data – which is what is you presumably mean by “knowing your data” (and returns us to the possibility of confirmation bias).

As an “empiricist”, I think the correct answer is to go into the field and make simultaneous measurements with all forms of historically used equipment. (The same for radiosondes and the effect of maintenance and environment on screened stations.) Then we would have some quantitative empirical measure of relative bias. With one (?) minor exception, however, no one has taken the scientific sea cruise YOU suggested.

Karl dealt with TOB bias in a fully rigorous manner. He proved a problem existed using hourly OBSERVATIONS, he quantified the problem using half of the available sites, he proposed a method for correcting the problem, and he validated his correction methodology using the other half of his sites (including quantifying the uncertainty in the correction). That is the way science “should” be done. Skeptics who refuse to understand TOB are ignorant (or worse). However, without knowing the cause of an undocumented breakpoint (permanent shift or return to unbiased observation), similar rigor appears impossible. If breakpoint correction by PHA had lowered warming by 0.2 K, I suspect it would have received far more scrutiny than it has.

Why do you feel that skepticism about this subject must have political origins? In many areas of science, we can address challenging problems by repeating the experiment or doing it with better equipment or better controls. People who have the good fortune of being able to do such rigorous experiments may tend to be more skeptical of climate science, because they aren’t accustomed to the difficulties of your field. They may also feel that modelers pay far too much attention to models (instead of the “real world”), because they don’t understand how much more detailed and certain model output is than observation. The AR4 chapter on “Evaluation of Climate Models” contains 50+ pages comparing models to models, but the “real science” of comparing models to observations is covered by one paragraph referring the reader to unspecified locations in other chapters.
Victor Venema says:

March 1, 2017 at 5:05 pm

Frank: “Why do you feel that skepticism about this subject must have political origins?”

Where else does your motivation come from to anonymously state that the results are wrong before you have made your self expert? Why else do you write this:

Frank: “If breakpoint correction by PHA had lowered warming by 0.2 K, I suspect it would have received far more scrutiny than it has.”

From a scientific perspective that makes no sense. That seems to tell more about you and the blogs you read than about homogenization methods and the efforts to understand how well they work.

You claim to be a scientist. It also does not fit to scientific reality where scientists like to make a strong case and are thus more comfortable with understating the problem. If scientists want to claim that the world is warming, they need to be sure that measurement problems were not the cause of warming. That is what you have to show to make a strong case. Whether measurement problems may lead to an artificial cooling is something that can be done later. While there are hundreds of studies on urbanization, artificial cooling problems are understudied:
http://variable-variability.blogspot.de/2015/04/irrigation-paint-cooling-bias.html
(And links therein.)

If the scientists are initially careful and with better understanding dare to make stronger statements, the mitigation skeptical movement is jumping up and down that the results “always” become more “alarmist”. There is nothing science can do to pacify the unreasonable.

Frank: “Victor: IMO, there is no way to distinguish AHEAD of time between artifacts in data and unforced variability. By definition, unforced variability is the different between theory and observation.”

No, that is not the definition. Also if the temperature observations were perfect there would still be deviations for many reasons. There would still be unforced variations, forced natural variations, model uncertainties. There is also theory about the unforced variability. For example theory about El Nino, NAO and the QBO.

Frank: “The only things that can help is distinguish artifact from unforced variability are additional observations relevant to your data – which is what is you presumably mean by “knowing your data” (and returns us to the possibility of confirmation bias).”

Comparisons of measurement methods and the comparison of stations measuring a similar climate help estimate the impact of observational problems. For background on how homogenization works for land station data:
http://variable-variability.blogspot.de/2012/08/statistical-homogenisation-for-dummies.html
How we validated that the algorithms work, including gradual inhomogeneities:
http://variable-variability.blogspot.com/2012/01/new-article-benchmarking-homogenization.html

I am sorry you do not like the structure of the IPCC report. The information is still in there. At the conferences I visit model work is mostly confronting models with observations, trying to understand the processes and uncertainties.
Pingback: 2017: A year in review | …and Then There's Physics

This site uses Akismet to reduce spam. Learn how your comment data is processed.

	thomaswfuller2 on Doubling down?
	Jordan Peterson: A C… on Propagation of nonsense…
	Jordan Peterson: A C… on Propagation of nonsense
	Chubbs on Doubling down?
	Just Dean on Doubling down?
	I colori delle Alpi… on Revisiting causality using…
	Huitfeldts icke-svar… on Roy Spencer and Intelligent…
	Paul Pukite (@whut) on Doubling down?
	Paul Pukite (@whut) on Doubling down?
	russellseitz on Doubling down?