The uncertainty on the mean

I wrote a quick post about Gavin Schmidt’s post comparing models to the satellite datasets. I thought Gavin’s post was very good, and explained the various issues really well. Steve McIntyre, however, is claiming that Schmidt’s histogram doesn’t refute Christy. This isn’t a surprise and isn’t what I was planning on discussing.

What I found more interesting was his criticism of a post Gavin wrote in 2007 (he also seems to have not got over the delayed publication of a comment in 2005). Nic Lewis also seems to think that Gavin’s 2007 post was wrong. So, I thought I’d have look.
In discussing a paper by Douglass, Christy, Pearson & Singer, Gavin says

the formula given defines the uncertainty on the estimate of the mean – i.e. how well we know what the average trend really is. But it only takes a moment to realise why that is irrelevant. Imagine there were 1000’s of simulations drawn from the same distribution, then our estimate of the mean trend would get sharper and sharper as N increased. However, the chances that any one realisation would be within those error bars, would become smaller and smaller.

In other words, in comparing the models and observations, Douglass et al. assumed that the uncertainty in the model trends was the uncertainty in the mean of those trends, not the uncertainty (or standard deviation) in the trends. This seems obviously wrong – as Gavin says – but Steve McIntrye and Nic Lewis appear to disagree.

The key point, though, is that we only have one realisation of the real world, which is very unlikely to match the mean of all possible realisations. With enough model realisations, however, we could produce a very accurate estimate of the mean model trend. Given, however, that the observations are very unlikely to produce a trend that matches the mean of all possible real trends, the model mean is very unlikely to match the observed trend, even if the model is a good representation of reality.

Gavin Cawley – who is also mentioned in Steve McIntyre’s post – discusses it in more detail here saying:

It is worth noting that the statistical test used in Douglass et al. (2008) is obviously inappropriate as a perfect climate model is almost guaranteed to fail it! This is because the uncertainty is measured by the standard error of the mean, rather than the standard deviation, which falls to zero as the number of models in the ensemble goes to infinity. If we could visit parallel universes, we could construct a perfect climate model by observing the climate on those parallel Earths with identical forcings and climate physics, but which differed only in variations in initial conditions. We could perfectly characterise the remaining uncertainty by using an infinite ensemble of these parallel Earths (showing the range of outcomes that are consistent with the forcings). Clearly as the actual Earth is statistically interchangeable with any of the parallel Earths, there is no reason to expect the climate on the actual Earth to be any closer to the ensemble mean than any randomly selected parallel Earth. However, as the Douglass et al test requires the observations to lie within +/- 2 standard errors of the mean, the perfect ensemble will fail the test unless the observations exactly match the ensemble mean as the standard error is zero (because it is an infinite ensemble). Had we used +/- twice the standard deviation, on the other hand, the perfect model would be very likely to pass the test. Having a test that becomes more and more difficult to pass as the size of the ensemble grows is clearly unreasonable. The spread of the ensemble is essentially an indication of the outcomes that are consistent with the forcings, given our ignorance of the initial conditions and our best understanding of the physics. Adding members to the ensemble does not reduce this uncertainty, but it does help to characterise it.

I should probably clarify something, though. If you want to characterise the uncertainty in the model mean, then of course you would want to use the uncertainty in the mean. However, if you want to compare models and observations, you can’t use this as the uncertainty, if the observed trend is not the mean of all possible observed trends.

To do such a comparison the standard deviation of the trends would seem more appropriate. However, even this may not be quite right, because as Victor points out the model spread is not the uncertainty. Typically, what is presented is the 95% model spread (5% of the models would fall outside this range at any time if the distribution were Gaussian). However, to take into account other possible uncertainties, this is typically presented as a likely range (66%), rather than as an extremely likely range (95%). Of course, if the assumed forcings turn out to be correct, and the models is regarded as a good representation of reality, then the model spread will start to approximate the actual uncertainty.

As usual, maybe I’m missing something, but it seems that the criticism of Gavin’s 2007 post is not correct, in the sense that Gavin is quite right to point out that using the uncertainty in the mean, when comparing models and observations, is wrong. This seems like another example of people with relevant related expertise making a technical criticism of what someone else has said, without really considering the details of what is being done, and doing so in a way that makes it hard for non-experts to recognise the issue with their criticism. Feature, not bug?

Update: It seems that some are arguing that the end of my post is wrong because a paper by Santer et al. did indeed use the uncertainty on the mean when comparing models and observations. However, Santer et al. also included the uncertainty in the observed trend in their comparison, which makes it more reasonable than what was done in Douglass et al., who did not include the uncertainty in the observed trend (this post is about a Gavin’s comment about Douglass et al.). However, having said that, I’m still not convinced that the Santer et al. test is sufficient to establish if models are biased, given that the distribution of the observed trends (i.e., trend plus uncertainty in trend) will not necessarily be equivalent to the distribution of all possible trends for the system being observed.

I realise that there might even be more confusion. Apart from when I quoted Gavin Cawley’s comment, the Gavin I’m mentioning is Gavin Schmidt.

Advertisements
This entry was posted in Climate change, ClimateBall, Gavin Schmidt, Global warming, Science, Steven McIntyre and tagged , , , , , . Bookmark the permalink.

270 Responses to The uncertainty on the mean

  1. Maybe my tweet was too cryptic.

    I wanted to say that I can understand that a beginner confuses the standard deviation of a set of values with the error of the mean of this set expressed as a standard deviation. That can be an honest mistake and I would not be surprised if I once made it. In this light I can imagine that people without statistical background do not understand a word of the above post.

    However, when someone points out the mistake, there is no excuse whatsoever for a statistically trained person or a scientist to repeat this mistake or defend it.

  2. Steve McIntyre says:

    You are sneering without knowing the facts. Santer et al 2008, of which Schmidt was a coauthor, used exactly the same formula for the standard error of models in their t-test as Douglass et al 2008 had used. They differed from Douglass et al by including a standard error for observations. The latter point is reasonable enough, but doesn’t change the fact that they used the same formula as Douglass et al 2008 for the standard error of models. As I observed in my post, Wigley said that Schmidt’s RC post was “simply wrong”. Also, as I observed in my post, Santer wrote back to Cawley, confirming that the formula of Santer et al 2008. Zwiers peer reviewed Santer et al.

    So you’re not so much arguing against me and Lewis, but Wigley, Santer, von Storch, Zwiers.

    One of the reasons why I examined the distribution of the difference in mean – citing both Jaynes and Gelman as authorities for trying to estimate posterior distributions – was to avoid the arid pontificating of Cawley and others, who fail to provide any statistical authorities for what ought to be a fairly simple problem. If you can cite any authority for Cawley and Schmidt’s pontificating, I’d be happy to review it.

  3. Dikran Marsupial says:

    In statistics it is very important to understand the problem well enough to be able to formulate the question properly before trying to answer it. In this case, we have two means, one of the observations and one of the model runs, but that doesn’t mean the correct test of model-observation consistency is to see if the means are plausibly the same. This is where thought experiments (such as the parallel Earth ensemble) can be useful in helping to understand what the models can be reasonably expected to tell you.

    This is another example of “null ritual” statistics. There is no good reason to expect the observations to exactly match the mean of the model ensemble, so if we are unable to reject the null hypothesis that there is a difference in the model and observed means, it just tells us that we don’t have enough data to confirm what we already know – that they are different.

  4. Dikran Marsupial says:

    Steve McIntre do you agree that perfect climate model ensemble (e.g. from the parallel Earths thought experiment) is essentially guaranteed to fail the Douglass et al test?

  5. Steve McIntyre says:

    You say: “Given, however, that the observations are very unlikely to produce a trend that matches the mean of all possible real trends, the model mean is very unlikely to match the observed trend, even if the model is a good representation of reality.”

    This is a well understood point among statisticians, one which Gelman has mentioned in a different context, even if it seems like a paradox to climate scientists unfamiliar with statistics. That’s why one looks at the distribution of the differences between the distributions, as I did in my post. The results are very clear. You should try reading what I wrote, before editorializing against it.

  6. Steve McIntyre says:

    I’m not arguing for the Douglass et al test. They should have had a term for standard error of observations. I’m made the narrow point that Santer et al used the same formula for standard error of models as Douglass et al had used and that Schmidt’s excoriation of that formula, as used in Douglass et al, is unjustified, or alternatively, applies equally to Santer et al 2008, of which Schmidt was coauthor.

    Nor do I recommend that the analysis be reduced to a single frequentist t-test. I think that it is more informative to look at the distribution of differences, as I did in my article.

  7. matayaya says:

    This discussion of models vs observations reminds me of the Nightly Business Report I watch on PBS every night. They routinely bring on an analyst from one of the many businesses that sell stocks and bonds. Before the analyst begins to speak about what stocks and bonds they are recommending, the moderator shows what stocks and bonds they had recommended the last time they were on the show. The predictions were right only a little better than half the time. Even knowing that they were hardly perfect, the analyst maintains credibility and I still want to know what his predictions are now. Even when they are wrong they impart useful investment information. The more informed we are, the closer we maintain our decisions to the reality.

  8. Dikran Marsupial says:

    Steve McIntyre It would help us to reach agreement more quickly if you were to give a direct answer to the question, that way I know your position on an issue that I see as relevant. A “yes” or a “no” would suffice, but if the answer is “no” than an explanation of the error would be appreciated.

  9. Steve,
    I don’t see what Santer, Wigley, or anyone else I haven’t mentioned has anything to do with this. The point is remarkably simple. We can clearly run a large suite of models which encompassed the range of possible initial conditions, to produce a large number of model realisations which would then allow us to determine the model mean very accurately. If we had a similarly large suite of observational realisations that also covered the range of possible initial conditions, we could compare the model mean with the mean from the observational realisations. However, we don’t have a large suite of observational [Edit: changed “model” to “observational”] realisations; we only have one. Also, this one observational realisation is extremely unlikely to match the mean of all possible observational realisations. Therefore comparing this observational realisation with the mean of the models (using the uncertainty in the mean as the uncertainty) is a test that will almost certainly fail and is therefore clearly the wrong test.

    If you agree with the above, then what did Gavin say in his 2007 post that was wrong? That’s all I was focusing on.

  10. Dikran Marsupial says:

    Steve McIntyre wrote “was to avoid the arid pontificating of Cawley and others, who fail to provide any statistical authorities for what ought to be a fairly simple problem.”

    The problem doesn’t lie in the statistics, so a statistical authority cannot help, the problem lies in understanding the operation of a GCM ensemble and what the output actually tells you. If the mean of the ensemble could be expected to converge to the observations then the Douglass et al test would be a reasonable test for consistency. However, that is not the case, and the correct test for consistency is to see if the observations are plausibly a realisation of the model physics (i.e. do the observations lie in the spread of the model runs).

    BTW, I don’t think the Santer test is identical to the Douglass et al test, but I will need to re-read the paper to remember the details.

  11. Steve,

    That’s why one looks at the distribution of the differences between the distributions, as I did in my post. The results are very clear. You should try reading what I wrote, before editorializing against it.

    This, however, still doesn’t necessarily resolve the issue, because you still only have a single realisation of reality, and even the distribution of the trends for this single realisation is unlikely to match the distribution of all possible trends.

  12. https://andthentheresphysics.wordpress.com/2016/05/10/the-uncertainty-on-the-mean/#comment-78682
    Physics: “However, we don’t have a large suite of model realisations; we only have one.”
    should probably be: “However, we don’t have a large suite of observational realisations; we only have one.”

  13. I don’t know much about statistics, but I think I get aTTP’s key point. To paraphrase.

    If we take thousands of model runs, only one of them is likely to come close to following observations. The problem is we cannot know which of the many model runs was the ‘perfect’ one until after the event. And the one that mostly closely traces the natural variations in the observations, is very unlikely to be the mean of all the model runs. But all model runs are valid because their tortuous traces are all produced by modelling the same inherently-random natural variations that influenced the observations.

    Is that correct?

  14. john,

    But all model runs are valid because their tortuous traces are all produced by modelling the same inherently-random natural variations that influenced the observations.

    If we were confident that the model parameters were correct, then this would be a fair thing to say. However, we don’t know that (they don’t all produce the same climate sensitivity, for example). The variability in the models can therefore be because of internal variability and different model parameters.

    I think the way I would think of this is that the observations are for a single realisation and therefore are unlikely to be close to the mean of all possible realisations. Therefore comparing the model mean with the observations is unlikely to be a fair test, because even if the model were a perfect model, it would almost always fail this test.

  15. Hyperactive Hydrologist says:

    Each climate model run is an equally plausible future, assuming correct emission scenarios. Therefore the full range of the climate model ensemble should be considered when being compared to observations. The ensemble effectively encompasses the uncertainty in the climate projections.

    It should also be noted that the exact same climate model can give quite a large spread in temperature with only small changes to the initial conditions. However, the temperature tend to converge over a longer time scale. This demonstrates the importance on natural variability in models and the fact that we are modelling a chaotic system which contributes to the uncertainty.

  16. “Therefore comparing the model mean with the observations is unlikely to be a fair test, because even if the model were a perfect model, it would almost always fail this test.”

    I’ve actually been saying this, or a variation on this, for years. (I’m glad I got it right!) Above the mean or below the mean is irrelevant. Inside or outside the model bounds is relevant.

    One thing that bothers me about the model/obs discussions is, many seem to assume that the obs are correct and the models are wrong. That fails to recognize the challenges inherent in the observations. It seems as likely as not that models could be giving us a better picture of what is actually occurring than the observations do.

    Perhaps Spencer, Christy, Lewis, et al should be considering the possibility that they need to find their own “missing heat.”

  17. Dikran Marsupial says:

    “Above the mean or below the mean is irrelevant. “ I wouldn’t go that far, but certainly from the perspective of establishing consistency, it is whether the observations are inside the model bounds or not. If the observations spend a lot of time above or below the model mean, then that may mean there is something interesting to study either in the forced response of the climate, or internal climate variability, or with the observations or all three. It is only actually irrelevant if your only interest is in falsifying the climate models, rather than understanding the science!

  18. In a genuine scientific conversation, irrelevant would a poor choice of words. “ClimateBall” is played in a different arena altogether. 😉

  19. Steven Mosher says:

    “I’ve actually been saying this, or a variation on this, for years. (I’m glad I got it right!) Above the mean or below the mean is irrelevant. Inside or outside the model bounds is relevant.”

    Here is my simple GCM

    T= 0K

    here is another simple GCM

    T= 400K

    there now observations fall inside the bounds.

    Typically we define in advance the “bounds” that are acceptable.

    For example. If you ask me to build a GCM, the first question I will ask is.
    How closely do I have to match the absolute temperature? If you want me to get
    “ice” right and other temperature driven processes correct, then I’d ask for a
    Specification. I have to get absolute temperature correct to within what?

    1K?

    How does that sound to you? If a model can’t get the absolute temperature correct to within 1K, we should probably work on it some more before using it.

  20. > With enough model realisations […]

  21. Steven,
    I agree with what I think you’re saying; we should avoid making overly simplistic interpretations of the quality of the output from GCMs.

    If a model can’t get the absolute temperature correct to within 1K, we should probably work on it some more before using it.

    It’s an interesting question, and I don’t know the answer. However, if we’re interested in how the system responds to changes and if the response likely depends linearly on these changes, then the absolute temperature may not be that crucial – within reason.

  22. Steven Mosher says:

    “If we take thousands of model runs, only one of them is likely to come close to following observations. The problem is we cannot know which of the many model runs was the ‘perfect’ one until after the event. And the one that mostly closely traces the natural variations in the observations, is very unlikely to be the mean of all the model runs. But all model runs are valid because their tortuous traces are all produced by modelling the same inherently-random natural variations that influenced the observations.

    Is that correct?

    ##############

    yes and it also undermines using the mean of models as guidance.

  23. Steven Mosher says:

    “It’s an interesting question, and I don’t know the answer. However, if we’re interested in how the system responds to changes and if the response likely depends linearly on these changes, then the absolute temperature may not be that crucial – within reason.”

    I think the key is within reason.

    A while back I tried to define what I thought were ‘reasonable’ specifications.
    Then I stopped. Because I have no clue.
    Ideally, the folks building models would formalize this for themselves. document it.
    and measure their progress toward hitting their self imposed specifications.
    yes. I trust them to do their best. but I also know that you cannot improve what you dont measure
    and document.

  24. If I’m interpreting your intent correctly, aren’t you saying pretty much what Schmidt says when he states that, ‘Models are wrong, but they are instructive.’ (May not have the exact wording correct.)

    The question seems to be what the difference between the model mean and the obs tell us. Contrarians seem to believe that means models are wrong. Others are saying the difference, if they’re within the model bounds, doesn’t really tell us much about the predictive capacity of the models. I’m adding that, especially since we’re discussing satellite data in this case, the difference probably tells us even less because of the high degree of uncertainty in those observations.

  25. Rob,
    I think I largely agree. We know that there is a range of climate sensitivity for the models, so they can’t all be reliably representing long-term warming. They also, I think, can have different levels of internal variability. However, they do allow us to investigate what might happen under different future scenarios, even if we do have to accept that we can’t constrain this precisely. Given the uncertainty in the observations and that we only have one realisation of reality, we’re – however – not yet in a position to reject any of the models.

  26. Steve McIntyre says:

    You say: “BTW, I don’t think the Santer test is identical to the Douglass et al test, but I will need to re-read the paper to remember the details.” Jeez, can’t any of you read. I said that its formula for the standard error of models was identical to Douglass’ formula for the standard error of models (which is what Schmidt had ranted against). The Santer test added a term for standard error of observations, and differs from the Douglass test on that point, as I stated on several occasions.

    Schmidt appears to have recanted from his 2007 opposition to this method when he coauthored Santer et al 2008.

    As I pointed out in my blog article, the divergence is now even more substantial and the t-test fails even if one uses the incorrect formula advocated by Schmidt in 2007.

  27. Steve,
    I still fail to see what Santer et al. have to do with this. Either what Gavin said in his 2007 post is correct, or it’s not. I don’t think that whether or not he authored a later paper is remotely relevant.

    even if one uses the incorrect formula advocated by Schmidt in 2007.

    What formula did Gavin advocate in 2007? All I can see that seems relevant is

    That defines the likelihood that one realisation (i.e. the real world) is conceivably drawn from the distribution defined by the models.

    which doesn’t appear to be suggesting any kind of t-test.

  28. BBD says:

    Before this moves too far away from the physical underpinning let’s recall that natural variability in the single-instance real world overprints the forced trend in the short term. In the long term, the forced trend dominates. So very little can be inferred about sensitivity from looking at short-term climate behaviour.

  29. Steven Mosher says:

    ““Dikran,
    True, it’s normal to not want to admit an error, but good scientists typically do.”

    “Steve,
    I still fail to see what Santer et al. have to do with this. Either what Gavin said in his 2007 post is correct, or it’s not. ”

    [Mod: Sorry, I’m not interested in trawling through leaked emails.]

  30. Steve McIntyre says:

    You asserted/implied that the view that Schmidt’s 2007 analysis was incorrect was associated with me and Lewis, thereby falsely personalizing it. Santer, Wigley, Zwiers all also opposed Schmidt’s analysis, with Wigley specifically stating that Schmidt’s RC article was “simply wrong”. Santer et al, 2008 has quite a bit to do with it, since it took exactly the same position on standard error of the models as had been criticized by Schmidt in 2007. Santer was aware of the irony.

    I’ve pointed this out several times, because commenters disbelieved that Santer et al 2008 had used the same formula for the standard error of models as Douglass et al. However, I can assure you that I am not mistaken on this point. As to whether this “matters”, Santer, Wigley and others thought it did.

  31. MMM says:

    Doesn’t Santer et al. 2008 actually say essentially the same thing that Gavin said in 2007? E.g., “DCPS07’s use of σSE is incorrect. While σSE is an appropriate measure of how well the multi-model mean trend can be estimated from a finite sample of model results, it is not an appropriate measure for deciding whether this trend is consistent with a single observed trend” (from http://pubs.giss.nasa.gov/docs/2008/2008_Santer_etal_1.pdf, literally 4 lines above the part that McIntyre quotes in his own post). This was then demonstrated in Santer et al. 2008: “Application of the unmodified DCPS07 test to synthetic data leads to alarmingly large rejection rates of H2 (Figure 5B; red lines). Rejection rates are a function of N. For 5% significance tests, rejection rates rise from 65 to 84% (for N = 19 and N = 100, respectively)”. Which appears to demonstrate quite nicely exactly the argument that Gavin was making (e.g., that the larger the number of model runs, the more in error you appear to be, if you are using standard errors based on the number of model runs).

    McIntyre seems to focus in on the fact that there is still a term in the equation where the variance of the model average trend is divided by the number of models – while casually noting the inclusion of the variance of the observed trend. But, in my opinion, the inclusion of the variance of the observed trend changes the whole discussion: wouldn’t that variance be equal to the variation you’d expect in many runs of a perfect model? So if you run your perfect model infinite times, it makes sense that the divisor would collapse to the variance of the observed trend… but, importantly, no further!

    The whole point Gavin was originally making was that the DCPS formulation leads to a world in which if you run your perfect model an infinite number of times, then no individual model run will be consistent with the model mean: demonstrated nicely in Santer et al. 2008. Include the variance of the observed trend (which I think would be similar to the variance derived from a single perfect model run), and then 95% of your individual runs are consistent with the 95% uncertainty bounds. Voila!

    -MMM

  32. wheelism says:

    Mmm…closure.

  33. mrooijer says:

    Some comments after looking quickly at the articles mentioned:

    (a) Douglas e.a. use sigma/sqrt(N-1) (par 2.3). But that is in itself not a t-test. Using a t-test here for two independent groups (one group with one observation) would mean that the sqrt(N-1) disappears again from the weighting of the two groups. But they did not do that. So: using this here is just wrong. It is (hence) very easy to write a simulation (I just wrote one in 10 minutes) to produce counter examples.

    (b) Santer ea 2008 (“Consistency of modelled and observed temperature…”) introduce as eq. 12 a modified t-test-like formula to replace the incorrect formula from Douglas ea. So Steve McIntyre is wrong: they do not use the same formula, The Santer formula just happens to have a sqrt(N) in it in the correct place, whereas it should not have appeared in Douglas ea formula et all.

    (c) Santer ea also show with a simulation (“synthetic data” in par 6) and the results in fig. 5 why the Douglas method is wrong.

  34. Tom Curtis says:

    Steve McIntyre: “Jeez, can’t any of you read.”
    Well evidently you cannot as you have entirely missed the point of Schmidt’s argument; not to mention ATTP’s and (in particular) Dikran Marsupial’s very clear exposition above. They do not disagree on the method of determining the error margin of the mean of model trends. Rather, they claim that considering models to be falsified if observations fall outside the error margins of the mean of model trends is an error.

  35. Tom Curtis says:

    Steven Mosher: “Typically we define in advance the “bounds” that are acceptable.”

    How about +/- 0.5% of absolute temperature (Kelvin):

    Of course, that is not entirely realistic in that the determination of the observed absolute GMST has an error margin approximately as large as the distribution of the model absolute GMST.

  36. wheelism says:

    Someday we’ll find it,
    The McMosh Connection –
    The Audits, the CRUtapes, and me.

    (…and Then There’s mod:
    Apologies, and I’m out.
    – Your Humble “B” Author)

  37. Ron Graf says:

    Steven Mosher: “Typically we define in advance the “bounds” that are acceptable.”

    Apparently that is done with GCMs during their construction and used to tune and test their workings. Once they are delivered to the IPCC, however, they are not accompanied by any certification that includes a warranty test. That would have solved a lot a grief (and political sensitivities about later rejection.)

    Allowing the protocols to be what they are now allows anyone who is skeptical to say the models are not valid and anyone who is a true believer in them to say “you can’t prove that.”

    ATTP: However, we don’t have a large suite of observational [Edit: changed “model” to “observational”] realisations; we only have one. Also, this one observational realisation is extremely unlikely to match the mean of all possible observational realisations. Therefore comparing this observational realisation with the mean of the models (using the uncertainty in the mean as the uncertainty) is a test that will almost certainly fail and is therefore clearly the wrong test.

    If we had a more than one realization of reality the model mean test criteria would would be expected to match their mean with increasing precision proportionally for N realizations. The fact that we have only one but it falls outside the 95% population of the model means is not anything to crow about. BTW, if you read Steve Mc’s post there were many issues he covered, not just this.

    Steve Mosher, I’m glad to see your objectivity here.

  38. > You are sneering without knowing the facts.

    Where’s the sneering, and is sneering justified when knowing the facts?

    Speaking of reading skillz, perhaps auditors might show how it’s done:

    https://mrooijer.wordpress.com/2015/11/09/why-validation-in-principle-components-analysis/

  39. KR says:

    Note that climate is about the _statistics_ of weather – if the model ensemble is good, their SD range will encompass the observations. But their fidelity cannot be measured without comparing the _statistics_ of yearly observations. Comparing details of short term observation trajectories and from those claiming model success or (hello, Steve) failure is a basic categorical error. ATTP is quite correct, as is Gavin.

  40. Steve Mosher writes: “Typically we define in advance the “bounds” that are acceptable.”

    This may be true in engineering, but it is not true everywhere – especially at the edge of what can be done at all. Oftentimes we are happy to make any measurement at all; i.e., we have to start somewhere. Hopefully we can then refine the method, or find a better method, reduce uncertainties and eventually reach a point where a given set of expectations has a reasonable chance of being met.

  41. Ron Graf says:

    KR says:

    Comparing details of short term observation trajectories and from those claiming model success or (hello, Steve) failure is a basic categorical error. ATTP is quite correct, as is Gavin.

    So it was it a mistake for the IPCC not to pre-establish a 15-year benchmark criteria for the CMIP3 or CMIP5? If it was how would you say they correct it now? If it was scientifically correct not to pre-establish a criteria for each model of the ensemble as a whole how would you propose constitutes scientific validation?

    oneillsinwisconsin says:

    This may be true in engineering, but it is not true everywhere – especially at the edge of what can be done at all. Oftentimes we are happy to make any measurement at all…

    Same question to you.

  42. Ron – when we’re working in areas that are at the edges of what;’s possible we simply accept the best we can do. The Wright Bros. didn’t start out with performance specifications for an F-22. If they had, they would have just shook their heads and stayed home thinking, “Sorry, this isn’t possible.”

    For mature technologies it’s easy to write specifications for what’s ‘acceptable.’ And in the commercial marketplace the consumer will tell you what is or isn’t acceptable. But for fields that are in their infancy or working on the edge of possibility, ‘acceptable’ often means that they work – period.

  43. Steven Mosher says:

    “This may be true in engineering, but it is not true everywhere – especially at the edge of what can be done at all. Oftentimes we are happy to make any measurement at all; i.e., we have to start somewhere. Hopefully we can then refine the method, or find a better method, reduce uncertainties and eventually reach a point where a given set of expectations has a reasonable chance of being met.”

    yes, I recognize that. That is one of the reasons why I stopped what I was doing.
    Still, it’s not necessarily true that one must choose NOT to establish bounds.
    The question is what can be gained by establishing bounds. So, for example.
    We have pretty good knowledge that the earth average temperature is around 15K.
    Setting up a boundary: “we will get the temperature right to within 5K” can serve a purpose, even if its arbitrary. You’d clearly reject a model that got an answer of 0K. or would you?
    In other words saying “its not true in all cases” merely states a fact about current practice.
    My sense is that current practice would be improved by self imposed boundaries. documenting them, and working toward continuous improvement. you dont argue against that by saying
    “we currently dont do that”. You argue against that by.

    A) point out that its already being done.
    B) arguing that setting boundaries and documenting improvement will lead to worse science.

  44. Steve McIntyre says:

    I tried to comment six hours ago, but the comment is still in moderation, while numerous other comments have appeared. One more time, Santer et al 2008 rejected the argument of Schmidt’s RC post and, for the calculation of the standard error of models in the denominator of their t-test, used the same formula for standard error of models as Douglass et al 2008, while adding a term for the standard error of observations. Please do not keep insisting that I am wrong about this or that I am making some elementary mistake about it – I am not. If you think that the above comments are in error, then you don’t understand them.

    If you need further clarification on this, simply read Climategate correspondence about the topic between Jan 2008 and Oct 2008.

    In saying this, I am not saying that the calculations of Douglass et al 2008 were “right”. The formula of Santer et al 2008, as I observed at the time, was better if one were looking for a frequentist formula. The defect of Santer et al was that they used obsolete data and their results were invalid with up-to-date data. Given the recriminations at Real Climate when Courtillot used obsolete data, it was hypocritical that they shut their eyes to Santer’s use of obsolete data. Santer’s claims are even more invalid now, using up-to-date data.

  45. Steven Mosher says:

    “The Wright Bros. didn’t start out with performance specifications for an F-22. If they had, they would have just shook their heads and stayed home thinking, “Sorry, this isn’t possible.”

    Actually they did start out with a spec, unwritten of course. That spec was “the device shall fly”
    As for the f-22 Spec, there were certain things that were impossible. So the spec was relaxed.
    Safe ejection throughout the full performance envelope was not possible within the cost and weight spec. Try ejecting at 800 KEAS on the deck. Simulating this was a bitch.
    In the end we looked at historical data for where people ejected and simulated battles for the aircraft state at which engagments occurred. Basically there were parts of the flight envelope where there was no safe ejection. You basically lost arms and legs.

  46. Steve McIntyre says:

    Re Willard’s comment about principal components analysis: your link was a nothingburger. Mann had claimed that the HS shape (arising from the peculiar strip bark bristlecone chronologies) was the “dominant pattern of variance”, whereas we observed that it was a lower order (PC4) pattern associated with bristlecones, about which the original authors said that the shape did not arise from temperature. The NAS panel said that these proxies should be avoided in temperature reconstructions. I would have thought that that would have put an end to further foolish discussion about their backdoor inclusion in a reconstruction as a lower order PC, but people seem to want to endlessly litigate this. Our comments on this topic were completely accurate and untouched by this silly critique. Beyond this comment, I don’t have a whole lot of interest in arguing about it with people who have not familiarized themselves with the topics. Nor do I regularly keep with comments at blogs other than Climate Audit, so, if I don’t respond further on this issue, please do not interpret that as acquiescence.

  47. Steven Mosher says:

    “Steven Mosher: “Typically we define in advance the “bounds” that are acceptable.”

    How about +/- 0.5% of absolute temperature (Kelvin):”

    ###################

    Tom.

    I started to build a spec and then I stopped. basically it seemed like it was way above my pay grade. Not my job. not my area of expertise. I would end up saying something stupid or wrong. So i stopped. hence, The point I would make is very limited.
    Its a point about the process.

    I’m pretty sure that folks do this kind of checking in an informal sense. or maybe ad hoc after the fact.

    An example would be acceptable drift during control runs.

    For attribution studies the IPCC (Ar4) did limit the models they used by winnowing out those that
    had large drift during control. That it seems to me could be something to impose on all models
    “Drift shall be less than X”

  48. Steven Mosher says:

    “[Mod: Sorry, I’m not interested in trawling through leaked emails.]”

    That’s fine.

    “Dikran It would help us to reach agreement more quickly if you were to give a direct answer to the question, that way I know your position on an issue ”

    At any time did you think Santer 2008 had an error in it?

  49. Steven Mosher says:

    think of that as a Tol test. You know ask a direct yes or no question and get answer.
    goose gander kind of thing.

  50. Willard says:

    > your link was a nothingburger […] untouched by this silly critique.

    Strong arguments we have there.

    ***

    > I don’t have a whole lot of interest in arguing about it with people who have not familiarized themselves with the topics.

    That’s OK, I guess, since the author seemed more interested with something else than sneering:

    Reactions that are directly related to the statistics and R code in this draft are welcome. If you have someting else to say wait until after the first revision.

    https://mrooijer.wordpress.com/2015/11/09/why-validation-in-principle-components-analysis/

    This sneering may be justified because the facts pertaining to that critique are arguably known.

  51. Marco says:

    “Given the recriminations at Real Climate when Courtillot used obsolete data”

    Ah yes, Steve McIntyre reminding us of another Climateball episode at Climate Audit, where Steve McIntyre *misrepresented* what Ray Pierrehumbert complained about. It wasn’t the use of obsolete data, it was presenting data as data that it wasn’t! And to add injury to insult, the data they actually used was not fit for the purpose.

    McIntyre also managed to twist a statement in the Note Added in Proof:
    “For the global temperature Tglobe curve cited from Jones et al. (1999) in Courtillot et al. (2007), these authors now state in their response that they had used the following data file: monthly_land_and_ocean_90S_90N_df_1901-2001mean_dat.txt We were unable to find this file even by contacting its putative author who specifically stated to us that it is not one of his files (Dr. Philip D. Jones, written communication dated Oct. 23, 2007).”

    McIntyre claims that since the data that Courtillot et al *actually* used was data from a paper on which Jones was a co-author, it was an *incorrect* claim of Jones. Clearly, anyone with basic English skills can see that Jones was asked whether he recognized the filename that Courtillot et al had given as the source. Jones said it isn’t one of his files, and at no point does McIntyre show *any* evidence that this is untrue.

    Of course, it turned out that the first response by Courtillot et al was untrue, and they had actually used a reconstruction from Briffa et al, rather than the HADCRUT reconstruction for which they had given the file name. In that sense it was maybe OK to remove this particular part of the Note Added in Proof – however, the way it was done meant that the Note Added in Proof could not indicate that the data chosen now made even less sense, because it was a April-September proxy-based reconstruction of part of the NH.

  52. Steve,

    You asserted/implied that the view that Schmidt’s 2007 analysis was incorrect was associated with me and Lewis, thereby falsely personalizing it.

    Well, because as far as I can tell, both you and Nic Lewis said it was. I’m more interested in whether or not what was said is correct, than personalising this.

    I tried to comment six hours ago, but the comment is still in moderation, while numerous other comments have appeared.

    I went to bed. My system seems to catch some things and not others.

    If you need further clarification on this, simply read Climategate correspondence about the topic between Jan 2008 and Oct 2008.

    I’d really rather not. I’m not interested in what people said ~10 years ago, I’m more interested in whether or not what I wrote in this post with respect to using the uncertainty on the mean is correct, or not. Just to be clear, the argument is simply that the observations are a single realisations and therefore comparing the means is a test that is almost certainly going to fail. I don’t really think that you’ve actually addressed that.

  53. Ron,

    if you read Steve Mc’s post there were many issues he covered, not just this.

    Indeed, but this was the bit I found interesting and hence wrote about.

  54. Steve McIntyre wrote “You say: “BTW, I don’t think the Santer test is identical to the Douglass et al test, but I will need to re-read the paper to remember the details.” Jeez, can’t any of you read.”

    Rather ironic I thought that Steve appears not to have read what I actually wrote sufficiently assiduously. I was pointing out that my memory of a paper I had last read seven years ago might not be reliable.

  55. Steven Mosher asked “At any time did you think Santer 2008 had an error in it?”

    Yes, I did. I recall having my concerns addressed, but can’t remember the details; having re-read the paper my concerns have re-appeared once more. The test in Santer et al. (If I understand it correctly) is much better than that in Douglass et al., but I don’t think it is correct as a test of consistency (as I suspect it would fail a perfect model ensemble). The test used by Schmidt in the RC articles (subject to the caveats mentioned in Victor’s excellent blog article) is the test I would use.

    BTW, you don’t need to remind me of what I wrote, I am generally happy to answer scientific questions (although being only human, I may fail in that sometimes).

  56. Steven Mosher wrote “think of that as a Tol test. You know ask a direct yes or no question and get answer. goose gander kind of thing.”

    Shame I only read that after answering, it suggests you were not interested in the answer, but just playing rhetorical games. You will note that I did answer the question, both unambiguously and with more detail than was asked for. Compare and contrast if that is the sort of thing that interests you.

  57. Steve McIntrye wrote “In saying this, I am not saying that the calculations of Douglass et al 2008 were “right”.”

    It would help the discussion make progress if you were to say whether you think it is “right” or “wrong”, and likewise whether you think the Santer et al test is “right” or “wrong”. Surely the important issue is whether Gavin’s RC articles are correct, rather than merely whether there is an inconsistency between the RC article and the Santer et al paper. If the Santer et al paper is wrong, then the RC articles would be correct to be inconsistent with it.

  58. Steven Mosher wrote “yes and it also undermines using the mean of models as guidance.”

    I agree with this, however it is worth pointing out that climatologists generally don’t use only the mean as guidance (e.g. for assessing impacts), they generally take the uncertainty into account as well (e.g. by running all of the ensemble runs through their impacts model, so that the uncertainty in the projected climate is propagated though to give the uncertainty in the impacts).

    Where the do look only at the mean it is because they are interested in only the forced response of the climate (ignoring internal climate variability), which is the relevant information for some questions.

  59. Steve Mosher wrote “For example. If you ask me to build a GCM, the first question I will ask is.
    How closely do I have to match the absolute temperature?”

    This is not a workable approach because we can’t determine the plausible effects of internal climate variability from a single observed realisation that is being forced by our activities. This means we can’t estimate, without the model we want to create, how closely we can reasonably expect the model to match the absolute temperature.

    Of course this is not an ideal situation, but it is the one forced upon us by a lack of a time machine or access to parallel universes. So the best we can do is to encode our best estimate of climate physics into a model that can make testable projections and use that as our best estimate of the likely consequences of our actions. However the real problem in deciding future action is not the reliability of the model, the socio-politico-economic issues are far more difficult to deal with (see e.g. Mike Hulmes’ book on “why we disagree on climate”) and science can’t help much with that.

  60. Tom Curtis makes a very important point “They do not disagree on the method of determining the error margin of the mean of model trends. Rather, they claim that considering models to be falsified if observations fall outside the error margins of the mean of model trends is an error.”

    Testing for consistency is not a test of model skill, it is a test of whether the model is “falsified” by the observations (in the sense that the model cannot “explain” the observations). Asserting consistency is not a claim of great model skill, and I don’t recall seeing anyone claim it is, as consistency is a rather low hurdle. However, the fact that it is a low hurdle means that it is a very harsh criticism of a model if (including all relevant uncertainties) an inconsistency can be shown. That is why the Douglass et al paper was making a very strong claim, and thus needed to have a strong argument. Unfortunately (for them) they didn’t as their test for consistency was obviously false, as a perfect climate ensemble would be guaranteed to fail it.

    There are lots of ways that model skill can be measured, or the difference between observations and models, but that is not the same question as consistency. FWIW I think Gavin’s bar charts do a good job of depicting how well the observations agree with the models, taking into account the uncertainties, which is “not particularly well”. However the reason for this is not necessarily that the models are biased warm; it might also be that the models underestimate internal climate variability, or that the observations are biased cool, or some combination of the three, and research is required to look into it (and indeed is in progress).

  61. However the reason for this is not necessarily that the models are biased warm; it might also be that the models underestimate internal climate variability, or that the observations are biased cool, or some combination of the three, and research is required to look into it (and indeed is in progress).

    I think this is a key point. One can compare models and observations to see how consistent they are. Showing that they aren’t, or that there are indications that they aren’t, doesn’t immediately allow one to conclude that it is the models that are wrong. As you say, it could be the models, the observations, or some combination of both.

  62. Steve Mosher wrote “I trust them to do their best. but I also know that you cannot improve what you dont measure and document.

    Indeed, that is what the *MIP projects achieve. I’d also say that you can’t improve anything by measuring and documenting alone (not relevant to the climate science issue, just something that tends to be forgotten in QA ;o).

  63. semyorka says:

    If the “models are running hot” why is the primary focus of these people not to produce better models then? Especially if its just a simple explanation, such as some cloud feedback. My experience with the creationists is that they are far more focused on raising minor issues to present as doubts and “falsifying” evolution than actually producing science. How would our friends feel that they are any different?

  64. angech says:

    So many comments;
    Dikran Marsupial says: May 10, 2016 at 4:14 pm
    “In statistics it is very important to understand the problem well enough to be able to formulate the question properly before trying to answer it.”

    Well said.

    “We have two means, one of the observations and one of the model runs, but that doesn’t mean the correct test of model-observation consistency is to see if the means are plausibly the same.”

    1, We only have one observation, so technically not a mean.
    2 We have many model runs and can do infinite combinations with them to obtain infinite means not just one. This is a very important statistical point in view of the comments re possible divergence of models others quote later on.
    3 The correct test of consistency is to show plausibility of the result being the same.
    True you can argue that there are good reasons why a model mean diverges from the reality it is supposed to be estimating. But this in no way implies that divergence is a good thing ever in proving consistency.

  65. We have many model runs and can do infinite combinations with them to obtain infinite means not just one.

    I’m not sure what you mean by this. You can clearly produce a mean trend for every simulation. You could produce a mean for ensembles of a particular model. However, if you consider the ensemble of all model runs, then there is only one mean, and if we carry a very large number of model runs, the uncertainty in this mean can become very small.

    The correct test of consistency is to show plausibility of the result being the same.

    Again, not quite sure what you mean. If you select models on the basis of their assumptions matching what happened in reality, then this may be a fair point. The point here, though, is that across the ensemble of models run we know this isn’t true. So, that the mean of the ensemble of model runs doesn’t match what happened in reality isn’t necessarily an indication of some kind of major inconsistency. We clearly want to understand differences, and it may be an indication of some kind of inconsistency, but jumping to that conclusion would be wrong.

  66. angech wrote 1, We only have one observation, so technically not a mean.

    We have one observable tropical trophosphere, but we have multiple observations of it (satelites, radiosondes etc.), each of which have multiple competing products (e.g. UAH and RSS and others for satellite data), each of which have their own biases and issues, so taking a mean and a variance gives an indication of what they say and the structural uncertainties.

    2 We have many model runs and can do infinite combinations with them to obtain infinite means not just one. This is a very important statistical point in view of the comments re possible divergence of models others quote later on.

    No, this is not correct, there are only a few reasonable methods for constructing an average (ensemble mean and mean-of-means being the most obvious). I don’t think anyone uses anything else, do they?

    3 The correct test of consistency is to show plausibility of the result being the same.

    No, that is not the correct test, because if you understand the operation of a climate model ensemble, then there is no good reason to expect the observations to exactly match the ensemble mean. As I discussed above, a perfect model ensemble would be guaranteed to fail that test.

  67. angech says:

    Basically ATTP is defending an argument that any run of models can produce a result that is different from the expected mean.
    The more model runs the more likely that any particular model mean will deviate from the expected mean.
    This is true and is an expected outcome in statistics.
    It would be much more remarkable if any model or model mean actually faithfully [correctly] modeled the single observation in this case or the exact model mean.

    Gavin said “the formula given defines the uncertainty on the estimate of the mean – i.e. how well we know what the average trend really is. But it only takes a moment to realise why that is irrelevant. Imagine there were 1000’s of simulations drawn from the same distribution, then our estimate of the mean trend would get sharper and sharper as N increased. However, the chances that any one realization would be within those error bars, would become smaller and smaller.”

    In practice the probability would remain exactly the same. Statistically the probability of the particular run falling within range of the observation or expected mean can be described by standard normal distribution.
    Thus it is far more likely that any one model run will fall within one standard deviation [64.3%] and 95.4 fall within 2 standard deviations. There should in fact be 50% of the possible distributions below the actual observation in most model runs.
    Perhaps the inability to incorporate larger Natural variation parameters is the fourth reason for the model divergence.
    Gavin is right and McIntyre is right.

  68. semyorka said, “If the “models are running hot” why is the primary focus of these people not to produce better models then?”

    But this brings up my question again. Why do we assume that the models are hot instead of the the observations being cool? Or, even some relative combination of both?

  69. angtech wrote “In practice the probability would remain exactly the same. “

    That is incorrect, the standard error of the mean is the standard deviation divided by the square root of the ensemble size, so as the ensemble grows towards the infinite ideal, then the probability of remaining within 2 standard deviations (Gavin’s test) remains about the same, but the probability of staying within 2 standard errors of the mean (the Douglass et al test) goes to zero.

  70. Anders

    Just to be clear, the argument is simply that the observations are a single realisations and therefore comparing the means is a test that is almost certainly going to fail. I don’t really think that you’ve actually addressed that.

    The test would fail if you omit the term that estimate of ‘the uncertainty in the observations’. (This is predominantly due to “weather noise”.) The test would not almost certainly fail if one accounts for that term.

    It’s important to understand the estimate of the uncertainty in the observations is an estimate of the difference between the actual earth trend that happened and the “expected value”. Often this estimate is obtained from time series– but it could be obtained from model spread. But in neither case do you get the result that one will always reject the model mean when the number of models approaches infinity. That just doesn’t happen.

    Heck…you can do monte carlo on this and see it doesn’t happen. (Santer did btw. I’ve ginned it up on R. Doesn’t happen. This is a standard test. But Douglas et al. screwed the pooch by leaving out a term.)

  71. Lucia,

    The test would fail if you omit the term that estimate of ‘the uncertainty in the observations’. (This is predominantly due to “weather noise”.) The test would not almost certainly fail if one accounts for that term.

    Indeed, but the context of this post was what Gavin said about Douglass et al. (2008) who did not include the uncertainty in the observations.

  72. angech,
    I’m having to answer this in two places.

    Basically ATTP is defending an argument that any run of models can produce a result that is different from the expected mean.

    No, this is not what I’m saying. We only have observations of a single realisation of the real system (i.e., what actually happened). Therefore that single realisation is very unlikely to be close to the mean of all possible realisations. Therefore we cannot know what the expected mean should be. Therefore simply comparing the observations with the ensemble mean of the models (which we could represent very accurately if we ran very many models) is a test that would typically fail. As Lucia says (and as Santer et al did) you could then include the uncertainty in the observations and this test would be less likely to fail. However, even that may not be a fair test given that the distribution of the observed trends may not (probably isn’t) the same as the distribution of all possible real trends.

  73. Lucia,

    It’s important to understand the estimate of the uncertainty in the observations is an estimate of the difference between the actual earth trend that happened and the “expected value”.

    I don’t completely agree with this. It is probably true that the variability in the observed data gives an indication of how much internal variability could influence the mean trend (or the forced response). Hence, including the uncertainties in the trend is clearly an improvement over not doing so. However, the distribution of trends from a single observational realisation is probably not a good representation of the distribution of all possible trends for this system (i.e., the mean of the single observation is probably not the same as the mean of all possible realisations of this system).

  74. mrooijer says:

    But Douglass et al. screwed the pooch by leaving out a term. Exactly, and that second term is the variance of the observations (as in the t-test). Removing that term is the reason that his method will always reject the observation when the number of models approaches infinity.

    And that is why McIntyre’s comments are completely besides the point. He refers to Gelman’ s comment on the difference of two means. That does not apply here, because one of those means comes from one observation with zero variance according to the calculations of Douglass e.a. He also refers to the fact that the same formula for the standard error of the mean was used in the other paper. They also both calculate the mean in the same way – what does that tell you? Nothing. A nothingburger avant la lettre.

  75. Joshua says:

    ===>> I tried to comment six hours ago, but the comment is still in moderation, while numerous other comments have appeared.

    I’m glad to see that I’m not the only one that Anders censors.

  76. Joshua… I don’t think he’s censoring anyone when he’s asleep.

  77. Steve,

    You asserted/implied that the view that Schmidt’s 2007 analysis was incorrect was associated with me and Lewis, thereby falsely personalizing it.

    No, it’s because you and Nic Lewis said it. I don’t really care who else thinks it’s wrong, your post was the first time I’d come across someone who said it was wrong. So, either Gavin is wrong, or he isn’t. That’s the key point, IMO.

  78. Willard says:

    > I tried to comment six hours ago, but the comment is still in moderation, while numerous other comments have appeared.

    Since this happened an uncountable number of times on your own website over the years, that “observation” is well within the means of WP. It also happens a lot at Judy’s. And then there are spam filters.

    The truth is out there.

  79. MMM says:

    McIntyre: “One more time, Santer et al 2008 rejected the argument of Schmidt’s RC post”: But they DIDN’T! If you look at my previous comment, Santer et al explicitly stated that “DCPS07’s use of σSE is incorrect” for much the same reason that Schmidt was critiquing them (e.g., that the more model runs you have, the higher a percentage of potential observations you reject).

    Now, if you want to say that _Cawley_ was wrong because he misinterpreted Schmidt as disagreeing with Santer… well, now we’re diving into Climategate emails, which I’m not doing here. And Wigley was responding to Cawley in another climategate email, and is known for shooting from the hip, so I wouldn’t put the same weight on his “simply wrong” that you do.

    Dikran: “I suspect it would fail a perfect model ensemble”: But Santer et al. does a synthetic data test, and while it wasn’t a perfect match, it was pretty close. I think you are falling into the same trap as Cawley (and in a weird way, McIntyre): you see the division by the number of models and you say, aha! this goes to zero as the number of models goes to infinity, therefore, it fails! Except that the addition of the variance of the observed trend means that as the number of models goes to infinity, the denominator in the test statistic no longer goes to zero but rather approaches b0. See below.

    Lucia, ATTP: I think there’s an interesting question here – is b0 really the right measure? If we had a perfect model, maybe it would be better to use some measure of the standard deviation of the runs of the perfect model to get that term (rather than the variance of a single realization of that model, even if they might be expected to be equal on average). But we don’t have a perfect model: we don’t even have a single model: so I think it might be reasonable to guess that the standard deviation of a bunch of random models would not come as close to matching the standard deviation of a perfect model as well as the variance of the observed data… but I don’t know. It depends on how good the models are, and whether I am properly understanding the role of b0.

    -MMM

  80. Lucia wrote “It’s important to understand the estimate of the uncertainty in the observations is an estimate of the difference between the actual earth trend that happened and the “expected value”. Often this estimate is obtained from time series– but it could be obtained from model spread.”

    I don’t see how this can be reliably estimated from time series when we have only one observable realisation of the process that is currently being vigorously forced. I don’t see how we can reliably distinguish between the forcing and the internal climate variability with a statistical, rather than a physical model.

    It is probably a good idea to avoid terms with specific statistical meanings, such as “expected value”, as a lot of the difficulty is caused by not focussing on the physical meaning of the projections. “Forced component of the trend” is probably a better representation of what the ensemble mean actually tells you, calling it the “expected value” gives the impression that it is the value we expect to observe, which it isn’t, we only expect (in the everyday sense) that the observations lie in the spread of the models.

    If the uncertainty in the observations were an estimate of the unforced component of the trend, rather than merely structural uncertainty in its measurement, then the Santer et. al. test would be fine. Perhaps that was the explanation that reassured me back in 2008.

  81. MMM,

    Except that the addition of the variance of the observed trend means that as the number of models goes to infinity, the denominator in the test statistic no longer goes to zero but rather approaches b0. See below.

    I think it is an interesting question. As I was trying to respond to Lucia, I can see how the uncertainty in the observed trend can give some indication of the possible range of trends, but the distribution of trends from this single observation would seem to be unlikely to match the distribution of all possible observed trends, which – in a sense – is what running a large suite of models would be trying to represent.

    I also think that the uncertainty in the observed trend may still not be quite sufficient, given that internal variability can enhance, or suppress, forced warming and so this the mean of the observations and the uncertainty in the observed trends may still not be a completely fair representation of the range of possible trends.

  82. MM wrote “Dikran: “I suspect it would fail a perfect model ensemble”: But Santer et al. does a synthetic data test, and while it wasn’t a perfect match, it was pretty close. I think you are falling into the same trap as Cawley (and in a weird way, McIntyre): you see the division by the number of models and you say, aha! this goes to zero as the number of models goes to infinity, therefore, it fails! Except that the addition of the variance of the observed trend means that as the number of models goes to infinity, the denominator in the test statistic no longer goes to zero but rather approaches b0. See below.

    First of all, it would be logically impossible for Cawley and I not to simultaneously fall into the same trap! ;o)

    If we had a perfect ensemble, it shouldn’t rely on uncertainty in the observations to prevent it from failing the test. If we had a perfect model and perfect observations, the perfect model should still pass the Santer et al. test with high probability, but I don’t think it does (I may have missed something). As I said, the Santer et al test is still vastly better than the Douglass et al test.

  83. Joshua says:

    Rob –

    ==> Joshua… I don’t think he’s censoring anyone when he’s asleep.

    Indeed. I was making a joke in a sort-of-Poe.

    1) I was mocking the self-important attitude that I see so often, that if someone’s comment is deliberately moderated by a blog host, it amounts to “censorship.”

    2) I was mocking Steve’s use of a variation on “But, RealClimate moderation..” technique of climate wars bickering.

    3) I was pointing out Steve’s facile reasoning, cloaked under a rather typical cover of plausible deniability. Steve moderates comments at his own blog based on what I consider to be arbitrary (in the sense of not being objective) application of criteria, so for him to whine it about it happening to him, even if it were, which wasn’t the case, is rather ironic. Further, that Steve would even go there, given that he knows that many times comments wind up in non-intended moderation at his own blog, and given that we all know that most blog software has seemingly capricious filtering algorithms, makes it rather interesting that he would react with an implication of victimhood and persecution without first doing a good faith investigation of the situation. Just because he does that with respect to what happens with his blog comments doesn’t necessarily generalize to his thinking and behavior in other contexts, but it is “information.”

  84. Eli Rabett says:

    An interesting question wrt model ensembles is that as the number of realizations grows does the knowledge of the distribution of results improve. For an ensemble of different models with only single runs no. For the same model with different initial conditions or an ensemble of models each with several runs based on different conditions yes, but the statistical properties of the variance will not be simple and will depend on the details of the model.

    This comment by Kevin O’Neill at moyhu gets at some of this
    http://moyhu.blogspot.com/2016/04/averaging-temperature-data-improves.html?showComment=1461420363354#c6502775355858010117

  85. pbjamm says:

    I am not a statistician so have not much of value to offer to this discussion but would like to note that @Steve McIntyre’s unrelenting rudeness would make me reluctant to engage even if I were.

    “You are sneering without knowing the facts.”
    “Jeez, can’t any of you read.”
    “Please do not keep insisting that I am wrong about this or that I am making some elementary mistake about it – I am not. If you think that the above comments are in error, then you don’t understand them.”

  86. John Bills says:

    Take a good look at the graph (from 1880 -2012) that Tom Curis up thread provided.
    It gives a good representation of the state of the models.

  87. pbjamm,
    Well, I was kind of impressed that he came over here. Seems rare these days, so I decided to ignore that, for the moment at least 🙂

  88. BTW, as the CMPI{3,5} ensemble is an ensemble of opportunity, I suspect the models use the parameteri values that the modelling groups consider best for projecting future climate. In practice there will be some variation in the parameters that are also plausible, but this source of uncertainty is unlikely to be properly represented in the model runs (IIRC a “perturbed physics” experiment is required for that), which seems to me a good reason to think that the spread of the models underestimates the structural uncertainties, if not the plausible effects of internal climate variability.

  89. Willard says:

    Hu chimed in without sneering, which may mean he doesn’t know the facts:

    I’m afraid that I agree with Gavin here with regard to the sqrt(n) issue.

    https://climateaudit.org/2016/05/05/schmidts-histogram-diagram-doesnt-refute-christy/#comment-768973

    INTEGRITY ™ – Be Afraid

  90. Frank says:

    ATTP wrote: “Given the uncertainty in the observations and that we only have one realisation of reality, we’re – however – not yet in a position to reject any of the models.”

    We do have only “one realization of reality”, but that does NOT prevent us from concluding that models are inconsistent with reality. McIntyre is using a standard statistical method for determining if the difference in two means is “significant” – i.e. for rejecting the null hypothesis that the difference could be zero or less. A traditional and analogous problem is to measure the heights of groups of men and women and see if the results reject a null hypothesis the difference in the POPULATION mean for both men and women could include zero. To do so, we find a pdf for the difference in means and see what fraction of that pdf lies on the opposite side of zero from the mean. Or we use the standard error of the means to do the statistical equivalent. In this analogy, the men are the climate models (bigger) and the women are observations (smaller). We are trying to infer whether the mean warming of all possible climate models (as a group) is different from all possible realizations (as a group) of observed forced warming during the satellite era using REPRESENTATIVE samples of each,

    The availability of climate models makes it practical to repeat measuring the height of representative groups of men or the warming projected by AOGCMs roughly 100 hundred times. We can also contemplate doing so until the uncertainty in the mean of the men or the models is negligible. There is nothing fundamentally wrong with the idea that the uncertainty in a mean can be negligible. Systematic errors can cause our mean to be precise, but not accurate, but statistical analysis can only inform us about random error, not systematic error. Climate models certainly contain systematic errors. We are trying to use observations to determine how big those errors might be!

    In this analogy, we only have a “single realization” of the experiment measuring the heights of a representative group of women. And a single realization of warming during the satellite era. Nevertheless one can still determine the standard deviation of the difference of two means (σ_d) in both cases using the formula below. (Mcintyre used Monte Carlo techniques to create a pdf for the difference in means.)

    σ_d = sqrt( σ1^2 / n1 + σ2^2 / n2 )

    Continuing the analogy, we INFER the likely distribution of all heights of all women from the scatter in a single experiment measuring the heights of a representative group of women From the SCATTER in the temperature vs time observed during one realization of forced warming during the satellite era, we make inferences about other possible realizations of our climate. To use ATTP’s terminology, both represent only “one realization of reality”. If there were a constant radiative imbalance at the TOA and perfect measurement of temperature, all of the points would be expected to lie on a straight line. We have 120 data points per decade (reduced by autocorrelation) that tell us how unforced variability (chaos) has perturbed the expected relationship between time and temperature. The possibility that a strong El Nino or La Nina at the beginning or end of the measurement period has perturbed the warming rate is included in the confidence intervals for the slope and intercept. The 95% confidence interval for the observed warming rate is nearly 0.1 K/decade wide DUE to unforced variability (ENSO and other chaotic processes) – not measurement error! We have not ignored the possibility of “other realizations of reality”, we have properly accounted for this possibility in the confidence interval for the warming rate.

    Let’s ask the right question: Is our one realization of forced warming in the satellite era is REPRESENTATIVE of the population of all realizations? Were a typical distribution of El Nino and La Nina events experienced during this period? (Yes, as best we can tell.) Did changes in solar or volcanic activity bias the results? (No, but Pinatubo would been a problem at the beginning or end of the observations). Is there significant low-frequency unforced variability in our climate (say from the AMO) that hasn’t been sampled in the satellite era? (A great question, but AOGCM’s say no.) As long as we have properly sampled typical unforced variability during the satellite era, the confidence interval for the warming rate includes all of the uncertainty in the warming rate we would expect to see if we had more “realizations of reality”.

    If we could repeat measuring the heights of groups of women or experience additional realizations of the reality of forced warming, we would narrow – not broaden – the confidence interval for these means and increase our confidence that a significant difference exists.

  91. Frank wrote We do have only “one realization of reality”, but that does NOT prevent us from concluding that models are inconsistent with reality.

    Indeed, if the observations lay outside the spread of the models, taking into account all relevant uncertainties, then a model-observation inconsistency would have been demonstrated.

    McIntyre is using a standard statistical method for determining if the difference in two means is “significant” – i.e. for rejecting the null hypothesis that the difference could be zero or less.

    It is indeed a standard test for a significant difference of two means, but this is not a problem where consistency requires the two means to be plausibly the same. Indeed, the parallel Earths thought experiment shows we can’t reasonably expect the observations to lie any closer to the ensemble mean than a randomly selected run comprising the ensemble.

    A traditional and analogous problem is to measure the heights of groups of men and women and see if the results reject a null hypothesis the difference in the POPULATION mean for both men and women could include zero.

    This is not an analogous problem, an analogous problem would be to measure the heights of a population of men, and then measure the height of a human (we only have one observable reality, no a population) and see if the height of the human in question was consistent with what we know about the heights of men.

  92. Joshua says:

    I won’t clutter up the technical discussion further, but I did want to provide for some interesting background reading….

    Speaking of Hu, I just wanted to link a most fascinating thread which, IMO, speaks to Steve Mcyntire’s approach in these discussions:

    https://climateaudit.org/2009/08/14/steig-professes-ignorance/

    The comment with this date stamp, in particular, was quite interesting.

    Posted Aug 14, 2009 at 5:01 PM

    Given the comment at this date stamp:

    Posted Aug 17, 2009 at 9:00 AM

    I guess some audits require more diligence than others.

  93. Frank wrote “In this analogy, we only have a “single realization” of the experiment measuring the heights of a representative group of women. And a single realization of warming during the satellite era. Nevertheless one can still determine the standard deviation of the difference of two means (σ_d) in both cases using the formula below. “

    If you have a sample of one, its standard deviation is zero (using the uncorrected formulation) as the sample is identical to the mean, and infinite for the unbiased version (due to the division by N – 1).

  94. I guess the key difference is that for the observations the trend distribution comes from the mean trend for a single timeseries and the uncertainty in that trend. For the models the distribution comes from the distribution of mean trends for many time series. So, that suggests that they’re not strictly equivalent. I can see that the variability in the observations can be indicative of the range of possible trends, but it would seem that the distribution that you get from a single time series is unlikely to be a good representation of the distribution you would get if you had multiple realisations of the observational time series.

  95. Joshua… “Indeed. I was making a joke in a sort-of-Poe.”

    Sorry, I don’t frequent ATTP as much as I should. 🙂

  96. Willard says:

    RonG, on May 10th, 2016 at 4:58 pm, I suspect CDT:

    For those familiar with the most recent CA post on Gavin’s statistical mean oops ATTP seems to have raised McIntyre’s hackles enough to lure him over to a debate at ATTP’s. Very entertaining.

    http://rankexploits.com/musings/2016/human-caused-forcing-and-climate-sensitivity/#comment-147428

    On the May 10th, 2016 at 10:24 pm, a response to RonG’s observation:

    Ya,, maybe Lucia should head over.
    FWIW I think this problem is intractable.

    http://rankexploits.com/musings/2016/human-caused-forcing-and-climate-sensitivity/#comment-147437

    And then there was a comment on May 11, 2016 at 11:57 am, BST:

    Small world.

    Before that, on the May 11th, 2016 at 5:43 am CDT:

    I should add, in comments at his own blog, it appears Anders doesn’t understand how comparisons to the model mean are made.

    http://rankexploits.com/musings/2016/human-caused-forcing-and-climate-sensitivity/#comment-147451

    Putting “but Santer” in AT’s mouth seems to be a good way to start the day.

    Please note that there’s a six hours difference between CDT and BST.

  97. Willard,
    Yes, the ‘but Santer’ gambit has been rather irritating. Apparently my post has to be in the context of Santer et al. despite me not mentioning it at all and despite it being pretty clear that it was written to discuss Gavin’s claim about Douglass et al. (2008) – an entirely different paper.

  98. niclewis says:

    ATTP, the main part of my recent comment at CA about Gavin Schmidt’s calculation of uncertainty in the model mean that you referred to was this:

    “The confidence intervals for efficacies in the recent Marvel et al paper, of which he is second-named author (and Kate Marvel’s boss) appear to be calculated by dividing the square root of the sum of the squared differences in the five (six in one case) individual run values, by the square root of the number of runs of the (single) model involved, supposedly giving the standard error for each individual run. The correct divisor for this calculation is smaller by one, as estimating the mean uses one degree of freedom. The thus (mis)calculated standard error for each model run is then used as the standard error for the ensemble mean, instead of dividing it by the square root of the number of runs. That results in the confidence intervals being too high by a factor of two (more in one case)…”

    Maybe you will also seek to defend Gavin Schmidt’s calculations in this case.

  99. Nic,
    What has that got to do with what Gavin said in 2007 about Douglass et al.?

  100. Willard says:

    > Apparently my post has to be in the context of Santer et al […]

    FWIW, it’s the other way around: the Auditor looked at what Gavin said, and injected “but Santer” all by himself.

    You can clearly see that “but Santer” is the only context that matters when you see the Auditor’s title: Schmidt’s Histogram Doesn’t Refute Christy.

    INTEGRITY ™ – It’s All About Context

    Let it be noted that this audit has been issued on May 5, 2016 – 4:42 PM EDT, which is 1 hour earlier than CDT and 5 hours later than BST.

    ***

    I rather liked Lucia’s recent:

    I can only read what you seemed to claim.

    http://rankexploits.com/musings/2016/human-caused-forcing-and-climate-sensitivity/#comment-147514

    This seems to mean that once you seem to claim something, it seems you can’t get out of what seems to be parsomatics.

    ClimateBall ™ is oftentimes a seeming game.

  101. Parsomatics, that’s the word I was trying to remember.

  102. Willard says:

    Let’s hope that if you don’t respond further, AT, Lucia won’t interpret that as acquiescence.

    Billions upon billions of (Inhofe or nothing) burgers.

  103. Who knows. It seems that anything could happen.

  104. niclewis says:

    ATTP,
    They both seem to show the same misconception by Gavin as to what is the correct measure of uncertainty related to the model(s) when a (multi)model-ensemble mean is involved. The problem with Douglass et al, as I understand it, is that they did not correctly allow for observational uncertainty. But that is a quite different thing from the standard deviation of individual model results. Gavin confused the two issues in his 2007 article. Whether lumping all models together in an ensemble and taking its mean and standard deviation makes much sense is another question, but it is commonly done.

  105. Willard says:

    FWIW, “parsomatic” is Eli’s term introduced in a ClimateBall ™ episode about law blogging, a concept that may seem to be related to the unclean hands move the Auditor seems to be trying to pull at the moment.

  106. Nic,

    Gavin confused the two issues in his 2007 article.

    Where?

  107. niclewis wrote “The problem with Douglass et al, as I understand it, is that they did not correctly allow for observational uncertainty.”

    It depends what you mean by “observational uncertainty”. If you mean the uncertainty in estimating the trend in the observational data, then no, that isn’t the error that Douglass et al made. The error they made was in assuming that the observed trend should match the ensemble mean trend in order for the models to be consistent with the observations. This isn’t correct, as the parallel Earths thought experiment demonstrates. A consistency test that would be almost certain to reject a perfect model ensemble is probably not very useful.

    “Whether lumping all models together in an ensemble and taking its mean and standard deviation makes much sense is another question, …”

    It does make sense, as a way of trying to incorporate the structural uncertainties involved. The climatologists are aware of the statistical defficiencies, which is why they point out the CMIP5 is an “ensemble of opportunity” (implying that if they had unlimited computational resources it is not the ensemble they would design by choice). It is better than ignoring the structural uncertainties altogether.

  108. niclewis says:

    dikranmarsupial wrote:
    ‘It depends what you mean by “observational uncertainty”.’

    I was including uncertainty arising from internal variability.

  109. niclewis “I was including uncertainty arising from internal variability.”

    yes that is more reasonable. How do you determine the uncertainty arising from internal variability without using a physical climate model?

  110. Tom Curtis says:

    “This is not an analogous problem, an analogous problem would be to measure the heights of a population of men, and then measure the height of a human (we only have one observable reality, no a population) and see if the height of the human in question was consistent with what we know about the heights of men.”

    To flesh out this analogy a bit more, what we have is several estimates of the height of one person; where said estimates differ in method, value and uncertainty – but they are still estimates of the height of a single member of the population. That is, they represent (if we restrict ourselves to the estimates used by Christy) six estimates of the value of one member of the population, not a sample of six members of the population. That is a crucial distinction.

  111. Tom, yes that does indeed make the analogy more complete, good point!

  112. Joshua says:

    I was trying to not participate in the echo-chamber/group think/preach to the choir dynamic that’s going on…

    But this was just too funny to pass up.

    Lucia says to Anders:

    The beginning of your post links to Gavin Schmidt’s recent discussion of Christy’s presentation to Congress http://www.realclimate.org/ind…..-datasets/
    This recent post has nothing to do with Douglas nor Gavin Cawly or Gavin Schmidts criticism of Douglas 2007. Which is fine but a reader might develop the impression your post was about more than just what Gavin said about Douglas 2007

    In a comment where she quotes Anders extensively, yet somehow failed to include the following in her list of excerpts:

    This isn’t a surprise and isn’t what I was planning on discussing.

    Yes, indeed, a reader might develop the impression that Anders’ post was about more than what he said he wanted to discuss, and was actually also about something that he said he wasn’t planning on discussing.

    This reminds me a bit of when Lucia asserted that she was in a better position than I am to judge what I find interesting and/or important (she insisted that I think that quantifying the magnitude of the “consensus” is important and/or interesting)….presumably because a reader of what I wrote “might develop [a particular] impression”about what I find interesting or important.

    I have to give kudos for the sophisticated use of passive tenses and plausible deniability, however.

  113. angech says:

    dikranmarsupial says niclewis “I was including uncertainty arising from internal variability.”
    ” How do you determine the uncertainty arising from internal variability without using a physical climate model?”
    Well,you have the earth observations over time, they vary from year to year, you take the same day each year for your sample period , 15 years say, and you look at the variance in the figures to determine the uncertainty.
    No model, just the data.

    On a separate point raised raised on occasion of observational data being wrong [ATTP].
    Your observation is right the measurements may be biased high or low, but practically the models work off the biased data [or data bias] anyway so this problem more or less self excludes itself?

  114. It is distressing to find some of the usual unskeptical “skeptic” authorities taking advantage of aTTP’s hospitality to colonize and assert. Having been subjected to a Mac attack on the subject of the extension of the climate record (Marcott and Shakun), which was personal and nasty and a revelation as to how low things can get, it was made clear to me that in the case of the future vs. fossil forever this is a one-sided effort, without regard to what is true, but rather what they can get away with and which entities will host the plausible but dishonest.

    aTTP is much better equipped technically to deal with false quantities than I was. Nonetheless the time is past when it was OK to give credence to biased information and attacks. It’s a dedicated attack, and not to the benefit of the rest of usTM.

  115. Eli Rabett says:

    Josh, the term of art is IMplausible deniability. Plausibility merely gets in the way of the Tol’s, Lewis’ and [Mod: redacteds]

  116. Anders,

    I can see that the variability in the observations can be indicative of the range of possible trends, but it would seem that the distribution that you get from a single time series is unlikely to be a good representation of the distribution you would get if you had multiple realisations of the observational time series.

    I am perhaps more than unusually lost at sea in this discussion, however HadleyCRU make available a 100-member ensemble of GMST realizations on their download page for HADCRUT4. They caution: Additional quoted uncertainties are estimates of measurement and sampling uncertainty and of uncertainty in global and regional averages with incomplete regional coverage. These additional uncertainties are identical for all ensemble members.

    Still, it seems to me that using those multiple realizations it should be possible to construct a histogram similar to what Schmidt et al. did for the CMIP5 historical ensemble.

  117. brandonrgates – HADRUT provides multiple measurements of *one* observational time series. For ATTP’s “multiple realisations of the observational time series” we would need multiple planet earths.

    Tom Curtis explained it quite nicely in his comment above:

    what we have is several estimates of the height of one person; where said estimates differ in method, value and uncertainty – but they are still estimates of the height of a single member of the population.

    The HADCRUT ensemble are multiple measurements of the one ‘person’ (earth) we have data on. It provides more information on the uncertainties in measuring our one member, but it is not a substitute for having measurements of multiple earths.

  118. oneillsinwisconsin,

    Thanks, your comment helped me better understand Tom’s analogy. I guess the distinction is that the multi-model ensemble is functionally a representation of different planets because the models are sufficiently different from each other. Ideally then, we’d have ~100 realizations from the *same* model compare to the HADCRUT4 ensemble from which we might be able to compare the structural uncertainties of both the modelled and observational time series in more apples-to-apples fashion.

    We obviously don’t have that ideal, and what Gavin & Co. have done seems to accept that, for what they have indeed done is treat the ensemble of multiple model runs as a single model, and calculated the distribution of trends as a measure of structural uncertainty in the model ensemble. Thus it seems appropriate to compare that to a similar histogram constructed by using the HADCRUT4 ensemble. Or, as McI mentions in is post, the RSS ensemble.

  119. Tom Curtis says:

    brandongates, the models effectively representations of “different planets” (more accurately, distinct parallel Earths) but they differ not only because of slight variations in the encoded physics but also because of different initial conditions (which can alter the sequence and timing of ENSO events, and other climate variability) and also model specific flaws due to resolution issues.

    In an ideal world we would run several hundred runs of each model using different languages on different computers for significant sections of the runs; and then compare the model spread to observations. Doing so would allow us to cull models bases on poor encoded physics. In practice that would take several centuries to do with current resources, so we make do with one run per model and test the common elements of the models against observations.

    Of course, the ideal world described would run straight into the problem described by Dikran Marsupial above, which is why you need to test against the model spread, not the model mean.

  120. Tom,

    Thanks for your note, it sounds like my understanding of the issues here is more or less on the mark, particularly the limited resources to do it ideally … or “right”. Annoying that such resource constraints so often go missed in this discussion.

    In the past, I have done “simple” analyses based only on the model ensemble mean, such as this one using annual means, linear regression to scale model output to observation, and taking the standard deviations of the annual residuals to compute an error envelope:

    It’s (over)simplicity may be its undoing, but it’s easy for me to explain and defend — plus I think it tells relevant stories: assuming we follow the RCP6.0 emissions pathway and the ensemble prediction really is ~10% for that forcing scenario, we have approximately six additional years to avoid overshooting a two-degree target.

    Comments on whether this easy method has any merit would be appreciated.

  121. angech,

    Well,you have the earth observations over time, they vary from year to year, you take the same day each year for your sample period , 15 years say, and you look at the variance in the figures to determine the uncertainty.

    Except internal variability can influence both the mean trend and the variability about this trend.

  122. Angtech wrote “Well,you have the earth observations over time, they vary from year to year, you take the same day each year for your sample period , 15 years say, and you look at the variance in the figures to determine the uncertainty.”

    Fifteen years is not nearly enough, even to properly capture the effects of ENSO (a major source of internal variability) and you are still left with the problem, of deciding what part of the trend is due to the forcings and which internal variability.

  123. you are still left with the problem, of deciding what part of the trend is due to the forcings and which internal variability.

    Exactly, that’s why I think that the distribution of trends you get by considering the trend plus uncertainties in this trend is still not going to be representative of the distribution of all possible trends for the system over that time interval.

  124. Nic,
    Gavin has responded to your point here.

  125. angech says:

    brandonrgates says: May 12, 2016 at 3:29 am
    ” I have done “simple” analyses based only on the model ensemble mean, such as this one using annual means, linear regression to scale model output to observation, and taking the standard deviations of the annual residuals to compute an error envelope:”

    Eyeballing the graph with the envelope the minor issue is the fact that the range of the envelope seems to stay the same over 240 years.
    Not meaning to be a pain but the envelope should expand backwards and forwards as the degree of uncertainty rises in both directions from today.
    One from increasing possible errors going forwards and one from lessening observations going back.
    Otherwise well done.

  126. Dikran Marsupial says:

    angtech wrote “Not meaning to be a pain but the envelope should expand backwards and forwards as the degree of uncertainty rises in both directions from today”

    I don’t think that is really true. If the effects of internal climate variability are similar in magnitude over time, then there is no expansion in error bars from that. The uncertainty in the forced component would stem from the forcings, which are prescribed as part of the scenario, so there would be no expansion from that. The remainder is the effects of differences in the model feedbacks, which I suspect cannot give rise to ever increasing divergence due to phsyical constraints. I suspect much of the narrowing is due to the baselining, which (by definition) reduces the apparent variability during the baseline period.

    If the forcings remain the same, then there would (in principle) be no change in the envelope, no matter how long you leave the simulations to run.

  127. Dikran,
    But there is a range for climate sensitivity, though, which I think would increase the spread with time. [Edit: across all models, I mean. This wouldn’t apply to a single model.]

  128. Dikran Marsupial says:

    Oops, didn’t read that carefully enough, for regression based extrapolations of course there ought to be a broadening of the error bars. Appologies to angtech,

  129. Dikran Marsupial says:

    ATTP yes, I think that is what I meant by feedbacks being different, but I think I misunderstood what angtech was getting at anyay.

  130. angech says:

    dikranmarsupial says: May 12, 2016 at 7:40 am
    “Fifteen years is not nearly enough, you are still left with the problem, of deciding what part of the trend is due to the forcings and which internal variability”.
    True.
    I am not arguing about how reliable or accurate the determination is, just saying that there is a simple way to do it that is not a model.
    The degree of uncertainty can be much better defined with a longer observational run.

    “15 years is not nearly long enough”… is not an argument that most skeptics or warmists would want to listen to. I agree very strongly with you on this comment.

    ,

  131. angech says:

    Dikran Marsupial it’s OK. most of what I say should be shot down. Evening in Oz, East so computer time.

  132. Dikran Marsupial says:

    angtech “I am not arguing about how reliable or accurate the determination is, just saying that there is a simple way to do it that is not a model.”

    O.K., but that is essentially pedantry; the point I was making is that there is no way of doing this without the models that is not so questionable as to make the resulting test meaningless. Unlike the test based on the spread of the ensemble.

    “The degree of uncertainty can be much better defined with a longer observational run.”

    saying that doesn’t make it true, you are still left with the problem of separating the forced component from the internal variability, which we can’t do using statistics.

    ““15 years is not nearly long enough”… is not an argument that most skeptics or warmists would want to listen to.”

    Can you give some examples of “warmists” disagreeing with that point?

  133. angech says:

    No, I am happy with the construct and meaning of what I have said.
    I think under other circumstances you might find these statements acceptable rather than as attacks.
    I do not want to go further on the 15 years statement. I was surprised that you were as definite as you were on that point as the corollary is that model outcomes might not be expected to show AGW after just 15 years but they do.

  134. Dikran Marsupial says:

    Angech wrote “I was surprised that you were as definite as you were on that point as the corollary is that model outcomes might not be expected to show AGW after just 15 years but they do.

    The WMO suggest thirty years is a reasonable period for this kind of question:

    “Climate, sometimes understood as the “average weather,” is defined as the measurement of the mean and variability of relevant quantities of certain variables (such as temperature, precipitation or wind) over a period of time, ranging from months to thousands or millions of years.

    The classical period is 30 years, as defined by the World Meteorological Organization (WMO). Climate in a wider sense is the state, including a statistical description, of the climate system. “

    and this seems fairly well accepted amongst climatologists (ISTR the WMO guideleines have been mentioned in IPCC WG1 reports), so I don’t see why you should be surprised that I suggest that 15 years, is too short, especially as I had given a physical reason:

    “Fifteen years is not nearly enough, even to properly capture the effects of ENSO (a major source of internal variability) “

    Angech wrote “as the corollary is that model outcomes might not be expected to show AGW after just 15 years but they do.”

    You might expect to wait a long time flipping a coin and waiting for eight consecutive heads, but that doesn’t mean it won’t happen within (say) the first 100 flips.

  135. Willard says:

    Jeez, can’t any auditor read.

  136. The Very Reverend Jebediah Hypotenuse says:

    If The Auditor does not respond further on this issue, please do not interpret that as acquiescence.

    And if you think that the above comment is in error, then you don’t understand it.

  137. jamesannan says:

    Yes, it’s all a bit of a mess. It’s related to the confusion over whether the models can be considered “truth-centred”. I recall Santer et al’s mistake from a few years ago, which added to the whole Santer/Douglass clusterfuck, though I don’t think I commented on it at the time.

  138. Joshua says:

    So today’s an [unlucky] day for anyone who actually reads my comments…;as I have a philosophical musing that wants to get out. The rest of you will, of course, go on about your business as usual.

    Looking at the bickering over at Lucia’s I came across an idea – essentially, “Someone is wrong on the Internet.”

    Now I can’t come close to even parsing the statistical argument taking place, and indeed, perhaps if I had the skills necessary I would see some clear right/wrong balance in the dynamic….but from from my vantage point of trying to identify a rhetorical framework, it seems to me that both “sides” are right in what they’re arguing. Which also means that both “sides” are wrong.

    Given that from a logical framework if people are actually arguing about the same topic that would be impossible, my conclusion is that what’s going on is that the participants are making arguments that only partially overlap and focusing, somewhat pedantically, on trying to prove the other “sides'” error by conflating non-congruent arguments.

    I come from a family where devil’s advocacy has long been a primary mode of exchange about all variety of issues. Family members by “marriage” often just roll their eyes and leave the room once we get started, because they don’t completely understand how arguing about perspective is our way of exploring our own viewpoints (or just find it annoying/uncomfortable to watch the sausage getting made)..

    Anyway, IMO, the main problem here is lies in the basic organizing structure for the discussions, as well as a lack of commitment to good faith dialogue. With committed good faith dialogue, it seems to me, the first place to start is to built a shared understanding of the problem at hand, and a shared vocabulary and definitions of the terms being utilized. To help that along, a structured approach to the discussion, where first points of common view are identified and discussed, seems almost a requirement.

    What we see in these discussions is pretty much the inverse – where people start out with different problems in mind, and different vocabulary and definitions, with some inchoate desire to be proven “right” in their view by an \acknowledgement from the other side that they were wrong…which of course is impossible because even though the other side is/i> wrong, so are you, and you are also both right.

    It boils down to no one is ever right on the Internet because “Someone is [always] wrong on the Internet.”

  139. Dikran Marsupial says:

    jamesannan wrote “Yes, it’s all a bit of a mess. It’s related to the confusion over whether the models can be considered “truth-centred”. “

    and truth about what exactly? If it means the true forced response of the Earth’s climate system, it isn’t an unreasonable idea, if it means the true full response of the Earth’s climate system (including the forced and unforced components), then that is rather questionable.

  140. Willard says:

    > I can’t come close to even parsing the statistical argument taking place […]

    Was there a statistical argument? The only statistical argument I’ve seen has been ignored by everyone. Tying oneself in a logical knot (“but Santer,” “but in the context of Santer,” “I mean Santer is a counterexample to your general claim,” “no, not that one – this one is trivial, the other one I read from your words,” “I don’t care about your clarifications, you wrote what you wrote and that’s that,” etc.) is not logic.

    ***

    > To help that along, a structured approach to the discussion, where first points of common view are identified and discussed, seems almost a requirement.

    Start here:

    https://en.wikipedia.org/wiki/Universal_pragmatics

    Note that the word “faith” doesn’t appear in that Wiki entry.

  141. anoilman says:

    Willard… did you just mack up a word? Parsomatics? Would that make you the Shakespeare of Climate Change?
    http://neverendingaudit.tumblr.com/tagged/parsomatics

  142. Dikran Marsupial says:

    Joshua, I think in this case a large part of the problem is that not everybody is making the distinction between consistency and model skill/bias and hence there is a fair bit of talking past eachother.

  143. Willard says:

    As I said earlier, Oily One, the word is Eli’s.

    Let’s also bear in mind that the Auditor’s “after removing the sleight-of-hand and when read by someone familiar with statistical distributions” was first and foremost an excuse to out Dikran and try to divide and conquer using the good ol’ dirty hands.

  144. wheelism says:

    As was demonstrated in an earlier thread, There’s Something About JAQing with dirty hands that causes Tol hair.

  145. Joshua says:

    Thanks for that link, willard.

    I think that added to the mix is something that goes like this:

    “The easiest person to fool is yourself.”

    So then how do you know when you’re right?

    I see a whole lot of people absolutely convinced that they are “right.” I mean totally fucking convinced. No way that the’re wrong, and anyone who thinks that possible is either immoral, fooling themselves or just plain ignorant. Hmmmmmm.

    Seems to me that given that we all kinda know that the easiest person to fool is yourself, then the only way that we can know if we’re right is if we’ve convinced someone else…who started out in disagreement (or at least wasn’t predisposed to agree [because they might just be fooling themselves otherwise).

    And there’s this:

    Haidt approvingly quotes Phil Tetlock who argues that “conscious reasoning is carried out for the purpose of persuasion, rather than discovery.” Tetlock adds, Haidt notes, that we are also trying to persuade ourselves. “We want to believe the things we are saying to others,” Haidt writes. And he adds, “Our moral thinking is much more like a politician searching for votes than a scientist searching for truth.”

  146. Joshua says:

    Sven –

    Since you read the thread previously, maybe you’ll do so again. In that case…

    The total lack of self awareness is astonishing
    Too funny, to quote the classics 🙂

    I’ve been disinvited to participate at Lucia’s. Since you quoted my comment and editorialized,I’m wondering if you’d write an explanation here. In what way is the “lack of self-awareness astonishing.”

    Did you (wrongly) interpret my characterization to be describing only one group of participants?
    Do you doubt that I was trying not to participate in the echo-chamber/group think/preach to the choir dynamic, but found Lucia’s comment too amusing to pass up?

    Some other explanation?

  147. Willard says:

    > So then how do you know when you’re right?

    An hypothesis – when you correct a wrong:

    Emma Green: What is anger?

    Martha Nussbaum: A good place to begin is Aristotle’s famous definition. The basic ingredients of anger are:

    (1) You think you’ve been wronged,
    (2) The damage was wrongfully inflicted, and
    (3) It was serious damage to something you care about.

    Aristotle also thinks the damage is always a kind of insult—what he calls a down-ranking, or a kind of slighting that puts you lower in the scheme of things. I ended up saying that’s not always the case. However, I think that’s an important ingredient in a lot of anger that people have.

    The last thing—and this is the crucial one, I think: Aristotle, and every other philosopher known to me who writes about anger, says that part of anger itself is a desire for payback. It’s not really anger—it’s something else.

    http://www.theatlantic.com/politics/archive/2016/05/martha-nussbaum-anger/481464/

    Under that light, it’s more “somebody wronged me on the Internet” than “somebody’s wrong on the Internet.” Here’s BrandonG trying to explain just about the same thing in terms of reputation management:

    When the discussion becomes about how ignorant I am of physics relative to you, I see that as an indication that you’re more interested in managing your own reputations, sullying mine, and not actually interested in resolving our differences. I in turn infer that you don’t actually want to “do something” about AGW, because you don’t actually believe that there’s a problem to be solved.
    .
    Your most effective way to prove me wrong about that last bit is to NOT re-assert your intent, but to actually do it by discussing mutually acceptable solutions to the problem you claim exists, albeit not to the same “biased” extent that I do.

    http://rankexploits.com/musings/2016/ban-bing-is-too-long/#comment-146768

    BrandonG’s anger led to a spirited exchange, with more and more parsomatics of course, each comment by BrandonG attracting a series of response void of acknowledgement, with a side discussion about BrandonG having the discussion.

    A beautiful comedy of menace.

  148. anoilman says:

    Willard… that’s quite interesting. I’m actually motivated by the fact that a lot of folks of the denier stripe work so hard at undermining or ‘wronging’ others. It makes no sense to me. They just show up like lookie loos, and slam it all to hell with nothing to offer.

    I really cherish the opposition that doesn’t spout garbage or propaganda. One fellow who was very pro oil\fracking, was anti Keystone because he knew it would cause back ups in the US side of the system. (If keystone goes through, Americans in the mid west will pay more for gas.) When he was laid off in the downturn, I offered my genuine condolences. I dunno but I felt as though I was dealing with a genuine human. That kind of opposition doesn’t bother me.

  149. Willard says:

    > I’m actually motivated by the fact that a lot of folks of the denier stripe work so hard at undermining or ‘wronging’ others.

    They got [little] else, Oily One. Without a daily ClimateBall ™ episode for the auditing fans, the established viewpoint prevails. When you don’t have a good position, you try to stir things up, otherwise you lose by inertia.

    There’s at least one scientist who gets suckered in each day. If that doesn’t suffice, there’s always journalists, e.g.:

    Since the MSMs are powered by anger, it recurse furiously enough for audits to never end.

  150. Joshua says:

    Sven-

    My comment was not meant to characteriz either side as being disproportionately more of an echo-chamber.

    imo, both sides wrongly characterize their own side as more open-minded and less biased. imo, such characterizations typically fail a basic test of skeptical due diligence. Such findings of disproportionately greater negative attributes on the “other side” is easily predictable based on the tendency of humans towards motivated reasoning, confirmation bias, identity-protective cognition, etc., particularly in a context which is highly polarized and where people identify strongly with an in-group in opposition to an out-group. It is possible that thete is some objectively quantifiable disproportion, but I would need to see actual evidence to be convinced. Anecdotal speculation without any emperical support, particularly when offered by members of the outlier groups that are heavily engaged in hostile interactions, does not sway me my presumption of proportionality (we are all prone towards the same human foibles)

    If you have actual evidence to support your conclusions/speculations of disproportionality, I mean evidence where there is at least a cursory attempt to control for naturally biasing influences – I’d love to see it.

  151. Joshua says:

    Oh, and thanks for the reaponse.

  152. Joshua says:

    And one more point (unless I think of anotherj. It is precisely because I think there is a proportionality in the echo-chamber aspect that I was somewhat reluctant to particiapate in the openly partisan bickering.

    But I thought that the obviously lame rhetorical device employed by a smart person who is motivated provided an amusing example of my pet topic of how “motivation” and identity-protective and identity-defensive reflexes bias reasoning.

  153. Joshua, FWIW as far as the structure of discussion goes, I think something like Socratic method ought to work if it is a good faith discussion. The problem is that people often tend not to be willing to answer questions that define their position, which is a good way to “buil[d] a shared understanding of the problem at hand, and a shared vocabulary and definitions of the terms being utilized.”. The problem is that this often ends up being a “hostage to fortune” that may lead to them being proven wrong later. Of course if you are truth-seeking, you will welcome that (although perhaps not as much as being proven right ;o).

    BTW if anyone at Lucia’s wants to ask me any scientific or statistical questions about the Douglas/Santer/Schmidt tests, ask them here and I’ll do my best to answer them.

  154. James,
    Thanks for the comment. I’ve just come across an older post of yours that seems relevant (H/T Dikran).

  155. Angech,

    Eyeballing the graph with the envelope the minor issue is the fact that the range of the envelope seems to stay the same over 240 years.

    Your eyeballs are correct. The error bars for each model curve are the same across their entire respective intervals by design. They are, however, slightly different, with the scaled model ensemble slightly smaller, +/- 0.22 K vs +/- 0.23 K.

    Dikran,

    I don’t think that is really true. If the effects of internal climate variability are similar in magnitude over time, then there is no expansion in error bars from that.

    That’s the assumption I’m making with this method, such as it is. One place it breaks is that apparent variability increases in the observational time series the further back one goes in time, I reckon because interannual variability increases because of sensitivity to fluctuations in fewer observations.

    Anders,

    But there is a range for climate sensitivity, though, which I think would increase the spread with time. [Edit: across all models, I mean. This wouldn’t apply to a single model.]

    I agree. I’m implicitly treating the ensemble as a single model. I originally did this (at least a year ago) for simplicity. Over at Lucia’s, SteveF said something to the effect of: we can make the envelope as large as we want simply by increasing the spread of the ensemble members. So one possible method to my simpleton madness is that my error envelopes are completely insensitive to ensemble spread.

    Thanks all for the feedback. Please keep it coming if you happen to think I’m doing something egregiously incorrect or indefensible.

  156. Willard,

    A beautiful comedy of menace.

    I’m still a bit too pissed off to find the humour, and not really sure I want to walk it off by having a larf at it. Time may change that, however — fury is tiring.

  157. … which is not to say I’m put out that you referenced it. You know I don’t mind being quoted. 🙂

  158. Mal Adapted says:

    Semyorka:

    If the “models are running hot” why is the primary focus of these people not to produce better models then? Especially if its just a simple explanation, such as some cloud feedback. My experience with the creationists is that they are far more focused on raising minor issues to present as doubts and “falsifying” evolution than actually producing science. How would our friends feel that they are any different?

    That’s a key insight. Creationists and AGW-deniers like to claim they have science on their side. Yet their lack of interest in contributing to the scientific enterprise is revealing. Science is a way of trying not to fool yourself, and the pseudo-skeptics are trying desperately to fool themselves. They can hold existential terror at bay only by telling themselves the Universe exists for their benefit. They’ll never produce any worthwhile science, because that would require them to let go of their comforting illusions.

  159. Mal Adapted says:

    I feel compelled to add that I, OTOH, will never produce any worthwhile science because that would require me to work too hard. I’m more than happy to let someone else do it ;^)!

  160. Brandon,

    I’m still a bit too pissed off to find the humour

    I find myself in a similar frame of mind. I think I get lulled into a false sense of security only to be rudely awakened when you realise the discussion you thought you were having isn’t the one everyone else is having; they’re criticising all sorts of things you hadn’t even considered and – in some cases – still can’t work out.

  161. wheelism says:

    Mal: But how can you resist those sweet, sweet grants?!

  162. anoilman says:

    Willard… there’s a lot of blaming the victims going with Fort Mac. You have enviro’s claiming it was a just punishment. (no) Given Alberta’s efforts to deny global warming, its not a surprising reaction. There’s also way too much attribution of the fire to Global Warming, and El Nino.

  163. aom: “There’s also way too much attribution of the fire to Global Warming, and El Nino.

    A co-worker suggested the Ft McMurray fire was started by Syrian refugees. Not in jest. I’m guessing a potential Trump voter.

  164. Anders,

    In the post you’ve been active on at The Blackboard, SteveF leads off with:

    Some recent exchanges in comments on The Blackboard suggest some confusion on how a heat balance based empirical estimate of climate sensitivity is done, and how that generates a probability density function for Earth’s climate sensitivity.

    That misunderstanding was mine, and arguably so.

    SteveF (Comment #146401)
    April 23rd, 2016 at 9:07 am
    […]
    BTW, the ‘time to reach 2C’ is a very different question. Nothing much to do with an estimate of sensitivity.

    SteveF’s facepalm was warranted; I did completely goof on interpreting the calculation he was making wrt heat uptake and copped to it in my response.

    See #146414 from me attempting to direct the conversation back to my original point — that ECS/TCR do affect warming rate under a given forcing regime — #146415 from Lucia, and #146418 my response to her. Unsurprisingly, SteveF’s article wasn’t at all responsive to one of the central, salient issues to the topic up to that point.

    TL;DR: it’s a battle over how to frame the argument. You said it well: I give up.

  165. Joshua says:

    Sven –

    It’s certainly an odd situation where we have a convo where I post here and you post there.

    I don’t particularly agree with Anders’ moderation policy, but I don’t think that we can generalize about much from it’s existence, and certainly not something important, IMO, outside of the distorted view from a bubble of fanatics leaving comments on blogs. Perhaps it would be important in evaluating whether there is some disproportionality of openness on the different sides of the “climate-o-sphere,” but a lot of bloggers have various types of moderation policies so finding some kind of pattern, I think, would be unlikely. At least I haven’t found any generalizations about disproportionality that I think are valid. People on both sides complain and are abso-fucking-lutely convinced that they and their tribe-mates are getting the short end of the stick.

    But reactions to moderation make one of the best examples, IMO, of the identity-aggressive and identity-defensive (identity-protective cognition) patter what I was talking about. People get outraged about “censorship.” That seems pretty silly to me as a concept (is your free speech really being infringed if someone disinvites you to comment at their blog?) and I think that the wails about “censorship” are an example of how partisans, via motivated reasoning, exploit a real world problem to handwring about what is, for the most part, self-victimhood.

    IMO, it is quite possible for people to engage here in disagreement with Anders if they think strategically about how to engage in good faith. In a sense, it might require self-censorship, but only in that you’d have to think about how to conduct yourself in a way that demonstrates good faith. I have been moderated a few blogs, and is has been because I made the choice to not engage in good faith (because I felt that people weren’t engaging in good faith with me – I always try to engage in good faith and will continue to try to do so if I feel that the effort is mutual. My reason for not bothering with good faith exchange with people who I think are not exchanging in good faith with me is that it is not possible to engage in good faith, IMO, with people who don’t make a reciprocal effort). If I make such a choice, then I am content to deal with the consequences. It doesn’t mean that I won’t point it out if I think that the exercise of moderation criteria is arbitrary (in the sense of being biased)… but them’s the rules of the game. We all accept those rules, implicitly, when we sign on to comment at someone’s blog.

    I don’t know if you have a history at this blog, and I have no idea why your comments aren’t passing moderation. But I am confident that it isn’t merely because you have opinions that are different than Anders’.

  166. Joshua says:

    willard –

    I’m a little confused about the connection your’re making in your 6:35 to my comment. I wasn’t sure what this meant:

    ==> An hypothesis – when you correct a wrong.

    As for:

    ==> it’s more “somebody wronged me on the Internet.

    Yeah…I think of that relating to the overriding self of victimhood, mostly self-inflicted, IMO, that runs throughout public discourse on polarized issues….which in turn, IMO, boils down to identity-aggression and identity-defense.

    I go back and forth between being disturbed by the “menace” aspect and thinking that it isn’t really menace, but just insecurity being manifest. Doesn’t insecurity lie at the root of all this narcissism?

  167. Joshua says:

    Oh, and as for knowing when you’re right if you are the easiest one for you to fool….

    Obviously, the scientific method is the gold standard for determining when you’re right…although also obviously, it’s not a foolproof way to avoid fooling yourself (pun intended).

  168. Joshua says:

    Dikran –

    I agree about Socratic dialogue as a good frame for good-faith exchange. But I think that when there is a pre-existing polarized context, a more deliberative, proactive, and tightly organized structure can work where a Socratic dialogue might fall apart. Stakeholder engagement and participatory democracy offer some interesting models.

  169. Joshua says:

    Sven –

    ==> No, Joshua, I do not have any history on aTTP that I know of.

    Hmmm. Interesting.

  170. Joshua,

    Doesn’t insecurity lie at the root of all this narcissism?

    I can only speak for myself, and the honest truth is yes, that’s a factor. We don’t need to know the root cause to observe the symptoms, or the result: when the conversation becomes about the conversation, there is no longer a conversation happening.

  171. Willard says:

    A blast from the past, Joshua. Perhaps this counts as an history, or that Sven knows what he writes.

    You are right about the “odd situation” – it only attracts Sven’s peddling, and there’s very little else than sameol’ sameol,’ good faith, identity stuff, motivated reasoning, etc.

  172. Selective amnesia is the best sort, donchaknow.

  173. Willard says:

    > I’m a little confused about the connection your’re making in your 6:35 to my comment.

    Correcting a wrong may indicate that you’re right. That’s an impression it gives to the audience. Basic rhetoric, the framework of every Disney movie. Vindication also boosts the ego, but we shouldn’t care less about public psychologizing. We all have our pride and our prejudices. Discussing otters’ is nobody’s business.

    I would also argue against romanticizing the Socratic elenchus, as it is asymmetric and ultimately self-defeating. Reasoning by epagogue has limits, and besides Plato’s own theory of Ideas, there’s very little it can’t dissolve. Trolling people into thinking may also get you moderated away from websites.

    I’m not sure how disagreeing with about everything you repeat over and over again can be interpreted as an echo chamber phenomenon, Joshua, but here you go.

  174. Joshua says:

    No, willard, you disagreeing with everything I repeat over and over would not be part of an echo chamber phenomenon. Did I suggest it would be?

  175. Ken Fabian says:

    I might be missing much (very likely ) and showing my ignorance here but is this (really? seriously?) about an expectation that the average of an increasing number of climate models should converge with real world conditions (at ever shorter ie <decade time scales) to be considered 'good enough'? If so I would have to say that appears to be wrong simply on the basis of known internal variability – which precludes any such convergence of an average with a single (real world) realisation over any period insufficient to be averaged to 'smooth' the natural variability.

    If the models work reasonably well then the ones that put the variables that resisted prediction year to year (but tend to average out to near net zero on multidecade scales – like ENSO) in the right places with close to actual values ought to most closely align with the real world; whilst the average of all such models ought to lie near the middle of that range of variability a single realisation (such as the real world) will only do so when the internal factors combine correctly to put it at the middle of that range.

  176. Joshua says:

    Sven –

    Like I said, I wouldn’t want to generalize from poorly crafted sampling…but it would be rather ironic if you were banned because the host thought you were being “rude.”

  177. Steven Mosher says:

    “Steven Mosher asked “At any time did you think Santer 2008 had an error in it?”

    Yes, I did. I recall having my concerns addressed, but can’t remember the details; having re-read the paper my concerns have re-appeared once more. The test in Santer et al. (If I understand it correctly) is much better than that in Douglass et al., but I don’t think it is correct as a test of consistency (as I suspect it would fail a perfect model ensemble). The test used by Schmidt in the RC articles (subject to the caveats mentioned in Victor’s excellent blog article) is the test I would use.

    BTW, you don’t need to remind me of what I wrote, I am generally happy to answer scientific questions (although being only human, I may fail in that sometimes).”

    Cool. Thanks for the answer. At some point I’m hoping someone will compare and contrast the various approaches.– probably best with the names removed!!! since bringing up the names just innaugrates climateball.

    So.. Approach A, Approach B, Approach C..

    I’ll read what Victor has to say

  178. Willard says:

    > Did I suggest it would be?

    Of course you did not, Joshua. It was my way to address Sven’s concerns regarding AT’s, which I believe he tried to peddle in his parallel exchange with you. See how he tried to special plead with his “but AT moderation” right after Lucia recalled you were disinvited (perhaps it should be “uninvited,” unless you were formally invited in the first place) because, reasons.

    I think it’s easier to hear echoes from places with which we are less familiar. It takes a while to hear the different voices and viewpoints from the blog contributors. What passes as an echo chamber might simply be pure piling on. Fionn Regan has a thing to say about games with minorities facing a faceless majority:

  179. Joshua says:

    ==> What passes as an echo chamber might simply be pure piling on.

    Yes, piling on may a more accurate description much of the time – perhaps somewhat similar to in some ways, and easily confused for, but distinctly different from an echo chamber effect.

    ==> I think it’s easier to hear echoes from places with which we are less familiar.

    For someone from the “other” camp, basically the only thing to see is the different identity orientation. Any other distinctions pale in comparison, and so everyone “over there” is basically identical (and thus echoing each other). Certainly, it is similar to “They all look alike,” or “You don’t look Jewish!”

  180. Ken,
    Based on James’s post it seems to be the case that some argue that the mean of the models should match reality (truth-centred). James’s post explains quite well why this is almost certainly not a reasonable assumption.

  181. Joshua,
    I haven’t been following the other side of your conversation, and I had completely forgotten about Sven’s past comment here.

  182. Dikran Marsupial says:

    Steven Mosher wrote “Cool. Thanks for the answer.”

    no problem.

  183. Dikran Marsupial says:

    Willard, agreed. I said “something like Socratic method” (or something like that) rather than Socratic method itself, because I want to borrow the bits that ought to work, for instance asking questions that discover your interlocutor’s true position, providing sound answers to them to make your own position clear, exploring the consequences of that position, working towards contradictions etc. There is no reason it should be asymmetric though as far as I can see (except perhaps if you want a clear story to write down later). At the end of the day though, the only way that will happen is if you are at least willing to answer questions yourself and be willing to explicitly state when you are wrong and where you agree with your “opponent”. This seems to me a requirement of rational discussion (although we are all only human).

  184. Eli Rabett says:

    The Socratic method psss the sht out of a whole lot of people who think they are being patronized. Look how well it worked for Socrates

  185. Dikran Marsupial says:

    Eli, my main point is that we should ask questions rather than assume we know the others position and that we should give direct answers to questions, to make our position clear, whether they lead to a contradiction or not. In a symmetric version where both are asking and answering questions, there is no reason to feel patronized by it. I agree the form where only one persons position is questioned is not the best approach. At the end of the day, if someone asks you a relevant question and you don’t want to answer it, then perhaps you need to ask yourself why and what actually motivates your participation in the discussion.

  186. Dikran Marsupial says:

    Joshua, yes, something like that, but perhaps rather less formal, particularly:

    “It is a cooperative search for a universal truth, which will be discovered, if at all, by the group”

    which surely should be what science is about? Questioning is not necessarily confrontational, and is capable of being used in a co-operative manner. If I am not sure I understand someones position, and have (say) two plausible interpretation, then it seems to me a very sensible approach to resolving the ambiguity to formulate a question that most sharply distinguishes between the two interpretations and ask it. That way I am actively helping my “opponent” to explain their position to me (or to show that they are right etc.). In which case, the best thing to do is to answer the question as directly as possible (and perhaps then add some caveats that might be relevant or give a third interpretation etc.). Equally, every time I ask a question, I run the “risk” of my opponent having a good answer that I can’t just walk away from.

    Now of course, sometimes (sadly rather too often) I will draw attention to the fact that someone is not answering my questions, but then again I can’t do that without them actually being evasive, in which case what else can I do? It doesn’t help a search for the truth if evasion goes unchallenged. AFAICS, all that can be done is to show that a cooperative search for the truth is not possible because at least one side of the discussion doesn’t want it.

  187. Willard says:

    Here’s one asymmetry, Joshua:

    There are three levels (or orders) of discourse in a Socratic dialogue: first, the discourse of the dialogue itself; second, strategic discourse about the direction or shape of the dialogue as it unfolds; third, meta-discourse about the rules governing the dialogue. The facilitator plays no contributory role in the actual first-order discourse; he simply transcribes the proceedings at each stage, according to the prescribed structure (see next section). The facilitator plays a minimal role in second-order strategic discourse; but he may (if asked) offer some suggestions about viable strategies. The facilitator does play a role in third-order meta-dialogue. A meta-dialogue may be requested at any time, by any group member who seeks clarification about a rule or any other matter governing the dialogue as a whole. The facilitator is responsible for answering meta-dialogical questions. The facilitator may also initiate a meta-dialogue at any time, if in his judgment some procedural point requires clarification. Thus the facilitator of a Socratic dialogue is like the conductor of an orchestra: he has no explicit voice in the score, but has a meta-voice in conducting the performance.

    In ClimateBall ™, the facilitators are those who create editorial lines. They are also the subject of most editorials. This is why ClimateBall ™ is first and foremost a credibility race. Sven’s still at it this morning, BTW, and his hypotheses still conspire to amuse him.

    In ClimateBall ™, there is such a thing as playing home and playing visitor. This affects the goals and the balls being played. It would be hard to play Socratic in our actual episode – one group of protagonists channel their inner Chewbaccas to convey that AT “makes no sense.” As the author of the quote describes, it’s mostly strategic.

    There are lots of merits to more formal approaches for sure. However, we need, if only for sanity’s sake, to assume that this is Sparta – we’re into eristic for the most part. Perhaps the most expedient would be to abide by Postel’s law:

    Be liberal in what you accept, and conservative in what you send

    https://en.wikipedia.org/wiki/Jon_Postel#Postel.27s_law

    Even this law carries an asymmetry. At least it should, in principle, lead to more robust communications. Not robust in the sense of the movie which originated the “this is Sparta” meme – the messenger scene may not represent Postel’s law.

  188. Joshua says:

    willard –

    Yes, I noted that the presence of a facilitator creates an asymmetry of sorts…I actually meant to write something like “…and doesn’t seem too terribly asymmetrical…” but forgot.

    I would be for such a process that didn’t employ a facilitator, but where participants agreed on the framework for discussion and accepted responsibility for the facilitator’s role. Of course, there would be a trade-off between the benefit of flattening the discourse and the (perhaps) cost of diminished quality of facilitation if no one person is assigned the role.

    I actually like a system used in some couple’s therapy.. Discussants follow a very simple process:

    1) First person makes a statement (e.g., “If A and B happen, C is most likely to be the result”)
    2) Second person restates and confirms that the restatement captures first person’s meaning, (“So as I understand it, you think is that C is a likely occurrence, is that right?”)
    3) First person clarifies, if necessary: (Well, not exactly. “I only think that C is a likely occurrence if A and B happen. If A and B don’t happen, I think that C is not likely to happen at all.”)
    4) 2nd person tries to correctly state the clarified position (“OK, so C is most likely to happen if A and B happen, but otherwise you wouldn’t expect C”)
    5) First person acknowledges correct statement of her position
    6) Second person asks if there’s anything else to add.
    7) First person may add something here (e.g., “But I want to make it clear that I think that would be the most likely sequence of events should A and B happen. I am not at all certain that A and B will happen, or that C would definitely occur if they did..”).
    8) Recycle back to step 2 as necessary.

    In other words, the goal would be to avoid reaching the stage where one of the participants says something like this:

    “All that’s happened is you and several other people have repeatedly taken parts of my comments out of context and claimed they proved I was conflating my opinion with fact. In reality, the parts of my comments you ignored gave the justification for my claims in which I explain why they are true, not just my opinion..”

    Consensus-based engagement seems to be an impossible goal if there isn’t a shared interest in reaching consensus. My own personal belief is that you haven’t fully argued a thesis if you haven’t faithfully reproduced and accounted for (at least obvious) counter-arguments.. But of course, we will never see a consensus process in the climate wars, IMO, because there is a moral struggle taking place. IMO, the proximal goal is usually to confirm a belief that the other side is immoral, ignorant, blah, blah, blah.

  189. Joshua says:

    Ah…forgot the link (for when my comment clears moderation): http://www.hi-izuru.org/wp_blog/2016/04/a-critical-comment/#comment-9956

    I would also add for step 8….that the 2nd person might say “So I get that you think that C is likely to happen should A and B occur, but what I really want to know is whether or not you think that C is likely to happen. That is the more interesting question, in my opinion.”

  190. Dikran Marsupial says:

    Joshua “My own personal belief is that you haven’t fully argued a thesis if you haven’t faithfully reproduced and accounted for (at least obvious) counter-arguments.”

    Indeed, I would welcome a discussion of my parallel Earth ensemble thought experiment, it’s interesting that my actual argument seems to be attracting so little discussion. ;o)

  191. Dikran,
    Based on what I’ve read at Lucia’s place, that’s very simply because Ben Santer said you’re wrong. I still have to work out when a statistician criticising a climate scientist illustrates that climate scientists don’t understand statistics, and when it doesn’t. It seems somewhat random.

  192. Dikran Marsupial says:

    Indeed ;o) It is an interesting question how the “truth-centered” interpretation of the ensemble persists even though the “exchangeable” interpretation is clearly right (my parallel Earth ensemble is only one argument that demonstrates that, and I am not claiming it is the best, just the one that makes sense to me). It is curious that my position on the Douglass et al. test is of so much interest when my reasoning apparently is not (too arid perhaps ;o).

  193. markbofill says:

    Based on what I’ve read at Lucia’s place, that’s very simply because Ben Santer said you’re wrong. I still have to work out when a statistician criticising a climate scientist illustrates that climate scientists don’t understand statistics, and when it doesn’t.

    Right, well, if this refers to my remarks, I’ve apologized and withdrawn them. Certainly, if anyone is under the impression that I think anybody has to just take Ben Santer’s word for it, I’d like to disabuse them of that notion.
    This said, I am not an expert in statistics and can’t intelligently engage on the issue. I’m told that there is one, but. I’m not going to be the guy to talk about it.
    Thanks.

  194. Joshua says:

    It’s interesting (to me at least) how difficult it is for (at least some) people to say something like….

    “This is how I interpret what you were saying there, […] am I right that’s what you were trying to say?” And then go on to calmly explain why they are in disagreement.

    Instead, it seems that (some) people are much more interesting in saying something like…. “Are you insane? The only way that someone could think that is if they’re crazy. Oh, I suppose it could also be that you’re stupid. Or maybe it’s just that you’re trying to pull a fast one on me. Or perhaps it’s just that you’re engaging in propaganda, and trying to fool people? Or maybe it’s not that you’;re really stupid, but you just really do believe that, in which case it’s quite obvious that you don’t have any idea what you’re talking about – you are absolutely wrong. There simply is no way that anyone who isn’t stupid and who knows that they’re talking about could have said that unless there is some devious motivation.

    …at least in these blogospheric discussions. I assume that most people who act that way online don’t act that way in real world interpersonal interactions, at least for the most part.

  195. Mark,
    Actually, that wasn’t based on your comments. It seems many have claimed that Dikran was shown to be wrong by Santer. If you want to read a pretty clear explanation for why the Douglass and – probably – the Santer method are wrong, you could read James’s post.

  196. Joshua,
    It seems that Lucia is completely incapable – or unwilling? – to say anything like this

    “This is how I interpret what you were saying there, […] am I right that’s what you were trying to say?”

    I think I’ve had two exchanges with Lucia, both of which have ended up with her claiming that I said something very specific and that that very specific thing was wrong, and me trying to explain that that isn’t what I said. Even agreeing that her interpretation was wrong had no effect. To be fair, Lucia isn’t alone in behaving like this, but – to me – this is a style that I have no interest in having to deal with – I actually think there is no way to deal with it, given that they can always choose the least charitable interpretation of what you’ve said and then stick with that.

  197. markbofill says:

    Joshua,
    .
    I recently found myself in just that interesting circumstance and produced just that unfortunate response. For my part, a certain amount of surprise and or shock upon discovering something I didn’t know but felt like I ought to have been told contributed to an entirely visceral impression of bad faith. What’s interesting (to me) is that looking back on it, Anders has a point. That nobody mentioned that Dikran is Cawley or that Cawley discussed these issues with Santer doesn’t actually constitute bad faith. This said, I think readers might have appreciated knowing, but.
    ~shrug~
    FWIW.

  198. markbofill says:

    thank you Anders.

  199. Dikran Marsupial says:

    Well, I am happy to discuss it here with anyone that is interested. This paper on “Good Practice Guidance Paper on Assessing and Combining Multi Model Climate Projections” from an IPCC expert meeting may be of interest. On page 4, it discusses two statistical frameworks for quantifying ensemble uncertainty, namely the “truth-centred” and “exchangeable” frameworks. The Santer et al. test (as I understand it) would be reasonable under the “truth-centred” framework, but invalid under the “exchangeble” framework. Sadly the “truth-centred” framework is untenable, as demonstrated by the “parallel Earth ensemble” thought experiment (see also James Annan’s blog article, discussed above, unlike me he a real expert on this topic). The “truth-centered” view also seems rather inconsistent with IPCC AR5 WG1 report Box 11.1 (page 959), which says:

    “… In this sense climate system variables such as annual mean temperatures (as in Box 11.1, Figure 1 for instance) may be characterized as a combination of externally forced and internally generated components with T(t) = Tf(t) + Ti(t). …

    In practice, and in Box 11.1, Figure 1, the forced component of the temperature variation is estimated by averaging over the different simulations of T(t) with Tf(t) the component that survives ensemble averaging (the red curve) while Ti(t) averages to near zero for a
    large enough ensemble. The spread among individual ensemble members (from these or pre-industrial simulations) and their behaviour with time provides some information on the statistics of the internally generated variability. “

    Note if averaging preserves Tf(t) and attenuates Ti(t), then the ensemble mean only represents the forced component of the observed T(t), but not the internally generated component Ti(t), which rather refutes the “truth-centered” framework as the ensemble mean is only an estimate of Tf(t) and not T(t) itself.

  200. Dikran Marsupial says:

    Forgot to add the link to the IPCC report (not that it is difficult to find).

  201. markbofill says:

    TY Dikran. I’ll certainly read it, although it might be past my pay grade, I appreciate it.

  202. Dikran Marsupial says:

    Mark wrote “That nobody mentioned that Dikran is Cawley”

    FWIW I gave a pretty unsubtle hint the first time it seemed to matter. The Dikran Marsupial pseudonym has not been a secret for a long time, I revealed who was behind it after I wrote the comment on Essenhigh’s paper. However I think that those who wish to discuss things on blogs anonymously should be able to do so, and exposing their pseudonyms is not really the done thing. It generally is not a good sign in a scientific discussion if the topic of who is making the argument is the focus rather than the scientific merits of their argument.

  203. Dikran Marsupial says:

    Mark – Reading the first IPCC report is a fairly good way of getting into the science; as the science has progressed the IPCC reports seems to me to have become rather more terse and less accessible for the non-climatologist. Most of the basics have changed very little, so there is still lots to be gained from the FAR (IMHO).

  204. markbofill says:

    Dikran,

    Yes. I agree that at the end of the day the fact that you are Cawley in RL isn’t an issue. I also noticed and acknowledge that it is both an ‘open secret’, in that you identify yourself openly on the SkS site on the author’s page, and that you did hint at it here. It’s part of what I thought was interesting and what I thought might interest Joshua, that mine was not a reasoned response. I’m not honestly sure how to categorize it, but. It seemed to have bearing on his question, so I thought I’d mention it.

    Thanks.

  205. Steven Mosher says:

    “I think it’s easier to hear echoes from places with which we are less familiar. It takes a while to hear the different voices and viewpoints from the blog contributors. What passes as an echo chamber might simply be pure piling on. ”

    https://www.industrydocumentslibrary.ucsf.edu/tobacco/docs/#id=mgxn0061

  206. Dikran Marsupial says:

    Perhaps blogs should auto-generate a new pseudonym for each of the participants for each discussion (perhaps the identities/fixed pseudonyms revealed when the discussion is closed) so that we focus on the arguments rather than the personalities? ;o)

  207. Willard says:

    > That nobody mentioned that Dikran is Cawley […]

    You could also shrug over the Auditor’s missed opportunity right at the beginning, MarkB:

    Schmidt’s December 2007 argument caused some confusion in October 2008 when Santer et al 2008 was released, on which thus far undiscussed Climategate emails shed interesting light. Gavin Cawley, commenting at Lucia’s and Climate Audit in October 2008 as “beaker” […]

    https://climateaudit.org/2016/05/05/schmidts-histogram-diagram-doesnt-refute-christy/

    Fascinatingly, no one from the auditing crowd made any accusation of sockpuppetry.

    I wonder why.

  208. Joshua says:

    Anders –

    FWIW,

    https://hiizuru.wordpress.com/2014/07/23/interesting-perspective-on-the-consensus-debate/#comment-3206

    and

    https://hiizuru.wordpress.com/2014/07/23/interesting-perspective-on-the-consensus-debate/#comment-3220

    tl;dr (probably not with the time spent reading)…Lucia explains how she knows what I find interesting better than I know what I find interest.

    It was also interesting to read her description of why I was dis un-invited from her blog. Apparently she doesn’t need to actually consider what I say about my intent when she determines what my intent is…

  209. Dikran Marsupial says:

    Joshua, likewise I know a t-test when I see one (although that doesn’t mean a t-test is the appropriate test of model-observation consistency). This is why we should ask questions, and answer them, so we all know what each other thinks. ;o)

  210. Joshua says:

    Mark –

    FWIW,

    Your criticism of Anders wasn’t what stood out for me, so much as your apology for doing so. I understand that there were reasons behind why you drew your conclusions. What was interesting, IMO, was that even though those reasons were not resolved, you still showed accountability for how you extended your perceptions beyond what they could really sustain.

    I wouldn’t characterize your response as not being “reasoned,” exactly.

  211. Dikran,
    It does seem common to mistake someone disagreeing with something, with them not understanding what that something is. I’ve encountered someone who once claimed to know infinitely more about computational fluid dynamics than I do. Given that I’ve taught it and published numerous papers based on it, it would be surprising if I knew nothing, so either they really know an infinite amount about it (which would be very impressive), or they don’t understand the concept of infinity.

    Joshua,
    I couldn’t find where you were un-invited from Lucia’s blog.

  212. Joshua says:

    Anders –

    ==> I couldn’t find where you were un-invited from Lucia’s blog.

    That wasn’t in the links I gave. I was uninvited from her blog on another occasion when I engaged with her at her blog. I’m not really interested in getting back into it…as it was essentially just a repeat, IMO, of the same sort of pattern as what I describe here:

    https://hiizuru.wordpress.com/2014/07/23/interesting-perspective-on-the-consensus-debate/#comment-3252

    But it was an interesting experience in that shortly after that I watched a dust-up that took place here with Danny Thomas and to be honest, saw him being treated in a way that was very similar to how I was treated at Lucia’s…. where there was a presumption of bad faith* which made any good faith exchange impossible.

    Also rather interesting was that was hardly the first protracted exchange that Lucia and I have had over a couple of years (with the same pattern displaying)…yet, she didn’t remember having any exchanges with me at a later time…and then after a protracted exchange where she univinvited me, she later couldn’t quite remember the name of that person she uninvited.

    Imagine how I feel to be so unremarkable. 🙂

    * I should point out that my definition of good- or bad-faith is probably a bit idiocyncratic. To me, you can think someone is an asshole or even think that they are deviously motivated but still think that they are exchanging in good faith. For me, “good faith,” in this context means that you are taking someone’s argument seriously, and assuming that if someone says something, it’s what they mean and if you didn’t understand what they said it could be because they didn’t word it clearly or it could be because you weren’t really thinking through what they said. Good faith in these discussions means to me that you really are trying to have a discussion, to understand what someone is trying to say, to get them to understand what you’re saying and that you are committed to that proximal goal.

  213. Joshua,

    But it was an interesting experience in that shortly after that I watched a dust-up that took place here with Danny Thomas and to be honest, saw him being treated in a way that was very similar to how I was treated at Lucia’s…. where there was a presumption of bad faith* which made any good faith exchange impossible.

    I don’t specifically remember that, but I do remember someone of that name commenting. There are certainly parallels and I don’t some kind of special ability to avoid the things I find annoying elsewhere. Just do the best I can, which isn’t always very good.

  214. Joshua says:

    Anders –

    ==> It does seem common to mistake someone disagreeing with something, with them not understanding what that something is.

    I see that constantly. I see smart people who (from what I can tell) are clearly very knowledgeable being told that no one who is smart and knowledgeable could hold the opinions they hold.

    I remember once when it happened here with Tisdale. Now I get that people think that Tisdale is obviously wrong about fundamental issues related to climate change (‘m not in a position to judge), but I read his posts and as near as I can tell he does have a lot of knowledge and he is intelligent. Of course, not being knowledgeable for very bright myself, I could just be getting fooled by bells and whistles (nice graphs and gish gallops of endless information), but I kinda doubt it. As poor as my skills are, I usually have the ability to parse what people are saying enough to suss out whether there is any logic to the flow of idea. I don’t know whether he makes fundamental mistakes in his reasoning, but I’m reasonably certain that if he does, it is neither because he’s dumb or uninformed.

    Anyway, this is an incredibly common phenomenon. Smart and knowledgeable people telling other smart and knowledgeable people that they aren’t smart and/or knowledgeable. When I read that, I think to myself, “Now why would someone who is smart and knowledgeable say something that is so obviously false, so much so that even I, as someone not nearly as smart of knowledgeable could see the error as plain as day?”

    I think it’s something about the nature of this community. It tends to be a collection of smart and knowledgeable people, and with that there is an underlying character of narcissism. Everyone wants to establish that they are more smarter and more knowledgeable.

  215. Dikran Marsupial says:

    Mark wrote “or that Cawley discussed these issues with Santer” I think it should be a reasonable expectation that someones private email correspondence should be exactly that. I actually don’t really want to see anybodies email correspondence, whether the contents are relevant to the discussion or not; what matters is the validity of the argument that is presented in public.

    Mark seems like a “good faith” sort of chap to me, I’m only mentioning this to clarify what I think are reasonable expectations.

  216. Joshua,
    I largely agree that there are lots of smart people who engage in this topic. What still surprises me, though, is to encounter essentially anonymous blog commenters who will argue that they have some skill set that allows them to judge someone else’s work, who then don’t seem to recognise that many others have the same skill set. In many cases, their credentials are completely unverifiable, while that of those they’re criticising often have public profiles where one can actually check their credentials.

  217. Joshua says:

    ==> Mark seems like a “good faith” sort of chap to me, I’m only mentioning this to clarify what I think are reasonable expectations.

    That has always been my experience so far, even though he has called me a Poopy-head. (I kid you not.) But even though he seem to kind of think I’m pretty much an asshole (based on his impression that I was inhumanly unfair to Judith), he has always sought to interrogate what I’ve said with an intent to understand what I’ve said and to offer explain his reaction for my consideration.

    What’s interesting is that I have always found Lucia to be quite different in her approach – and Mark is quite convinced that Lucia engages in good faith as a matter of course.

    I haven’t quite resolved that contrast (and Mark, I know that you don’t want to discuss that issue and I respect that…)

  218. Dikran Marsupial says:

    I think it is worth pointing out that bright and/or knowledgeable people have blind spots just like everybody else (and are just as blind to their existence), such that when they have an incorrect mental model of something it can be very difficult to shake (“God does not play dice”, cosmological constant, big bang discussion with Lemaitre [caveat: my understanding of these examples might not be correct]). Having said which, thinking you are more knowledgeable than someone else isn’t a good way of avoiding your blind-spots (or at least of allowing others to help you out of them).

  219. Mal Adapted says:

    Wheelism:

    Mal: But how can you resist those sweet, sweet grants?!

    Easy. If I got the grant, I’d have to do the work!

  220. Joshua says:

    Anders-

    ==> In many cases, their credentials are completely unverifiable, while that of those they’re criticising often have public profiles where one can actually check their credentials.

    I’ve been pretty amazed at the number of people who have a vast range of knowledge, It is way beyond what I expected to encounter, and I think that along with narcissism, there is an overriding character of iconoclastism (is that a word)…maybe they kind of go hand-in-hand… Part of what drives people to acquire so much knowledge is an internal drive to tear down institutions, so as to establish their narcissistic identity.

  221. Joshua says:

    Unbelievable. Why don’t I just give up on using html tags? [Mod: fixed :-)]

  222. Dikran Marsupial says:

    I would hope the pleasure of finding things out is a more common motivation. I suspect if you look at a contentious topic then you will be selecting for contrarian tendencies.

  223. markbofill says:

    Joshua, well sometimes you are a poopyhead. You’re not always nice. That said, I think you demonstrate both integrity and honesty in general. I’d be proud to call you a friend given the impression I get from you online, although I don’t know you like that and for all I know you might be the worst sort of diabolical scum in RL. I hope that doesn’t come across the wrong way.
    Dikran, TY.
    Alright I don’t do kumbyama and all the sweetness is getting cloying, so all you no good dirty gosh darn warmists can just go bleep bleep and bleep the bleeping bleep bleep already. And don’t bleep the bleep if you think we deniers are going to bleep bleep bleep the bleep…

  224. Joshua says:

    Mark –

    ==> That said, I think you demonstrate both integrity and honesty in general.

    I gotta show that to my “wife.” I’ve been waiting for years to show her someone who agrees with me about that.

    ==> and for all I know you might be the worst sort of diabolical scum in RL.

    Although I won’t show her that part.

    ==> I hope that doesn’t come across the wrong way.

    Not at all. I think it’s pretty funny that people think that they can judge other people’s character from blog comments. I tend to think it’s important to know someone in real life to judge their character…like how they treat their family, neighbors…

  225. > What’s interesting is that I have always found Lucia to be quite different in her approach […]

    Eli’s short and sour:

    Define something your opponent has said, awkwardly enough that you can say anything you want about it. It’s not what is said, but figuring out where the twist is that tells the story. In this case, the take away is from what is not said and how it is not said.

    http://rabett.blogspot.com/2011/04/exposing-herself-to-art.html

  226. Joshua says:

    ==> Define something your opponent has said, awkwardly enough that you can say anything you want about it.

    Here we go…

    https://hiizuru.wordpress.com/2014/07/23/interesting-perspective-on-the-consensus-debate/#comment-3252

    such a target rich environment, that thread…

    Lucia says:

    ==> You are the one who introduced the notion that other people are “dramaqueen”.

    After she said:

    ==> just as you claim you are amused by these “drama queens”, I am laughing hilariously at the degree to which you are being a huge drama queen.

    After what I actually said when I first discussed drama queening:

    ==> The bickering and drama queening about Cook et al. is also information,…

    Suppose I had said, rather that people are drama queening, that people were being dramatic? How, then, would my words have been twisted to tell a story?

  227. Steven Mosher says:

    ” Part of what drives people to acquire so much knowledge is an internal drive to tear down institutions, so as to establish their narcissistic identity.”

    too funny.

  228. Steven Mosher says:

    Best Line of the train wreck

    ‘When you actually know something about a subject (like, it appears, education), what you write is not bad… and reading what you write is not painful.”

    http://rankexploits.com/musings/2015/climate-communication-a-more-likely-scenarios/

  229. Roger Jones says:

    I agree with James Annan on models not being truth centred. I cannot understand philosophically, how many climate scientists can sustain the view that they are – it makes them sound like (conventional) economists. One of the biggest problems with the whole detection and attribution caper is that both the model baseline and the model anomaly are considered to be truth centred. The get out of jail card justifying this is to use anomalies to remove the model bias from both estimates.

    This may have some justification for temperature means in restricted circumstances but it does not work for variability, particularly for anything hydrodynamic. There is nothing wrong with analysing these things but to assume direct statistical transfer between those estimates and observations is highly questionable.

  230. Dikran Marsupial,

    If I am not sure I understand someones position, and have (say) two plausible interpretation, then it seems to me a very sensible approach to resolving the ambiguity to formulate a question that most sharply distinguishes between the two interpretations and ask it.

    I have been trying the this is how I’m reading you approach:

    [Lucia] (In fact, it likely if only through people reasoning that if a model does thus and so, then we’ll pick aerosols of thus and such which results in better agreement. Nothing wrong with that, but it means that hindcasts aren’t good tests of skill and should be eliminated or minimized when testing or adjusting projections.

    I just don’t know what to say to this. The only way I can think of to get a truly representative test outside of the training period is to wait for a sufficiently long period of time to do a reasonable skill test against the projection. At that point, another elephant which has been lurking in the shadowy corner of this room rears its ugly head: a projection is also only as good as its assumed future forcings. As well, wait-and-see also completely defeats the purpose of doing the forward-looking projection to begin with.

    Please tell me that defeating the purpose of model projections is not your actual aim here, because from where I’m sitting, that looks exactly like what you’re attempting to do.

    There’s some backstory spanning two prior threads. Anyway, the response (Comment #147717):

    Defeating the purpose of model projections is not my aim. It seems you are inferring that by my pointing out that your method of tweaking them seems unrealiable. Yet, you also tell me you know it’s flawed. It’s hard for me to know what to say other than: your method seems to be an unrealiable method of tweaking. Saying something that suggests you assume those who observe this are trying to defeat the purpose of model projections suggests you aren’t willing to believe that, perhaps, your method of tweaking doesn’t tweak in a way that is likely to result in reasonable results.

    I have a song for this dance:

  231. Dikran Marsupial,

    Well, I am happy to discuss it here with anyone that is interested. This paper on “Good Practice Guidance Paper on Assessing and Combining Multi Model Climate Projections” from an IPCC expert meeting may be of interest.

    Missed that one in the fray, it is interesting, reading it now. Thanks.

  232. Frank says:

    dikranmarsupial wrote: “It is indeed a standard test for a significant difference of two means, but this is not a problem where consistency requires the two means to be plausibly the same. Indeed, the parallel Earths thought experiment shows we can’t reasonably expect the observations to lie any closer to the ensemble mean than a randomly selected run comprising the ensemble.”

    This is an appealing argument, but completely false when you examine it closely. If you use the DIFFERENCE BETWEEN TWO MEANS FORMULA to determine the statistical significance of the difference between one model run and the multi-model mean, you will usually not find a significant difference. The tricky part is determining the standard deviation for that single model run. By analogy with observations, you need to do a linear regression on that model run to obtain the uncertainty (corrected for auto-correlation) in its trend. That is what has been done calculate the uncertainty in the observed trend. Or you can take the sample standard deviation for all model runs to represent the uncertainty in one model run. Try that. Embarrassing, isn’t it! A single model run with uncertainty typical of all of the model runs shows a significant difference (p less than 5%) from the multi-model mean about 5% of the time. If you do the stats correctly, everything works properly.

    dikranmarsupial wrote: “but this is not a problem where consistency requires the two means to be plausibly the same”.

    As I wrote above, the key question is whether the confidence interval for the trend in observed warming properly reflects the uncertainty associated with “parallel Earths” or “multiple realizations”. Our climate could have taken many different paths in the satellite era due to unforced (internal) variability associated with chaotic fluid flow. Our climate DID take a circuitous path from 1979 to present due to that unforced variability, especially El Ninos and La Ninas. The width of the confidence interval for the observed warming trend reflects this chaotic behavior, not measurement noise. We have sampled, not ignored, the uncertainty associated with “parallel Earths” via the confidence interval for the observed trend! IF we have a representative sample of the population of all possible parallel Earths and a representative sample of the populations of all possible model output, the difference between their means is significant. Let’s argue about the “if’s”.

  233. Frank,
    You could dial down the condescension.

    We have sampled, not ignored, the uncertainty associated with “parallel Earths” via the confidence interval for the observed trend!

    You’re assuming that the trend plus uncertainty in the trend somehow represents the distribution of all possible trends. This is not the case; internal variability can produce variability about the mean trend, and it can influence the mean trend itself. Therefore the distribution of trends for a single realisation is unlikely to be the same as the distribution of all possible trends (from parallel Earths).

  234. Frank says:

    dikranmarsupial says: “we only have one observable reality, not a population”.

    We have one realization of climate change from 1979 to present, but that consists of 120 months/decade of forced warming and the unforced variability that creates the possibility of “parallel Earths”. We have sampled those alternatives by experiencing a typical assortment of El Ninos and La Ninas.

    We also have the ability to look at the confidence interval for the trend from individual model runs and see how well it represents the trend from parallel runs from the same model.

  235. Frank,

    We have sampled those alternatives by experiencing a typical assortment of El Ninos and La Ninas.

    No, we have not. We have sampled a single realisation of internal variability. You cannot claim that we experienced was somehow a good representation of the mean of what we could have experienced. This is the key point; we don’t know that the mean of the observations is somehow likely fall close to the mean of all possible realisations.

  236. Ron Graf says:

    Its an interesting thought experiment that if we had no information of surface temp until a year ago and then took a 99.9% accurate daily sampling we would have a perfect realization of one year but have little confidence in terms of what the represented for 100 years in terms of forced and unforced randomness. How would we create and test a model? What would be the validation criteria for the next year? If we had a 99.9% accurate record for 100 years couldn’t we ask the same question about creating a model projection for another 100 years? At what scale does the amplitude of randomness begin to diminish?

    BrandonRGates:

    …a projection is also only as good as its assumed future forcings. As well, wait-and-see also completely defeats the purpose of doing the forward-looking projection to begin with.

    What are the stated goals of the models? How can the models be validated as having skill beyond that of the human investigators constructing the models? (not rhetorical)

  237. wheelism says:

    “Nothing is possible, because math.”

    – Xeno’s lament.

  238. wheelism says:

    (Er…Zeno. Xeno’s stranger still.)

  239. wheelism says:

    Since Ron “Six ?” Graf and I are JAQing non-rhetorical on a Sunday night:

    If six was nine and we could see
    Jimi Hendrix, would he still be
    The screaming Stratocaster symphony that drew ya?
    Or would he burn it all down,
    Throw himself on the blaze
    And, consumed, reassume in a camouflaged haze,
    Granted freedom and peace and whispering, “Hallelujah?”

    (For Vinny. Vielleicht.)

  240. Ron Graf,

    [fuller context restored] The only way I can think of to get a truly representative test outside of the training period is to wait for a sufficiently long period of time to do a reasonable skill test against the projection. At that point, another elephant which has been lurking in the shadowy corner of this room rears its ugly head: a projection is also only as good as its assumed future forcings. As well, wait-and-see also completely defeats the purpose of doing the forward-looking projection to begin with.

    What are the stated goals of the models? How can the models be validated as having skill beyond that of the human investigators constructing the models? (not rhetorical)

    So far as I know, there is no legal protection here as at Lucia’s against asking rhetorical questions. JAQing off is frowned upon by some, but I don’t typically mind it because practitioners of same make it quite clear that they’re not to be taken seriously.

    That all said, you can find some of the stated goals for “teh modulz” within the publications of the IPCC. AR5 Chapter 9 is devoted to strategies for evaluating their performance, and the results of some of those kind of analyses.

    Happy reading. If it becomes too onerous a task for you to digest all of it, I offer one typically unstated (because it’s obvious) goal of climate modelling: it’s better than throwing darts at a wall whilst blindfolded.

  241. Frank says:

    ATTP wrote: “You’re assuming that the trend plus uncertainty in the trend somehow represents the distribution of all possible trends.”

    What causes the wide 95% confidence interval (almost 0.1 K/decade) in the observed trend? Is this measurement error or is it mostly unforced (internal) variability – the chaotic behavior responsible for the likelihood that a second, third or fourth observed trend from a similar change in forcing will produce different results.

    Imagine how much narrower that confidence interval would be if there were no unforced variability! Wouldn’t you agree that unforced variability is THE major contributor to the width of the confidence interval for the trend? (Think about the effect of just removing the 97/8 El Nino on the confidence interval.)

    When you perform a linear regression, the confidence interval for the slope and intercept take into account the possibility that the deviations from linearity (which are supposed to be randomly distributed) could be distributed differently. The confidence interval accounts for the possibility that a strong El Nino or El Ninos or La Ninas could have occurred by chance at the beginning or end of the record, where they perturb the slope the most.

    To use the formula for the standard deviation of the difference in two means, you need representative samples of the populations about which you want to draw inferences. (inferential statistics). One population is all possible model runs. We have about 100 of those, leaving little uncertainty associated with the model mean. The other population is all possible realizations of climate change since 1979 from the forcing we experienced. Those realizations will differ only because of unforced variability (and. to a much smaller extent, error in measuring temperature). Since 1979, we have a typical sample of the unforced variability of associated with ENSO (and other phenomena that change every few years or less, such as the QBO or MJO). What other types of unforced variability could make other realizations significantly different from the one we experienced? What do climate models tell us about the existence of such phenomena?

    You can’t just say ANYTHING could have happened. We experience forced climate change (anthropogenic or natural – solar, volcanic, orbital) and unforced (or internal) variability arising from chaotic fluid flow. You can – if you want – say that unforced variability is significant on time scales longer than ENSO. However, that contradicts climate models and supports skeptics who want to blame 20th-century warming on unforced variability.

    If you are willing to concede: 1) that unforced variability causes most of width of the confidence interval associated with the observed warming trend and 2) that unforced variability is the only way to produce other realizations of observation, then I’ll be happy to discuss whether the satellite record contains a representative sample of the unforced variability that produces other realizations. That is an interesting question. Otherwise, we will just be talking past each other.

  242. Frank,
    Yes, unforced variability is probably what causes most of the width of the confidence interval associated with the observed warming. However, the key point is that the mean trend (or best-fit) is not necessarily the same as the forced trend. If it was, we could probably have eliminated some models already. It is very likely that on decadal timescales, unforced variability influences both the width of the confidence interval in the trend AND the mean trend. Therefore a test that essentially compares the mean observed trend with the mean model trend (even if it also includes the uncertainty in the observed trend) is not necessarily a test that can really tell you if models are running too hot, or not.

  243. Frank says:

    ATTP wrote: “You could dial down the condescension.”

    You are right. I should. What tone do you and your following use when you expose an alleged “mistake” by a skeptic, say Steve McIntyre in this post.

  244. Roger Jones says:

    Forced variability is the main cause of the departures from no change on short time scales. Over the long term, this integrates into a complex trend. Externally forced climate change and internally mediated variability are not independent of each other. Most warming is occurring as part of regime changes associated with decadal variability. Most regime changes combine with ENSO events. It’s a complex, hydrothermal system. Radiative forcing traps the added heat energy but it does not distribute it.

    This isn’t mainstream thinking but it’s important information – we are probably in another shift at the moment and global temperature is likely to be about 0.3 C warmer than the 1998-2014 average over the next few years, if not warmer. In some regions, this will approach 0.6 to 1 C warming.

  245. Frank,
    I’m not sure I get the point of your question and I don’t think you found a mistake. The problem with being condescending is that it’s hard to bring things back if it turns out your condescension was unwarranted. In case you didn’t know it, Dikran Marsupial is Gavin Cawley.

  246. Roger Jones says:

    short time scales – a few decades – sorry (earth science thinking)

  247. Roger,

    Externally forced climate change and internally mediated variability are not independent of each other.

    Exactly.

    This isn’t mainstream thinking but it’s important information

    Yes, I was very surprised that many regard models as truth-centred, in the sense that even on decadal timescales we’d expect the observations to lie close to the model mean.

    we are probably in another shift at the moment and global temperature is likely to be about 0.3 C warmer than the 1998-2014 average over the next few years, if not warmer.

    I’m personally reluctant to make any strong predictions myself, but it certainly looks as though we’re undergoing a shift. We seem to be experience consecutive monthly records.

  248. Roger Jones says:

    ATTP, I have about six years work across a number of papers about ready for submission. We put two in late last year and they were slammed, so we have beefed them up and are nearly ready to go again. Trouble is they are individually and collectively too long for conventional publishing because of the detail the arguments require.

    The latest estimates for how much it might warm were done over the weekend and are little more than an educated guess, but as the previous ones have been pretty close, I’m happy to be held to it.

  249. Dikran Marsupial says:

    Frank wrote ” If you use the DIFFERENCE BETWEEN TWO MEANS FORMULA to determine the statistical significance of the difference between one model run and the multi-model mean, you will usually not find a significant difference.”

    This isn’t surprising. We know a-priori that there will be a difference between the individual model run and the ensemble mean (as one has a substantial unforced component and the other does not). So if we don’t get a statistically significant result that just means the period over which the trend is computed is too short to provide enough evidence to tell us what we already know. A failure to reject the null hypothesis does not imply the null hypothesis is true.

    However, you are missing the fundamental point which is that we know a-priori that the ensemble mean is not directly a prediction of the observed trend, and so testing for zero difference is not an appropriate test for consistency (which is a test of falsification rather than skill).

    “The width of the confidence interval for the observed warming trend reflects this chaotic behavior”

    It can only reflect the particular realisation of the chaotic behaviour that was seen in the observations, not what the chaotic behaviour could have been. These are not the same quantities and you can’t use them interchangably.

    “We have sampled, not ignored, the uncertainty associated with “parallel Earths” via the confidence interval for the observed trend! “

    Yes, but this is a sample of one, which doesn’t give much of a view of the distribution.

    “We have one realization of climate change from 1979 to present, but that consists of 120 months/decade of forced warming and the unforced variability that creates the possibility of “parallel Earths”. We have sampled those alternatives by experiencing a typical assortment of El Ninos and La Ninas. “

    The data are rather autocorrellated, so while we may have 120 months per decade of data, that doesn’t mean we have “120 months per decade” of information, this is especially true for slower moving sources of variability, such as ENSO, and slower ocean circulations than that. As to whether we have experienced a typical assortment of El Ninos and La Ninas, how can you tell what is “typical” from a single sample? We can’t.

    “We also have the ability to look at the confidence interval for the trend from individual model runs and see how well it represents the trend from parallel runs from the same model.”

    Yes, indeed the best way of working out the likely consequences of internal variability is just that (rather than trying to estimate it from the observations, which are a single realisation of a chaotic process). If you do that, you end up with the “are the observations in the spread of the ensemble” test (or variations on it). As the ensemble doesn’t reflect all of the sources of uncertainty, the spread of the model runs is likely to be narrower than it should be, and the basic test seems reasonable to me.

    ” Is this measurement error or is it mostly unforced (internal) variability – the chaotic behavior responsible for the likelihood that a second, third or fourth observed trend from a similar change in forcing will produce different results. “

    How can you answer this question without knowing the true value of the forced component of the trend; the measured trend may (or may not) be an unbiased estimate of the forced component, but that doesn’t mean it is an accurate estimate of the forced component for this particular interval of this particular realisation (error is comprised of bias and variance)

    “Wouldn’t you agree that unforced variability is THE major contributor to the width of the confidence interval for the trend? “

    This is irrelevant, if you want a valid test of consistency, then it should be based on something that a perfect model might reasonably be expected to do.

    The confidence interval accounts for the possibility that a strong El Nino or El Ninos or La Ninas could have occurred by chance at the beginning or end of the record, where they perturb the slope the most.

    This is not the case, the statistical model used has no conception of physical processes like ENSO, just a statistical assumption. Consider there are also sources of internal variability such as PDO with much longer timescales.

    “To use the formula for the standard deviation of the difference in two means, “

    Again, this is a pointless test as we know a-priori that there will be a difference, so what does the test tell us that we don’t already know?

    “You can’t just say ANYTHING could have happened.”

    Nobody is doing so. The spread of the ensmeble is a prediction of what plausibly could have happened. If the observations lie outside that, then something happened that the model physics suggests is impossible and hence the models are inconsistent with the observations. That is what testing for consistency is about.

  250. Dikran Marsupial says:

    Roger wrote “Externally forced climate change and internally mediated variability are not independent of each other.”

    This is an important point, and IIRC is a caveat given in the section of the IPCC AR5 WG1 report that I mentioned earlier. The “are the observations within the spread of the ensemble” is able to deal with this complication correctly as far as I can see.

  251. Dikran Marsupial says:

    Brandon wrote “JAQing off is frowned upon by some, but I don’t typically mind it because practitioners of same make it quite clear that they’re not to be taken seriously.”

    I think the key issue is how they deal withe the answers to the questions they are just asking. All to often they do nothing with the answers (not even acknowledge them), which is indeed an indication that they are not engaging in a rational discussion.

  252. Willard says:

    > What tone do you and your following […]

    Holy tu quoque, Frank!

    All wrapped up in a rhetorical question!

    No more playing the ref, please.

  253. Ron Graf says:

    BrandonRGates says: “Happy reading. If it becomes too onerous a task for you to digest all of it, I offer one typically unstated (because it’s obvious) goal of climate modelling: it’s better than throwing darts at a wall whilst blindfolded.”

    Firstly, if a simple logical question can’t be answered (without pointing to the bible and saying because they said so somewhere in there) it’s a sure sign one’s assumptions may be a result of under-the-dome (consensus) thinking, not critical thinking. Secondly, there are many things that are worse than throwing darts at a board, including attaching undue significance to outputs. Why would the authorities create a surrogate? Why mediums use tea leaves and crystal balls? I am not saying there is intentional deceit but as Feynman famously said, “The first principle is that you must not fool yourself – and you are the easiest person…”

    The utility of any model or tool is to provide skill not capable of the operator before the model. That skill must be proven to validate the model. The test must be clear, agreed upon in advance, and prove skill beyond 95% confidence. Otherwise you are journeying to the Temple of Apollo to summon the Oracle of Delphi when you could just ask your wife.

  254. Ron,
    I don’t really follow what you’re trying to suggest, but here are some thoughts about climate modelling.

    1. Climate models at a fundamental level of based on the laws of physics. There are parametrizations, but these are either constrained by the laws of physics or by some understanding of what range of parameters are allowed by the laws of physics.

    2. We cannot really validate full climate models. We have only one system (the planet Earth) that is evolving a single realisation. We can test their skill, and most indications (despite what you will read on some sites) indicate that climate models are skillful.

    3. Climate models are really being used to try and understand what will happen if we introduce changes, such as continuing to emit CO2 into the atmosphere. We can consider various different pathways, and indeed do.

    4. Climate models are not perfect, but they are better than nothing, are actually pretty impressive, and are getting better all the time.

    5. This is science, not engineering. We’re not trying to design a new climate, we’re trying to understand the one in which we live and what might happen under various future scenarios.

    6. If all we wanted to know was climate sensitivity, climate models are not needed.

  255. Dikran Marsupial says:

    “Firstly, if a simple logical question can’t be answered (without pointing to the bible and saying because they said so somewhere in there) it’s a sure sign one’s assumptions may be a result of under-the-dome (consensus) thinking, not critical thinking. ”

    No, that is not true, and basically an ad-hominem. Some questions may look simple, but have simple (but not completely accurate) answers but required rather more to answer fully and the purpose of the IPCC WG1 is precisely to summarise the mainstream position on important issues, so a reference to the IPCC is a reasonable answer. Using religious metaphors is not a good response, compared to reading the chapter and offering a criticism of what it actually says.

  256. Ron Graf,

    Firstly, if a simple logical question can’t be answered (without pointing to the bible and saying because they said so somewhere in there) it’s a sure sign one’s assumptions may be a result of under-the-dome (consensus) thinking, not critical thinking.

    I gave you a simple an logical answer already: I offer one typically unstated (because it’s obvious) goal of climate modelling: it’s better than throwing darts at a wall whilst blindfolded.

    That you apparently don’t understand that trivially simple logic indicates that Holy Writ would be unfathomable to you if it sat on your face and wiggled.

  257. Dikran Marsupial,

    However, you are missing the fundamental point which is that we know a-priori that the ensemble mean is not directly a prediction of the observed trend, and so testing for zero difference is not an appropriate test for consistency (which is a test of falsification rather than skill).

    I am somewhat grudgingly (and perhaps belatedly) coming to embrace the notion that the ensemble mean does not tell us what many of we lay folk intuitively think it means. I’m going to take some liberty to potentially JAQoff:

    1) Why is it shown in IPCC reports in the first place?

    2) What good is useful for, if anything?

    3) Is there a Cliff Notes version on how either skill tests and/or falsification tests are done with such an ensemble?

    If I sound grumpy, that’s because I am. Please understand it’s not you, it’s me … call it a hard day at the office.

  258. Dikran Marsupial says:

    Brandon asks “1) Why is it shown in IPCC reports in the first place? 2) What good is useful for, if anything?”

    It is an estimate of the forced component of the change in climate (with the caveat that the forced and unforced components are not completely independent), i.e. the component that is the result of the change in radiative forcing and the resulting feedbacks. I’d say this is useful if you want to know the results of changes in emissions of GHGs (as we have no control over the chaotic components, such as ENSO). On a centennial scale, the unforced component is likely to average out to zero as well (at least the short/medium term components), so on a centennial scale it is a useful estimate of the observed climate.

    “3) Is there a Cliff Notes version on how either skill tests and/or falsification tests are done with such an ensemble?”

    No that I know of. The Douglass et al paper shows that there ought to have been one!

  259. Dikran Marsupial,

    I think the key issue is how they deal withe the answers to the questions they are just asking. All to often they do nothing with the answers (not even acknowledge them), which is indeed an indication that they are not engaging in a rational discussion.

    That does tend to be a clincher. The number of unacknowledged answers in this thread alone is astounding. Just bounce bounce bounce onto the next question already answered a thousand times since last week. You and I both know that it’s a setup; when those rude warmists resort to telling us to read The Bible … we’re WINNERS!

    Not that I’m bitter about it or anything.

  260. PS, thanks for your more substantive response above, Dikran. I made it halfway through Douglass before I could no longer ignore that I was in over my head. Again. Still, I have it bookmarked and will chip away at it. Some papers I first read four years ago are almost making sense to me, so there may be some hope.

  261. Dikran Marsupial says:

    Its also rather interesting how few people want to ask me any questions about my comments on the statistical tests, even though my invitation appears to have been relayed to some of the blogs concerned on my behalf. ;o)

  262. > 50% of them claim to be banned 97% of the time.

  263. Eli Rabett says:

    Dikran is in a different form repeating the point about GCMs that, in spite of challenges, nobunny has ever produced one from which, like from the foam of the sea, a climate sensitivity of < 1.5 (certainly 1.5. That leaves the Nic Lewis’ of the world to jack around with statistical models of crappy data, and with noisy enough data you can’t prove crap, or to be more exact, you can prove any crap you want.

    Moreover, the misplaced emphasis on a single parameter from a complex model is an exercise in self abuse.

  264. wheelism says:

    (Eli with the hat-trick!)

  265. wheelism says:

    (I’m discounting the reference to M. Marsupial as in flagrante delicto, and “Rabett pulls the hat-trick” was the correct response.)

  266. Pingback: Model tuning | …and Then There's Physics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s