Consensus on consensus

Richard Tol’s consensus paper has finally been published. Richard’s paper can probably be described as taking all possible sets of numbers from all the various other consensus studies, and plotting graphs showing various possible levels of consensus, but without paying any attention to the details of the other studies, or who/what was included in the different surveys. For example, a survey of those who largely dispute the mainstream view, unsurprisingly returns a low level of consensus.

Richard’s paper also asks a number of questions that he seems to feel need to be answered, such as

Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts. A later query returned 13 458, only 27 of which were added after Cook ran his query (Tol 2014a).

Not only is this simply not true (their data show that there were 12876 abstract identifiers, not 12876 abstracts) but it has been explained to Richard many, many times that this was because of duplicates in the database that were then removed, without the identifiers then being re-ordered so as to be sequential (which – if done – would seem to rather defeat the object of using a database).

Credit : Cook et al. (2016)

Credit : Cook et al. (2016)

I don’t actually really need to go through the details, because there is also a response that includes almost all who’ve published consensus studies, and – at the very end – yours truly. Not only does this paper respond to Richard’s various questions (I’m sure that will be the end of it……) it also makes the key point that if you drill down into the details of the various consensus studies, then – as the figure on the right shows – the level of consensus increases as the relevant expertise increases. In other words

low estimates of consensus arise from samples that include non-experts such as scientists (or non-scientists) who are not actively publishing climate research, while samples of experts are consistent in showing overwhelming consensus.

I don’t think I need to say much more. I think it’s quite fascinating to see almost everyone who has published on a particular topic responding to a criticism in this way. Richard Tol, of course, is apparently still not happy, saying:

Unfortunately, Environmental Research Letters does not believe in open discussion and forced me to hide the rather severe methodological critique on Cook’s 2013 paper behind a superficial literature review.

……This is normally sufficient for a retraction: the data behind Cook 2013 are not what Cook 2013 claim they are.

Ahh, shame, they actually insisted on reviewing his paper, and exerted some editorial control. What a terrible thing to do. Maybe next time Richard should stamp his little feet a bit harder and maybe he’ll get his own way. It also seems that after 3 years of work, Richard still does not realise that Cook et al. (2013) was a survey of abstracts, not a survey of abstract raters. It might also be worth considering the irony of Richard apparently communicating his issues to a site that is best known for denying the mainstream scientific position with respect to climate science. It’s almost as if Richard doesn’t get the point of consensus studies; they’re mainly to refute claims made on, for example, sites like WUWT that there isn’t a consensus.

Anyway, I’ll stop there. There are links below to various other articles. I’ll try to add more as they appear. Posts about the consensus tend to have somewhat active comment threads, so can I ask everyone to – if necessary – show some restraint. The comment threads here have become much calmer in recent times, so maybe it won’t be necessary.

Links:
Consensus on consensus: 97% of experts agree people are changing the climate: University of Queensland press release.
Yes, there really is scientific consensus on climate change by John Cook in the Bulletin of the Atomic Scientists.
Consensus on consensus: a synthesis of consensus estimates on human-caused global warming by Bart Verheggen.
Consensus on consensus by Stephan Lewandowsky.
Devastating reply to Richard Tol’s nonsensus in peer-reviewed journal by Collin Maessen.
It’s settled: 90–100% of climate experts agree on human-caused global warming by Dana Nuccitelli in the Guardian.
Settled science: there is a scientific consensus that humans are causing climate change by Sou.

This entry was posted in Anthony Watts, Climate change, ClimateBall, Watts Up With That and tagged , , , , , , . Bookmark the permalink.

1,163 Responses to Consensus on consensus

  1. If the initial response from cited authors didn’t give him pause (mind you, this was in September of 2015) then I really doubt that the peer-reviewed response will. His tweeted response to my article and what was published on WUWT already indicate this.

  2. Collin,
    Absolutely. I doubt this is end of it. Years of endless fun.

  3. Dikran Marsupial says:

    Richard Tol wrote “The consensus is of course in the high nineties. No one ever said it was not.”

    Just for reference.

  4. Dikran Marsupial says:

    Tol critcisises TCP “The paper states that ‘information such as author names and affiliations, journal and publishing date were hidden’ from the abstract raters. Yet, such information can easily be looked up. Unfortu-nately, Cook et al (2013) omit the steps taken to prevent raters from gathering additional informa-tion, and for disqualifying ratings based on such information.”

    This seems to me to be a rather silly criticism, given that you can indeed find out this sort of information very easily, e.g. by typing the first line of the abstract into Google scholar, one wonders what steps could have sensibly been taken to prevent this, without making such an extensive survey impractical.

    Also the issue to do with the number of unique abstract IDs, this has been explained to Richard, and it seems rather disingenuous not to have mentioned the explanation in his paper. Asking questions about academic papers is a perfectly reasonable thing to do, but pretending that they haven’t already been answered, rather less so.

  5. Dikran Marsupial says:

    Richard writes “Unfortunately, Environmental Research Letters does not believe in open discussion…”

    Are journals intended for discussion? Comment papers are pretty rare, not all of them have a response, for the exchange to be continued further is deeply unusual (but not unheard of). Email is quite a good way of having a discussion, or skype, or even the telephone. Wanting to have a discussion via papers in a journal seems rather 19th century to me!

  6. Dikran,
    It’s quite remarkable. It’s almost as if Richard hasn’t spent decades in academia doing research and publishing papers. In any other environment, what he has said about ERL would probably lead to some kind of response stating that he should withdraw what he has said or they will have no more dealings with him. In this environment it’s hard to do that because you can’t really stop someone from submitting a paper and you’re meant to judge submissions on their merit, not on what the authors might have said about your organisation in the past.

    In a sense, I’ve learned a lot from Richard in the last couple of years. If you have a thick skin and don’t care what others think of you, you can get away with virtually anything. It’s quite amazing, really.

  7. Which makes me wonder why you called dealing with Tol fun. 😉

  8. It can be so surreal at times, it’s hard not to find it funny 😉

  9. Dikran Marsupial says:

    I think ERL have been extremely lenient in allowing the comment to be published in its current form, given the poor analysis of the data (i.e. not distinguishing between the levels of expertise, just whether a sub-sample was used) and the obviously footling questions (see above and also datestamps etc). Journals do tend to be lenient over this sort of thing, and probably rightly so, but I don’t think it has done Richard’s academic reputation any good and I don’t think he has anything to complain about.

  10. MarkR says:

    It seems to me like the Cook data match the description in the journal. The fact that completely irrelevant ID numbers (that were not used in any analysis) were not reassigned to make them sequential is a very weird criticism.

    Richard Tol’s Energy Policy comment on Cook13 contained a calculation error, returning 91% instead of 97%. He doesn’t seem bothered that he’s published a number that’s clearly wrong and has not, as far as I can see, retracted or corrected it.

    For comparison I discovered a calculation error in Gavin Cowley’s comment on a paper by Loehle. Gavin and coauthors submitted a corrigendum. Kevin Cowtan found a bug in one of his papers that changed the graphs by less than the width of the lines drawn on them, he submitted a corrigendum. Errors we found in papers that reject the consensus such as those by Humlum, Gervais and Lu are rather like Tol’s – they are not corrected even when the error completely breaks their conclusions.

  11. Richard’s argument, I think, is that the 91% is not an error, despite the fact noone can find his missing 300 abstracts (#FreeTheTol300). I think (and I can’t find the comment anymore) that Richard argued that some fraction of all abstracts rejected the consensus. So, it appears to be a quantum mechanical argument in which abstracts can exist in a super-position of states. Just in case anyone reading this thinks that this makes any sense; no it does not.

  12. verytallguy says:

    I think this:

    Richard Tol wrote “The consensus is of course in the high nineties. No one ever said it was not.”

    ought to be repeated at regular intervals through this comment thread. I shall continue to do so until moderated 🙂

  13. Dikran Marsupial says:

    Wasn’t the 91% the result of an incorrect simplifying assumption made in the statistical analysis (uniform distribution of disagreements between categories)? If there aren’t enough observations to draw reliable conclusions from the proper analysis, that doesn’t mean that you should draw conclusions from an analysis based on an obviously incorrect assumption, it means you shouldn’t draw a conclusion as there aren’t enough data.

  14. Dikran,

    Wasn’t the 91% the result of an incorrect simplifying assumption made in the statistical analysis (uniform distribution of disagreements between categories)?

    Yes, I think it was something like that. I have pointed out to Richard that it seems ironic that he would be calling for the retraction of a paper, the result of which he doesn’t dispute, while his published response to that paper has a result that doesn’t make any sense. #FreeTheTol300.

  15. Pingback: Consensus on consensus: a synthesis of consensus estimates on human-caused global warming | My view on climate change

  16. MarkR says:

    Tol used the Cook13 data on disagreements between abstract raters to derive a correction which he then applied.

    If it were right then he’d get the same answer as you get from the original data when you apply it to the same original data. This is a simple and obvious test to verify your sums. Tol didn’t verify, and it turns out his sums were wrong.

    This was pointed out to him in the response and by email but he didn’t do anything to fix this published mistake.

    Correcting mistakes in journals typically requires the original authors to have an appetite for accuracy, as Cawley and Cowtan showed in my past experiences. Tol hasn’t (yet).

  17. MarkR,
    Also, IIRC, Tol’s method suggested that the initial consensus (before any error correction) would have been than 100%, which clearly does not make any sense.

  18. Dikran Marsupial says:

    It is a pity I don’t teach a class on Bayesian statistics at the moment as it looks like a nice exercise for students (conjugate priors, problems with uninformative priors where you do actually have some relevant prior knowledge etc.).

  19. verytallguy says:

    It is a pity I don’t teach a class on Bayesian statistics at the moment as it looks like a nice exercise for students (conjugate priors, problems with uninformative priors where you do actually have some relevant prior knowledge etc.)

    I feel a guest post coming on… perhaps with an extension to application to climate sensitivity?

  20. Joshua says:

    Richard Tol says:
    May 28, 2014 at 1:37 am

    …Instead of bitching about someone else’s work, you may consider helping to solve this problem:

  21. Dikran Marsupial says:

    Supplementary irony Richard wrote“I was trained as a statistician. I have taught statistics. I have published in statistical journals. I have written statistical software. Your null hypothesis should therefore be that I do not make elementary errors.” in order to avoid giving a straight answer to a technical question regarding one of his publications (which he continued to evade).

  22. Pingback: Crush - Ocasapiens - Blog - Repubblica.it

  23. Richard,
    I didn’t forget them, I didn’t know they existed. Just to be clear, your claim is that the agreement in the literature goes away if you consider all the data. Considering all the data includes samples of people who are known to dispute the mainstream position. So, if you include samples of people who – in advance – are known to dispute the consensus, you discover that not all samples agree? Yup, sounds about right. So what? If we sampled everyone on the planet, I’m sure the level of consensus would be small, but we’re trying to determine the level of consensus amongst relevant experts, or in the relevant literature, not simply take a vote.

    And this,

    For the remaining two, Cook 2016 admit that Cook 2013 misled the reader. This would normally imply a retraction.

    makes you sound like a blog commenter who has never published a paper in their life, not someone who has spent decades in academia and published hundreds.

    Let’s be clear, you said this in your now published paper

    Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.

    This is not true, and you should have known it was not true, as it has been pointed out to you before. I personally don’t like calling for retractions of papers, but the idea that someone who has published something that they must know was not true (as it had been explained to them before on numerous occasions) calling for the retraction of another paper, seems to illustrate a remarkable level of brass neck

  24. @wotts
    There are three kinds of errors, errors that are irrelevant, errors that can be fixed through an erratum, and errors that should lead to retraction.

    Cook 2016 writes that two aspects of the data collection for Cook 2013 are different than they were portrayed in that paper. Data not being as described is not something that can be fixed in an erratum.

    Note that there are three further challenges to the Cook 2013 data that were evaded by Cook 2016. And there was the earlier challenge by Dean, where Cook & Cowtan admitted that Cook 2013 was wrong.

  25. There are three kinds of errors, errors that are irrelevant, errors that can be fixed through an erratum, and errors that should lead to retraction.

    I agree, but I am trying to work out why you think you get to decide which is which. When do you plan to submit an erratum to your most recent paper?

    Cook 2016 writes that two aspects of the data collection for Cook 2013 are different than they were portrayed in that paper. Data not being as described is not something that can be fixed in an erratum.

    No, it doesn’t. I think that data in Cook et al. 2013 is perfectly clear. IIRC, it took a few minutes to download an equivalent sample after first reading the paper. Are you sure you’re not just very confused?

    And there was the earlier challenge by Dean, where Cook & Cowtan admitted that Cook 2013 was wrong.

    You really are going to have to start backing up your assertions. I’ll repeat what I’ve said many, many, many times before. Cook et al. (2013) was a survey of abstracts, not a survey of abstract raters. I realise that Dean may be very confused about such a subtlety, but you should have been able to work out the difference by now.

    I have a feeling that you’re confusing someone acknowledging a point that you might have made with “oh my god we made a mistake”. They’re not equivalent.

  26. Richard,
    In fact, this is almost certainly untrue.

    where Cook & Cowtan admitted that Cook 2013 was wrong.

    I do not think that either of Kevin Cowtan or John Cook have ever admitted that Cook 2013 was wrong. Are you sure it’s wise to keep repeating things that are almost certainly not true?

  27. Dikran Marsupial says:

    Richard wrote “There are three kinds of errors…”

    However, “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts. ” is not an error in Cook et al (2013), it is an error in Tol’s comment paper. The supporting data does not show that there were 12876 abstracts, it is merely that Richard doesn’t understand what a unique ID is in the context of a database. This is an error in Richard’s comment paper that he should at least acknowledge. I would suggest that this would be reasonable grounds for Richard’s comment paper to be retracted as this was explained to him prior to submitting the comment.

  28. Magma says:

    Do not feed the Tol.

  29. Actually, I don’t really agree with this

    There are three kinds of errors, errors that are irrelevant, errors that can be fixed through an erratum, and errors that should lead to retraction.

    I don’t think we should retract papers for errors. Errors either get fixed in an erratum, or newer publications produce newer results that supercede the earlier paper which might have errors. You retract papers for fraud or misconduct. One might argue that Richard’s call for a retraction is effectively accusing the authors of Cook et al. (2013) of some kind of research misconduct. I think Richard should either back this up (which would essentially mean being shown to be right) or retract it. I don’t expect this to actually happen, so am just expressing my opinion.

  30. Magma,
    Given that it is about his paper, some leeway may be necessary.

  31. verytallguy says:

    Magma,

    Feeding seems inevitable given the OP, but we can choose what tidbits to feed.

    For instance, this seems relevant from Andrew Gelman to Tol:

    I’m sure you can go the rest of your career in this manner, but please take a moment to reflect. You’re far from retirement. Do you really want to spend two more decades doing substandard work, just because you can?… …It’s not too late to get a bit more serious with the work itself.

    http://andrewgelman.com/2014/05/27/whole-fleet-gremlins-looking-carefully-richard-tols-twice-corrected-paper-economic-effects-climate-change/

  32. Richard’s paper can probably be described as taking all possible sets of numbers from all the various other consensus studies, and plotting graphs showing various possible levels of consensus, but without paying any attention to the details of the other studies, or who/what was included in the different surveys.

    Sounds a lot like another Gremlin-ridden paper by Richard Tol.

    If I were in his field, I would think twice before citing work of such quality.

  33. Victor,
    A criticism I sometimes have of those who criticise mainstream climate science is that a lot of what is done is to take numbers and analyse these numbers without paying any attention to what the numbers actually mean, how they were collected, what the underlying assumptions were, and what underlying physics one should understand before manipulating the numbers. This is a crucial part of physics; numbers only have meaning in the context in which they were produced (i.e., you need to know things about the numbers and the system they represent before trying to draw any conclusions). It’s therefore somewhat bizarre that someone of Richard’s calibre would appear to think that all he needs to do is compare all possible survey samples, without paying any attention to the details of the surveys themselves, what was surveyed, what were the assumptions, what was the goal of the survey, etc. It just becomes a form of numerology.

  34. It’s therefore somewhat bizarre that someone of Richard’s calibre would appear to think that all he needs to do is compare all possible survey samples, without paying any attention to the details of the surveys themselves, what was surveyed, what were the assumptions, what was the goal of the survey, etc.

    Let’s dispel with this notion that Richard Tol doesn’t know what he’s doing. He knows exactly what he’s doing.

  35. John Hartz says:

    Hot off the press…

    It’s settled: 90–100% of climate experts agree on human-caused global warming by Dana Nuccitelli, Climate Consensus – the 97%, Guardian, Apr 13, 2016

  36. Magma says:

    Let’s dispel with this notion that Richard Tol doesn’t know what he’s doing. He knows exactly what he’s doing. — Victor Venema

    I came to the same conclusion a couple of years ago.

  37. Dikran Marsupial said on April 13, 2016 at 7:29 am,

    “Richard Tol wrote “The consensus is of course in the high nineties. No one ever said it was not.”

    Just for reference.”

    I believe this point is important enough to second here with quoting all of Richard’s own words for context – I believe that all should see here all of what Richard actually wrote:

    “The consensus is of course in the high nineties. No one ever said it was not. We don’t need Cook’s survey to tell us that.

    Cook’s paper tries to put a precise number on something everyone knows. They failed. Their number is not very precise.”

    There is a problem with the phrase “not very precise” here. If we’re limited to the high nineties, then for all practical purposes, “not very precise” is essentially just irrelevant and the consensus in question is essentially proved beyond all reasonable doubt.

  38. lerpo says:

    It’s therefore somewhat bizarre that someone of Richard’s calibre would appear to think that all he needs to do is compare all possible survey samples, without paying any attention to the details of the surveys themselves, what was surveyed, what were the assumptions, what was the goal of the survey, etc.

    This sounds similar to one of the criticisms of Richard Tol’s twice-corrected paper, “The Economic Effects of Climate Change”. Although the error was probably more subtle there, Andrew Gelman notes: “You just can’t think of these as representing 14 or 21 different data points as if they represent observations of economic impact at different temperatures. The data being used by Tol come from some number of forecasting models, each of which implies its own curve.” – http://andrewgelman.com/2014/05/27/whole-fleet-gremlins-looking-carefully-richard-tols-twice-corrected-paper-economic-effects-climate-change/

  39. Cook and Oreskes don’t have climate backgrounds but find a way to measure the statistics of opinion. Now we’re sharing opinions about opinions. Poetic, but perhaps not elucidating.
    But that’s just my opinion.

  40. The Very Reverend Jebediah Hypotenuse says:


    I believe this point is important enough to second here with quoting all of Richard’s own words for context – I believe that all should see here all of what Richard actually wrote…

    People.

    If you have time to ‘debate’ about consensus in climate science with Richard Tol, then you really do need to get out more.

    Walk the dog, play with your kids, water the garden. Doing something – anything – with a low carbon footprint would be more productive than trying to pin down GWPF-approved zombie-jello-arguments, the details of which do not really matter to anyone but Tol himself, and which will probably be ‘revised’ in time anyway…

    Unless, of course, you are a masochist.

  41. snarkrates says:

    Richard’s life’s work seems to be dedicated to providing sufficient evidence of dissent from the consensus that the denialati can give voice to their talking points with a straight face after only a couple of hours of practicing in front of a mirror and thinking about childhood cancer and very old nuns. Evidently, having 9% of experts disagree with the consensus is sufficient to do this, while 3% is not. And meanwhile back on planet reality…

  42. John Hartz says:

    The Very Reverend Jebediah Hypotenuse says:

    We must have ESP. I was thnking the very same thing while I took a break from sitting in front of my computer and did some yard work.

  43. John Hartz says:

    In keeping with the “What’s good for the goose is good for the gander.” dictum…

    Do Richard Tol’s superiors at the University of Sussex and the Vrije Universiteit Amsterdam know how much time and energy he expends in the cybersphere?

  44. Marco says:

    “Also the issue to do with the number of unique abstract IDs, this has been explained to Richard, and it seems rather disingenuous not to have mentioned the explanation in his paper.”

    “rather disingenuous” is the euphemism of the century. It skirts dangerously close to scientific misconduct.

  45. Marco says:

    Also, since Tol’s twice corrected paper is mentioned, most people I talked to have agreed that the correction materially alters the conclusion. This is, according to COPE, grounds for a retraction. I am sure Tol will now request his original paper to be retracted, since he is oh-so-much concerned about ethics.

    Any day now…

  46. Willard says:

    > Cook and Oreskes don’t have climate backgrounds but find a way to measure the statistics of opinion.

    Nic Lewis doesn’t have a climate background but finds a way to raise concerns about climate sensitivity.

    What does that tell you, Turbulent One?

    ***

    > Do Richard Tol’s superiors […]

    This would caution what Richie himself does to his targets, e.g.:

    http://frankackerman.com/tol-controversy/

    ***

    Speaking of unanswered questions, I wonder if Richie ever found the time to respond to this one:

    In recent testimony economist Richard Tol claims that a study finding a 97% consensus in the academic literature on the human contribution to climate change is flawed. The original study was based on a team of volunteers rating around 12000 scientific abstracts. Tol argues that there is an 18.5% error rate in the rating process and estimates that 6.7% of abstract ratings are still in error after reconciliation, implying that 11.8% were fixed during reconciliation. Tol applies the same changes in rating which were produced by the reconciliation step to the additional 6.7% of abstracts, decreasing the consensus percentage from 97% to 91%. If correction of 6.7% of the ratings reduces the consensus by 6%, then the earlier reconciliation of 11.8% of the ratings is likely to have reduced the consensus by at least a similar amount. However given that the consensus after reconciliation is 97%, this would appear to be impossible.

    http://t.co/wXd0FjekBE

  47. Dikran Marsupial says:

    Marco, well I am English, allowances must be made for that. ;o)

  48. Steven Mosher says:

    “This seems to me to be a rather silly criticism, given that you can indeed find out this sort of information very easily, e.g. by typing the first line of the abstract into Google scholar, one wonders what steps could have sensibly been taken to prevent this, without making such an extensive survey impractical.”

    Simple.

    You do what we used to do. You put raters in a room with no access to the outside world.

    Jesus. I remember my first time doing content analysis. I read the work. I recognized the author
    from the details. I went to the study director and handed them the text and recused myself because I knew who the author was.

    FFS

  49. Willard says:

    > You do what we used to do.

    For something else.

    Simple.

    Jesus.

  50. Francis says:

    Query: Aren’t we at 1.0 C of warming right now relative to “preindustrial time”? Can we detect the positive welfare-equivalent income gain that Tol predicted? Isn’t an important part of modeling to hind-cast as the years go by and re-analyze the strengths and weaknesses of your own work?

    And finally, if there are errors in the papers that showed the positive income gain, what kind of errors are these — type 1, 2 or 3?

  51. @wotts
    Ben Dean computed the inter-rater reliability test stats, submitted a comment to ERL noting the inter-rater reliability was garbage, and got rejected. He then wrote a new comment, asking ever so nicely whether perhaps the original authors could report said stats, because it was sort of crucial info and every other paper in the literature computes it and all textbooks so you should. That comment was published. Cook&Cowtan reply by computing the relevant tests, find that the results are garbage, and then spend the rest of the paper coming up with contrived arguments why the test results should not be believed.

    @steve mosher
    Indeed.

  52. guthrie says:

    One of the reasons for poking the Tol in this fashion is simply so that any curious person out there can see what a hash he has made of things. Ordinary members of the public will happily take some experts word for things, but if they find that said expert is wrong, or at least everyone and their dog disagrees with them, some will take further thought and maybe realise the expert is not so expert after all. Others will of course carry on believing what they are told.

  53. Dikran Marsupial says:

    Steven Mosher wrote ” You put raters in a room with no access to the outside world.”

    In which case the survey would not have been possible (and neither would a lot of citizen science projects). Putting raters in rooms requires money for a start.

  54. John Hartz says:

    Hmmm…

    Processing high-level math concepts uses the same neural networks as the basic math skills a child is born with

    How Does a Mathematician’s Brain Differ from That of a Mere Mortal? by Jordana Cepelewicz, Scientific American, Apr 12, 2016

  55. @dikran
    Cook had two options. Either he ensured that the raters were independent, that they only had access to the specified information, and that they were blind to previous ratings. Or he included a caveat in the paper that none of these conditions could be met.

    Instead, Cook claimed that the conditions were met.

  56. Willard says:

    Speaking of hash Richie makes of things:

    During that RTE interview, [Richie] called into question the organisation’s independence and condemned it for a lack of transparency.

    However, online — his favoured medium — [Richie] went for the jugular.

    Over a 48-hour period, [Richie] made a host of serious allegations into how the ESRI operates, about its transparency, its relationship with Government and how it is funded.

    Today, the ESRI hits back very strongly at the various allegations made by Tol online and during an interview with this newspaper. It vehemently denies the failings alleged by Tol, rejecting his outlook almost entirely. “The allegations made by Richard Tol are wholly unsubstantiated.”

    His criticisms of the ESRI on television were somewhat muted and restricted, no doubt by the station’s lawyers, and Tol himself is bemused that his departure was given so much prominence. “It is a slow news day if the lead story is the hairy guy packing a box,” he tweeted.

    But it was on Twitter that Tol made the most serious allegations about the organisation.

    He accused it of being a xenophobic and nepotistic body which is caught in a timewarp using antiquated technology. He also stated that he was the fifth senior person to leave the institute, implying cultural and endemic problems at the ESRI.

    http://www.independent.ie/opinion/analysis/daniel-mcconnell-a-fearless-whistleblower-or-a-disgruntled-crank-26809004.html

    All this because yet another of Richie’s studies has been suspected of containing Gremlins.

  57. Willard says:

    > Cook & Cowtan reply by computing the relevant tests, find that the results are garbage […]

    Here you go, Richie:

    The unweighted Cohen kappa is 0.35 using the seven fine-grained categories used in the initial rating process. However, the consensus statistics are based on only three categories: ‘endorse’, ‘reject’ or ‘no position’; for these categories, kappa rises to 0.46. Subdividing rating categories is known to depress kappa values. The more appropriate Fleiss kappa gives the same results. In our view, the categories should be considered as nominative (Cook et al 2014). However, if they are treated as ordered, the kappa value for the fine-grained categories approaches the value for the consensus categories. Kappa values are also depressed in the case when category counts are very uneven (Sim and Wright 2005). Our data is an extreme case with two orders of magnitude difference between the most and least populous categories.

    http://iopscience.iop.org/article/10.1088/1748-9326/10/3/039002

    Then the authors go on to show that the interpretation of Kappas ain’t that easy and that there may be more important ways to look at the data:

    Because the consensus ratio is determined by two of the three categories, differences in allocation of papers to the ‘no position’ category have minimal impact on the conclusions. The proportion of ratings in the relevant categories (i.e. endorse, no position, reject) for the 12 raters who contributed at least 500 ratings were decomposed by change of variable into consensus invariant and consensus altering terms. The inter-rater variability in the consensus invariant variable was more than twenty times larger than in the consensus altering variable. Thus the primary cause of inter-rater variability arises from differing interpretations of the no-position criteria, but at the same time the raters applied their individual criteria consistently to both the endorse and reject categories. This suggests that inter-rater variability could be substantially reduced by clarification and training on the no-position criteria, but that doing so would not affect the final consensus percentages.

    Which explains why it’s easy for Richie to put words into Cook & Cowtan’s mouth.

    For good measures, here’s Ben Dean’s citation:

    http://www.sciencedirect.com/science/article/pii/S0895435610000971

  58. The Very Reverend Jebediah Hypotenuse says:


    Jesus. I remember my first time doing content analysis. I read the work. I recognized the author
    from the details. I went to the study director and handed them the text and recused myself because I knew who the author was.

    Damn – that happened to me too, my first time doing content analysis…

    On the fifteenth of May, in the jungle of Nool,
    In the heat of the day, in the cool of the pool,
    He was splashing…enjoying the jungle’s great joys…
    When Horton the elephant heard a small noise.

    So Horton stopped splashing. He looked towards the sound.
    “That’s funny,” thought Horton. “There’s no one around.”
    Then he heard it again! Just a very faint yelp
    As if some tiny person were calling for help.

  59. Willard says:

    > Cook claimed that the conditions were met.

    A quote might be nice.

    ***

    Here are some caveats that we can read in C&C’s response to Ben Dean:

    Potential bias among the raters was tested a second way by use of the author self-ratings (bearing in mind that the authors had access to the whole paper). The author ratings were assumed to be correct and were then used to calculate a correction to the abstract ratings. This correction was then applied across all the abstracts, to estimate the consensus score which would have been obtained had the authors rated all of the papers. The results are virtually unchanged (97.2% versus 97.1%). Thus this second method of bias evaluation also suggests that bias was not a significant problem. Nonetheless, we encourage third parties to independently examine the abstracts as a further audit of our results. Tools have been made available to facilitate this task at http://www.skepticalscience.com/tcp.php.

    http://iopscience.iop.org/article/10.1088/1748-9326/10/3/039002

    The emphasized conclusion might be a bit too idealistic, since we know of at least one author who has misclassified his own papers:

    [Richie] has claimed that five out of the ten abstracts rated by Cook et al, 2013 were incorrectly rated. Let’s run through the list of those abstracts for him.

    http://bybrisbanewaters.blogspot.com/2013/05/tols-gaffe.html

    ***

    More than 20 years to go from “unprecedented” to “independent.”

    I blame adjectives.

  60. Dikran Marsupial says:

    Richard Tol, I asked ” one wonders what steps could have sensibly been taken to prevent this, without making such an extensive survey impractical.”

    as usual, you have dodged the question. The fact that you couldn’t come up with an answer to the question posed demonstrates that the criticism (as posed) was indeed silly.

    The point raised in your paper was:

    “The paper states that ‘information such as author names and affiliations, journal and publishing date were hidden’ from the abstract raters. Yet, such information can easily be looked up. Unfortunately, Cook et al (2013) omit the steps taken to prevent raters from gathering additional information, and for disqualifying ratings based on such information.”

    This is presenting the issue as a technical flaw in the study that calls the conclusions into question. Richards answer above was:

    “Cook had two options. Either he ensured that the raters were independent, that they only had access to the specified information, and that they were blind to previous ratings. Or he included a caveat in the paper that none of these conditions could be met.

    Note the shift of the goal posts here, if a caveat would have been satisfactory (although it ought to have been obvious from the description in the paper), then he should have complained about the lack of a caveat. This is called “making a mountain out of a molehill”.

    “Instead, Cook claimed that the conditions were met.”

    Please provide an actual quote.

  61. I would normally say: do not feed to Tol, but it did produce a nice overview paper of all the consensus studies. Chapeau to all authors!

    What the paper left me curious about is whether there are also consensus studies about other sciences. How many emeritus physicists are willing to deny Quantum Mechanics in an anonymous survey? How many geologists denounce plate tectonics?

    The 97% consensus still sounds a bit low to me and the numbers of some surveys are even lower, thus it would be interesting to see whether it is possible to get higher number with the methods used. Might it indicate that it is difficult to get a quality sample where everyone consistently has the relevant expertise? Which percentage of scientists are willing to answer nonsense for ideological reasons in an anonymous survey when the answer does not affect their reputation?

    it should come as no surprise that the most common argument used in contrarian op-eds about climate change from 2007 to 2010 was that there is no scientific consensus on human-caused global warming (Elsasser and Dunlap 2012, Oreskes and Conway 2011).

    It would be interesting to repeat this study to see if enough members of the US public are now aware of the consensus that the mitigation sceptics no longer dare to use the imbecile argument that there is no consensus.

  62. Thanks Willard

    The standard response to a kappa of 0.35 is to have a good cry and start working on a new paper.

  63. Richard,
    I’ll repeat what I’ve said many, many times. Cook et al. (2013) was a survey of abstracts. It used people to convert the abstracts into a rating. There were initially two ratings per abstract. If these disagreed, there was a reconciliation process to produce a final rating for each abstract. The results are public. You can check them for yourself. The abstracts surveyed are also available. You can do it again. etc, etc, etc….

    Cook&Cowtan reply by computing the relevant tests, find that the results are garbage, and then spend the rest of the paper coming up with contrived arguments why the test results should not be believed.

    In other words, Cook & Cowtan did not admit that Cook 2013 was wrong (as you earlier claimed). Why do you think you’re in a position to suggest that others may be saying things that aren’t true?

  64. Dikran Marsupial says:

    “The standard response to a kappa of 0.35 is to have a good cry and start working on a new paper.”

    A bit like a paper where some of the conclusions are not robust to the deletion of a single datapoint (and where that datapoint is likely an outlier and contributed by the author of the paper)? Good job it appears to be just a working [sic] paper.

    Brass neck indeed.

  65. ATTP, many have tried to get that point across but Tol keeps ignoring it. Doesn’t really matter what you say to him he either will say something that doesn’t match reality or attempts to dodge the question.

    Probably also the reason why he’s commenting here and hasn’t commented over at my place. He knows I’ll hold him to properly citing where he gets his claims from. As demonstrated from the context other commenters here have provided it’s obvious why he isn’t linking his sources.

  66. Mike Pollard says:

    At present Tol’s paper has been downloaded 35 times, the reply by Cook et al 258. Pretty clear as to who generates interest.

  67. Willard says:

    You’re welcome, Richie.

    You are referring to a “standard response” – does it mean you actually used Kappa in one of your previous publications? I’m not sure Kappa’s that “standard” in an econometrician toolkit – do you have citations in econometrics or even economics where Kappa’s used?

    Also, let’s recall that you never addressed KR’s criticisms except by waving your hands:

    Kappa is inappropriate across ratings of different data sets (by definition), you yourself have demonstrated autocorrelation/clustering via alphabetic sorting (your figure S14, weakened only slightly by the additional year sort), rater variance and drift cannot say anything about consistent bias (i.e., any error in Cook et al conclusions) without some ground-truth check (which you have declined to do), [AT] has (IIRC) has computed the skew from ‘7’s in your rolling stats, any trends in rolling stats vanish under random ordering, and the timing of composition and consensus change do not come close to overlap (as you have shown in your Figure 1).

    Done and done. Rolling statistics of an unrelated ordering, and kappa, are irrelevant to this data, and simply fail to support your comment. I feel no need to say more in that regard.

    https://wottsupwiththatblog.wordpress.com/2013/07/26/richard-tol-and-the-97-consensus/#comment-2726

    There’s also the strange fact that your own Kappa was, as you said, 8%.

    Oh, and if you could find a cite for that “< 0,7 = garbage" criteria of yours, that'd be nice.

  68. @dikran
    “[e]ach abstract was categorized by two independent […] raters.”

    “[a]ll other information such as author names and affiliations, journal and publishing date were hidden.”

    both statements from Cook 2013 — note that Cook 2016 acknowledges that both are false

    as to raters being blind to previous ratings, Cook 2013 is silent and Cook 2016 is silent too

    we know, however, that there were significant gaps in rating activity, that Cook was rating abstracts himself, that Cook regularly reported back to the raters about their progress and that of others, that raters reported their findings to other raters, and that one rater stopped rating because the results were not to his liking

  69. Willard says:

    One random citation about Kappa’s measures:

    Kappa Agreement
    < 0 Less than chance agreement
    0.01–0.20 Slight agreement
    0.21– 0.40 Fair agreement
    0.41–0.60 Moderate agreement
    0.61–0.80 Substantial agreement
    0.81–0.99 Almost perfect agreement

    Source: http://www1.cs.columbia.edu/~julia/courses/CS6998/Interrater_agreement.Kappa_statistic.pdf

    Should we conclude that what’s “garbage” for Richie may be considered “fair” for otters who published about Kappa?

    The authors also hammer that Kappa is an estimate of how much the agreement is not due to chance more than an estimate of the strength of an agreement.

    The “< 0” line is interesting for at least two reasons. First, it means it's not really a percentage, contrary to how Richie presented his 8% three years ago. Second, it represents "less than chance," which should indicate something like ultra-independence.

  70. both statements from Cook 2013 — note that Cook 2016 acknowledges that both are false

    No, it doesn’t.

  71. verytallguy says:

    I think now would be a good point to remind us that Tol is arguing with himself:

    Richard Tol wrote “The consensus is of course in the high nineties. No one ever said it was not.”

  72. @wotts
    According to Cook 2016, “[r]aters had access to a private discussion forum” and “raters affirm that [fuller investigation] occurred”.

  73. John Hartz says:

    Perhaps we should start a pool on how many nits Tol will pick on this comment thread before all is said and done.

  74. Here’s what it actually says

    Tol (2016) questions what procedures were adopted to prevent communication between raters. Although collusion was technically possible, it was – in practice – virtually impossible. The rating procedure was designed so that each rater was assigned 5 abstracts selected at random from a set of more than 12,000. Consequently, the probability of two raters being assigned the same abstract at the same time was infinitesimal, making collusion practically impossible.

    Raters had access to a private discussion forum which was used to design the study, distribute rating guidelines and organise analysis and writing of the paper. As stated in C13: “some subjectivity is inherent in the abstract rating process. While criteria for determining ratings were defined prior to the rating period, some clarifications and amendments were required as specific situations presented themselves. These “specific situations” were raised in the forum. A manual search of this forum found content from 32 abstracts consisting of 7 endorsements, 12 no position and 13 rejections, some of which were provided as examples to raters to help with abstract classification. In addition, several non-reviewed or non-climate-related abstracts were identified and raised in the forum, although these are irrelevant for the results. While some discussion may have been missed in this manual search, we are able to identify potential cross discussion of 0.26% of the sample. Excluding these papers results in an estimated consensus
    of 97.4%.

    I’ll stress that this indicates that 32 abstracts were discussed in the forum, out of more than 12000. Some of this was for clarification about the rating process. Some to discuss if some abstracts should be discarded as being non-climate related, or non-peer-reviewed,….

  75. Thanks, Wotts, I had not spotted that: Random drawings WITH replacement? OMFG!

  76. How can you not have seen that? It’s the only place they discussed rater independence. On what basis where you claiming that Cook 2016 acknowledge your point?

    Random drawings WITH replacement? OMFG!

    What are you talking about?

  77. Willard says:

    Compare and contrast.

    From Dean’s letter:

    I could see no mention as to how these levels were created and how reliable they were in terms of both inter-rater and intra-rater reliability (Cohen’s kappa). Best practice on rater reliability indicates that both inter-rater and intra-rater should have been measured and documented in a study such as Dr Cook’s [2] and I am surprised that this fact appears to have been neglected.

    http://iopscience.iop.org/article/10.1088/1748-9326/10/3/039001

    Here’s what we can read in that [2]:

    There are several statistical approaches that may be used in the measurement of reliability and agreement. Because they were often developed within different disciplines, no single approach can be regarded as standard. Every method is also based on assumptions concerning the type of data (nominal, ordinal, continuous), the sampling (at random, consecutive, convenience), and on the treatment of random and systematic error. Therefore, it is not possible to be too prescriptive regarding the ‘‘best’’ statistical method, with the choice depending on the purpose and the design of the study. […]  However, there are several types of kappa statistics, including Cohen’s kappa, Cohen’s weighted kappa, and the intraclass kappa statistic. Inference procedures also vary depending on the particular kappa statistic adopted, for example, the goodness-of-fit approach for the intraclass kappa statistic [56].

    http://www.sciencedirect.com/science/article/pii/S0895435610000971

    The implicit inference that insinuates itself between the two sentences of Dean’s letter quoted above does not seem to be substantiated in the single authority cited by Dean.

    Once upon a time, being an Oxonian meant something.

  78. verytallguy says:

    What are you talking about?

    I think this might just be the point Richard is making:

  79. Gator says:

    Tol is just trying to be an honest broker.

  80. John Mashey says:

    This discussion is useful, at least in one sense.
    A simple rule has served well. If someone shows they are a serious Dunning-Kruger afflictlee, it is a good idea to simply ignore anything they say thereafter, essentially simulating the helpful KILLFILE of USENET days.

    For example:

    Turbulent Eddie wrote:
    “Cook and Oreskes don’t have climate backgrounds but find a way to measure the statistics of opinion. Now we’re sharing opinions about opinions. Poetic, but perhaps not elucidating.
    But that’s just my opinion.”

    I’ll just cover Oreskes (Wikipedia, see publications, since I happen to have a few extra notes handy:
    Naomi Oreskes is Professor of the History of Science and Affiliated Professor of Earth and Planetary SciencesShe recently arrived at Harvard after spending 15 years as Professor of History and Science Studies at the University of California, San Diego, and Adjunct Professor of Geosciences at the Scripps Institution of Oceanography.

    CV 1990 PhD Graduate Special Program in Geological Research and History of Science, Stanford University… Advisors: Marco T. Einaudi (Applied Earth Sciences, Geology) and Peter Galison (History, Philosophy, Physics)
    1981 B.Sc. First Class Honors, Mining Geology (IC ~M.I.T. of the UK)
    The Royal School of Mines, Imperial College, University of London’ …
    1991-1996 Assistant Professor of Earth Sciences and Adjunct. Asst. Prof of History, Dartmouth College, Hanover, NH’

  81. Reich.Eschhaus says:

    That’s a paper? 2-2.5 pages of text complaining that the world is not perfect in the way that Tol wants it to be? And when it doesn’t confirm with how Tol wants it to be, it is useless and the results can be easily dismissed. Ah! Now I see!

  82. anoilman says:

    All this and Richard STILL can’t find his missing 300 papers. Surely if the consensus is so much lower, he could find some of the missing 300 papers he claims are there. I mean, he’s been offered cold hard cash by many people if he could list provide them.

    Yup… I’m still fanning my $10,000 dollars cold hard cash, and you still haven’t got those papers. I wonder where they are?

    Maybe you should widen your scope to Pop tech, and include journals with articles on Dog Horoscopes, or my favorite, “Landscape and Urban Planning”. 🙂

  83. Seems to me, nowadays there are quite a few Koch-funded positions in universities, so perhaps since Dr. Tol is providing a variety of evidence that he is prone to error, bias, and – at best – unable to do simple arithmetic (the alternative being deliberate misstatement), he is auditioning for a job when his employers discover he is not qualified in his discipline unless said dishonesty or incompetence is a feature not a bug.

    However, I think taking the dog for a walk and not providing him with a forum for further self-disqualification is a good idea. Endless distraction is, as Marc Morano says, a win for the opposition to some very necessary positive community action.

  84. lerpo says:

    Richard,

    I would be interested in your response to the posts that show your new paper includes statements that you knew were wrong or misleading. Is there a reasonable explanation?

  85. Sou says:

    @lerpo. No. There is no reasonable explanation. Richard has been told directly over and over again all the numerous places he is wrong.

    I cannot fathom how his appalling comment ever got published. He’ll never be cured of telling fibs IMO. It’s an affliction for which he’s earned a reputation among those who’ve had the misfortune of dealing with him. (I am pleased that he’s stopped accusing the 11,944 abstracts of being fatigued, though it’s disappointing that he still hasn’t figured out how to count or do arithmetic. Baby steps.)

  86. Anders,

    Cook et al. (2013) was a survey of abstracts, not a survey of abstract raters.

    I agree that was its intent. I have recently changed position on C13, and my present opinion is that the way the endorsement categories were written left too much open to subjective interpretation. Specifically, the definition of AGW is not consistent across all categories.

  87. Sou says:

    I wonder why Richard didn’t write about the tiny 0.7% of abstracts that disputed the science (using Richard Tol’s flawed arithmetic). Or the 1.9% that disputed the science using the correct arithmetic. I don’t think I’ve ever seen any science denier (fake sceptic) allude to fact that only a miniscule number of papers reject the consensus, and Richard Tol is no exception. They are too caught up with wanting to reject the 97%, or, in Richard’s case, looking for his missing 300 (as if!).

    I expect those percentages would be even lower if one only looked at the last five years or so, instead of the past 20 years.

  88. angech says:

    …and Then There’s Physics says: April 13, 2016 at 12:29 pm
    “your claim is that the agreement in the literature goes away if you consider all the data. Considering all the data includes samples of people who are known to dispute the mainstream position. So, if you include samples of people who – in advance – are known to dispute the consensus, you discover that not all samples agree? Yup, sounds about right. So what? If we sampled everyone on the planet, I’m sure the level of consensus would be small, but we’re trying to determine the level of consensus amongst relevant experts, or in the relevant literature”

    please, a little fairness here

    A study of the degree of consensus in the literature must consider all the data.
    The data will include people who agree with the position and who disagree with the position.
    The position only becomes mainstream when one side is in a substantial majority.
    Not a priori even if it exists or is assumed to exist a priori.
    And you cannot exclude experts who disagree, that is the whole reason for doing the consensus study

    One could paraphrase to show the problem with this approach.

    “So, if you include samples of people who – in advance – are known to agree the consensus, you discover that all samples agree? Yup, sounds about right. So what? If we sampled everyone on the planet, I’m sure the level of consensus would be small, but we’re trying to determine the level of consensus amongst relevant experts, who are already agreed with it, or in the relevant literature which already agrees with it.”

    The answer by the way is a “true” consensus of 100% but nobody would ever do such a study because the result is already known.
    Oh wait, someone has done one. Cook et al.

  89. Sou says:

    @angech If you want a public opinion poll, you could try Gallup. This discussion is about the findings of people whose job is to research climate science. That’s what’s meant by “relevant experts” – having expertise in climate science. Inserting/substituting your own words to change what was written above doesn’t actually change what was written above.

    The consensus as shown by the published literature as well as by surveys of climate scientists is close to 100% – not quite, but close. There are still a small if dwindling number of contrarians. There are also some evolution deniers among biologists, I expect. And maybe some plate tectonic deniers among geologists (not so sure about that, but possibly).

  90. Anders,

    I crossposted my previous comment at Sou’s, she noted that it wasn’t very specific. So I fleshed out my position a bit more and repeat it here (correcting some poor spelling):

    I think C13 tried to do two things at once with the same survey instrument: a literature review, and an expert opinion review. Either would have been fine on their own, or both done “separately” as part of the same paper. I think it’s all but certain that the vast majority of working climate scientists endorse the IPCC position of >50% warming since 1950 due to human activity, but very few *papers* explicitly do — as you allude — simply by virtue of the fact that most climate papers aren’t attribution studies.

    If what we want to promote — and rightfully so — is that 90+ percent of climate scientists endorse the IPCC position, I think we should use survey instruments which put those kind of explicit questions to authors in lieu of backing into it by doing a review of their work. I could, and would, defend that kind of survey.

  91. @brandon
    Cook 2013 was, of course, an attempt to measure the content of the literature — rather than an attempt to measure the knowledge and opinions of the raters. As such, the raters did not disclose any information about themselves and there is no reason to keep their identity under wraps. Cook’s ethics approval does not bind him to do so, and there is no legal or moral ground.

    In Cook 2013, the raters where simply measurement devices — like thermometers. That said, it is common practice to demonstrate that your measurement devices are properly calibrated and accurate. Cook 2013 omits that — or rather, they claim that their calibration data show the same thing (even though there is a 63% disagreement). Cook&Cowtan look at rater calibration, and find it lacking.

  92. Brandon,
    I’m aware of your position. Maybe we should postpone a discussion of it to some other date, as it’s probably hard enough to manage this one at the moment.

    Richard,
    Any comment on this? And you still appear to be ignoring that in your quest for research integrity, you’ve managed to publish a paper with claims that are almost certainly not true and that you knew to be not true well before your paper was published.

  93. angech,

    A study of the degree of consensus in the literature must consider all the data.
    The data will include people who agree with the position and who disagree with the position.
    The position only becomes mainstream when one side is in a substantial majority.
    Not a priori even if it exists or is assumed to exist a priori.
    And you cannot exclude experts who disagree, that is the whole reason for doing the consensus study

    Yes, but if you have a sample that is only those who dispute the mainstream position, it will return a low level of agreement with the consensus.

  94. Anders,

    Maybe we should postpone a discussion of it to some other date, as it’s probably hard enough to manage this one at the moment.

    I completely understand. I wouldn’t have felt right not saying anything, but do not have a burning urge to press the point, especially not against your wishes.

    Cheers.

  95. Richard,

    Thanks for your comment, which I have read. It’s probably best if I only lurk this thread from here on out.

  96. Brandon,
    Thanks. It might be interesting to have the discussion at some point (and I believe there is some interest in doing so even from the authors). I’m just trying to imagine a comment thread that includes discussing the current papers AND a discussion of another critique.

  97. Dikran Marsupial says:

    Richard Tol, you write “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    Were you informed prior to submitting the latest version of your comment paper of the explanation of this apparent anomaly (i.e. that the numbers in the supporting data are only unique IDs and provide only an upper bound on the number of abstracts in the study, and hence does not provide evidence that there were ever more than 12465 distinct abstracts actually downloaded)?

  98. Anders,

    You are of course quite welcome. I would welcome further discussion at a later time, especially if the emphasis is on my few ideas for improvements. Not being a survey expert by any means, I would of course be gratified if the authors found some merit in them.

  99. Dikran Marsupial says:

    “Tol (2016) questions what procedures were adopted to prevent communication between raters. Although collusion was technically possible, it was – in practice – virtually impossible. The rating procedure was designed so that each rater was assigned 5 abstracts selected at random from a set of more than 12,000. Consequently, the probability of two raters being assigned the same abstract at the same time was infinitesimal, making collusion practically impossible.”

    Given the nature of the study, this seems to me to be perfectly reasonable. Of course you can always do more in any study/experiment to rule out possible complicating factors, but at the end of the day you do actually need to draw the line somewhere so you can actually run the study/experiment. No experiment is ever perfect and Richard has provided no evidence at all that this issue has a substantial effect on the results. In fact it seems somewhat inconsistent to question methods to avoid collusion on one hand and the low agreement on the raters on the other. It seems to me this is a case of “impossible standards” (or at least “unreasonable standards”) which is a common rhetorical device, especially given that Richard does not set himself similarly high standards (examples of which are mentioned upthread, and e.g. on Andrew Gelman’s blog).

  100. Dikran,

    Were you informed prior to submitting the latest version of your comment paper of the explanation of this apparent anomaly (i.e. that the numbers in the supporting data are only unique IDs and provide only an upper bound on the number of abstracts in the study, and hence does not provide evidence that there were ever more than 12465 distinct abstracts actually downloaded)?

    Well, I said the following here.

    If you state in your paper that there are ~400 missing abstracts, rather than ~400 missing abstract identifiers, then you are publishing a statement that may well not be true and that the author of the original paper has already explained as not being true.

  101. Dikran Marsupial says:

    Richard wrote “I only publish things I do not know to be true. I never publish things I know to be untrue, though.”

    This is rather cleverly (i.e. not in a good way) worded, it stops short of saying that he doesn’t publish things he is confident are untrue; if there is any uncertainty it still fits within that criterion. I have to say though that for an academic to admit to having such a low standard of veracity in their published work is astonishing.

    As Richard thinks that Cook et al. should be completely explicit in giving caveats in their published work, I think it is only fair that he does likewise, and to test that I would like to know the precise circumstances when Richard wrote that particular section. Of course I think it is very unlikely that Richard will give a straight answer to the question, but the discussion makes progress either way.

  102. Tim Roberts says:

    @ Dikran,

    “I have to say though that for an academic to admit to having such a low standard of veracity in their published work is astonishing.”

    Therein lies the rub. Thanks for the excellent summary of Toll and his “musings”. His credentials as an academic critic are well below what one would expect for any serious discussion. Is he best ignored? Well I certainly think so – unfortunately the deniosphere hangs off every word of these self-proclaimed experts so he and others are not ignored.

  103. @dikran
    On the number of abstracts, I had seen a second-hand account — Miriam O’Brien paraphrasing an email by John Cook — and several third- and fourth-hand accounts. The second-hand account is consistent with the first-hand account published earlier this week, but does not really answer the full question: It reconciles the discrepancy between two numbers, but not the other discrepancies.

    The first-hand account is vague: No code, no dates, no lab notes.

    The explanation in Cook 2016 raises a new issue: Where the duplicates removed before being rated, or after? This matters in principle, as data may have been deleted. In also matters in practice, because we know that ratings drifted over time: There are statistically significant differences between early rating and late ratings.

  104. Sou says:

    @Dikran – oh yes. Over and over again. As ATTP said, both here, also at some length when he visited HW (and probably lots of other places too). Perhaps Richard has never used a database 😦

    The Evolution of a 97% Conspiracy Theory – The Case of the Abstract IDs (from March 2015 – excuse the plug, but that week he spent at HW, he was behaving in a most peculiar fashion, even for Richard.)
    http://blog.hotwhopper.com/2015/03/the-evolution-of-97-conspiracy-theory.html

  105. Richard,
    That does not change that you published something that is almost certainly not true and that you knew was probably not true before publishing it.

    Here’s something else to consider. Cook et al. (2013) made it clear that social science papers were ranked as “not climated rated”. This meant that they were removed and not rated. In WoS you can select what database to search. If you know that you’re only want physical/natural science papers (not social science papers) you would obviously not include the social science citation index in your search. Even today (and I’ve just done this) if you do an equivalent search to Cook et al. (2013) you get 12605 articles. Therefore, there cannot be 12876 abstracts, rather than 12465 abstracts, because there aren’t even this many abstracts in the database today; they simply do not exist (a bit like your missing 300). Was this basic thought process too complicated for you? You’re a Professor of Economics. You could work this one out, couldn’t you?

    Where the duplicates removed before being rated, or after? This matters in principle, as data may have been deleted. In also matters in practice, because we know that ratings drifted over time: There are statistically significant differences between early rating and late ratings.

    Jeepers, you’re persistent. This is not a compliment. Once again, all the ratings are public. You can go and check them yourself. You can do it again. I know you obviously don’t want to, but that doesn’t mean it isn’t possible.

  106. lerpo says:

    @Richard,

    Why should we take you seriously if you are playing such a game that you would publish something you knew was very likely not true or misleading?

  107. MarkR says:

    @ Richard Tol:

    “In also matters in practice, because we know that ratings drifted over time: There are statistically significant differences between early rating and late ratings.”

    There’s no drift in the consensus value which is the main result. Cook & Cowtan showed that the variation doesn’t affect the consensus result. When you made one of your mistakes and calculated 91% instead of the correct 97%, this came from the same thing. You also know that the consensus from each of the raters who did >500 abstracts spans “the high nineties” so inter-rater variation in consensus is small. You know that the abstract results from Cook13 give the ~same consensus as the ratings from the authors of the original papers. You know that deriving an adjustment from this correction and applying it to the full set of Cook13 abstracts gives the same consensus result too.

    Your comments helped people look into the data more. We found that raters have slightly different definitions for the borderline between, say “implicit acceptance” and “does not discuss cause” and between “implicit rejection” and “does not discuss cause”. This causes some difference in the rating values (e.g. rater A says “implicit endorsement”, B says “does not discuss cause”), but not in the consensus because the thresholds were applied symmetrically and there were fewer disagreements between, say “explicit endorsement” and “implicit rejection”. Rater differences mean there’s greater uncertainty in the _number_ of abstracts that take a position but according to self-ratings from paper authors, ALL of the Cook13 raters underestimated the number that take a position, although the reported consensus is the same (+/-1%).

    You’ve shown that through years slicing the data in every way you can think of, Cook13 shows a consensus “in the high nineties” in published climate science. You’ve failed to find anything that changes this result, and you were so keen to do so that you screwed up your maths and didn’t do the simplest checks on your numbers so published a wrong number. You’ve known it’s wrong for more than a year and you refuse to acknowledge and correct your mistake. This suggests that you’re keen on attacking Cook13 while disregarding the accuracy of your own criticisms. The fact you’ve failed to find anything that changes the Cook13 consensus result, even with disregard for the strength or accuracy of your own criticisms, suggests to me that the Cook13 result is very robust.

  108. Dikran Marsupial says:

    Richard Tol wrote “It reconciles the discrepancy between two numbers, “

    O.K. so why did your comment paper not contain the caveat that there was a reasonable explanation for the apparent discrepancy between the two numbers and that this had been pointed out to you (the fact it was second hand is irrelevant, you were aware of the explanation)?

  109. @dikran
    I gave Cook an opportunity to explain a discrepancy. It is unfortunate that the half-baked explanation raises a new question.

  110. Dikran Marsupial says:

    Richard Tol, that is irrelevant, why did you not give the caveat that there was a reasonable explanation for the apparent discrepancy?

  111. I gave Cook an opportunity to explain a discrepancy.

    Again, in what context is this even remotely relevant? It almost sounds like you’re in some kind of position to insist on things. I fail to see why you would think this.

    It is unfortunate that the half-baked explanation raises a new question.

    No, it does not. Maybe you can respond to this. It wasn’t hard to work out that there can’t have been 400 missing abstracts. Why did you struggle to do so?

    Let me stress something. You have now published a paper with a statement that is almost certainly not true and that you must have known was not true prior to publishing the paper. On what grounds do you think you’re in some kind of position to question other people’s papers? Does research integrity only apply to others?

  112. opluso says:

    Since you show Figure 1 from Cook (2016) in the body of your post could you address some of the questions/criticisms raised by Brandon Shollenberger? http://www.hi-izuru.org/wp_blog/2016/04/strangest-chart-ever-created/

    Although the graphic initially appears to make the authors’ points quite nicely, upon closer inspection it seems to be a hand-plotted contrivance. There might well be a better explanation but it was not readily apparent from reading either the paper or the supplmental material.

  113. Since you show Figure 1 from Cook (2016) in the body of your post could you address some of the questions/criticisms raised by Brandon Shollenberger?

    I really do have better things to do with my time. Like Richard Tol, BS is someone who seems to think he has some right to judge the integrity of others while ignoring lapses of his own.

  114. Dikran Marsupial says:

    ” Tell me, what scale do you think that’s on?”

    It is obviously a subjective ordinal scale, that is what “assigned qualitatively.” means. The purpose is to illustrate the fact that the degree of consensus grows as the expertise of the subjects increases, nothing more. ATTP was right not to waste his time.

  115. lerpo says:

    Richard,

    The discrepancy was explained but you failed to include it. Why should we take you seriously if you are playing such games?

  116. Willard says:

    > As such, the raters did not disclose any information about themselves and there is no reason to keep their identity under wraps.

    Let me get this straight, Richie: are you claiming that researchers are not bound to respect the confidentiality of their raters because the classification task is a lichurchur review?

    If you could find any classification task (diagnostic accuracy studies, clinical trials, epidemiological surveys, etc.) where the identity of raters is given, that’d be great. If you could find some where the response times are also given, that’d be greater.

    The greatest would be to reconcile your Kappa of “8%” with C&C’s – which Kappa flavour did you use, BTW?

  117. Eli Rabett says:

    Up above, Willard pointed to Richard’s contretemps with his then but soon to be former employer at ESRI. The entire episode is instructive if you are playing DickieBall. Richard, or more precisely one of his students developed an freakynomics (what econometrics is, an unconstrained fantasy into which you insert questionable data, or perhaps more precisely irrelevant and incomplete data, turn the crank and get the answer you and the Randys wish). Oh yes, there is no last step when you go back and check whether your answer and reality share a distant relationship. Finally, when somebunny raises a question about the validity of what you did, you attack them tooth and claw.

    Real Kzin stuff.

    Anyhow, the study asked the question of whether Irish people were better off on the dole or on the job, and the answers were, interesting. So interesting that questions were raised in the Dial and the newspapers and ESRI withdrew the paper

    Eli made a couple of points

    The Rabett looks at this figure and says, ok, at the right end are the 1% or 10% who have a lot of income when they are employed and not so much when they are unemployed or on the dole. That’s a difference in annual income of €30,000 or more, and on the left end the folks who don’t earn much more than they would get on the dole so the baseline difference is zilch. But, dear bunnies, here is the interesting thing. If you look at the red and purple lines, they pretty much run parallel to the blue one. The implication is that folks who are well off have the same reduction in costs when unemployed as the folks who are dirt poor. The difference is about €200 per week across the board.

    A huge problem with that is that if you look at direct income, over 30% of households in Ireland had direct income of less than €200, and even considering transfer payments, a bit less than 20% had weekly income of less than €200. This is from the 2004-5 Household Budget Survey that Crilly et al used). Are you telling Eli that they blew it all on work related expenses and did not spend a dime on rent (more about that later) or food cooked in the house??

    To figure out what it cost to work, Tol and friends looked at childcare, transport, take away food and clothing costs (dress for success). For example, in Table 15 they set the average expense for transportation if someone is working as €106.30 and €23.93 if not working. Now some, not Eli to be sure, are on the floor laughing their asses off. If a bunny is well off, has a car and commutes from the suburbs well, that €106.30 is reasonable. If not, if your weekly income is €106.30 you are not spending it all on a car (well maybe if you are a 20 year old, that fags and Guiness). It’s Bill Gates walking in to the room

    Echt Tol

  118. Willard says:

    You may also like this exchange between BartV and Richie:

    There’s even a datapoint on Tol’s graph with a level of scientific consensus less than 10%! Something’s up here..

    https://storify.com/BVerheggen/a-discussion-with-richard-tol-about-his-alternativ

    Warning: may contain Gremlins.

  119. It makes me feel all warm and fuzzy to think that we’ve taken this opportunity to expand what started as a discussion about Richard’s most recent paper to include some of his earlier work. As a researcher myself, it’s always pleasing when people take some time to consider a broader range of my research portfolio.

  120. izen says:

    All the consensus papers have been attacked in turn by someone with some academic credentials.
    It is clear that the contrarians are unable to find any alternative to accepting the >90% consensus but are extremely keen to have some credible critique of it, if only an accusation that the methodology was not ‘perfect’ even if the divergence from ‘accurate and good’ is insignificant to the point of non-existent.

    The consensus on mainstream climate science is so asymmetrical that quibbles about trivia is all that is possible.

    Attacks on the consensus measurements have been made by a number of contrarian academics. It is understandable that an academic who is unlikely to achieve status and any level of name recognition without playing the role of contrarian would be tempted to do so.
    But while it may raise the profile of the academic in the short term, and all that admiration from the WUWTers is ego-boosting, it is a Faustian pact.

    Eventually the very academic credibility that made the person so popular with the rejectionists and think tanks is used up by that involvement in the controversy. Those who delight in any critique of consensus studies poison the well. Fred Seitz, Fred Singer, Soon, Lindzen, Spencer, there is a long list of people who were feted by Institutes and Foundations, as well as the general population of rejectionists, who lost, or are losing their academic credibility because of their apparent willingness to provide spurious and irrelevant comments so the rejectionists can at least claim that any study is questionable.
    It is the tactic of selling ‘DOUBT’.

    While playing the Contrarian and providing fodder for the GWPF and their fellow travellers may be advantageous to some degree, it eventually destroys the very aspect of academic credibility that is being exploited by those groups. As BDD has noted before, Tol is well on the way of trading any academic integrity for respect from the WUWT crowd.
    It is an asymmetrical exchange.

  121. Dikran Marsupial says:

    “if only an accusation that the methodology was not ‘perfect’ even if the divergence from ‘accurate and good’ is insignificant to the point of non-existent.”

    indeed, the essence of the I in FLICC (Fake experts, Logical fallacies, Impossible standards, Cherry picking and Conspiracy theories)

  122. John Hartz says:

    ATTP: Now that this thread is winding down, it is time to look ahead to future OPs. Perhaps the subject matter of the following article would make a good topic to explore — not to say that there are any game players amongst us. 🙂

    Can game theory help solve the problem of climate change? by James Dyke, The Guardian, Apr 13, 2016

  123. Willard says:

    Richie’s portfolio is more persistent than diversified:

    I think what [Richie] has done here is a fantastic example of how persistence can eventually pay off. If you have some kind of agenda, or a message you’d like to promote, just be persistent; eventually you will succeed in getting it out there. It doesn’t really matter if what you’re saying is strictly correct, or not. It doesn’t really matter if what you’re saying is balanced and objective, or not. It doesn’t really matter if what you’re arguing against is something you’ve already accepted as being true. Just keep going. Eventually you will succeed.

    https://andthentheresphysics.wordpress.com/2015/03/26/persistence/

  124. John Mashey says:

    AGU Fall neetings attract 20,000+ scientists, and the last one had 15,000 posters spread over 10 sessions. There us a very low bar for poster acceptance… but of course there is no guarantee that people will pay attention.

    Thousands of posters relate to climate. One could do a survey by checking those in the relevant sections. In recent years, each year, of the thousands i at least scan, i gave seen (1) each year from people who reject the mainstream consensus. Watts had one ladt year, which proves that anybody can present. Unlike most blog scientists, at least he showed up… although his poster didn’t seem swarmed by questioners 🙂

  125. Magma says:

    @ izen: Attacks can be a simple matter of generating fake dissent.

    Carbonaceous Think Tank: Quick, find something wrong with this paper!
    Fake expert #1: I don’t like the labeling of this graph. It’s too small.
    Fake expert #2: They didn’t reference my paper on the 1273.78 year solar cycle.

    Carbonaceous Think Tank: New Research Claims Dismissed by Experts

  126. JCH says:

    In the last couple of weeks they posted on Google Scholar a large number of Abstracts from the AGU meeting. They’re probably presentations. I’ve been browsing through them. I know you have all heard this before, but 9 out of 10 drive a wooden stake deep into the heart of AGW.

  127. John Hartz says:

    The title of this article states the topic of the OP quite forcefully…

    For the 97 billionth time: Yes, there is a 97 percent consensus on climate change by Suzanne Jacobs, Grist, Apr 13, 2016

  128. John Hartz says:

    JCH: Are you attempting to wrest the Court Jester role away from TurbulentEddy?

  129. The Very Reverend Jebediah Hypotenuse says:

    Willard:

    Let me get this straight, Richie: are you claiming that researchers are not bound to respect the confidentiality of their raters because the classification task is a lichurchur review?

    Are you now, or have you ever been, a rater for Cook13?

    ATTP:

    It makes me feel all warm and fuzzy to think that we’ve taken this opportunity to expand what started as a discussion about Richard’s most recent paper to include some of his earlier work. As a researcher myself, it’s always pleasing when people take some time to consider a broader range of my research portfolio.

    Richard Tol Says:
    May 1st, 2010 at 6:07 am

    Bjorn Lomborg is a not a scholar. Scholars publish their research in peer-reviewed journals. Lomborg has one such paper.

    Lomborg writes books with popular science. In popular science, there is a trade-off between accuracy and sales. Lomborg sells well. In fact, his first book did so well that he can now afford to be more accurate.

    Lomborg successfully punches holes in climate hysteria. As panic is a bad adviser, Lomborg plays a useful role in the debate on climate policy. Lomborg provides counterbalance. He is therefore not balanced.

    http://www.irisheconomy.ie/index.php/2010/04/30/economics-voodoo-and-climate-policy/

    While Lomborg may be subject to the ‘no true scholar’ argument, and not balanced, both Lomborg and Tol are playing a useful role in the debate on climate policy.

    Tol’s scholarly corpus is the climate policy equivalent of the large reassuring print on the cover of the Guide.

    Someday, he may be able to afford to be more accurate.

  130. opluso says:

    DM:

    It is obviously a subjective ordinal scale, that is what “assigned qualitatively.” means. The purpose is to illustrate the fact that the degree of consensus grows as the expertise of the subjects increases, nothing more. ATTP was right not to waste his time.

    Yes, it is a subjective interpretation designed to demonstrate an objective fact. Yet it appears to have flaws or at least questionable data plotting. Clarification would be appreciated but perhaps that is not possible.

    The basic point (that a large majority of knowledgeable experts agrees that human activities contribute to warming) is trivially true. However, little support for any specific climate policy can be derived from the existence of that general consensus.

    IMO, the excessive defense of an asserted “97% consensus” seems like the real waste of time and talent.

  131. Clarification would be appreciated but perhaps that is not possible.

    It might be possible, but all the information is public, so it’s easy enough to produce something similar using your own qualitative reasoning.

    IMO, the excessive defense of an asserted “97% consensus” seems like the real waste of time and talent.

    Possibly, but it’s hard to avoid doing when people keep critiquing it in silly ways. We appear to agree that it is probably about right, and yet that doesn’t stop people from trying to criticise studies that attempt to show the level of consensus.

  132. Willard says:

    > Yet it appears to have flaws or at least questionable data plotting.

    Thankfully, raising concerns through drive-bys would never waste any time and talent.

    ***

    Of course not, Reverend.

  133. Dikran Marsupial says:

    @opluso “Yet it appears to have flaws or at least questionable data plotting. ”

    Not really, if you take a continuous ordinal scale and then discretize it into broad categories and then change the ordering within the categories it isn’t surprising that you can reorder them in different ways that tell a different story. However that is because you have thrown away the information that is in the orderings within the categories. Saying that the studies are represented from ordinal values from 1 to 5 doesn’t mean that the values are necessarily integers. An ordinal scale just means that it is a scale that preserved rankings, but the numeric difference between ranks is meaningless, this doesn’t imply categorisation.

    It seems to me that the blog article in question is jumping to conclusions based on a somewhat naive view of what an ordinal scale is.

  134. izen says:

    A number count of papers selected by some arbitrary criteria seems rather simplistic. I gather from those in Academe that you live and die by published papers and citations. Which gave rise to the apocryphal story of the science researcher asked by the evangelical Christian if he had heard of Jesus Christ? To which the reply was, “where did he publish and how many citations has he got?”

    A web or network of climate papers and their citations might render a more complete picture of the important science, and how it has progressed over time. Which aspects are solid settled structures of knowledge buttressed by a deep web of interconnections, and which aspects are still uncertain, or developing.
    Perhaps this has been done, if anyone knows of such work a link would be nice…

    None of this prescribes a policy solution. The best it can do is exclude inaction as a legitimate option.
    Game theory as a means of determining best policy may not be viable. As far as I am aware it breaks down when irrational sentience is involved. The prisoners dilemma and Tragedy of the commons do not have neat Game theoretical solutions.

  135. Willard says:

    > A web or network of climate papers and their citations might render a more complete picture of the important science, and how it has progressed over time.

    Indeed. Yet to show how the level of agreement on AGW increases with the number of publications in climate science carries useful information. That a researcher adds papers to his portfolio in a field seems to increase the odds that the researcher has “contributing expertise” in that field, if we may follow Harry Collins on this. Even if it doesn’t indicate much more, that’s far from being irrelevant.

    Of course there are lots of things one shouldn’t conclude from that relationship. For instance, it doesn’t imply that someone who hasn’t ever contributed has no expertise. Even non-born researchers may one day have some expertise

    Citation counts, a popular alternative, has its share of problems. It may tell about the expertise the members of a community attributes to a researcher. It forgets however that there are many reasons to cite a paper, and not all are authoritative.

  136. izen says:

    @-opluso
    “The basic point (that a large majority of knowledgeable experts agrees that human activities contribute to warming) is trivially true….
    IMO, the excessive defense of an asserted “97% consensus” seems like the real waste of time and talent.”

    The defence is reactive, and largely a US directed phenomena.
    The public perception of the scientific consensus in Nations other than the US is much closer to reality. And much less politically polarised. There is not the level of doubt, or miss-perception of the mainstream science outside the American influenced culture, it really is an outlier in global opinion.

    http://www.pewglobal.org/2015/11/05/global-concern-about-climate-change-broad-support-for-limiting-emissions/

  137. izen says:

    @-“Citation counts, a popular alternative, has its share of problems. It may tell about the expertise the members of a community attributes to a researcher. It forgets however that there are many reasons to cite a paper, and not all are authoritative.”

    That a bad paper may get cited by well established experts in refutation is information about the development of the science. I was not envisioning as simplistic a web, or weighting system as simple cite counts. The value of the cite in relation to origin and target and its context in the history and future of the research field are all relevant.

    I wonder if a future Tol will be happy to include in his bibliography of published work a reference to his ERL comment?

  138. Peter Jacobs says:

    “A web or network of climate papers and their citations might render a more complete picture of the important science, and how it has progressed over time.”

    Shwed and Bearman, 2010
    doi: 10.1177/0003122410388488

  139. John Hartz says:

    To further Izen’s point about the state of public understanding of climate science in the US, from the Grist article I cited above…

    John Cook, the lead author on the new study and a fellow at the Global Change Institute at The University of Queensland, wrote about this danger today in the Bulletin of the Atomic Scientists. He said that years of misinformation and doubt from conservatives have seriously skewed the public’s understanding of where scientists stand on climate change. Just last year, he noted, a survey revealed that a mere 12 percent of Americans knew that the consensus was above 90 percent. [My bold.]

  140. John Hartz says:

    Izen:

    A web or network of climate papers and their citations might render a more complete picture of the important science, and how it has progressed over time. Which aspects are solid settled structures of knowledge buttressed by a deep web of interconnections, and which aspects are still uncertain, or developing.

    How does what you envision differ from what the IPCC has produced in its assessment reports beginning with the first one in 1990?

  141. Ask not for whom the gremlins toil, they toil for RT.

    If, after more than a year, RT still does not understand the autoincrement function of a database and still refuses to listen to those that do, then he’s either just a dumbass or very good at playing one in a browser near you. That level of Homer Simpson stupidity doesn’t deserve anything other than ridicule. D’oh!

  142. Pingback: Three years! | …and Then There's Physics

  143. izen says:

    @-Peter Jacobs
    “Shwed and Bearman, 2010
    doi: 10.1177/0003122410388488”

    Many thanks.
    That is very much what I was thinking of. The scaling network modularity against literature size to avoid measuring benign contention is especially neat. As is the dynamic temporal window.
    The identification of consensus for tobacco, cellphones, vaccination and AGW look credible.

    This link got the full article unpaywalled.
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163460/

  144. Dikran Marsupial says:

    Having looked into the SI, I suspect the orderings within the categories may* simply be random. Once thing is for certain though is that they haven’t been cherry picked to maximise the support for the argument, but that is precisely what Shollenberger did:

    Note the orderings within each group are as anti-correlated with expertise as they possibly could be. Had the original diagram given an ordering that maximised the correlation between expertise and consensus, then Shollenberger may have had a valid point, but clearly they didn’t and it was him that picked the cherries to maximise the support for his claim the diagram was misleading.

    * or alternative the authors may have used their experience to rank them within categories according to their subjective views on the expertise of the groups.

  145. Dikran Marsupial says:

    I have to say that actually Schollenberger’s version of the diagram still suggests that consensus broadly increases with expertise with one study (I suspect the survey of meterorologists) as a potential outlier.

  146. Dikran,
    I agree. It seems that the data shows a general increase with expertise, which is really all the paper was trying to illustrate.

  147. Dikran Marsupial says:

    Well it looks like Richard is not going to answer the question ” why did your comment paper not contain the caveat that there was a reasonable explanation for the apparent discrepancy between the two numbers and that this had been pointed out to you … ? so I thought I would explain why he he really needs to consider why he can’t/won’t answer it. In his comment paper he writes:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    This is a clear factual error. The highest abstract ID being 12876 does not show that there were at any point 12876 abstracts. It doesn’t even show that 12876 abstract IDs were even assigned (although it is reasonable to assume that it does). Richard has made a claim here that is unequivocally wrong. His mistake has been explained to him, so even if he doesn’t accept the explanation, he is still not entitled to state (as a matter of fact) that the SI shows that there were 12876 abstracts. He should at the very least have given a caveat regarding the possible explanation.

    It is pure hypocrisy on Richard’s part to assert that Cook et al should contain a caveat about issues what might potentially have a (small) effect on the outcome of the study (but there is no evidence provided that it did), but for him not to give a caveat when he makes an assertion of fact that he knows is, at best, questionable.

    In future I would recommend that anybody conducting a similar survey use randomly generated, rather than sequential, IDs, if only to avoid this kind of misunderstanding from people who don’t understand the prupose of unique ID in a database and the hubris not to question their understanding (or indeed to listen to those who do understand).

  148. Willard says:

    There’s this interesting comment at Brandon’s:

    How do Cook et al. interpret their sample (“11944 climate abstracts from 1991–2011 matching the topics ‘global climate change’ or ‘global warming’” (from the abstract of Cook et al. 2013)) as showing that the surveyed papers demonstrated an expertise level of 5? (I see nothing in Cook et al 2013 that suggests they selected papers or authors by ‘expertise’).

    It looks like circular reasoning to me: assume “expert climate scientists” support the consensus, therefore high levels of consensus must signify expertise…

    http://www.hi-izuru.org/wp_blog/2016/04/strangest-chart-ever-created/#comment-9666

    And then Brandon doubles down instead of reading the article to Ruth.

    You just can’t make this up.

    Once upon a time, being an Oxonian meant something.

  149. John Mashey says: “Thousands of posters relate to climate. One could do a survey by checking those in the relevant sections.

    We could make another consensus study of AGU and EGU abstracts submitted to the climate divisions.

    And we could hold surveys and ask scientists at the climate sessions at EGU and AGU.

    The more consensus studies, the more news that explains the population that there is a clear consensus about the basics of climate change, that the mitigation skeptics are lying when they claim there is none, that the media is actively giving the wrong impression with their false balance aka bias “journalism”.

  150. John Hartz says:

    Why are humans reluctant to wholeheartedly embrace the body of scientific evidence about manmade climate change?

    Human psychology influences the decisions we make every day, including unwise ones. Our psychological profile can make us reluctant to pay for services that benefit everyone, including those who don’t contribute. It makes us focus on achieving short-term gains and avoiding short-term losses. And, most importantly, it prompts us to engage in rationalization and denial rather than tackle difficult challenges.

    Scientists suggest appealing to human psychology to create solutions to climate change by Rosemary Mena-Worth, Stanford News, Apr 13, 2016

  151. opluso says:

    We appear to agree that it is probably about right, and yet that doesn’t stop people from trying to criticise studies that attempt to show the level of consensus.

    There is nothing wrong with presenting a subjective analysis. There is, however, something wrong with presenting a subjective analysis as if it had precisely measured an objective fact. At least there is if you are a scientist.

    From the beginning, Cook (2013) was treated as a political document with the intent of being used to promote a 97% meme. Not “over 90%” but exactly 97%. It continues to be defended as if it were a political document since the methodology is, depending upon your political perspective, either fatally flawed or so nearly perfect as to merit no further argument.

  152. Willard says:

    > There is, however, something wrong with presenting a subjective analysis as if it had precisely measured an objective fact.

    Nic tried to pull that one recently. I don’t have time to find where, but I’ll search later. Let Richie counter it:

    As soon as you summarize, you pass judgement.

    https://andthentheresphysics.wordpress.com/2016/04/06/the-value-of-social-science/#comment-75659

    Thank you for peddling “but C13” using the fact/value dichotomy.

  153. opluso says:

    Willard:

    Since you said nothing to contradict my comment, I subjectively conclude that you objectively agree with 97% of it.

    Hey, this stuff is easy!

  154. Dikran Marsupial says:

    “There is, however, something wrong with presenting a subjective analysis as if it had precisely measured an objective fact”

    Not that anybody did any such thing. The subjectivity of the diagram was explained sufficiently clearly in the paper.

    “Hey, this stuff is easy!”

    rhetoric is easy, rational discussion is more difficult, if you just want to wisecrack, I am not that keen on indulging you by answering your questions any further.

  155. Willard says:

    > Since you said nothing to contradict my comment […]

    As one of your fellow Denizens used to say, dear opluso, read harder.

    Nice bait, though.

  156. opluso,
    What would happen if one of the authors of a consensus study tried to contact Barack Obama to point out that their papers didn’t actually show that 97% agreed that it was dangerous?

    1. Barack Obama would immediately issue a correction and apologise for getting it wrong.

    2. They would not even get through to anyone who could even point this out to Barack Obama.

    3. Even if Barack Obama was told that he hadn’t represented the studies correctly would still not bother issuing any kind of correction.

  157. Willard says:

    More importantly, what would happen with Judy’s meme presented as fact (“Obama tweeted”) if one read that the Twitter account is run by Organizing for Action staff?

    Contrarians might have a better chance with testifying that they don’t feel lonely.

  158. Hyperactive Hydrologist says:

    if the consensus 90% or 97% it doesn’t change the conclusion that the overwhelming majority of experts agree that the climate is warming and anthropogenic activities (specifically the use of fossil fuels) is the primary cause.

    So can someone explain to me the point of this discussion?

  159. JCH says:

    So can someone explain to me the point of this discussion?

    “The consensus is of course in the high nineties. No one ever said it was not.”

    As far as I can tell, they’re trying to establish that 90% in is in the high nineties and that Tol, yet again, is a “never wrong’ machine.

  160. snarkrates says:

    JCH, No. Tol is a “not even wrong” machine.

  161. John Hartz says:

    JCH: Congratulations! You are now the official Court Jester of this thread.

  162. Willard says:

    Chill, please.

  163. opluso says:

    DM:

    rhetoric is easy, rational discussion is more difficult, if you just want to wisecrack, I am not that keen on indulging you by answering your questions any further.

    indeed, the essence of the I in FLICC (Fake experts, Logical fallacies, Impossible standards, Cherry picking and Conspiracy theories)

    I suppose name-calling is better than Willardesque wise-cracking. Point taken.

    To your credit, you’ve come closer than the others in responding to Shollenberger’s criticisms. My general question was whether Shollenberger had identified any valid criticisms (obviously, my reading was that he had).

    Your first response was to quote (unfavorably) Shollenberger’s rhetorical question about the scale of the x-axis. His point was that it did not have a set scale because it was subjectively derived by assigning unscaled values of 1 to 5 with no “4s” identified. Which raises the question why didn’t they just use a 4-point scale? Apparently that gap meant something to the authors, but it is left unclear to the rest of us (and invisible on the graph).

    At your comment https://andthentheresphysics.wordpress.com/2016/04/13/consensus-on-consensus/#comment-76278 you noted that:

    Having looked into the SI, I suspect the orderings within the categories may* simply be random.

    * or alternative the authors may have used their experience to rank them within categories according to their subjective views on the expertise of the groups.

    In other words, I was not the only one left wondering exactly what the authors did to produce their graphic.

    aTTP:
    The Abstract for Cook (2016) mentions 97% three times while asserting that it is “robust” and “consistent” with other studies. Repeating the 97% meme is part of the climate communications strategy for creating a political consensus. Whether it is ultimately successful or not, I will agree that the political science is more important to policies than the climate science.

  164. Eli Rabett says:

    opluo:

    All approximations are wrong, some yield nearly correct conclusions
    All surveys are really wrong, some produce nearly correct approximations

  165. opluso,
    As far as I can tell based on my own experience, reading the literature, and talking to others, there is little disagreement amongst those who I think understand this, that 97% is a fair reflection of the level of consensus. If someone would like to prove this wrong, they’re free to do so. Continually complaining about secondary aspects of this study is rather unconvincing. That some people may have used this result in ways that are not correct is not an argument against the result being a fair reflection of the level of consensus.

  166. > All surveys are really wrong, some produce nearly correct approximations.

    Yet opluso claims that referring to the 97% figure implies “exactly 97%.”

    ***

    > My general question was whether [Brandon] had identified any valid criticisms (obviously, my reading was that he had).

    opluso’s peddling does not make any explicit endorsement of Brandon’s “questions/criticisms”:

    Since you show Figure 1 from Cook (2016) in the body of your post could you address some of the questions/criticisms raised […]

    https://andthentheresphysics.wordpress.com/2016/04/13/consensus-on-consensus/#comment-76198

    The “questions” part is now gone too.

    ***

    opluso’s comment where he gives credit to answer questions fails to respond to AT’s question.

    Since opluso commented on a blog post about Richie’s latest hit piece, perhaps he could address some of the questions/criticisms AT raised in his post.

    Hey, this stuff is easy!

    Wait, is that wise-cracking?

  167. John Hartz says:

    Another well written explanation of the new “Consensus on Consensus” paper,,,

    Research shows — yet again — that there’s no scientific debate about climate change by Chelsea Harvey, Energy & Environment, Washington Post, Apr 15, 2016

  168. John Hartz says:

    It doesn’t matter much to the Earth’s climate system what exact percentage the scientific consensus is, The gloabal climate system just continues to accumulate heat as the consensus science says it will. For example…

    The global temperature in March has shattered a century-long record and by the greatest margin yet seen for any month.

    February was far above the long-term average globally, driven largely by climate change, and was described by scientists as a “shocker” and signalling “a kind of climate emergency”. But data released by the Japan Meteorological Agency (JMA) shows that March was even hotter.

    Compared with the 20th-century average, March was 1.07C hotter across the globe, according to the JMA figures, while February was 1.04C higher. The JMA measurements go back to 1891 and show that every one of the past 11 months has been the hottest ever recorded for that month.

    March temperature smashes 100-year global record by Damian Carrington, Guardian, Apr 15, 2016

  169. opluso says:

    Willard:

    opluso’s comment where he gives credit to answer questions fails to respond to AT’s question.

    I did respond to his question(s). Read harder™

    Here’s a cheat sheet: aTTP asked a two-part question (attempt to contact; inaccurate interpretation of 97%) and provided three options. I chose #3.

    Perhaps the preface to my agreement threw you off the scent. Sniff harder.

  170. If you choose 3, why would you highlight Obama’s tweet?

  171. > I chose #3.

    That choice should somehow be implied by this:

    The Abstract for Cook (2016) mentions 97% three times while asserting that it is “robust” and “consistent” with other studies. Repeating the 97% meme is part of the climate communications strategy for creating a political consensus. Whether it is ultimately successful or not, I will agree that the political science is more important to policies than the climate science.

    https://andthentheresphysics.wordpress.com/2016/04/13/consensus-on-consensus/#comment-76305

    Somehow.

    AT’s question reads like one way to get opluso on track of the topic of this blog post: the fact that Richie kept repeating a claim that is clearly incorrect. A fact that can be checked by beer-mat knowledge of how databases work. Clearly not something that depends on some kind of “political perspective.”

    Perhaps AT should have said “should” instead of an UK-based “would.”

  172. Joshua says:

    opluso –

    I’m curious as to your opinion as to whether the paper was “misleading.”

    If you do think it is “misleading,” do you think that prevalence of agreement with the “consensus” position on climate change (which I interpret to be: More than 1/2 of recent warming, as well as other effects such as SLR, are extremely likely attributable to aCO2 emissions) is positively correlated with expertise (as measured by publications/citations in the field of climate science)?

    If so, do you think that the paper failed to make that case – and that is why the paper is misleading, because it makes the assertion without making a supporting argumert? Or do you think that the paper failed to accurately quantify the magnitude of the correlation – and that is why the paper is misleading because it leads the reader to think that it had?

    As for this:

    ==> IMO, the excessive defense of an asserted “97% consensus” seems like the real waste of time and talent.

    What do you think about the amount of energy spent on the other side of the “97% consensus” issue? Do you think that it is excessive? Do you think that it seems like a real waste of time and talent?

  173. opluso says:

    aTTP:

    I highlighted Obama’s tweet because it repeated the politically useful 97% meme which is, to this day, reinforced (by the various authors and others) even as they admit it’s actually just a ballpark figure. As I’ve said before, the consensus on human activities causing warming is trivially true. It is the effort to go beyond that in an effort to drive political/economic policies that I have problems with.

    Willard:
    The actual answer was #4 but for some reason it didn’t show up on the chart.

    Joshua:

    I’m curious as to your opinion as to whether the paper was “misleading.”

    If the paper itself was not intentionally misleading the follow-up promotional effort has been (I know, blame the press). The Cook paper was a communication strategy, not a research revelation.

    …positively correlated with expertise…?

    Taking the original Cook statement of consensus (…scientific consensus that human activity is very likely causing most of the current GW…) I would expect there to be a positive correlation between agreement and knowledge of the subject. I would also expect a positive correlation between repetitive exposure to a meme and believing it.

    What do you think about the amount of energy spent on the other side of the “97% consensus” issue?

    It’s political turtles all the way down.

  174. John Hartz says:

    In a nutshell…

    “What this latest study shows, definitively and authoritatively, is that the scientific consensus behind human-caused climate change is overwhelming,” said Michael Mann, director of the Earth System Science Center at Penn State University. “It is time to end the fake debate about whether or not climate change is real, human-caused, and a threat, and get on to the worthy debate about what to do about it.”

    Consensus Affirmed: Virtually All Climate Scientists Agree Warming Is Manmade by Phil McKenna, InsideClimate News, Apr 14, 2016

  175. @opluso
    The graph in Cook 2016 is indeed very strange. Cook 2013 is ranked at the highest level of expertise. Yet, papers written by geoscientists in geoscience journals about geoscience topics make up only some 20% of Cook’s data. The remaining 80% do not have any demonstrable expertise in the question whether climate change is real and human-made. In other word, Cook’s data by and large reflect endorsement of an hypothesis outside the authors’ expertise.

    Furthermore, the consensus rate among Cook’s experts is only 93%, while among Cook’s non-experts the consensus is 99%.

  176. Richard,
    You appear to be once again confused about “consensus” and “attribution”. Care to address this issue?

  177. Marco says:

    “I would also expect a positive correlation between repetitive exposure to a meme and believing it.”

    In which case you should welcome the work by Cook et al, as it is helping destroying the meme that it is all-so-uncertain-and-climate-scientists-disagree.

    But considering your comments here and elsewhere, I think it is exactly the opposite: you do *not* welcome the work of Cook et al, because it attacks the meme you prefer, which contradicts the facts.

  178. opluso,

    I highlighted Obama’s tweet because it repeated the politically useful 97% meme which is, to this day, reinforced (by the various authors and others) even as they admit it’s actually just a ballpark figure.

    Actually, I’ve just looked at Obama’s tweet again, and it’s fine. I thought he’d included “dangerous” but he hasn’t in the one you’ve shown. So, nothing really wrong with it. As Marco points out, repeating something that is largely true so as to destroy a meme that is not, so be welcomed by all who would rather we were properly informed than mis-informed.

  179. John Mashey says:

    VV: re AGU & EGU … mostly, I’m just trying to get a few people dubious of the consensus to actually attend a meeting and talk to real scientists, something that many apparently rarely, if ever do.

    Next AGU is still in San Francisco, but then construction at Moscone, so New Orleans in 2017 and Washington in 2018, then back to SF.
    EGU = Vienna.

  180. Joshua says:

    opluso –

    I appreciate the response, but…

    I’m curious as to your opinion as to whether the paper was “misleading.”

    If the paper itself was not intentionally misleading the follow-up promotional effort has been (I know, blame the press). The Cook paper was a communication strategy, not a research revelation.

    Just to make sure…I was asking about the paper that is the subject of this OP, right?

    Anyway, as near as I can tell you didn’t answer my first question – which is kinda the most important one. If I ask you whether you think the paper was misleading (yes/no) and you answer “if….” I consider that non-responsive to the question.

    ==> I would also expect a positive correlation between repetitive exposure to a meme and believing it.

    As near as I can tell, you basically didn’t answer my questions and instead used my questions as a lever to me other points related to your feelings about the context of paper.

    Could we try again?

  181. Joshua says:

    If I pray hard enough, will god magically fix my html tags (by turning off italics after the word “revelation”)? I tried something with the tags that didn’t work.

  182. I don’t think god will do it magically, but I can do it be editing your comment 😉

  183. Dikran Marsupial says:

    opluso “I suppose name-calling”

    I didn’t call anybody names, FLICC is an acronym that describes behaviour, not individuals.

    ” His point was that it did not have a set scale because it was subjectively derived by assigning unscaled values of 1 to 5 with no “4s” identified.”

    yes, that was explained in the paper.

    “Which raises the question why didn’t they just use a 4-point scale?”

    Because they had thought about this prior to assigning the studies to the line and found when they came to apply their preset criterion that there were no fours? So what? Note that it doesn’t matter anyway, because the amount of the x-axis taken up by the fours is zero. This is reasonable because distances on an ordinal scale are essentially meaningless, only the ranking is meaningful.

    This is just searching for things to complain about.

    “In other words, I was not the only one left wondering exactly what the authors did to produce their graphic. ”

    No, I was not left wondering, because the ordering within the categories isn’t very important. When I look at a plot I generally try and find the meaning that makes the most sense and is most coherent. There comes a point where we as readers need to apply (i) common sense and (ii) the golden rule (how would we want others to interpret our plots?).

    I notice you don’t comment on the fact that Brandon S did cherry pick the ordering. Perhaps you ought to ask yourself why you are not critical of that?

  184. verytallguy says:

    If the paper itself was not intentionally misleading the follow-up promotional effort has been

    Funny this.

    There certainly has been an attempt to mislead following the paper. By those who fear the truth it reveals.

    This is a general feature of climate “sceptics”. Strongest reactions aren’t against the weakest research, but rather the most revealing.

    Lewandowsky is the poster child, but Cook comes close.

  185. verytallguy says:

    Hmm. I’m not a praying type, but perhaps the great html fixer in the sky can come to my rescue too?

  186. Dikran,
    Brandon S is currently unhappy with me because the comment I made one his post (why, I hear you ask; there is no good answer to this) clarifying something very simple, wasn’t done in a suitably deferential manner. Given that people are questioning this figure, maybe I’ll just repeat what you’ve pointed out.

    The x-axis is an ordinal scale with the values representing the level of expertise. These were assigned subjectively, but the details are in Table S1 in the Supplementary Information. Others can, of course, choose to disagree with the assignments, but the details are available. On the graph itself, the width of each category is simply set by the number of samples in each category, and the ordering of the samples in each category has no specific significance. It would also make little difference if they were ordered differently, as the point is just to illustrate that the level of consensus increases with expertise.

    Now that that is cleared up, I’m sure we can all now move on. Okay, I’m only joking 🙂

  187. Dikran Marsupial says:

    “It would also make little difference if they were ordered differently”

    precisely, the only circumstance in which it might matter would be if the ordering were cherry picked to maximise the support for the argument being made… which is precisely what BS did, but Cook et al. did not – in their case the ordering is random (“… occurring without definite aim, reason, or pattern:”).

  188. Vinny Burgoo says:

    Dikran: ‘I notice you don’t comment on the fact that Brandon S did cherry pick the ordering. Perhaps you ought to ask yourself why you are not critical of that?’

    Brandon S chose the most unfavourable ordering to make a point. He was quite open about that. What’s the problem?

    (Irrelevant anaesthesiology fact of the day: if you have an operation in Michigan, the gases needed to keep you under have a global warming effect equal, on average, to emitting 22 kg of CO2. Worldwide, anaesthesia has an effect equivalent to 1 million cars. Say no to anaesthesia and save the planet!)

  189. Dikran Marsupial says:

    “Brandon S chose the most unfavourable ordering to make a point. He was quite open about that. What’s the problem?”

    He didn’t demonstrate the most favourable order, and point out that the one used in the paper would be in the middle somewhere (i.e. it was a reasonable summary of the data).

    My main problem with it is that it is just uncharitable pedantry, i.e. looking for something to complain about and then making a mountain out of a molehill. That is not how science should be done IMHO.

  190. Dikran writes of “…uncharitable pedantry, i.e. looking for something to complain about and then making a mountain out of a molehill.”

    Vinnie writes: “Worldwide, anaesthesia has an effect equivalent to 1 million cars. Say no to anaesthesia and save the planet!)”

    There are an estimated 1.2 billion cars in the world. Using Vinnie’s anaesthesia equvalent we arrive at 0.083%. Mountains out of molehills?

  191. Dikran Marsupial says:

    I gather it is also coal that is the real long term problem, rather than oil.

  192. John Hartz says:

    ATTP: Help! I’m running out of popcorn. 🙂

  193. Joshua says:

    ==> (why, I hear you ask; there is no good answer to this)

    That was certainly the question I asked myself when I saw that you had commented over there.

    Actually, I think the phrase : “There’s somebody wrong about something on the Internet.” is something of an answer. That little phrase captures a HUGE % of the dynamic that takes places in practically all blog comments, IMO.

    Of course, in some ways that only captures the surface dynamic, and doesn’t much explain the more underlying mechanics. What really explains the irrationality of why people (including myself) keep doing things (in so many different domains) over and over when it produce the same results, with no apparent benefit?

  194. Joshua,
    What I thought was that if I simply presented information that would lead them to where the 1381 number came from that they could at least clear up that one issue. A small step towards clarifying various issues. Of course, in doing it the way I did, I was accused of being rude, and the standard exchange ensued. It’s is quite remarkable how there are certain situations in which it appears entirely impossible to avoid such an exchange.

  195. Vinny Burgoo says:

    oneillsinwisconsin, going without anaesthesia does seem like a disproportionate response but some people like to make gestures like that.

    Take electric cars, for example. Based on a very rough calc, I reckon 22 kg CO2 is about what I’d save every 2 months if I traded my petrol-fuelled old banger for an EV. Worth the expense and nuisance? No, but it’d make me popular with the aristocracy hereabouts. They are all very big on EVs, solar panels and the rest. (Do I want to be popular with the local aristocracy? Er, no.)

  196. John Hartz says:

    The kerfuffle over Cook(13) and Cook(16) occurring on this thread and throughout the blogosphere should not distract us from what is happening in real world. For example,

    On Friday, NASA released the latest temperature data for the globe, showing that March of 2016 was the hottest March on record since reliable measurements began in 1880. The month was 1.28 degrees Celsius, or 2.3 degrees Fahrenheit, higher than the average temperature in March from 1951 through 1980, with particularly scorching temperatures in the Arctic (as has been the case throughout this year so far).

    Scorching March temperatures set a global record — for the third straight month this year by Chris Mooney, Energy & Environment, Washington Post, Apr 15, 2016

  197. Joshua says:

    Anders –

    I have noticed that Brandon, unlike many of the participants in the climate blog wars, criticizes people across the more typical great climate divide.

    That said, I can’t see any way that YOU (let alone anyone else) simply presenting information to HIM would “clear up” anything with him about a conclusion that he’s drawn, (something which would be obvious if you’ve observed him interacting in blog comment exchanges basically ever).

    There’s always the question about “lurkers” benefiting…and indeed, I (as a lurker in that exchange) like to read good faith discussions from people of differing viewpoints – I can learn from such exchanges. But you have to have good faith discussants for that to happen.

    As such, I can learn from, say, you and Dikran discussing an issue on this blog, and perhaps Brandon and Carrick discussing that same issue on his blog. But then I’m left to my own devices for looking how those discussions interact…something which becomes that much more difficult when technical issues are being discussed.

  198. OPatrick says:

    Vinny, I wonder if you understand the reasons why people are supportive of electric cars? Quite aside from the point that there is little expense (when I was doing the calculations last year, an electric car worked out as my cheapest option for a new car – fuel prices have dropped since then, but even so the difference is still small – far less than the difference between a Polo and a Golf, for example, a difference which a surprising number of people seem happy to pay. I am told you can get very good deals on second hand electric cars too), and, for many driving patterns, no ‘nuisance’ at all, the reason most people buy electric cars is not so much because of the immediate savings on emissions, but because we believe that in the long term we need to be moving away from transport fueled by fossil fuels. In order for this to happen the infrastructure needs to be in place and investment needs to be going in to developing the ever-improving technology.

  199. Joshua,
    I agree completely. I’ve never claimed that what I choose to do in terms of commenting elsewhere is well thought out and logical.

  200. Dikran Marsupial says:

    “oneillsinwisconsin, going without anaesthesia does seem like a disproportionate response but some people like to make gestures like that.”

    Vinny, I think the point is that nobody is saying that we shouldn’t use our cars at all, the point is that we need to use a limited natural resource wisely and with some regard to the consequences for the future (i.e. it is a cost-benefit analysis). That is why factoids like the anaesthesia one are essentially straw men.

    The electric car one is another straw man. The main benefit of electric cars is not that they reduce CO2 emissions directly, but they reduce emissions of pollutants in big cities where they are a problem. They save CO2 emissions indirectly because it is easier to perform CCS at a power station than it is in a moving vehicle.

    “Worth the expense and nuisance?” it depends on whose point of view you adopt. It might not be worth the expense and nuisance to you, but if you were a Bangladeshi, living in a country where the effects of climate change are likely to be felt more strongly, then they may have a different view of whether your inconvenience outweighed their means of providing for themselves. A key problem is that of discounting rates, most of us are too self-centered to care whether our fossil fuel use has negative effects on people they don’t know, who are least able to deal with it. There are scientific issues (most of which are sufficiently resolved), there are economic issues (rather less well resolved), there are socio/political issues (even less so). There is no ethical dimension to the science, but there is an ethical issue on what we choose to do about it.

  201. Joshua says:

    Yeah. I get that. That sense of perspective is (one reason) why I tend to find your participation more useful than that of many others.

    Brandon, in particular, is an interesting case for his extremeness in a different direction: Even though he routinely characterizes others’ actions as irrational/crazy/whatever (paraphrasing), and he seems to have a philosophical outlook that the entire world is irrational, he seems to me to be extremely “challenged” w/r/t reflecting on his own rationality.

  202. BBD says:

    Up a bit, Dikran observed:

    My main problem with it is that it is just uncharitable pedantry, i.e. looking for something to complain about and then making a mountain out of a molehill. That is not how science should be done IMHO.

    Shollenberger isn’t doing science. He’s doing propaganda, as are all contrarians.

  203. snarkrates says:

    Let’s face it. Any paper that attempts to quantify consensus as a proportion of scientists who “believe” a given proposition is a political document. The scientific consensus is on the validity of the predominant model we use to understand Earth’s climate. That model implies a climate sensitivity to doubling of CO2 in the 2-4.5 range–at least, no one has been able to formulate a self-consistent model that implies a significantly lower value. Indeed, take the models out of the picture and the upper end of the range increases a whole lot more than the lower end decreases. Anthropogenic causation of the current warming epoch is a consequence–indeed, even a prediction–of the theory of Earth’s climate.

    The contrarians are free–indeed have been encouraged all along–to formulate an alternative climate model. Their failure to offer one tells us all we need to know about the scientific consensus. The dissenting proportion of scientists is negligible precisely because their output in this direction is negligible.

  204. Dikran Marsupial says:

    Joshua wrote “he seems to me to be extremely “challenged” w/r/t reflecting on his own rationality.” well most of us have some problem with that, it is human nature. Having said which, accusing someone of being a liar in one post and expecting them to be courteous and respectful in the next does seem somewhat problematical.

    This really illustrates the reason why there is little constructive dialogue between mainstream blog contributors and skeptic blog contributors. If you want to keep the discussion civil, then you need to respond to [perceived] offenses by being at most equally offensive, but preferably less so. ATTP did that (IMHO), but BS did not recognise it (perhaps because it is much easier to remember being called a liar than to remember you have recently called someone a liar). If it were human nature not to escalate, we wouldn’t need reminders to avoid doing so e.g. “an eye for an eye”. Sic transit gloria blogsphere…

  205. Willard says:

    > Cook 2013 is ranked at the highest level of expertise.

    Please read back Table 1, Richie. Your wording may be misleading you and otters.

  206. Joshua says:

    Dikran –

    ==> well most of us have some problem with that, it is human nature.

    Sure. But I think it’s reasonable to speculate that with some relative to others, there’s a wider gap between the frequency of concluding that others are irrational and the frequency of acknowledging irrational behavior on their own part.

    Of course, such evaluations are always subject to observer bias to some degree, but w/r/t that gap I was describing there is also concrete evidence available.

  207. Willard says:

    > I can’t see any way that YOU (let alone anyone else) simply presenting information to HIM would “clear up” anything with him about a conclusion that he’s drawn, (something which would be obvious if you’ve observed him interacting in blog comment exchanges basically ever).

    I can think of a counterexample where a ClimateBall exchange cleared up something about one of Brandon’s conclusions. It starts thus:

    I have a confession to make. I was wrong. Tamino was right.

    http://rankexploits.com/musings/2012/its-fancy-sort-of/

    This state persisted for at least two days, until Brandon went to see the Sun shining. And the Sun, in his old Appolonic ways, made him see the light:

    This makes one thing clear: I was right. From the very start, I was right. […] So again, I was right.

    http://rankexploits.com/musings/2012/its-fancy-sort-of/#comment-95549

    (B’s own emphasis.)

    As we can see, this ClimateBall episode cleared up that Brandon was right all along.

  208. Dikran Marsupial says:

    Joshua, indeed, the difficulty in deciding how we should we act on that evaluation to improve the dialogue.

  209. Joshua says:

    Dikran –

    I question whether it’s possible to dialogue* with people who aren’t engaging in good faith. By definition, anything you say will become anti-dialogue.

    *using the term as a verb, not as a noun.

  210. Vinny Burgoo says:

    Honi soit qui mal y blogs.

  211. Dikran Marsupial says:

    Dico, dico, dico, blog meum nasem non habet!

  212. Willard says:

    Honi soit qui mal y blogue, Vinny. The verb is “bloguer,” so it’s third person singular of the “-er” verb group –

    https://en.wiktionary.org/wiki/bloguer

  213. Dikran Marsupial says:

    Joshua, yes, that is why I rarely comment on skeptic blogs any more, although I do occasionally read them.

  214. markbofill says:

    Rather than 3 separate comments, here’s my summary:
    BBD,

    He’s doing propaganda, as are all contrarians.

    I don’t think this is so. Avoiding the stupid literalist interpretation and taking your meaning to be ‘most’ instead of all, I still question this. I will grant you that it must seem that way, and that there seem to be (in my experience) a lot of people out there in the discussion who are not speaking in good faith. I see this on both sides, this isn’t unique to contrarians. I ‘get’ the frustration. It frustrates the heck out of me anyways.

    Joshua & Dikran,

    +1, too much to quote but I agree with just about everything you’re saying about this.

    Anders,

    FWIW, I didn’t see what was rude about your initial remark. Kudos as always for people like you who try to engage, but as Joshua and Dikran observe, sometimes it’s pointless. Still, it’s good to see other people and people on the other side of the debate try to engage as well. Thanks for making the effort.

    I’ve stalled off the inevitable long enough. Time to go cut the grass. :/

  215. John Hartz says:

    ATTP: Fodder for a future OP perhaps…

    “Every morning I awake torn between a desire to save the world and an inclination to savor it. This makes it hard to plan the day.”

    This thought, by author E.B. White, captures the tension that every advocate for action on climate change should feel. This is especially true for those of us who do research and who are most knowledgeable about the problem and the role our lifestyles play in creating it.

    Eco-authenticity: advocating for a low-carbon world while living a high-carbon lifestyle by Andrew J. Hoffman*, The Conversation US, Mar 31, 2016

    *Holcim (US) Professor at the Ross School of Business and Education Director at the Graham Sustainability Institute, University of Michigan

  216. BBD says:

    Fair enough, Mark – I wrote in haste. Let’s modify that to:

    Shollenberger is doing propaganda like most of the more vocal contrarians.

  217. opluso says:

    Joshua:
    The original Cook paper was intended to strengthen the argument that scientists and other experts agree human activities have resulted in warming. The second Cook paper was a response to Prof. Tol’s criticism of the first paper and added some analysis of expert rankings (and the chart). So to me they seem to be part of the same continuing process regarding the 97% meme.

    There are always issues with measuring opinions. Such surveys should provide margins of error around any results (typically based upon sample & population sizes) because they are necessarily approximations of the “true” numbers. The 97% meme is successful because it leaves out this uncertainty and fits the preferred narrative. It has been repeated over and over. When challenged on specifics the authors can (and do) say that 97% is “consistent with” other surveys.

    Prof. Tol, among others, takes issue with specific aspects of the Cook papers’ research methodologies. Here’s a quote that explains their concern better than I could:

    There are four main aspects of the research methodology: design, sampling, data collection, the data analysis. If inappropriate methodology is used, or if appropriate methodology is used poorly, the results of a study could be misleading.

    https://www.gwu.edu/~litrev/a06.html

    Even in the absence of flawed methodology (or sloppy application), do I think the Cook papers are misleading? Yes, in the sense that pseudo-precision is misleading. Yes, in the sense that they are part of a communications strategy specifically intended to push particular policy goals.

    aTTP:

    Brandon S is currently unhappy with me

    Having read some of his other posts, he seems to be unhappy with everybody. 😉

  218. Willard says:

    > I don’t think this [Brandon doing propaganda, as all contrarians] is so.

    I agree. This also applies to C13 – BBD’s point is the mirror image of opluso’s regarding C13. ClimateBall players should get over the fact/value dichotomy and realize that ideology always seems to lie in the eyes of those you observe.

    Humans are political animals that sometimes do science. Sometimes they go cut the grass. Sometimes they play ClimateBall, rehearsing socialization processes that started in third grade.

    We’re all in it it together. Since audits never end, it may take a while until we stop playing ClimateBall.

    We can still part, if you think it would be better. It’s not worthwhile now. Silence. Well, shall we go? Yes, let’s go. They do not move.

  219. opluso,

    Having read some of his other posts, he seems to be unhappy with everybody.

    Indeed 😉

    The problem with Richard’s point about methodology is that it isn’t really true. There are indeed scenarios where someone’s method is so wrong that it makes the result meaningless, but there are many cases where a method turns out to have flaws, but the result still has some merit. Part of doing research is trying to gain understanding of something, and that requires developing methods for doing so. These are continually evolving. If we eliminated all research results that came from methods that turned out to have problems, we’d never learn anything. The corrollary of Richard’s argument is also not true. A method that is applied perfectly doesn’t necessarily return a result that is “correct”. It also depends on your assumptions and other factors related to how the research was carried out.

  220. BBD says:

    willard

    ClimateBall players should get over the fact/value dichotomy and realize that ideology always seems to lie in the eyes of those you observe.

    I’m not really content with this. The issue (as I see it) is that there is a strong (near unanimous) scientific consensus and the contrarians contend that there isn’t, which ain’t true. So for me / others to argue that the contrarians are pushing counterfactual FUD (I called this propagandising earlier) and the consensus messaging crowd are essentially only telling the truth seems reasonable.

  221. Dikran Marsupial says:

    opluso wrote “There are always issues with measuring opinions. Such surveys should provide margins of error around any results (typically based upon sample & population sizes) because they are necessarily approximations of the “true” numbers. The 97% meme is successful because it leaves out this uncertainty”

    c.f.

    This diagram provides just such a credible interval and shows that other studies provide different results. Note that error bar on the Cook et al study is sufficiently small that to summarise it as 97% is not at all misleading (I suspect they may even be able to write 97.0%). This is making a mountain out of a molehill. The uncertainty in this study result is negligable in a “public communication of science” setting.

    BTW opluso, you repeatedly comment on other peoples motivations for doing things. You would be better off criticizing what they actually say, rather than your opinion of their motivations. The golden rule applies here if you speculate uncharitably about motivations of others, that implies you do not object to others speculating uncharitably about yours.

  222. John Hartz says:

    BBD:

    I’m not really content with this. The issue (as I see it) is that there is a strong (near unanimous) scientific consensus and the contrarians contend that there isn’t, which ain’t true. So for me / others to argue that the contrarians are pushing counterfactual FUD (I called this propagandising earlier) and the consensus messaging crowd are essentially only telling the truth seems reasonable.

    Well said!

  223. Willard says:

    > The issue (as I see it) is that there is a strong (near unanimous) scientific consensus and the contrarians contend that there isn’t, which ain’t true.

    Only some of the contrarians, BBD, for we already know that Richie contends that

    [T]he consensus is of course in the high nineties.

    https://andthentheresphysics.wordpress.com/2013/06/10/richard-tols-fourth-draft/#comment-822

    His “No one ever said it was not” begs the same kind of refutation as your own hyperbole. That JH rubberstamped your incorrect remark reinforces my point.

    ***

    > The original Cook paper was intended to strengthen the argument that scientists and other experts agree human activities have resulted in warming.

    From the horse’s mouth:

    An accurate perception of the degree of scientific consensus is an essential element to public support for climate policy (Ding et al 2011). Communicating the scientific consensus also increases people’s acceptance that climate change (CC) is happening (Lewandowsky et al 2012). Despite numerous indicators of a consensus, there is wide public perception that climate scientists disagree over the fundamental cause of global warming (GW; Leiserowitz et al 2012, Pew 2012).

    Click to access Cook_2013_consensus.pdf

    ***

    Putting these two remarks together, we get that the problem C13 addresses ain’t about the contrarians’ perception, but the public’s, and more precisely the American public. While contrarians may not feel as lonely as Barack suggests, they’re not the public either. The public entertains beer-mat beliefs, while contrarians are more pro-active in their ClimateBall entertainment. Disney may have been prophetic in saying that laughter was America’s most important export.

    For everything related to public perception, I’d defer to LarryH.

  224. Dikran Marsupial says:

    Brad Keyes puts the politeness question into perspective 😦

    It was questioned by BS though.

  225. BBD says:

    Willard

    Er, what hyperbole?

    Also remind me where I argued that C13 was for the benefit of contrarians? My understanding is that C13 got written because contrarians have been falsely claiming that there is a lack of consensus amongst scientists in order to confuse the public. So C13 was supposed to provide an improved public understanding of the fact that a very strong consensus does in fact exist.

  226. Willard says:

    The hyperbole is contained in what I quoted, BBD:

    [T]here is a strong (near unanimous) scientific consensus and the contrarians contend that there isn’t[.]

    I showed you an instance in which a contrarian did not contend that: Richie himself. Here’s the quote again, if only to please Very Tall:

    [T]he consensus is of course in the high nineties.

    https://andthentheresphysics.wordpress.com/2013/06/10/richard-tols-fourth-draft/#comment-822

    ***

    I suspect you’re conflating what contrarians do and the effect their doings produce. Contrarians mostly raise concerns. That raising these concerns may create FUD is only a by-product of the ClimateBall episodes where concerns are raised. Many contrarians wash their hands over that side-effect. They’d rather quote something about methodology instead, which enwraps their concerns into some kind of noble cause. For everything else, they let megaphones such as Willard Tony do their thing.

    And then there’s more ClimateBall.

  227. Willard says:

    > Also remind me where I argued that C13 was for the benefit of contrarians?

    The second part of my comment did not address something you raised, BBD, only the first. The third part being a synthesis of the first two parts, it mostly related to opluso’s latest peddling, which interestingly reiterates yet again what Joshua deplored earlier by hammering in what was contained in his previous comments: C13 political, C13 political, C13 political.

  228. opluso says:

    DM:

    Thanks for the advice, which I will work diligently to adopt as my own.

    In the meantime, could you provide specific details of the analysis behind the Bayesian 99% credible intervals? Although you have suggested sufficient information was provided in the paper to answer my questions, I honestly couldn’t find it beyond the reference to it being “largely a function of sample size.”

  229. John Hartz says:

    While we wax eloquently (and perhaps ad naseum) about scientific consensus, millions of people throughout the world are experiencing the consequences of manmade climate change. For example…

    A heatwave across most of India has led to a hotter than usual summer in much of the country.

    Temperatures have risen above 40C in at least seven states, which is highly unusual, and local media reports say that more than 100 people have died of sun stroke.

    How Indians are coping with a dangerously hot summer, BBC News, Apr 15, 2016

  230. BBD says:

    WIllard

    The hyperbole is contained in what I quoted, BBD:

    Oh, okay, I see.

    I suspect you’re conflating what contrarians do and the effect their doings produce.

    Yes, that’s probably true. I may also be conflating two different types of contrarianism as well. The blog-standard variety and the kind that wrote the API memo.

  231. John Mashey: VV: re AGU & EGU … mostly, I’m just trying to get a few people dubious of the consensus to actually attend a meeting and talk to real scientists, something that many apparently rarely, if ever do.

    Good luck. I get the impression that mitigation sceptics put a lot of energy, time and reputation in their effort to keep themselves in a state of misinformation. I would be very surprised if they would be willing to put energy into an effort that could challenge their preciously held misconceptions and get them into a contrary position to their friends, readers and the leaders they admire.

    I would focus the communication on people who have not made this stupidity their identity.

  232. Dikran Marsupial says:

    opluso, if you want to know those details and they are not in the paper, then send the corresponding author of the paper a polite enquiry.

    However, even then the fact that they publish a diagram showing that there is variation in the consensus between studies is an acknowledgment of the uncertainty. Indeed (as you can see from the diagram) the credible intervals for individual studies do not reflect the full structural uncertainties due to issues such as the exact question on which consensus is estimated. The diagram does a good job of showing both forms of uncertainty.

    The more important point however is that there is a gap in the public perception of the scientific consensus and the actual consensus, and it is large in comparison to the uncertainty in estimating the actual scientific consensus. This means that complaining about the use of a specific figure is detracting from the really important issue, which is that public opinion is affected by an incorrect view of scientific consensus. In other words, it is making a mountain out of a molehill.

    It isn’t even correct that it is only the 97% figure that is being promoted, see for instance this graphic produce by skeptical science

    This was tweeted by John Cook here, he used it in his article at the Conversation, so the idea that the 97% figure is being pushed while ignoring uncertainties seems to be taking a rather selective view of what is actually being said. It is also in John’s youtube video, which also demonstrates that it is the “consensus gap” that is the issue.

  233. Dikran Marsupial says:

    opluso John cook said “we found that the expert scientific consensus on human caused global warming is between 90 to 100%”

    How much clearer does he need to be?

    Note “consensus on human caused global warming” is not the same as the criterion “humans are contributing more than 50% of global warming, consistent with the 2007 IPCC statement that most of the global warming since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations”, which is why the uncertainty in the consensus on the former (vague) position is broader than that of the latter.

  234. Dikran Marsupial says:

    Re Brad Keys, turns out it was a typo (I gather there is something called “predictive text” that sounds like a plausible possibility).

  235. Willard says:

    > I may also be conflating two different types of contrarianism as well. The blog-standard variety and the kind that wrote the API memo.

    Indeed, we need to distinguish practionners of the auditing sciences from meme machines, i.e. think tanks. In fairness, we should bear in mind that the two categories are not exclusive: some think tanks do contain auditors. For instance, here’s one person who sits on the Academic Advisory Concil of the GWPF:

    [Richie] is a climate change economist and a Professor at the Department of Economics at the University of Sussex. He is an editor of the journal Energy Economics.

    http://www.thegwpf.org/who-we-are/academic-advisory-council/

    As you can see, Richie’s playing both ClimateBall variants.

  236. Vinny Burgoo says:

    I tweeted
    You twittered
    She twat

  237. Vinny Burgoo says:

    Willard, in courtly circles in 14th century England ‘she blogs’ was ‘ele blogs’, not ‘ele blogue’. You are thinking of modern French.

  238. Willard says:

    Look at the coat of arms, Vinny:

    https://fr.wikipedia.org/wiki/Ordre_de_la_Jarreti%C3%A8re

    One does not simply close a posh emprunt with an English word.

  239. BBD says:

    @ Dikran

    I agree with Vinny, especially after reading BK’s response to Brandon S. Besides, that’s pure Brad.

  240. Dikran Marsupial says:

    “Willard, in courtly circles in 14th century England ‘she blogs’ was ‘ele blogs’”

    I somehow doubt that. Something along the lines of ‘ele florilegiated’ perhaps

  241. BBD says:

    Willard

    As you can see, Richie’s playing both ClimateBall variants.

    Well, RP Jnr did describe Richard as a climate polymath once. Before they fell out.

  242. Dikran Marsupial says:

    BBD that also works for me, I’m keen on Hanlon’s razor (or at least more charitably worded versions), but it isn’t always easy to find the key (pun not intended).

    one of those “I late-cut, you nurdle, he edges” things.

  243. Willard says:

    Come to think of it, we ought to write “honi soit qui mal en blogue.”

  244. BBD says:

    My wife, from the wings:

    “What are you laughing at? I thought it was all supposed to be deadly serious?”

  245. John Hartz says:

    I greatly admire Bill McKibben’s ability to frame the enormity of the task confronting the human race with respect to mitigating and adapting to manmade climate change, Here’s a recent example…

    Climate change isn’t like other political issues, said Bill McKibben, an environmental activist who co-founded 350.org and led the opposition to the Keystone XL pipeline. With most debates between Democrats and Republicans, he said, “the best response is usually to figure out a compromise in the middle,” and then revisit the problem as needed. But with climate change, such a strategy could be catastrophic, he said. At current rates of fossil fuel consumption, the world has a few decades at most before it burns its way past 2 degrees Celsius.

    “Our problem here is that the two sides, fundamentally, are not industry and environmentalists, or Republicans and Democrats,” McKibben said. “At the very bottom, the two sides are physics and human beings. And that’s an extremely tough negotiation, because physics doesn’t care. We have no leverage on physics.”

    The new climate rallying cry: keep it in the ground by Sammy Roth, USA Today, Apr 13, 2016

  246. > Before they fell out.

    Citation needed.

  247. Vinny Burgoo says:

    OPatrick, what does a second-hand EV cost these days and what would its insurance, maintenance and 5k-miles-a-year ‘fuel’ costs be for five years, which is perhaps the longest time I can reasonably hope to get low-maintenance motoring out of my current petrol-fuelled old banger?

    And how much does pothole-damage cost to repair for a newish, few-sold EV rather than, say, a still-very-popular petrol-fuelled car manufactured throughout the 1990s and 2000s? (Pothole-damage is unavoidable around here unless you’ve got a robust gas-guzzling ‘Chelsea tractor’.)

    As for individuals buying expensive and imperfect EVs now in order to push down the cost and improve technology and infrastructure so that EVs will be a good choice for the masses a decade or two from now – fine. Your choice. (Whaddya want? A medal?)

  248. opluso says:

    Ed Maibach was added as a co-author of Cook 2016. Prof. Maibach’s “research currently focuses exclusively on how to mobilize populations to adopt behaviors and support public policies that reduce greenhouse gas emissions…”http://communication.gmu.edu/people/emaibach

    Maibach has studied the impact of precise, numeric descriptions on public beliefs. His findings are instructive and relevant, IMO, to the overall 97% debate. Long quotes are provided but you should, of course, read the entire paper before drawing your own conclusions.

    ​In this paper we report the results of two experiments that seek to answer the question: How can scientists and scientific organizations effectively communicate the level of scientific agreement about human-caused climate change? In the first experiment, we tested statements that express the level of agreement through verbal and numeric descriptions, varying the level of precision in the statements to assess how precision influences perceptions of scientific agreement and message credibility.​

    To test H2 and RQ2—that numeric scientific agreement statements will result in higher estimates of scientific agreement than non-numeric statements, and whether there is a difference in estimation confidence between numeric and non-numeric statements—we predicted these same two dependent variables from numeric vs. non-numeric messages. Participants in the numeric statement conditions gave estimates of scientific agreement that were approximately 13 points higher than those of participants in the non-numeric message condition, b = 12.61, p <. 001; they were also approximately 20 points more confident in their estimations, b = 19.46, p <. 001. Therefore, results were consistent with H2.


    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0120985

  249. BBD says:

    @ willard

    > Before they fell out.

    Citation needed.

    I thought there had been a spat, several years back (2010?) but I can’t find the evidence, so file as unsubstantiated waffle.

  250. BBD says:

    opluso

    Since the numerical and the actual are self-evidently in close agreement, what harm can it do to give the public a numerical description? We may quibble over a percentage point or two, but since a near-unanimous scientific consensus does in fact exist, how can the public be misinformed? The only potential for misinforming the public is illustrated by Maibach. The weakness of non-numeric statements as an effective communication of the actual strength of the scientific consensus.

  251. Joshua says:

    Mark Bofill –

    ==> I don’t think this is so. Avoiding the stupid literalist interpretation and taking your meaning to be ‘most’ instead of all, I still question this. I will grant you that it must seem that way, and that there seem to be (in my experience) a lot of people out there in the discussion who are not speaking in good faith

    I also don’t think that spreading propaganda is a likely explanation. Too nefarious to be plausible, IMO. I heard a joke on the radio today where a commentator said, in the context of a discussion about recycling, that it’s hard for Republicans to reach across the aisle when they’re afraid of what the Democrats have been touching (while recycling). I think that’s a reasonable metaphor for understanding the prevalence of poor faith exchange: It’s driven by group aggression and group defense, not a deliberate and knowing attempt to present false information. I have little doubt that most “skeptics” who engage with “realists” are fully convinced that they are just presenting the “true” state of the science to people who aren’t capable or willing to accept that “truth.” No different than “realists.” IMO, it’s less a conscious attempt to push an agenda than an un-self aware process of confirmation bias and identity defense/aggression.

    Keep in mind, that BBD tends to not look favorably when I suggest some measure of moral relativism.

  252. BBD says:

    Joshua

    Everybody should look at the peanut they are pushing and contemplate.

  253. Joshua says:

    Opluso –

    I’m going to skip over the preliminary stuff (some of which, again, seems not particularly related to the questions I asked you) and go to this part of your comment to me:

    ==> Even in the absence of flawed methodology (or sloppy application), do I think the Cook papers are misleading? Yes, in the sense that pseudo-precision is misleading. Yes, in the sense that they are part of a communications strategy specifically intended to push particular policy goals.

    And looking past whether your assertion of “pseudo-precision” for now (Dikran wrote a response that seems to point out a significant problem)….

    Let’s say you’re right, and that the work lacks sufficient treatment of uncertainty, and is presented in a way to suggest “pseudo-precision.” Consider the relationship between “pseudo-precision” and the attribute of being “misleading” in context.

    On the one hand (again, assuming you’re right for now) ,we have some papers that are overconfident in their assessment of the “precise” prevalence of shared opinion among experts about climate change, but which present an assessment that largely coincides with the assessment of prevalence made by their critics (how many times has Tol estimate the “consensus” to be very much in the range of Cook et al.’s assessments)? If there is disagreement about the magnitude of the assessment between Tol and Cook, it is by an relatively small amount when compared to the overall estimate.

    If people are “mislead” by Cook et al., what is the effective impact? Perhaps some people might think that the prevalence is definitely 97% instead of almost certainly in the high 90’s (paraphrasing Tol). Hmmm.

    But on the other hand, from the other side of the great climate divide, we have many people attacking the “consensus” studies as being unethically researched, with a clear and intended implication that that lack of ethics can be generalized to the entire field of climate science, to the point where the notion that we are at risk of dangerous climate change should we continue emitting aCO2 unabated is consider to be a “hoax.”

    IMO, actually, in terms of real impact, in the real world, even if I were to grant you that Cook et al. were “misleading” by virtue of employing “psuedo-precision,” it is significantly less “misleading” in impact than the “anti-concensus” campaign that is ubiquitous in the climate wars.

    Personally, I think all this “consensus” bickering is non-productive. I don’t agree with those “realists” who think that “consensus-messaging” will have some material impact on public opinion about climate change. IMO, the driving mechanisms behind the public’s view on climate change are far more complex. But it interesting to note that it seems that many “skeptics” seem to agree with the “realists” about the magnitude of impact from consensus messaging. They’re agreeing with those folks that they think are dishonest and who regularly employ poor reasoning.

    But what seems clear to me is that the obsessive focus among some “skeptics” about “consensus-messaging” is just more evidence of pervasive identity orientation (as expressed by identity-defensive and identity-aggressive behaviors). An interesting extension of that is to see so many “skeptics” present (IMO fallacious) arguments about how considering the prevalence of shared opinion among experts is antithetical to valid scientific process, or make arguments that seem obviously non-reflective (since evaluating “consensus” is a heuristic that every one does, pretty much every day, because it is fundamental component of how humans reason) even as they amusing spend heaps of time in comment thread after comment thread arguing about minor differences in the overall assessment of the prevalence of shared opinion among climate science “experts).

    All of that is only magnified if your focus of “misleading” is that chart.

  254. markbofill says:

    Joshua,

    It’s driven by group aggression and group defense, not a deliberate and knowing attempt to present false information. I have little doubt that most “skeptics” who engage with “realists” are fully convinced that they are just presenting the “true” state of the science to people who aren’t capable or willing to accept that “truth.”

    I think this is a shrewd observation that is probably correct. Not to say that there’s nobody out there knowingly and deliberately spreading propaganda that they don’t believe on both sides; the world is a big place full of lots of people. But I think the majority of the cases are as you describe.
    Sometimes people resort to dishonest tactics. I think it’s mostly bad defense mechanisms; they can’t figure out why they disagree with a clear point. They are unwilling to concede, thinking that they’re misunderstanding or that they’ve forgotten something pertinent. It takes time to think things through, and the pace of conversations
    sometimes doesn’t accommodate that leisure.
    Regarding moral relativism, I don’t think that’s what we’re talking about. We’re the same animal regardless of our abstract convictions or the moral validity of our abstract convictions. IMO all of us having our humanity in common we have common mental blind spots. We many of us often make the same mistakes.
    I don’t know where any of this gets us. 🙂 It seems like it ought to get us someplace. Maybe at some point we’ll figure it out.

    Thanks Joshua, it’s always interesting talking with you.

  255. markbofill says:

    Uhm, I don’t mean to post a bunch of comments on this subject but on re-reading I worry that my remark might be unclear:

    Sometimes people resort to dishonest tactics. I think it’s mostly bad defense mechanisms; they can’t figure out why they disagree with a clear point. They are unwilling to concede, thinking that they’re misunderstanding or that they’ve forgotten something pertinent. It takes time to think things through, and the pace of conversations sometimes doesn’t accommodate that leisure.

    This isn’t intended to excuse or justify anything. I may be wrong, but I suspect there are readers here who want to figure out how to ‘reach’ people who disagree with them. IMO understanding why people do what they do without resorting to contempt is an important key to reaching people.
    Ok I’ll shush now. Thanks.

  256. > Maibach has studied the impact of precise, numeric descriptions on public beliefs.

    Here’s the abstract:

    Human-caused climate change is happening; nearly all climate scientists are convinced of this basic fact according to surveys of experts and reviews of the peer-reviewed literature. Yet, among the American public, there is widespread misunderstanding of this scientific consensus. In this paper, we report results from two experiments, conducted with national samples of American adults, that tested messages designed to convey the high level of agreement in the climate science community about human-caused climate change. The first experiment tested hypotheses about providing numeric versus non-numeric assertions concerning the level of scientific agreement. We found that numeric statements resulted in higher estimates of the scientific agreement. The second experiment tested the effect of eliciting respondents’ estimates of scientific agreement prior to presenting them with a statement about the level of scientific agreement. Participants who estimated the level of agreement prior to being shown the corrective statement gave higher estimates of the scientific consensus than respondents who were not asked to estimate in advance, indicating that incorporating an “estimation and reveal” technique into public communication about scientific consensus may be effective. The interaction of messages with political ideology was also tested, and demonstrated that messages were approximately equally effective among liberals and conservatives. Implications for theory and practice are discussed.

    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0120985

    Sigh.

  257. @joshua
    From the very start, Cook 2013 was a PR stunt rather than serious research. It is small wonder that this debate is had in public.

    The problem with Cook 2013 is that anybody with a limited understanding of surveys or statistics would recognize this as a deeply flawed paper — yet one of the most prominent ones in climate research. If you want to argue that climate research is flawed, Cook 2013 is exhibit A.

  258. OPatrick says:

    Vinny, I don’t know how much secondhand EVs cost, look it up – I’m told you can get some good deals. I don’t know why you think the pothole issue is relevant for EVs as opposed to any other car. Fuel costs, with battery rental, are marginally higher than petrol for an efficient car at present, roughly equivalent with prices at £1.20 per gallon. If you buy the battery fuel costs will be cheaper, but there is uncertainty involved.

    Whaddya want? A medal?

    I want people to stop spreading the sort of memes you demonstrate. Electric cars aren’t particularly expensive anymore and they aren’t any more imperfect than any other car. They are a realistic choice for a large minority, possibly already a majority, of people.

  259. Richard,

    From the very start, Cook 2013 was a PR stunt rather than serious research. It is small wonder that this debate is had in public.

    A few points.

    1. That you would assert this with little, if any, evidence would seem to bring into question your overall objectivity.

    2. Given that you’ve waged a 3 year campaign against Cook et al., despite not disputing the results, would suggest that yours is little other than a PR stunt itself.

    3. Even if Cook et al. was a PR stunt, it is at least promoting something that most regard as true.

    4. If yours is a PR stunt, it is attempting to undermine something that is regarded by most as true.

    5. In your attempt to query Cook et al., you’ve managed to publish a paper which has a claim that is almost certainly not true, and that you must have known was not true before publishing this paper.

    Some serious questions. You maintain that you are simply defending research integrity by challenging what you call a deeply flawed paper. Why is it acceptable for you to do so by repeating things that are almost certainly not true? Does research integrity only apply to others? You criticse Cook et al. for supposedly being a PR stunt, and yet your own behaviour is hard to interpret as anything other than a propaganda war against a paper that has presented a result that most regard as essentially correct. Is it only others who should avoid promoting their views publicly?

    I don’t expect you to answer these, or to recognise the irony in what you say.

    The problem with Cook 2013 is that anybody with a limited understanding of surveys or statistics would recognize this as a deeply flawed paper — yet one of the most prominent ones in climate research.

    I think there are many who feel that you could replace Cook 2013 with Tol 2009. Fortunately for you, most others endeavour to conduct themselves in a manner that is both decent and reasonable. I’m fully aware that if anyone had conducted a campaign against your 2009 paper that was similar to the campaign you’ve waged against Cook 2013, you’d be complaining vigorously about being attacked by the consensus police.

  260. Anders,

    Brandon S is currently unhappy with me because the comment I made one his post (why, I hear you ask …

    … because … sea lions.)

  261. verytallguy says:

    From the very start, Cook 2013 was a PR stunt rather than serious

    From the person who has assiduously tried to undermine the results with an asinine set of comments, this is pure comedy gold. And speaking of serious, let’s remind ourselves of Gelman on Tol:

    I’m sure you can go the rest of your career in this manner, but please take a moment to reflect. You’re far from retirement. Do you really want to spend two more decades doing substandard work, just because you can?… …It’s not too late to get a bit more serious with the work itself.

    Healer, heal thyself.

  262. For the PR nature of Cook 2013, I defer to Ari Jokimaki.

  263. Richard,
    Ari Jokimaki is not an author of Cook et al. (2013), so it is puzzling as to why you would defer to Ari.

  264. Marco says:

    “you’d be complaining vigorously about being attacked by the consensus police.”

    Ask Frank Ackerman what “complaining” you could have expected in that case. An e-mail to your employer is the absolute minimum you could expect, and that e-mail may contain legal threats.

    People here may enjoy the irony that the Tol-Ackerman controversy hinges around a claim by Ackerman that Tol claims Ackerman knew was wrong. Compare to Tol’s inclusion in his comment on Cook et al of a claim that he knew was wrong.

  265. Marco,
    Indeed. Given that I’m an author of the new consensus paper, I’ve been pondering how I might respond if Richard chooses to wage the same kind of campaign against it that he has against Cook et al. (2013). One might imagine that Richard would aim to only say things that are either true, or suitably qualified. Such an assumption might, however, be wrong.

  266. Joshua says:

    Mark Bofill –

    Consider the following, completely hypothetical, scenario.

    Imagine running across a person, ‘ll call her Brenda Shultz, who is quite intelligent in certain domains and who enjoys applying her reasoning in those domains to figure out challenges and exercise her intellect. Imagine that she is a little quirky, perhaps maybe even somewhere on “the spectrum,” and part of that quirkiness manifests as an intense attachment to the “correctness” of her reasoning. She is extremely confident in her reasoning skills (not entirely without merit) and this leads her to frame points of disagreement as situations where the other person “doesn’t make sense” or is a “liar” or is “insane” or who makes “insane arguments.” The quirkiness, because she is rigid in her own perspective formation (she tends towards a binary view of inherently complex issues) , seems to create something of a confusion for her as to what is a matter of perspective/opinion and what is objective understanding. When people present differing perspectives on issues to her, she tends to focus on and isolate pedantic and picayune points of disagreement instead of clarifying her understanding of the more important elements in what other people meant (in a way it makes sense because she’s so confident or her reasoning skills and it doesn’t seem possible to her that someone could mean something other than what she interpreted). She tends to be prescriptivist in terms of what words mean (they can only mean what she interprets them to mean) and so, trying to clarify a misunderstanding on her part is impossible, and if anyone interprets her words to mean something other than what she intended, it isn’t because of the inherent ambiguities in communicating, but because her interlocutor is (or must be) “insane’ or “lying.”

    She tends to have a particular political ideological orientation, and while she isn’t uniformly enslaved by that orientation, because she has that particular ideological orientation others might see her as a propagandist when she relentlessly argues that her perspective on a politicized issue is the only valid perspective, and spends a great deal of time presenting that perspective in various fora while calling people who have different perspectives liars, insane, etc.

    Now in some ways, Brenda reminds me of many people engaged in the climate wars…but perhaps some more than others.

  267. Now in some ways, Brenda reminds me of many people engaged in the climate wars…but perhaps some more than others.

    Hmmm, let me think????

  268. Joshua says:

    Richard –

    ==> From the very start, Cook 2013 was a PR stunt rather than serious research. It is small wonder that this debate is had in public.

    Your use of “PR stunt” is ambiguous, IMO. YOU seem to be very focused on the societal impact of your research and on influencing the societal impact of others’ research. Does that mean that your work is a “PR stunt?” What is the difference between research that is intended to inform public opinion and a “PR stunt?”

    Clearly, Cook et al. are interested in the impact of their research on public opinion, but you seem to be trying to establish some condition of mutual exclusivity between a focus on the impact of research on public opinion and “serious research.” What would lead you to emphasize such a simplistic, some might even say propagandistic, false distinction?

    ==> The problem with Cook 2013 is that anybody with a limited understanding of surveys or statistics would recognize this as a deeply flawed paper

    Well, you have established credentials in understanding statistics (I don’t particularly know about your credentials w/r/t conducting surveys), but I have seen others who have similar credentials and who disagree with you about statistical arguments on this particular topic, and perhaps more importantly, on other topics as well (Ie.g., Gelman).

    Further, I have seen you put forth some (what appear to me to be) god-awful arguments in these pages – arguments that contain logical gaps that you could push a small planet (or large meteorite) through. As such, I’m not particularly inclined to accept your embedded appeal to self-authority/appeal to incredulity at face value. But what makes that even more interesting is that the obsessive bickering about the “consensus” issue takes place when it seems to me to be plainly obvious that there is a strong prevalence of shared opinion among climate experts about the risk of BAU w/r/t aCO2 emissions, and that the existence of that prevalence is of some value (but isn’t dispositive) even as people spend gobs of time promoting fallacious (IMO) arguments to underestimate or overestimate its value, respectively.

    I won’t say that you can’t make this up because you could make this up if you wanted to create a textbook case for establishing how identity-orientation influences reasoning.

    ==> — yet one of the most prominent ones in climate research. If you want to argue that climate research is flawed, Cook 2013 is exhibit A.

    And here is where we find the meat of your argument. Yes, if you want to argue that climate research is flawed then Cook et al. is exhibit A – because it has become a badge of identity within a polarized context. What was very nice about that little gem of your statement there, is the implication about what happens if one’s goal is to argue that climate research is flawed. So then the question I find interesting is why do those “skeptics” who obsessively focus on Cook et al. want to argue that “climate research” is flawed?

    IMO, Cook et al. is not “climate research.” That anyone would want to apply guilt by association from non-climate research to the field of climate research just goes back, IMO, to the picture of bickering about Cook et al and climate research as a junior high school lunchroom food fight, where people are locked into an identity-orientation battle.

    The arguments about Cook et al, and in particular that component of those arguments that you’re engaged in, look to me like predominately about identity/personality politics. It’s about leveraging science and analysis in order to fight about identity and personalities.

  269. @wotts
    Indeed. I assume Ari thought it improper to be a rater and an author.

  270. opluso says:

    Joshua:

    I appreciate your thoughtful comments. But we do not seem to be on the same wavelength regarding “misleading” or the problem I have with pseudo-precision. Perhaps this is due to shifting contexts.

    Scientists and academicians typically hedge their conclusions with phrases like “not inconsistent with” etc. Yet it is widely believed by policy wonks (and has been empirically tested by Maibach and others) that precise numerical claims are more effective in moving the policy debate than non-numerical or less precise statements. The 97% meme is an excellent example of this effect.

    The Cook 2013 et seq. papers exist in two contexts. One is the academic literature context and this was largely the arena in which Tol and others criticised the original methodology (though I see from his recent comments that he has a broader critique now). The other context is political. I happen to believe the latter is the more important context in this situation and also happen to believe that is the primary context in which Cook, et al. sought to have impact.

    You asked:

    If people are “mislead” by Cook et al., what is the effective impact?

    The impact is to push particular domestic and international policies.

    You also stated:

    IMO, actually, in terms of real impact, in the real world, even if I were to grant you that Cook et al. were “misleading” by virtue of employing “psuedo-precision,” it is significantly less “misleading” in impact than the “anti-concensus” campaign that is ubiquitous in the climate wars.

    As I said before, it’s political turtles all the way down. https://en.wikipedia.org/wiki/Turtles_all_the_way_down

  271. Joshua says:

    Opluso –

    ==> The impact is to push particular domestic and international policies.

    That doesn’t fit with how I understand impact. Now you might feel that is the intent or an accurate description of the effective argument equating intent or effective argument to impact doesn’t add up, IMO.

    As far as I can see, the impact, at least in the sense that the term impact is generally used, at most, would be as I described above: some policy developers and/or members of the general public might confuse certainty of a precise figure with an extreme likelihood of a very similar figure. To that I say, “meh,” there must be some reason beneath the surface that people spend so much of their time and energy on this issue.

    That seems like a rather obvious point to me….as is the point that when I asked you a yes/no question, responding with an “if….” clause isn’t really responsive.

    As for turtles. I will respond on that point even though, IMO, I think that the way in which you’re focusing on that is mostly tangential to the point I’m trying to discuss. The climate wars is an inherently political context. IMO, it is very difficult to disaggregate politics and science in many contexts. Science, in being performed by humans, is inherently political at least to some degree. Pointing that out, IMO, tends to be rather banal…but then gets problematic when people use selective reasoning about the linkage between politics in science to support an agenda (in the service of an identity struggle).

  272. Joshua says:

    Opluso –

    And btw…

    ==> …that precise numerical claims are more effective in moving the policy debate than non-numerical or less precise statements.

    I have not been convinced that “97% consensus-messaging” has a significant impact in the real world. I think that there are legitimate methodological questions about the research that suggests such an impact, but the biggest issue I have is that as near as I can tell, there is a big gap between researching that question under experimental conditions and proving the impact in the real world. In the real world, “consensus-messaging” is inherently linked to identity/ideological orientation. While in general, “consensus-messaging” works (look at the ubiquity of the use of “reviews” in on-line marketing), applying that general principle to this particular context seems to me to be highly problematic. Who is hearing the messaging and from whom they are hearing and in what context they are hearing it are very difficult parameters to control.

  273. Richard,
    I’ve no idea what Ari thought or why his views have any relevance whatsoever. He is not an author of the paper!

  274. opluso,
    You’re going on about precision. One way to determine the 99% confidence interval for such a survey is to use

    2.58 \sqrt{\dfrac{p(1 - p)}{n}},

    where p is the proportion that endorse the consensus, and n is the sample size. For the abstracts it was 0.97 of 4014, and for the self-rated papers, it was 0.97 of 1381. This gives 99% confidence intervals of +- 0.7% and +- 1.2%.

  275. snarkrates says:

    As I tried to point out above, none of these papers measure scientific consensus–which has nothing to do with who believes what. It has, rather, to do with which theories scientists use in their research. Thus, a scientist who dismisses anthropogenic causation in op eds implicitly supports the consensus on anthropogenic warming every time they use the standard theories of Earth’s climate. Anthropogenic warming is an inevitable consequence of those theories unless you want to believe some really strange stuff is true. This means that Dick Lindzen, Judy Curry, et al. implicitly support the consensus, because they do not see the current models as problematic enough to require something better. The real consensus is nearly 100% give or take a few cranks with measure zero on the probability space.

  276. Joshua says:

    Opluso –

    One more comment and then I’m out for a while.

    It occurred to me that I may have missed part of your point. Perhaps “definitely 97%” messaging is inherently more effective than “extremely likely in the high 90s” messaging because a claim of precision is, in itself, more effective than acknowledging some degree of uncertainty.

    I suppose that’s quite possible, but: I would think that for some people (at least I know it’s the case for me), a claim of “precisely 97%” is more implausible than a claim of “extremely likely in the high 90s.” I would be open to evidence otherwise that in balance, the former is significantly more effective than the latter in general circumstances, but certainly such results would likely be context-specific to at least some degree and, I would guess that given heuristics such as the one that I use to evaluate claims of precision, the net effect would be rather small in almost any context.

    The bottom line here, IMO, is what is the actual impact as measured in people being “mislead” by a pseudo-precise 97% figure (accepting your argument that it is pseudo for the sake of argument) – and in particular in comparison to Richard’s opinion that the consensus is certainly i high nineties (I believe one time he described it as almost unanimous?).

    I think that considering the differential impact (the number of people who would be meaningfully less misinformed if the phenomenon in question were not there) , “anti-consensus” messaging is, in the real world, “misleading” to a greater extent than Cook et al.

  277. BBD says:

    opluso

    Thanks for completely blanking my earlier comment.

    Let me repeat it for you:

    Since the numerical and the actual are self-evidently in close agreement, what harm can it do to give the public a numerical description? We may quibble over a percentage point or two, but since a near-unanimous scientific consensus does in fact exist, how can the public be misinformed? The only potential for misinforming the public is illustrated by Maibach: the weakness of non-numeric statements as an effective communication of the actual strength of the scientific consensus.

  278. @wotts
    Ari was the first to note that Cook 2013 was putting the PR before the science.

  279. Willard says:

    The problem with Richie is that anybody with a limited understanding of ClimateBall would recognize his contributions as a PR stunt. If you want to argue that contrarians are mostly preoccupied with PR stunts, Richie’s your man.

    Warning: may contain Gremlins.

  280. John Hartz says:

    Thomas Levenson, a professor of science writing at MIT, has penned a powerful Op-ed directed at the two leading Republican candidates for US President. The admonition contained in his concluding paragraph (below) should remind all of us that incessant and excessively repetitive arguing over scientific consensus is merely a sideshow to the main event.

    Much of contemporary science has accumulated into a deep understanding of the natural world that is inconvenient for the leading Republican candidates for president. Willed ignorance is a disaster for climate policy in particular. It is worse as an approach to science in the public sphere. For centuries, human curiosity led us to the point where we know so much; it would be good — more, it may well be a matter of survival — to put all that knowledge to use. [My bold.]

    Doubting climate change is not enough, Op-ed by Thomas Levenson, Ideas, Boston Globe, Apr 17, 2016

  281. Richard,
    So what? Ari is not an author. Therefore his views about the paper have as much relevance as anyone else who isn’t an author.

    Let’s also explain something about “science”. The goal of “science” (by which I mean research in general) is to try and develop understanding. Not even you dispute that the consensus is probably in the high nineties. However, you continue to publish papers criticising a study that illustrates that the consensus is in the high nineties. Given that your papers almost certainly do not improve our understanding of the level of consensus, the only reason that one would not regard your work as putting PR before the science is because – argueably – it’s not science.

  282. Willard says:

    > Perhaps “definitely 97%” messaging is inherently more effective than “extremely likely in the high 90s” messaging because a claim of precision is, in itself, more effective than acknowledging some degree of uncertainty.

    That’s not what Maibach & al found:

    [T]he results revealed that, numeric descriptions are more effective than non-numeric or verbal descriptions. Number precision may also matter, but the operationalization of precision in the present study was conflated with numeric magnitude and therefore might have been conflated with message strength, such that the more precise, numerical messages may have communicated consensus information in a stronger manner. Furthermore, the more precise numeric messages may have been more attention grabbing in the context of the newspaper advertisement used in our stimulus. Further testing is needed to assess whether the effect we observe is driven by message precision or by message strength.

    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0120985

    No wonder opluso fails to provide precise quotes.

    Other researchers have studied the impact of raising concerns in comment threads.

  283. Willard says:

    > I’ve no idea what Ari thought or why his views have any relevance whatsoever.

    Richie’s baiting you into stolen content, AT.

    That indicates how Richie’s into Sound Science ™ and Sound Science ™ only.

  284. Willard,
    Yes, that’s what I expected. If Richard wants to use stolen content to back up his assertion, then he’s welcome to do so. His argument against Cook et al. would then include a paper with a claim that is almost certainly not true and which he almost certainly knew was not true prior to publication, and evidence from content that was stolen. That’s a surprising way to promote research integrity, but each to their own, I guess.

  285. John Hartz says:

    Willard:

    Richie’s baiting you into stolen content, AT.

    You beat me to pointing this out. ESP?

  286. Willard says:

    > One is the academic literature context and this was largely the arena in which [Richie] and others criticised the original methodology […]

    Actually, Richie started over Twitter. One random criticism from that arena:

    A blast from the past:

    The word “distraction” might very well represent what’s happening at Lucia’s at least since beginning of May. Here are the relevant titles in http://rankexploits.com/musings/2013/05

    Dear John. I have questions.
    2 May, 2013 (21:14) | Data Comparisons

    The papers: I think I have all titles.
    3 May, 2013 (11:35) | politics

    Links to John Cook’s Survey.
    4 May, 2013 (12:12) | politics

    A Random Failure
    5 May, 2013 (12:51) | Data Comparisons

    I Tried
    6 May, 2013 (04:43) | Data Comparisons

    Cookies Cookies (& Bleg.)
    7 May, 2013 (08:31) | politics

    Survey Privacy Plugin
    9 May, 2013 (15:08) | politics

    Happy Hour: Time to Play!
    10 May, 2013 (16:22) | Data Comparisons, politics

    SkS Survey Over Haiku
    13 May, 2013 (19:13) | Haiku, politics

    U of Queensland Application for Ethical Clearance.
    14 May, 2013 (10:09) | politics

    I Do Not Think it Means What You Think it Means
    15 May, 2013 (12:26) | Data Comparisons

    On the Consensus
    17 May, 2013 (00:49) | Data Comparisons

    Nir Shaviv: One of the 97%
    17 May, 2013 (15:33) | Data Comparisons

    Better way to remove the effect of “non attribution papers”.
    18 May, 2013 (09:20) | Data Comparisons

    Why Symmetry is Bad
    19 May, 2013 (19:21) | Data Comparisons

    Possible Self-Selection Bias in Cook: Author responses.
    20 May, 2013 (12:02) | politics

    Bias Author Survey: Pro AGW
    21 May, 2013 (11:43) | Data Comparisons

    Climate Science P0rn
    23 May, 2013 (13:31) | Data Comparisons

    Tol-Nuccitelli Twitter War: The beginning
    23 May, 2013 (16:23) | politics

    The “D” word: Alternative definitions.
    25 May, 2013 (11:40) | Data Comparisons

    https://ourchangingclimate.wordpress.com/2013/05/17/consensus-behind-the-numbers/

    But the academic literature context arena, of course.

  287. John Hartz says:

    ATTP & Willard:

    Has Richard Tol posted anything on this comment thread that he hasn’t said before?

  288. opluso says:

    The lack of comment nesting on this site can make following an individual thread a bit difficult. I’m probably missing others, as well. C’est la guerre.

    DM:

    It isn’t even correct that it is only the 97% figure that is being promoted

    Of course, I never said it was.

    Regardless, I don’t think you could disagree that the 97% meme has, in fact, been promoted. It is promoted for a reason. That reason is not because of a mistaken assumption of superior accuracy. It is because of the accurate perception of superior efficacy.

    BBD:

    Since the numerical and the actual are self-evidently in close agreement, what harm can it do to give the public a numerical description?

    “Harm” may be the wrong word, but see above response to DM.

    Willard:

    No wonder opluso fails to provide precise quotes.

    I did provide precise quotes regarding the precise portion of the paper precisely relevant to my point. (see:https://andthentheresphysics.wordpress.com/2016/04/13/consensus-on-consensus/#comment-76413) I further encouraged others to read the full paper lest I be accused (know your audience) of selective bias. Alas, to no avail.

    Furthermore, although you were responding to a quote of Joshua (not me), the language you highlighted from Maibach shows that both aspects (precision and magnitude) support my position on the 97% meme.

  289. Willard says:

    Another blast from the past. It comes from BartV’s ur-thread. It’s the third of my Frequently Expressed Concerns (FEC).

    ***

    Third Concern

    Isn’t the GOAL of (Cook et al) political?

    Answer to the Third Concern

    The short answer is: no and yes.

    ***

    The study is about the degree of endorsement of AGW in the scientific community. To endorse AGW is a scientific stance. Here’s how the authors state the background of their research:

    An accurate perception of the degree of scientific consensus is an essential element to public support for climate policy (Ding et al 2011). Communicating the scientific consensus also increases people’s acceptance that climate change (CC) is happening (Lewandowsky et al 2012). Despite numerous indicators of a consensus, there is wide public perception that climate scientists disagree over the fundamental cause of global warming (GW; Leiserowitz et al 2012, Pew 2012).

    From this statement of motivation, we can conclude that the authors seek to improve the knowledge we have of the degree of scientific consensus. In the Conclusion, the authors state what they directly seek to refute:

    The narrative presented by some dissenters is that the scientific consensus is ‘…on the point of collapse’ (Oddie 2012) while ‘…the number of scientific “heretics” is growing with each passing year’ (Allègre et al 2012). A systematic, comprehensive review of the literature provides quantitative evidence countering this assertion.

    The emphasized sentence shows that the authors understate their work: not only do they provide a review of the litterature that seeks some kind of systematicity, it also validated this review with the results of a survey where authors self-classified their own work. This part of the study is not something to be dismissed lightly, more so if we are sensible to raising concerns.

    This was the “no” part.

    ***

    The authors’ GOAL was to refute a meme. The authors do seem to imply that satisfying this goal should eventually improve the perception of the degree of scientific consensus. But the communication task is not accomplished by the paper itself.

    To promote a consensus can be political:

    Consensus decision-making is a group decision making process that seeks the consent of all participants. Consensus may be defined professionally as an acceptable resolution, one that can be supported, even if not the “favourite” of each individual. […] Consensus decision-making is thus concerned with the process of deliberating and finalizing a decision, and the social and political effects of using this process.

    http://en.wikipedia.org/wiki/Consensus

    On the other hand, the meaning of consensus studied by Cook & al could satisfy a more mundane definition:

    A process of decision-making that seeks widespread agreement among group members.

    General agreement among the members of a given group or community, each of which exercises some discretion in decision-making and follow-up action.

    http://en.wiktionary.org/wiki/consensus

    The concept of consensus can thus be more epistemic than political. Science is still something different than politics, and were the concept of consensus in science the same as in politics, we’d be having a more consensual decision-process right now.

    On the other hand, the study serves as a backbone for this other project:

    http://theconsensusproject.com

    So we conclude that this project has a political aim.

    ***

    So the debate is over: AGW is happening.

    We certainly should not discount all the concerns that might raised.

    Which is very good, as otherwise we would not be here.

    And we all want [Sound] Science ™.

    So we thank everyone for their concerns.

    ===

    Source: https://ourchangingclimate.wordpress.com/2013/05/17/consensus-behind-the-numbers/#comment-18862

  290. Willard says:

    > I did provide precise quotes regarding the precise portion of the paper precisely relevant to my point.

    The abstract contradicts the claim that Maibach & al “studied the impact of precise, numeric descriptions on public beliefs.” They compared numeric with non-numeric descriptions.

    This is confirmed by a quote where the authors say that [T]he results revealed that, numeric descriptions are more effective than non-numeric or verbal descriptions.

    The same quote also shows that the authors acknowledge that the operationalization of precision in the present study was conflated with numeric magnitude.

    Let’s show the relevant figure:

    If opluso’s point is that numbers are more convincing than verbal descriptions, then he can use Maibach & al.

    If opluso’s point is that more precise numbers have a more substantial impact than less precise numbers, then he needs to find another study.

  291. John Hartz says:

    Willard:

    Thank you for delving into the definition of the word, “consensus.” Like “beauty”, it means different things to different people.

    Having said that, here’s one variation of “consensus” that I was exposed to duing my professional carreer in transportation. It had to do with acheiving agreement to proceed with a controversial large project. Ths underlying assumption was that each stake-holder had a the ability to torpedo the project. Therefore, the goal was to either convince each stake-holder to endorse the project or convince them to not launch their respective torpedos to sink the project. In this context, “consensus” can be acheived with less than 100% agreement by all stake-holders.

    Given the overwhelming body of scitentific evidence supporting the reality of manmade climate change, I doubt that any resident of Deniersville has a torpedo that can sink the scientific consenus.

  292. BBD says:

    opluso

    “Harm” may be the wrong word, but see above response to DM.

    That’s a non-response.

  293. opluso says:

    Willard:

    They compared numeric with non-numeric descriptions.

    They also compared 97% to 97.5% and found that 97.5 (being even more “precise”) was slightly more effective. You show the relevant bar graph and yet continue to accuse me of misrepresenting/misunderstanding Maibach et al.

    Because of the structure of the study they were largely limited to drawing general conclusions. I still feel it was a reasonable (and relevant) example supporting my original point, your parsings to the contrary notwithstanding.

  294. John Hartz says:

    opluso:

    Which “original point” are you referring to in your most recent comment?

  295. Joshua says:

    Willard –

    Thanks for that information.

  296. snarkrates says:

    I’m wondering whether this little microtempest reminds anyone else of the reaction to another paper: Mann et al. 1998? As with this former paper, the aim seems to be more to tarnish the reputation of the authors than to correct any substantive errors.

  297. > They also compared 97% to 97.5% and found that 97.5 (being even more “precise”) was slightly more effective.

    From 77% to 78%. Just imagine if C13 claimed 98% because that’s what they found. You just can’t make this up.

    Let’s repeat what the authors say they found. From the abstract:

    We found that numeric statements resulted in higher estimates of the scientific agreement.

    No mention of precision. From their discussion:

    [T]he results revealed that, numeric descriptions are more effective than non-numeric or verbal descriptions.

    Again, no mention of precision. The graph clearly shows what the authors say they found and opluso’s “slightly more effective” carries little weight. The claim that “Maibach has studied the impact of precise, numeric descriptions on public beliefs” misrepresents what Maibach & al did if we insist, like opluso does, on precision.

    Notice, BTW, that “definitely 97%” or “extremely likely in the high 90s” suggested earlier by opluso have not been tested by Maibach & al.

  298. On a more general note, opluso’s “questions/criticisms” might very well go against the whole idea of using science to improve scientific communication:

    Scientists and science communicators have appropriately turned to the science of science communication for guidance in overcoming public conflict over climate change. The value of the knowledge that this science can impart, however, depends on its being used scientifically. It is a mistake to believe that either social scientists or science communicators can intuit effective communication strategies by simply consulting compendiums of psychological mechanisms. Social scientists have used empirical methods to identify which of the myriad mechanisms that could plausibly be responsible for public conflict over climate change actually are. Science communicators should now use valid empirical methods to identify which plausible real-world strategies for counteracting those mechanisms actually work. Collaboration between social scientists and communicators on evidence-based field experiments is the best means of using and expanding our knowledge of how to communicate climate science.

    http://www.culturalcognition.net/browse-papers/making-climate-science-communication-evidence-basedall-the-w.html

    Why not lukewarmingly applaud what Dan suggests?

  299. Reich says:

    Tol: “From the very start, Cook 2013 was a PR stunt rather than serious research. It is small wonder that this debate is had in public.”

    Yes, designed for PR. Cook et al: “An accurate perception of the degree of scientific consensus is an essential element to public support for climate policy”.

    However, you still haven’t shown it isn’t serious research at the same time. You still haven’t shown their numbers are invalid. You even say you agree with the estimate. Anything else is just blathering,

  300. John Hartz says:

    snarkrates: Now that the “pause” meme has crashed and burned, the climate science denier drones need something else to swarm to. 🙂

  301. John Hartz says:

    Here’s an interesting assessment of the value of Cook 2013:

    “It’s important to have [it] on the record,” said Will Cantrell, professor of physics at Michigan Technological University, who was not part of the study. “I don’t think any one study is going to change a lot of people’s minds, but it’s better to have the information than to not have it,” he told ThinkProgress.

    Scientists Just Confirmed The Scientific Consensus On Climate Change by Alejandro Davila Fragoso, Climate Progress, Apr 13, 2016

  302. Vinny Burgoo says:

    National Badger Week doesn’t start until 25th June.

  303. Ken Fabian says:

    The consensus study is political in the sense that fact checking can be political; take away the campaigning to undermine public trust in climate science and obstruct government policy based upon it – campaigning based on misinformation, that Richard Tol is an active participant in – and there would be no incentive for Cook and others to do it. Given the ongoing success of such organised campaigning some well done fact checking is very welcome.

    As far as I am aware government commissioned studies and reports from relevant agencies, institutes and organisations would meet or exceed such a ‘high 90’s percentage’ bar for strong scientific agreement on the fundamentals; they ignore such reports and advice… not so much at their own peril, but at the peril of dangerous and damaging consequences affecting the entire global population, the remnant natural ecosystems and a productive global economy. And people like Richard Tol, who readily pass over the extraordinary and blatant use of misinformation and far more dubious ‘research’ than this by his fellow climate action obstructors is blithely carrying on as if this is a simple academic disagreement.

  304. John Hartz says:

    About an hour ago, John Cook posted the following on the Skeptical Science Facebook page…

    Nice outcome, our Reddit AMA on the scientific consensus on human-caused global warming makes it to the top of the Reddit Science page! https://www.reddit.com/r/science/

  305. John Mashey says:

    VV:
    “Good luck. I get the impression that mitigation sceptics put a lot of energy, time and reputation in their effort to keep themselves in a state of misinformation.”
    Yes, that’s why I said a few people
    See Peusdoskeptics exposed in the SalbyStormn .”
    “By contrast, of the 400+ dismissive commenters (who reject mainstream consensus), about 40% explicitly supported Salby’s erroneous CO2 ideas, seemingly desperate to believe the current rise in CO2 was natural. That idea was rejected by a mere handful, of whom one apologized and said he expected to be downvoted for doing so, and indeed he was.

    Dismissives reacted to Salby’s Macquarie story in varying ways:
    ~5% were consistently cautious or dubious from the start, commendably able to think skeptically on Salby’s story, if not on climate science or his CO2 ideas.
    ~5% accepted Salby’s story at first, but were able to change their minds, at least to being cautious about Salby’s story.
    ~10% said nothing about Salby’s story.”

    Since this was a sample from intensely dismissive blogs, it tells us little about the large population, but it does say that there was about 5-10% of the dismissive posters able to demonstrate some (classical) skepticism, rather than intense pseudoskepticism.

  306. Dikran Marsupial says:

    I wrote “It isn’t even correct that it is only the 97% figure that is being promoted”

    opluso replied Of course, I never said it was.

    Not explicitly, perhaps, but you repeatedly complained about the repetition of the 97% meme and questioned its accuracy.

    There are always issues with measuring opinions. Such surveys should provide margins of error around any results (typically based upon sample & population sizes) because they are necessarily approximations of the “true” numbers. The 97% meme is successful because it leaves out this uncertainty and fits the preferred narrative. It has been repeated over and over. When challenged on specifics the authors can (and do) say that 97% is “consistent with” other surveys.

    From the beginning, Cook (2013) was treated as a political document with the intent of being used to promote a 97% meme. Not “over 90%” but exactly 97%.

    The Abstract for Cook (2016) mentions 97% three times while asserting that it is “robust” and “consistent” with other studies. Repeating the 97% meme is part of the climate communications strategy for creating a political consensus.

    opluso has now shown me beyond doubt that he/she is not interested in substantive discussion and is just engaging in rhetoric. I have better uses of my time than to indulge this. The first indication was when opluso incorrectly accused me of name calling, and didn’t admit error when this was pointed out.

  307. Dikran Marsupial says:

    Richard wrote ” If you want to argue that climate research is flawed, Cook 2013 is exhibit A.

    If you want to look at a flaw in a scientific paper, you still haven’t responded to my post where I point out a clear factual error in your paper. When you write

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    That is factually incorrect as the highest abstractID being 12876 does not mean that there were more than 12465 unique abstracts downloaded. That is a factual error in your paper Richard. You ought to submit a corrigendum.

  308. @john h
    There has indeed been a repetition of moves. Cook 2016 does provide
    some non-news: ducking three critiques on Cook 2013
    some surprises: Cook admitting to two well-known faults in Cook 2013
    a new graph that is wrong: no surprise given the incompetence of the tree hut gang

  309. Richard,

    Cook admitting to two well-known faults in Cook 2013

    Not true.

    a new graph that is wrong:

    Not true.

    no surprise given the incompetence of the tree hut gang

    A bit Rich coming from you.

  310. Dikran Marsupial says:

    Richard wrote “ducking three critiques on Cook 2013” which is rather ironic given that the preceding comment was posted to remind him that he has been repeatedly ducking the discussion of a factual error in his comment paper (it wasn’t the first time he had been reminded either).

  311. Marco says:

    “a new graph that is wrong:”

    Reminds me of Tol’s twice correct JEP paper…and yet he only complains about other people’s supposed incompetence. Matthew 7:4-5 comes to mind.

  312. Marco says:

    *corrected*

  313. @wotts
    Thanks for confirming that three critiques were unanswered.

  314. Richard,
    I did no such thing. I’m trying very hard to not explicitly call you dishonest and trying very hard to not explicitly accuse you of research misconduct. If you carry on like this, my attempts to avoid this might fail.

  315. Dikran Marsupial says:

    Reminder #4

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    is a factual error in Richard’s comment paper, that is an unanswered critique.

  316. @dikran
    We’ve been over this.

    This particular discrepancy was caused by Cook downloading some abstracts twice. This explanation is now, first-hand, in the open literature. Previously, there were second- and third-hand accounts on blogs. This is progress.

    Unfortunately, Cook 2016 does not explain the strange patterns of missing IDs. It does not volunteer information on when the overlapping data were downloaded, nor on whether the deleted abstracts had been rated or not.

    And sadly, Cook 2016 ducks the question about the other discrepancies in their sample size.

  317. @dikran
    I formulated the question as a I did because ERL does not like references to unsubstantiated rumours on blogs.

  318. Dikran Marsupial says:

    Richard you are missing the point ” yet their supporting data show that there were 12 876 abstracts.” is factually incorrect, the supporting data does not show that, all it shows is that the largest abstractID was 12876, nothing more. And yet you assert, as an unquqlified fact that there were 12876 papers. That is a factual error.

    “This particular discrepancy was caused by Cook downloading some abstracts twice. This explanation is now, first-hand, in the open literature. Previously, there were second- and third-hand accounts on blogs. This is progress.”

    No, it isn’t progress. The apparent discrepancy was of no importance whatsoever, and much time has been wasted dealing with your ignorance of uniqueIDs. You can’t ignore an explanation even if it is second hand. You knew when you wrote ” yet their supporting data show that there were 12 876 abstracts.” that this wasn’t true as there is a plausible explanation.

    “Unfortunately, Cook 2016 does not explain the strange patterns of missing IDs. “

    There is no need for this to be clarified, as it is of no importance at all.

    This is all pure hypocrisy, given that your own papers contain much larger discrepancies, e.g. what penalty term was used in fitting your piecewise regression analysis, and you refuse to give a straight answer to that one.

  319. Richard,

    We’ve been over this.

    Indeed. You published a claim in a peer-reviewed paper that turned out not to be true. Not only this, but it was pretty obviously not true prior to you publishing this paper. Also, you were aware that it was probably not true prior to publishing this claim.

    This explanation is now, first-hand, in the open literature.

    Ahh, yes, let’s use the peer-reviewed literature to ask questions, the answers to which are pretty obvious and could be established in a matter of minutes by anyone who thinks about it for a little while. It’s not as if we don’t publish enough already and don’t already have enough to do reviewing papers and grant applications.

    I have no idea whether your other queries are worth addressing or not. My guess is that they aren’t – it’s the continued confusion of a survey to rate abstracts (which it is) with a survey to rate abstracts raters (which it isn’t). They’re the standard type of things one sees on blogs, not the kind of thing one would expect from a seasoned academic. Of course, if one was dealing with someone who one regarded as decent and as someone who acted with integrity, one might put some effort into addressing these. Given that you appear happy consorting with those who steal and publicise information that is not intended to be in the public domain, I think it is reasonable to view your behaviour as lacking in decency (there are other examples, such as your conduct with regards to early criticism of your 2009 paper, your accusations against Frank Ackerman, your accusations against your previous employer in Ireland, your behaviour towards Bob Ward).

    Also, now that you have published a claim in a paper that is not true, and that you almost certainly knew to not be true, I think it is fair to view you as lacking in integrity (especially as I did point out this issue to you well before the paper appeared). Therefore, I think it is quite reasonable to regard you as someone who is not in a position to expect responses from John Cook – or anyone else related to the consensus project – to your questions. I appreciate that this will not stop you from continuing to make the accusations that you’re making. However, I will happily continue to point out that you appear to not be in a position to expect a response from the authors.

  320. Dikran Marsupial says:

    Richard, you wrote “yet their supporting data show that there were 12 876 abstracts.”

    Do you think that this unqualified statement, given without any caveats, is factually correct? “Yes” or “No”.

  321. Richard,

    I formulated the question as a I did because ERL does not like references to unsubstantiated rumours on blogs.

    It wasn’t a question. Pretending that it was doesn’t change that you made a claim in a peer-reviewed paper that turned out not to be true, and that you almost certainly knew to not be true prior to publishing this claim.

  322. Dikran Marsupial says:

    Richard “@dikran
    I formulated the question as a I did because ERL does not like references to unsubstantiated rumours on blogs.”

    That doesn’t mean you cannot give the alternate explanation as a caveat without referencing the blog, or give a more qualified version of your question, e.g.

    ” yet their supporting data show that the highest abstract ID actually assigned was 12 876, which implies there may have been as many abstracts downloaded, although it is possible that some were deleted due to duplication”.

    The reason that you didn’t do that it because it would demonstrate that your complaint is merely footling.

  323. Dikran,
    And let’s also stress that the way in which Richard formulated the question, turned it into something that was no longer a question, but a claim that turned out to almost certainly not be true – something Richard was at least aware was a possibility prior to formulating his question as an untrue claim.

  324. Perhaps I should have written “yet their supporting data show that there were 12 876 abstracts (including potential duplicates).”

    But then I should also have written that Cook 2013 rated 11 944 abstracts (including three duplicates).

    Which then of course raises the question why some duplicates were removed but not others.

  325. Richard,
    Perhaps you simply shouldn’t have written it in a way that was almost certainly going to end up with you publishing a claim in a peer-reviewed paper that turned out to almost certainly not be true, and that you almost certainly knew was not true prior to publishing the claim in the peer-reviewed paper?

  326. Dikran Marsupial says:

    Richard “Perhaps I should have written “yet their supporting data show that there were 12 876 abstracts (including potential duplicates).”

    Right, so do you agree that your actual wording was factually incorrect?

  327. Dikran Marsupial says:

    Richard wrote “But then I should also have written that Cook 2013 rated 11 944 abstracts (including three duplicates).”

    This is a very silly justification for not putting a caveat in “yet their supporting data show that there were 12 876 abstracts”. The reason of course is that one is intended as a criticism of Cook16, and the other would just be footling pedantry (3 out of 11944 is not going to have a substantial effect on the conclusions). If you are going to make a criticism, then the criticisim needs to be fair and moderate (unless you are happy for criticisms of your work to be unfair or immoderate).

    “Which then of course raises the question why some duplicates were removed but not others.”

    This is just silly, the reason is obviously that they were not detected. No study is ever perfect, because the human beings performing them are never perfect. Raising a question about the cause of something that affects three datapoints from nearly twelve thousand is obviously pedantic to the point of sophistry.

    Do you agree that your actual wording was factually incorrect?

  328. Dikran Marsupial says:

    Lets insert Richard’s revision:

    Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts (including potential duplicates). A later query returned 13 458, only 27 of which were added after Cook ran his query (Tol 2014a).

    the reader then thinks “so the difference between 12876 and 12465 is probably down to the removal of duplicates, so what?” and “the difference between 12458 and 12876 is probably due to the bibliographic database being updated between the two queries (i) so what, that sort of thing happens all the time? (ii) in what way is that the responsibility of Cook et al”?. In other words, adding the qualification shows that the question was footling and not really worth discussion, never mind publication.

  329. Dikran Marsupial says:

    Sorry, the 12458 should have been 13458 and I should have closed the italic tag after the revised quote. [Mod: italics tags fixed]

  330. The reason for the difference between 13458 and 12465 is that Richard includes both the science citation index and the social science citation index in his search. Cook et al. (2013) categorised social science papers as not climate related and did not rate them. To almost anyone else, this would suggest that they did not include the social science citation index in their search. This does not seem to be obvious to Richard.

    If you are going to make a criticism, then the criticisim needs to be fair and moderate (unless you are happy for criticisms of your work to be unfair or immoderate).

    Indeed, but my impression is that Richard really does think that there is one rule for him and another rule for everyone else.

  331. Dikran Marsupial says:

    ATTP ah, in that case the question becomes footlingly footling.

    Richard Tol, do you agree that your actual wording was factually incorrect?

  332. lerpo says:

    Perhaps I should have written “yet their supporting data show that there were 12 876 abstracts (including potential duplicates).”

    Depends on whether you value your reputation more than your spot on the the GWPF advisory council.

  333. Joshua says:

    Dikran –

    ==> Richard Tol, do you agree that your actual wording was factually incorrect?

    Was that a joke?

  334. Willard says:

    > Perhaps I should have written […]

    Perhaps you should have dropped that one, Richie.

  335. Dikran Marsupial says:

    Joshua, no, it is a straight question, which would benefit Prof. Tol to answer directly and unequivocally (although I accept that is perhaps unlikely).

  336. Joshua says:

    Dikran –

    Well, allow me to just recommend that you not hold your breath waiting.

  337. Willard says:

    At least don’t bate it.

  338. opluso says:

    DM:

    I respect your knowledge of statistical matters and appreciate your technical input. However, the fact that you criticised my wise-cracking while tacitly tolerating similar behavior by your allies suggests that you are vulnerable to the same impulses as the rest of us. Welcome to the internet.

    DM wrote:

    “It isn’t even correct that it is only the 97% figure that is being promoted”

    opluso replied Of course, I never said it was.

    Not explicitly, perhaps, but you repeatedly complained about the repetition of the 97% meme and questioned its accuracy.

    You impugn my character even as you hold yourself out to be above the fray. So be it. But at least be good enough to take your own advice and resist the temptation to falsely claim I am saying something I didn’t say.

    BTW opluso, you repeatedly comment on other peoples motivations for doing things. You would be better off criticizing what they actually say, rather than your opinion of their motivations. The golden rule applies here if you speculate uncharitably about motivations of others, that implies you do not object to others speculating uncharitably about yours.

    You did not provide any of the “repeated” examples of my improper behavior but perhaps you were referring to the other part of the particular comment you cited, where I stated:

    Even in the absence of flawed methodology (or sloppy application), do I think the Cook papers are misleading? Yes, in the sense that pseudo-precision is misleading. Yes, in the sense that they are part of a communications strategy specifically intended to push particular policy goals.

    You seem unwilling to admit, or unable to grasp, that I make a distinction between the trivially true high-percentage consensus explored by Cook (and several other studies) and the politically motivated effort to support carbon control policies. That a consensus exists is an abstract concept with relatively low policy efficacy. Therefore, proponents of carbon/GHG reduction policies strategically desire a more efficacious (i.e., convincing to public and policy-maker) numeric meme. Thus the recurrent resort to the “precisely 97%” meme (by NASA, Pres. Obama, the media, etc.).

    Interestingly, when I cited Pres. Obama’s recent Tweet of the 97% meme as supporting evidence for my contention, aTTP initially assumed it was the original 2013 Obama Tweet that had gratuitously expanded the Cook meme to encompass “dangerous” warming. Some scientists, it would appear, are so involved in the policy debate they can recall 3 year old tweets. But I suppose that doesn’t prove anything, does it?

    You stated:

    Note that error bar on the Cook et al study is sufficiently small that to summarise it as 97% is not at all misleading (I suspect they may even be able to write 97.0%). This is making a mountain out of a molehill. The uncertainty in this study result is negligable in a “public communication of science” setting.

    Whether or not uncertainty SHOULD be fully and honestly expressed in public is something we can debate. But I strongly disagree that it is a negligible factor in public communications. Perhaps you have strayed beyond your area of expertise.

    Regardless, the 97% meme is not, IMO, the result of brevity or laziness. It is designed and deployed for the purpose of changing people’s minds. Pertinent research informs us that such techniques work. YMMV.

  339. John Hartz says:

    Willard: How does Climateball address the “grasping at straws” tactic employed by Tol?

  340. Willard says:

    You know I don’t like this kind of piling on, JH.

    Audits never end, that’s all.

  341. BBD says:

    opuloso

    You are wittering. The facts are simple:

    – a near-universal scientific consensus exists

    – the public have been misinformed on this point by vested interest over a number of years

    – C13 and other studies attempt to redress this fact by presenting a quantified estimate of the level of scientific consensus

    End of.

  342. Yes, let’s avoid piling on. This post does discuss one of Richard’s papers and so we should be thankful that he is willing to spend time discussing it with us here.

  343. opluso,
    Also, it may be true that an understanding of the level of consensus can play a role in influencing the public, and policy maker’s views, about a topic (and hence influence policy making). That doesn’t, however, imply that anyone has engineered some kind of apparently high level of consensus so as to influence policy making. I don’t quite know if this is what you were implying, but it does seem this way.

  344. BBD says:

    opuloso

    and the politically motivated effort to support carbon control policies.

    Physics is the motivation for mitigation policy, not politics.

    Politics is the motivation to *oppose* mitigation policy.

    Let’s get the facts straight.

  345. Willard says:

    > Pertinent research informs us that such techniques work.

    Techniques such as using numeric statements.

    Yup.

    Billions upon billions of black helicopters.

  346. Willard says:

    > when I cited Pres. Obama’s recent Tweet

    The Obama tweet meme again:

    This account is run by Organizing for Action staff. Tweets from the President are signed -bo.

    https://twitter.com/BarackObama

    The truth is out there.

  347. Willard says:

    > I make a distinction between the trivially true high-percentage consensus explored by Cook […]

    How can a consensus be trivially true?

    Note that opluso makes more than a distinction.

  348. John Hartz says:

    Willard & ATTP:

    If you are going to flag me for “pilling on” then you should also flag both Tol and opluso for excessive reptition.

  349. JH,
    I’m giving Richard – and maybe opluso – a bit more leeway than I might others. Partly, because if it degenerates on all sides, then the discussion can’t really continue. Partly, because it’s useful to highlight how their arguments (in some cases) appear to be based on things that are either not true, or simplistic interpretations of what actually took place. It can be instructive 🙂

  350. Dikran Marsupial says:

    Opluso wrote “You impugn my character

    “Not explicitly, perhaps, but you repeatedly complained about the repetition of the 97% meme and questioned its accuracy.”

    is not impugning your character, it is pointing out that what you had previously written implied that you were objecting to the promotion of the 97% figure to the exclusion of (essentially all else).

    You are turning this from a discussion of the substantive issues regarding the paper into a rhetorical debate, complete with personal attacks e.g. “Perhaps you have strayed beyond your area of expertise”. Sorry, I have better things to do with my time.

  351. Willard says:

    > If you are going to flag me for “pilling on” then you should also flag both Tol and opluso for excessive reptition.

    That’s not what “piling on” means, JH, and you’re playing the ref.

  352. John Hartz says:

    ATTP:

    Your site, your call.

    I do, however, suspect that a lot of people who were following the discussion at the beginning of this thread have since tuned out because of the repititive nature of the discourse.

  353. John Hartz says:

    Willard: You need not tell me what I already know. I never said that “pilling on” equaled “excessive repitition.”

    If I am “playing the ref”, what the heck are you doing?

  354. Willard says:

    There’s nothing special about the repetitiveness of this thread, JH.

    C13 is being discussed since at least May 2013.

    How many people has the time and the fortitude to read all this?

  355. Okay, maybe we can drop this. All Willard was getting at was that we shouldn’t all just pile onto Richard. I think that’s a fair request, even if we think it might be – at times – deserved.

  356. Willard says:

    > If I am “playing the ref”, what the heck are you doing?

    I don’t think “playing the ref” means what you think it means, JH.

    Thank you for your tu quoque, which yet again plays the ref.

  357. Dikran Marsupial says:

    I just thought I’d make a comment on the (prseudo-) precision of the 97% figure and the difficulty of public communication of science with an audience that is at least partly hostile.

    In science we generally aim to express numerical information using the significant digits, convention, i.e. only using enough digits as are meaningful, given the measurement resolution (i.e. error bars). Numbers are often rounded to avoid spurious implied precision. Cook13 seems to do this, giving the figures 97.1% for the survey of papers that expressed a position and 97.2% for the survey of author self-reported papers. This implies that the uncertainty of the estimates was low enough to resolve the third decimal digit in both cases.

    Now, this makes it clear that saying just 97% slightly understates the actual estimated consensus in both cases, which is in accordance with scientific skepticism (i.e. don’t over-state your evidence). Is it misleading to understate it this way? I would say “no”; for a start the .1% and .2% in each case is not going to make a substantial difference to anybody reconsidering their position on this issue. Secondly, Cook13 contains two studies, not one, and their estimate only agree to the first two decimal places, but not the third. This means that just 97% accommodates some of the structural uncertainty in the question in addition to the sampling uncertainty arising from the finite samples of data.

    So, given that the public communication on this issue has deeply hostile elements, the question ought to be how can we phrase this in an accessible but accurate manner, to which a hostile opponent cannot object? I think the 97% study does this as well as is actually possible, given that even the two studies contained in Cook13 don’t agree to more than the first two digits, several other studies give a figure of about 97%, but I rather doubt they agree on the third decimal digit either, and the more different estimates from other studies demonstrate that the third decimal digit is smaller than the structural uncertainties anyway.

    Personally I don’t think this complaint has much merit.

    Footnote: I once had a referee complain about spurious precision in a table in one of my papers. The reason I gave more figures than was significant was that it helps people trying to replicate my results to be confident that they had implemented it correctly (as an error in their implementation might result in an difference that is smaller than the sampling resolution of the statistic – but where near bitwise replicability might be possible). I did however also give the error bars to make the actual precision explicit. Also if someone wants to perform a rank based test then you perhaps need the actual scores, rather than the rounded ones. This shows that no matter how hard you try to be informative, you can’t please everybody!

  358. pbjamm says:

    Willard is playing (the roll) of the ref.

  359. @dikran
    No, I do not accept that. In this context, the number of abstracts includes duplicates. After all, Cook 2013 counts duplicates too.

    The issue, however, is not the number of abstracts, or why they were removed, but rather whether they were removed before or after rating.

    The duplicates that were not removed reveal something about rater reliability.

    The duplicates that were removed are sufficiently many to change Cook’s main result. If these were removed at random before being rated, then all is fine. If these duplicates were removed selectively after being rated (e.g., because duplication emerged between the March and May downloads), then things are not so fine.

    Cook 2016 had a chance to reassure us that all is fine.

  360. Richard,

    No, I do not accept that. In this context, the number of abstracts includes duplicates.

    Wow, you answered. Let’s consider the following scenario. Someone adds abstracts to a database. When they get to 1000, they realise that there are about 400 duplicates, which they then remove without reordering the database identifiers. They then continue adding abastracts until there are a total of 12465 abstracts in the database, but with identifiers that go from 1 to 12876, but not sequentially (there are gaps). Were there ever 12876 abstracts in the database?

    The issue, however, is not the number of abstracts, or why they were removed, but rather whether they were removed before or after rating.

    No, this is your issue. Most others regard it as irrelevant and an indication that having recognised that you said somethung untrue in your peer-reviewed paper, you will now move on to the next question and waste everyone’s time for the next 3 years.

    If these duplicates were removed selectively after being rated (e.g., because duplication emerged between the March and May downloads), then things are not so fine.

    Your null hypothesis should be that everything is fine. [Mod: edited for context] Let’s bear in mind that every abstract has initial, intermediate, and final ratings. You are, of course, free to disagree with these ratings. However, for some reason, instead of deciding to comment on the ratings of the abstracts that actually exist, you’ve decided to question the possible ratings of abstracts that don’t actually exist. I guess it is easier to question things for which there isn’t any evidence that you’re wrong, but it’s not particularly constructive.

    Cook 2016 had a chance to reassure us that all is fine.

    I think this is now falling pretty close to the “when did you stop beating your wife?” style of questioning.

  361. Willard says:

    > The duplicates that were not removed reveal something about rater reliability.

    What does it reveal, Richie?

  362. Dikran Marsupial says:

    Richard Tol wrote ” In this context, the number of abstracts includes duplicates.”

    This is obvious sophistry, you actual statement was:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    If the context included duplicates then the word “yet” would not have been used as that implies an unresolved incongruity.

  363. verytallguy says:

    I’m certain it won’t happen, but this might be a good point to stop posting and simply acknowledge the comedy value in watching Richard Tol thrash around attempting to put an academic veneer on his true objection to Cook 13: that it is correct.

    Further on the subject of comedy, I see following the twitter links on the sidebar that Judith Curry has joined in on the laughs front by publishing an “analysis” which claims to show that the more people agree on something, the less likely it is to be true. Honestly.

  364. Dikran,
    Actually, that is true. If Richard has said “Cook et al (2013) state that 12 465 abstracts were input to their database, yet their supporting data show that there were 12 876 abstracts”, he might have a point – sophistry, still, but potentially defensible. Saying “downloaded from Web of Science”, however, makes his claim much more specific and – once again – almost certainly untrue; even if Richard will never admit this. I realise that this is difficult, because admitting to having said something untrue in a peer-reviewed paper motivated by a desire to promote research integrity is going to be a difficult thing to do. Admittedly, this is a good example of why sitting on a high horse judging others can be unwise if you’re not extremely careful.

  365. vtg,
    I saw Judith’s post. If I’m up for it, I might have to comment on this deep philosophical issue that Judith raises. It’s clear that we can never really know anything, given that an increasing level of agreement immediately implies a reducing level of knowledge, while a low level of agreement means we don’t know which bit of knowledge we agree on.

  366. BBD says:

    Shades of Zeno’s paradox.

  367. Dikran Marsupial says:

    Richard Tol wrote ”In this context, the number of abstracts includes duplicates.”

    If the context included duplicates, please could you explain your use of “yet” in your statement:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    given that there is no incongruity (implied by “yet”) between the two figures if duplicates are present?

    This gives Richard the opportunity to show that his response to my previous question was not sophistry, which is what I should have done in the first place, mea culpa.

  368. Magma says:

    Over at Sou’s I passed on the news (news to me, since I had missed a brief note here soon afterward) that regular ATTP commenter Pekka Pirilä died November 24, 2015 at age 70 following a short illness

    An obituary (in Finnish) can be found here: http://espoonsuunta.fi/wp-content/uploads/2015/11/Pekka-Piril%C3%A4-in-Memoriam.pdf

    I believe a closing line from one of Pekka’s final comments here may be appropriate:
    Building on bad opponents leads nowhere.

  369. Magma says:

    @ Willard: That was the note I’d missed. The obituary I linked to has more biographical details for those interested.

  370. Willard says:

    Thanks for the obituary and the quote, Magma.

  371. verytallguy says:

    AT,

    i have a different take on Judith’s – I think her post bears comparison to Edward Lear on climate. Except Lear makes more sense.

    Cold are the crabs that crawl on yonder hills,
    Colder the cucumbers that grow beneath,
    And colder still the brazen chops that wreathe
    The tedious gloom of philosophic pills!
    For when the tardy film of nectar fills
    The simple bowls of demons and of men,
    There lurks the feeble mouse, the homely hen,
    And there the porcupine with all her quills.
    Yet much remains – to weave a solemn strain
    That lingering sadly – slowly dies away,
    Daily departing with departing day
    A pea-green gamut on a distant plain
    When wily walrusses in congresses meet –
    Such such is life –

    My post to that effect seems not to have surfaced over there. Anyway, perhaps we could expand this thread to look for poems related to Tol on Cook. How about

    “Stubborness” – Russell Sivey

    I do it my way
    Stubbornness inherited
    I rarely give in

  372. Magma says:

    In response to a comment by David Appell (“I haven’t been able to identify these Brumberg chaps, but I’d bet [they are nonscientists]”) I looked around for information on the Brumbergs.

    They are brothers from New York in their early 30s and working in finance. (David) Ryan Brumberg ran unsuccessfully for New York’s 14th Congressional District in 2010, as a Republican, naturally. Ryan Brumberg lists a law degree from Stanford and Matthew Brumberg a bachelor’s degree in mathematics and economics from Dartmouth.

    Both brothers are being sued by San Francisco investment firm Thiel Macro LLC, which alleged they misappropriated source code and other confidential information in December 2013 and January 2014 while employed on a team developing a quantitative investment algorithm.

  373. John Hartz says:

    ATTP:

    I think this is now falling pretty close to the “when did you stop beating your wife?” style of questioning.

    Perhaps that line was crossed a few days ago. 🙂

  374. JH,
    Indeed.

    Magma,
    Fascinating. Maybe they’re hoping that as the evidence against them mounts, the knowledge that they’re guilty will reduce? Has anyone highlighted this to Judith?

  375. Andy Skuce says:

    Yes, if it’s a jury trial a unanimous verdict against them will be an exoneration.

  376. snarkrates says:

    It is to Judy’s advantage that everyone else be ignorant. Then she stands out less.

  377. The Very Reverend Jebediah Hypotenuse says:

    2 + 2 = anything except 4.
    Because, consensus.

    CO2 is not a greenhouse gas.
    Because, consensus.

    Richard Tol and Judith Curry are always correct.
    Because, consensus.

    I think we now have formed a consensus: Truth is everything that cannot be agreed upon.

  378. Willard says:

    Impossibilities are the only certitudes.

  379. verytallguy says:

    Revd,

    At least, following Judith’s new paradigm of consensus, Spike Milligan’s famous theory of precipitation has finally been proved correct.

    “There Are Holes in the Sky”

    There are holes in the sky
    Where the rain gets in
    But they’re ever so small
    That’s why the rain is thin.

  380. guthrie says:

    POints go to whoever can identify the science fiction story and legal system within it in which being found guilty means you are innovent and free to go.

  381. pbjamm says:

    @guthrie: The original animated Transformers movie (1986) had the opposite where someone is found innocent and then executed.

  382. opluso says:

    DM:

    In one of your responses to me you incorrectly placed your punctuation/html code and it appears to attribute your quote to me. Not a biggie, but you do seem to be a stickler for details.

    If you believe that I implied (your version of my actual beliefs) then perhaps I have failed to convey my meaning clearly. Given your surprising sensitivities (for an blog participant) at all manner of implied insults (both of you and of others) I will avoid responding to you any further, though I wish to end on a note of agreement:

    Elsewhere you stated (regarding communicating 97% with the public):

    Personally I don’t think this complaint has much merit.

    Thank you. We can agree to disagree and move on.

    All others:

    In reference to criticism of Cook 2013, John Cook wrote in February of 2014:

    ​The purpose of communicating scientific consensus is straightforward. It removes a roadblock.

    Studies in 2011 and 2013 found that public perception of scientific consensus about human-caused global warming is associated with support for mitigation policies. If the public think scientists disagree about whether humans are causing global warming, then they don’t support climate action.​

    That doesn’t mean the consensus message should be simply “trust us”. On the contrary, it should be communicated that the consensus of scientists is built on a consilience of evidence. Ed Maibach recently conducted a test of many different consensus messages and found the most effective variant began with “Based on the evidence, there is 97% agreement…


    http://climatechangenationalforum.org/quantifying-the-consensus-on-anthropogenic-global-warming-in-the-scientific-literature-by-dr-john-cook-et-al-97-1-agree-that-humans-are-causing-global-warming/#comment-397

  383. opluso,
    I’m not quite sure where you going with this, but I don’t really think that a discussion of motives will be constructive, or worthwhile.

  384. Vinny Burgoo says:

    Willard: ‘The truth is out there.’

    Up to a point:

    At about 1:45 Obama said he tweeted about Cook 2013. The words he used don’t, in my opinion, prove that he personally approved the tweet – but who am I? A Cook 2013 co-author, Andy ‘We are all in sales’ Skuce (see above), has said that that was what Obama meant:

    https://critical-angle.net/2014/06/26/consensus-criticism-communication/

  385. Willard says:

    “Just bragging a little bit,” Vinny.

    Once your team does something, you own it, even when it’s good.

  386. Willard says:

    > I’m not quite sure where you going with this […]

    Where he started:

    From the beginning, Cook (2013) was treated as a political document with the intent of being used to promote a 97% meme.

    Right after his “you and him fight” transaction fizzled.

    And now he’s peddling Brandon’s recent discovery without attribution.

  387. Over on Brenda Schultz, more creative representation of the facts by John Cook:
    http://www.hi-izuru.org/wp_blog/2016/04/remarkable-remarks-by-cook-et-al/

  388. Willard says:

    > Who?

    Look at the initials, AT.

  389. I really think that people should read Table 2 of Cook et al. (2013) properly. A few things to consider.

    1. No abstract can end up with more than one rating.

    2. The ordering of the table would appear to suggest a sequence from 1 (Explicit endorsement with quantification – humans primary cause) to 7 (Explicit rejection with quantification – humans are causing less than half of global warming).

  390. John Hartz says:

    ATTP: Richard Tol’s incessant campaigns against Cook 2013 and Cook 2016 likely gets his name in front of many more people than does his own body of work. Sad.

  391. Magma says:

    “Brenda Schultz” for Brandon Shollenberger? I’ve previously noticed certain unpleasant similarities between Richard Tol and Luboš Motl; the difference being that at this point Tol could still conceivably salvage his academic career and reputation.

  392. @magma
    Cf. Joshua.

    @wotts
    Three years later, Cook’s Table 2 is still open for interpretation and reinterpretation. The poor raters had to master the rating system in a matter of hours, and stick with a rigid understanding for months!

  393. Richard,

    Three years later, Cook’s Table 2 is still open for interpretation and reinterpretation.

    Sure, that may even be true, but that doesn’t mean that the interpretations that some are choosing to make are correct, especially if they are based on looking at individual ratings without considering it in the context of the full rating system.

    The poor raters had to master the rating system in a matter of hours, and stick with a rigid understanding for months!

    Hmm, so this is all because of the poor raters?

  394. Dikran Marsupial says:

    Richard Tol, perhaps you missed my question, I’ll repeat it here:

    Richard Tol wrote ”In this context, the number of abstracts includes duplicates.”

    If the context included duplicates, please could you explain your use of “yet” in your statement:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    given that there is no incongruity (implied by “yet”) between the two figures if duplicates are present?

  395. And, let’s be clear, if someone downloads the same abstract twice from a database, they don’t suddenly have two abstracts. They might have two copies of the same abstract.

  396. Dikran Marsupial says:

    John Cook (aparently) wrote “The purpose of communicating scientific consensus is straightforward. It removes a roadblock.”

    Given that the road block is the gap in public perception of the scientific consensus and the reality of the scientific consensus, then I would say that removing the roadblock is a good thing to do. The last thing we need is for society to decide on a course of (in-)action based on an incorrect view of what the majority of scientists believe.

    John Cook also (apparently wrote) “Ed Maibach recently conducted a test of many different consensus messages and found the most effective variant began with “Based on the evidence, there is 97% agreement…”

    I don’t see any problem with this. The 97% figure is well supported by Cook13 and Cook16, as well as some other surveys, and the precision given is reasonable (see my earlier comment). Now if the purpose is to remove a widely held public misconception, using the most effective message seems to be common sense. It seems to me that if you want to complain about the way political agendas are promulgated, then start by complain about those who do it using arguments that are factually incorrect, rather than those that are correct and presented reasonably.

  397. @wotts
    I’m less concerned about the welfare of the raters than about their consistency and reliability. Comparing your interpretation above with John Cook’s on Reddit, one may suspect that different raters read the instructions differently. That neatly fits with the finding that there are systematic differences between raters.

    Most raters also systematically drift over time, just like Cook’s account of the rates is different every time he talks about it.

  398. Dikran Marsupial says:

    Richard Tol wrote “Over on Brenda Schultz, more creative representation of the facts by John Cook: http://www.hi-izuru.org/wp_blog/2016/04/remarkable-remarks-by-cook-et-al/

    Richard Tol wroteFor the record, as a matter of courtesy, I always use the female form to refer to someone of unknown gender.

    … and sometimes when you know they are not female as well. No inconsistency there, no sir/madam! ;o)

  399. Richard,
    Again, I’m confused as to why your concern about their consistency and reliability involves having to say things about their paper that almost certainly isn’t true. Maybe you could explain this apparent inconsistency?

  400. Dikran,
    Reading Richard’s comment on Brandon’s post shows that – despite you pointing this out – Richard still does not understand that the statistical test he used only applies if the two samples are drawn from the same population. Since abstracts and papers are not drawn from the same population, his criticism is invalid.

  401. Sorry, Wotts, the null hypothesis, per Cook 2013, is that paper and abstract ratings are the same. That’s why the latter were presented as a validation of the former.

  402. Dikran Marsupial says:

    Richard Tol wrote “I wrote repeatedly about this. Perhaps I’m not a critic, but I did note that the two ratings strongly disagree.”

    IIRC one gives a consensus of 97.1% and the other gives a consensus of 97.2%. That doesn’t seem like strong disagreement.

  403. Richard,
    No, I do not think that is true (what a surprise?). I do not think Cook et al. made any claim about the self-ratings validating the abstract ratings. In fact, in Section 3.2 they directly compare the abstract ratings of those abstracts, the papers of which received self-ratings, with the self-ratings.

    Ultimately, you appear to be someone who claims extensive statistical expertise and yet seems to be quite happy using in appropriate statistical tests. I have no idea why you think this should be the null hypothesis.

  404. Dikran Marsupial says:

    Richard Tol wrote “Sorry, Wotts, the null hypothesis, per Cook 2013, is that paper and abstract ratings are the same.”

    No Richard, the null hypothesis is the hypothesis to be nullified in order to promulgate your research hypothesis. The author ratings do provide a validation of the abstract ratings, common sense should be enough to tell you that, not a null hypothesis statistical test.

    As has been pointed out, constructing a valid NHST to compare the two figures would not be at all straightforward because the contents of the papers contain more information than the abstracts alone and also because the author ratings have the benefit of a greater understanding of the work. It would be statistically naive (to say the least) to expect the two consensus rates to agree within their sampling uncertainty. They are not based on the same information, so one wouldn’t expect them to be (even asymptotically) identical.

  405. @wotts
    I see. These are independent measures of completely different things that just happened to end up in the same table in the same paper.

    In Cook’s words: “Next, and here is the crucial part that every critic of our paper has conveniently ignored or avoided, we replicated our result by inviting the authors of the scientific papers to rate their own research. If we had mis-characterised a significant number of papers (e.g., rated them as endorsing AGW when they didn’t), then there would’ve been a significant discrepancy between our abstract rating and the self-rating. 1200 scientists responded to our invitation, resulting in over 2000 papers receiving a self-rating. Amongst papers that were self-rated as stating a position on human-caused global warming, 97.2% endorsed the consensus.”

    The intriguing bit of this cite is that Cook has endorsed me as not-a-critic of his work.

  406. Richard,

    These are independent measures of completely different things that just happened to end up in the same table in the same paper.

    No, they are two different ways on which to try and measure the same thing. Jeebus, this is so basic I can’t believe that you’re actually making this argument. Eventually I am going to have to email you at work just to check that you really are Richard Tol, Professor of Economics at Sussex University, rather than someone pretending to be Richard Tol, Professor of Economics at Sussex University.

    Also, where did John Cook say “validate”. Replicating results by considering alternative ways in which to measure the same thing, is perfectly good science. If I measure the temperature with a digital thermometer and a with Mercury thermometer and get a consistent result, I might be more confident that I have a good measure of the temperature.

  407. Dikran Marsupial says:

    Richard Tol wrote “I see. These are independent measures of completely different things that just happened to end up in the same table in the same paper.”

    No, they are measures of very similar, but not identical things; they are both in the paper because they are consilient which suggests the findings are sound. However because they are not identical (even asymptotically), a naive null hypothesis statistical test would be deeply misleading.

  408. @wotts
    Exactly. You try to measure the same thing in two different ways. (That’s what Cook did.) And then you apply a statistical test to the null hypothesis that they are the same. (That’s where Cook needed a bit of help.)

  409. Richard,
    You don’t apply a test to see if they’re drawn from the same population, if they are very obviously not drawn from the same population.

    (That’s where Cook needed a bit of help.)

    I don’t think it is Cook who needs some help with this.

  410. Dikran Marsupial says:

    Richard wrote “And then you apply a statistical test to the null hypothesis that they are the same.”

    Only if you had reason to think they should be identical, which in this case there is no reason to expect them to be (indeed we would expect there to be some difference).

    The problem with NHSTs is that if the sample size is large you can detect increasingly small effect sizes. Thus if you know apriori that the two things should not be expected to be asymptotically identical, you can always increase the sample size enough to detect a difference, even if it is of no practical significance whatsoever. Is the difference between 97.1% and 97.2% of any practical significance? No of course not. I really shouldn’t have to explain this to someone who has taught statistics.

  411. @wotts
    So what statistical test should be applied then?

  412. Richard,
    To test for what? Whether or not the two samples are drawn from the same population? Well, we know that they aren’t, so I don’t think we need a test to show that they aren’t.

  413. verytallguy says:

    Tol has set out his stall to prove his own intellectual superiority by critiquing Cook. The more abstract and pedantic the nit-picking, the more his prowess is demonstrated. Debating these nits merely serves to offer further opportunities for extending this pointless exemplification of his genius. The tragedy is the self-defeating nature of the debate in the long-term, as the undoubtedly talented Tol becomes notorious, a by-word for trolling, unable to be taken seriously.

    His publishing of an academic critique also allows suitable doubt to be shed on the conclusions of Cook, and objective firmly in line with his membership of the GWPF, an anti-scientific body so nakedly political that its charitable status was removed.

    In the light of this highly predictable narcissistic reprise of many blog threads past, obviously doomed forever to demur from any conclusion or catharsis, it is probably time, yet again, to point out that Tol does, in fact agree with Cook:

    Richard Tol wrote “The consensus is of course in the high nineties. No one ever said it was not.”

  414. @wotts
    What statistical test should be used to infer whether measurement device A returns the same result as measurement device B?

  415. Richard,
    Before moving on, let’s first clarify if using a statistical test to see if two samples are drawn from the same population is appropriate if we know – in advance – that they are not.

  416. @wotts
    But if you know that the two populations are different,why would you compare them in the first place?

  417. Richard,

    But if you know that the two populations are different,why would you compare them in the first place?

    That is essentially my point. However, if I have two different samples, but both of which can be used to determine the same thing, then I might want to see if the results are consistent, or not. I think I did the calculation earlier, but the result from the abstracts is 97.1% +- 0.7%, and for the self-rated papers was 97.2% +- 1.2% (99% confidence intervals). Seems statistically consistent to me.

    Of course, if one was being sensible, you would simply draw the obvious conclusion that whether you assess abstracts, or get authors to rate their own papers, the level of consensus is in the high nineties (as I think you agree). I can imagine some would argue that if the results were not statistically consistent, that they were wrong, but that would be silly.

  418. Dikran Marsupial says:

    Richard Tol wrote “So what statistical test should be applied then”

    None, the difference is too small to be of practical significance, so testing for a statistically significant distance would be pointless.

    Richard Tol wrote “What statistical test should be used to infer whether measurement device A returns the same result as measurement device B?”

    None, we know there should be a difference, we don’t expect the result to be exactly the same, just “similar”.

    Richard Tol wrote “But if you know that the two populations are different,why would you compare them in the first place?”

    To see if they are consilient.

    This is very basic statistics, Grant Foster (Tamino) has written a very good primer on Understanding Statistics”, which I would recommend.

  419. @wotts
    So, you cannot compare the distributions, but you can compare statistics of those distributions?

  420. Richard,
    You can compare whatever you like, but you do have to understand what those distributions represent, how they were produced, the limits of the comparison that you used, and what the results of those comparisons might tell you about the two distributions. Why am I explaining this to you? Surely you get this already?

  421. Dikran Marsupial says:

    Richard Tol would you expect the consensus estimates from the abstract survey to be asymptotically* identical to those from the author survey? Yes or No.

    * i.e. in the limit that all relevant papers were rated, rather than just a finite sample.

  422. @wotts
    No, I do not get you. A statistics is a characteristic of a distribution. If you cannot compare two distribution, then you cannot compare their statistics. If you can compare their statistics, then you can compare the underlying distributions.

  423. Dikran Marsupial says:

    Richard wrote “No, I do not get you.”

    Answering my question might help with that.

  424. Richard,
    It’s pretty clear you’re not getting it. Not really my problem. But I will try and illustrate it once more.

    Imagine you have some piece of information, let’s call it a paper. You can read that paper to determine its position with respect to some topic; maybe you actually ask the authors what position their paper takes. Now imagine you have lots of such papers and you repeat this for all the papers. Let’s imagine that the 3 possible outcomes are endorse, take no position, reject. Once you have your results you can analyse them to determine, for example, the level agreement, and report some level for all those papers that actually take a position.

    Now imagine that instead of considering the papers as a whole, you consider only their abstracts. You repeat the above, but now assessing only the abstracts and again assess them according to endorse, take no position, reject. Again, once you have your results you can analyse them and return, for example, the level of agreement amongst all abstract thats actually take a position.

    Let’s also imagine that the level agreement amongst the papers is very close to that determined using abstracts only. Let’s also imagine that the fraction of each sample that take no position is very different between the abstracts and the papers. Is the latter a surprise and does it somehow bring into question the agreement between the two different analyses? Answers on a postcard.

  425. John Hartz says:

    VTG:

    Tol has set out his stall to prove his own intellectual superiority by critiquing Cook.

    [Mod: Sorry, but I appreciate you may feel this to be true, but there isn’t really evidence for this and they have publicly claimed it is not true.]

  426. @wotts
    Let’s formalize.

    Let’s denote the information in paper i by P_i. There is a sample of papers i=1, 2, …, I. Let’s denote the distribution of information by F_P(P). The consensus statistic is then C_P = f(F_P).

    Let’s denote information in abstract i by A_i. This depends on P_i, so A_i = A_i(P_i). The distribution of A is F_A(A). The consensus statistic is C_A = f(F_A) = g(f_P).

    In words, if you compare C_A to C_P, then you compare a functional of F_A to that functional to F_P. In other words, you compare parts of the distributions.

  427. Willard says:

    > If you cannot compare two distribution, then you cannot compare their statistics.

    Comparing two distributions doesn’t imply they’re the same, Richie.

    Your “completely different” reveals your trick.

  428. Richard,
    Or, you could simply try answering the question that I posed?

  429. Magma says:

    @ Richard Tol: I had overlooked Joshua’s earlier hypothetical example of a “Brenda Schultz”. However his context and usage was different than yours, and could not be construed as an insult, whether intended or not.

  430. Willard says:

    Seems that opluso has learned some French:

    What is odd is that anyone denies or even downplays this fact. […] The paper may be mostly merde but it has been a very useful meme.

    https://judithcurry.com/2016/04/17/the-paradox-of-the-climate-change-consensus/#comment-779247

    His “C13, political” meme is evolving. The impression he had all along is now confirmed.

  431. opluso says:

    Hi Willard. You left out the thread context where Steven Mosher refers to Cook’s paper as “poop”. Probably just another minor oversight on your part.

  432. Dikran Marsupial says:

    oh well, that’s all right then..?

  433. @wotts
    Suppose you rate paper 1, 2, 3 as endorse, no-position, reject.
    Endorsement rate is 0.5.

    Suppose you rate the abstracts of papers 1, 2, 3 as reject, no-position, endorse.
    Endorsement rate is 0.5.

    You may conclude that the endorsement rate is the same. You may also conclude that 2/3 of ratings are different.

    In Cook 2013, the consensus rate is 97% for both papers and abstracts. Yet, ratings are different in 63% of cases.

    Because the consensus rate is not a sufficient statistic, you select a particular aspect of the distribution while throwing away most information in the data. It’s a bit like saying “most everybody looks the same because most people have two ears”.

  434. Willard says:

    > You left out the thread context […]

    Moshpit Made opluso Do It, no doubt.

    The first sentence was far more interesting.

  435. Richard,

    In Cook 2013, the consensus rate is 97% for both papers and abstracts. Yet, ratings are different in 63% of cases.

    Hmmm, I believe that this is only true if you regard an abstract rated as 2 to be different to an paper rated as 1, or an abstract rated as 5 to be different to a paper rated as 6, etc. What is the difference if you only consider endorse, no position, reject?

  436. Willard says:

    > ratings are different in 63% of cases.

    After pulling the “same” trick, you’re pulling the “different” one, Richie.

    There’s “different,” and then there’s “different”. It’s quite possible to take no position in an abstract while endorsing explicitly AGW in a paper. Which means your Kappa stuff is pure crap. If there were substantially more authors who’d reject AGW in their papers than we can see in the abstracts, it would show in the surveys.

    I can’t recall the number of times we’ve been through this.

  437. Dikran Marsupial says:

    Richard, if the ratings of the abstracts were the same as the author ratings, that would imply that (i) the body of the paper carried no relevant information that wasn’t already in the abstract and (ii) the citizen science raters of the abstracts are just as good at deducing the authors position as the authors themselves. These are both obviously absurd assumptions, so why perform a statistical test to determine if they are true?

    This is the problem with “cookbook statistics”, you need to know which recipe you ought to be following.

  438. izen says:

    @-“Suppose you rate paper 1, 2, 3 as endorse, no-position, reject.
    Endorsement rate is 0.5.
    Suppose you rate the abstracts of papers 1, 2, 3 as reject, no-position, endorse.
    Endorsement rate is 0.5.”

    If the papers author rated as ‘reject’ (type 3) are outnumbered by the papers author rated endorse (type 1) by 98-to-2 then that scenario is mathematically impossible.

  439. @wotts
    The consensus rate is 97% for paper ratings. The consensus rate is 97% for all abstract ratings, but 99% for the matched sample.

    63% of abstract ratings differ by at least 1 point, 25% differ by more than 1 point, and 5% by more than 2 points; 0.7% of ratings were rejections in one case and endorsement in the other.

  440. Dikran Marsupial says:

    Richard wrote “0.7% of ratings were rejections in one case and endorsement in the other.”

    This is the only one that really matters for comparing the consensus rates, and as you can see, the difference is not very large, which is presumably why Richard opted for the 63% figure.

  441. Dikran Marsupial says:

    Reminder #3:

    Richard Tol would you expect the consensus estimates from the abstract survey to be asymptotically* identical to those from the author survey? Yes or No.

    * i.e. in the limit that all relevant papers were rated, rather than just a finite sample.

  442. Dikran Marsupial says:

    Reminder #2:

    Richard Tol wrote ”In this context, the number of abstracts includes duplicates.”

    If the context included duplicates, please could you explain your use of “yet” in your statement:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    given that there is no incongruity (implied by “yet”) between the two figures if duplicates are present?

  443. Richard,
    Okay, so what’s your problem?

  444. @dikran
    The consensus rate is defined as (R1+R2+R3)/(R1+R2+R3+R5+R6+R7). By construction, it is insensitive to measurement error. For example, you can reclassify all R1 as R3 without affecting the result. It is a silly statistic.

  445. Richard,
    It’s not silly if it is what you want to know.

  446. I think I’ve just checked Richard’s numbers. I get that 0.6% of ratings were rejections in one case and endorsements in another. And I get that 0.65% were endorsements in one case and rejections in the other. Okay, so what?

  447. Dikran Marsupial says:

    Richard, that is not answering my questions. Have you never heard of hierarchical classification? The existence of confusion between sub-categories doesn’t mean the higher levels of the hierarchy are meaningless.

    For example, if you are interested in classifying cycads, then the top level classification could be to distinguish between the Cycas and Zamiaceae families. Does it really mean the classifier cannot distinguish between Cycas and Zamiaceae because it has difficulty distinguishing Cycas revoluta from Cycas spherica? No, of course not, that would be silly, your argument is no better.

  448. Willard says:

    > 63% of abstract ratings differ by at least 1 point

    The difference between 1 and 2 is not the same as the difference between 2 and 1, Richie. The difference between 1 and 2 is not the same as the difference between 3 and 4. Et cetera.

    Is that how you got a Kappa of “8%”?

  449. opluso says:

    Willard:

    Cribbing from Curry is OK but it would be more impressive if you could score your points with full and accurate quotes. Let’s see if I can find an appropriate Willard quotation for this situation:

    INTEGRITY ™ – That’s What It Takes

    ​Yet even after disregarding your own aphorism, you still weren’t finished with your ​cryptic ​insinuations​:​

    The first sentence was far more interesting.

    Yes, it was. So I will quote myself, as you seem reluctant to do so:

    Cook’s survey was specifically intended as a climate communications tool. What is odd is that anyone denies or even downplays this fact. Even Cook placed his own paper in this context.

    In other words, the same point I’ve made on this blog, ​to a resounding chorus of confirmation bias boos​.

    And here are a couple of Mosher quotes so that you will have even more reason to blush and seek the fainting couch. Taken from the lead post of the thread to which I responded (nesting is your friend).

    Cooks study is a pile of D​ung.
    Said so from day 1. You need to read more

    Example: Folks like Me and Richard Tol, both believe in AGW
    Both recognize the danger
    Both think Cooks study is P​oop.

    You may now recontextualize​ to your heart’s content.​

    DM:

    oh well, that’s all right then..?

    I thought we were still on a timeout. Does this mean we can play together again? Good, because I missed you, too.

  450. Maybe we could avoid going over old ground again, and again, though.

  451. Well, Wotts, you’ve looked at the data. If you’re happy with that quality …

  452. Willard says:

    > Yes, it [the first sentence] was [interesting]. So I will quote myself,

    No need, for I was referring to the one I quoted:

    What is odd is that anyone denies or even downplays this fact.

    Cf. also the comment I left at Judy’s:

    https://judithcurry.com/2016/04/17/the-paradox-of-the-climate-change-consensus/#comment-779250

    Read harder.

    ***

    > You may now recontextualize​ to your heart’s content.​

    I underlined that opluso used the word “merde” because there was a discussion about an old French saying earlier. The Moshpit attribution was unrequired because opluso used a conterfactual. It is already obvious that opluso is economical with his commitments, e.g.:

    It continues to be defended as if it were a political document since the methodology is, depending upon your political perspective, either fatally flawed or so nearly perfect as to merit no further argument.

    https://andthentheresphysics.wordpress.com/2016/04/13/consensus-on-consensus/#comment-76287

    Sounce Science ™ at work.

  453. opluso,

    Cook’s survey was specifically intended as a climate communications tool.

    Why do you think people publish papers?

  454. BBD says:

    To mislead the public about the strength of the scientific consensus, obviously, ATTP. Do keep up.

  455. Willard says:

    That can’t be that, BBD, for as Richie himself said:

    The consensus is of course in the high nineties. No one ever said it was not

    Note that Richie’s using a verbal description.

    Also recall that according to opluso, the consensus is “trivial,” whatever that means.

  456. Joshua says:

    Just think. Children in the future might think that the consensus is 97% when in reality it is on the high nineties.

  457. The Very Reverend Jebediah Hypotenuse says:

    Richard Tol (@RichardTol) says:

    Well, Wotts, you’ve looked at the data. If you’re happy with that quality …

    Someone is not happy with the quality? Aw, shucks.

    Here’s an idea…
    Instead of Richard Tol being all unhappy about the quality of data use in the published work of others, why doesn’t he publish his own super-duper-high-quality-data study?

    Gremlin-free, if possible.

    Sometimes, the moral high-ground gets really over-crowded.

  458. why doesn’t he publish his own super-duper-high-quality-data study?

    I think that would take actual effort.

  459. Willard says:

    I’d settle for Richie to free the 300.

  460. John Hartz says:

    ATTP, Dikran, Willard, et al:

    A reliable source* inside Deniersville has informed that you have been interacting with the Artful Dodger algorithim created by Richard Tol in 2013 and not with Richard Tol himself.

    *A double-agent recruited by John Cook a few years ago.

  461. guthrie says:

    Pbjamm- hmm, that’s interesting, but a little before I was paying attention to such things.
    I was rather thinking of “THe Dosadi Experiment” and the Gowachin legal system. Being found innocent meant you were in league with the powerful, so you should be executed, and being guilty meant you were standing up for ideas or people, and so should go free. Also the losing lawyer gets killed.

  462. Francis says:

    At 11:45 RT wrote: “A statistics is a characteristic of a distribution. If you cannot compare two distribution, then you cannot compare their statistics.”

    This appears to me to be a straightforward admission that Andrew Gelman’s critique of his earlier work was correct.

  463. opluso wrote “I thought we were still on a timeout. Does this mean we can play together again? Good, because I missed you, too.”

    You are just demonstrating that my assessment was correct and that you are only interested in rhetoric and not substantive discussion.

  464. Richard Tol wrote “Well, Wotts, you’ve looked at the data. If you’re happy with that quality …”

    Reminder #4:

    Richard Tol would you expect the consensus estimates from the abstract survey to be asymptotically* identical to those from the author survey? Yes or No.

    * i.e. in the limit that all relevant papers were rated, rather than just a finite sample.

    Reminder #3:

    Richard Tol wrote ”In this context, the number of abstracts includes duplicates.”

    If the context included duplicates, please could you explain your use of “yet” in your statement:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    given that there is no incongruity (implied by “yet”) between the two figures if duplicates are present?

  465. Richard’s also reversed the burden here. I don’t need to really have views about the quality of Cook et als. data to point out that Richard’s criticisms are weak and – in some case – almost certainly untrue. I also don’t need to have views about the quality, to point out that all the relevant data is available. I also don’t need to have views about the data to point out that there are now a number of studies that produce broadly similar results – the level of consensus is probably in the high nineties (if one considers papers/abstracts or relevant experts). I also don’t need to have views about the quality of the data to point out that – as yet – noone has done an equivalent study to show that the conclusion that there is a high consensus with respect to AGW is somehow wrong.

    To be clear, I don’t really have any issues with the data quality. Having had a bit of a closer look last night, it appears that the abstract raters were better at identifying papers that endorse the consensus, than identifying papers that reject it. However, even if one takes a very conservative view as to how wrong the number of reject abstract could have been (i.e., how many could they have mis-rated and, consequently, under-estimated the number of reject abstracts) it would still only change the consensus at the percent, or so, level.

  466. verytallguy says:

    [Mod: Although this is amusing, I get your point, and you were technically applying it to everyone else, I’m still going to moderate it. I try to stick to the idea that if I mention someone’s paper in a post and they appear in the comments, then I should try to treat them with some courtesy (I don’t always succeed) even if they would not do the same themselves.]

  467. verytallguy says:

    AT, very wise.

  468. @wotts
    There I was thinking that data quality is an important issue in any empirical research.

  469. Richard,
    I’m trying to see where it is that I suggested that data quality isn’t important. Oh, that’s right, I didn’t say it wasn’t important. I think I may have pointed out that you’re reversing the burden.

  470. Willard says:

    Data quality is more than an issue, Richie.

    It’s a concern.

    Thank you for your overall concerns.

    If you could tell us how you got your Kappa of “8%,” that’d be nice.

  471. The Very Reverend Jebediah Hypotenuse says:

    Tol:

    There I was thinking that data quality is an important issue in any empirical research.

    OK, Richard, let’s go there…

    First, you haven’t proven a case that data quality is lacking in Cook13.
    You’ve merely insinuated it repeatedly in public venues.

    Second, your drive-by implication that you are the only person on planet Earth who cares about data quality is vapid.

    Third, a literature review is not empirical research.

    Fourth and finally, you have very clearly demonstrated an inability to think clearly about which issues are important and which are not.

    Thanks for stopping by!

  472. pbjamm says:

    Just because the rating system is somewhat subjective does not mean that the quality of their work is poor. I can think of no objective way to rate the position taken in the abstract of a paper with regard to endorsement of a position. If Prof Tol can think of a way to improve the process or do it in a more objective manner then I should think that he would endeavor to redo the study in a way he deems correct. “Your methods suck” is no where near as effective as “Your methods suck and here is my proof that they suck”.

  473. @VRJH
    “First, you haven’t proven a case that data quality is lacking in Cook13.”

    I actually have:
    – there are inexplicable patterns in the data
    – original ratings disagreed in 33% of cases, implying an error rate of 7% in the reconciled data
    – paper ratings and abstract ratings disagree in 63% of cases
    – there are systematic difference between raters
    – there are systematic difference between early and late raters

    All documented: http://www.sciencedirect.com/science/article/pii/S0301421514002821

  474. Richard,

    I actually have:

    No, you haven’t. I think this is self-evident. If you’d actually proven something, it would be a great deal more convincing.

    – there are inexplicable patterns in the data

    There might be patterns you don’t understand, but I don’t remember the “all peer-reviewed papers must have patterns understood by Richard Tol” rule.

    – original ratings disagreed in 33% of cases, implying an error rate of 7% in the reconciled data

    IIRC, your error correction calculation implied an initial level of consensus that was greater than 100%. This is self-evidently silly.

    – paper ratings and abstract ratings disagree in 63% of cases

    Yes, we’ve been over this already.

    – there are systematic difference between raters

    Oh no, my goodness, it wasn’t perfect, as I think the paper discussed.

    – there are systematic difference between early and late raters

    Oh my goodness, more evidence of a lack of absolute perfection. What shall we do?

  475. I can’t believe this is still going on…

    ATTP: the author self-ratings show that the rating process isn’t inherently flawed. If it was there wouldn’t have been a difference of only 0.1 percent.

  476. Richard Tol wrote There I was thinking that data quality is an important issue in any empirical research.

    In that case you probably should investigate the obvious outliers in your studies that mean the conclusions drawn from the piecewise linear model are unreliable.

    You still haven’t answered my questions, just as you avoided them on the thread linked above. Ironic that you seek clarifications from Cook et al. but refuse to give them regarding your own work.

  477. Collin,
    As I’ve said before, you can’t fault Richard’s persistence.

  478. Willard says:

    > there are inexplicable patterns in the data

    That concern sure means something, Richie, but what exactly?

    ***

    > there are systematic difference between raters

    The concept of “raters” might deserve due diligence in that claim.

    “Systematic” is also intriguing.

  479. Willard says:

    Here’s the only occurence of “stem” in Richie’s political hit job:

    It is not possible to test whether individual raters systematically deviate from the average.

    No mention that only rater pairs (adjudicated or not) matter for the final ratings.

  480. @wotts
    As I said, there was I thinking that data quality matters.

  481. Vinny Burgoo says:

    Why do people keep hamsters as pets? The consensus view among hamster-owners is that hamster-ownership is driven primarily by the cuteness of hamsters but a few owners and many non-owners (particularly in America) favour other explanations, such as the annoying nocturnal squeak of their exercise wheels being a perverse comfort for lonely insomniacs or that hamsters are pets only while they are being fattened up and the main reason for owning them is to eat them.

    You can’t have too many consensuses about hamsters, particularly if they are strong consensuses. Consequently, we measured three consensuses using two different methods and found that all three are strong.

    Method 1

    We got volunteers to examine 11,944 Tweets containing the hashtags ‘#hamster’, ‘#hamsters’ or ‘#hammieonhiswheel’ and rate them according to the Tweets’ level of agreement with two versions of the hamster consensus: that hamster-cuteness accounts for more than 50% of the motivation for owning a hamster; and a weaker version saying only that hamster-cuteness is a factor in hamster-ownership.

    Here is the rating system used by our volunteers. (Based on Table 2 and our guidelines to raters.)

    Level 1: The Tweet explicitly says that hamster-cuteness is the primary reason for hamster-ownership.
    Example: ‘People keep #hamsters mostly because they are furry and have big eyes.’
    (Guidance: Include here if the Tweet says that hamster-cuteness is >50% of the motivation for hamster-ownership.)

    Level 2: The Tweet explicitly says that hamster-cuteness is a reason for hamster-ownership.
    Example: ‘#Hamsters! Their furriness and big eyes contribute to their popularity as pets.’
    (Guidance: Mention of furriness and big eyes without explicitly tying those attributes to cuteness relegates a statement to implicit endorsement, Level 3.)

    Level 3: The Tweet implies that hamster-cuteness is a reason for hamster-ownership.
    Example: ‘Making #hamsters less furry would make them less popular as pets! #LOL’
    (Guidance: For Tweets that don’t fit Levels 1 and 2 but can safely be assumed to have been motivated by concerns about hamster-cuteness driving hamster-ownership – e.g. Tweets about ways to reduce hamster-cuteness or about modelling hamster-ownership or about what happened to very old hamsters before cuteness, as understood by humans, even existed.)

    Level 4: The Tweet takes no position on hamster-ownership or says that the role of hamster-cuteness is uncertain/undefined.
    Example: ‘Although the extent of cuteness-driven hamster-ownership is inconclusive, here’s #hammieonhiswheel.’

    Level 5: The Tweet implies that hamster-cuteness has a minimal impact on hamster-ownership.
    Example: ‘All or almost all #hamster-ownership worldwide could plausibly be motivated by #hunger.’

    Level 6: The Tweet explicitly minimizes or rejects the view that hamster-cuteness drives hamster-ownership.
    Example: ‘There is no evidence that it’s dangerous to keep #hamsters as pets.’
    (Guidance: Explicitly rejects or minimises cuteness-driven hamster-ownership without putting a figure on it.)

    Level 7: The Tweet explicitly states that hamster-cuteness is a minor reason for hamster-ownership.
    Example: ‘The cutesness of #hamsters as a driver of hamster-ownership is negligible in comparision to other motivators.’
    (Guidance: Explicitly rejects or minimises cuteness-driven hamster-ownership with a specific figure.)

    Our results: 87% of #hamster etc. Tweets saying explicitly that hamster-cuteness’s contribution to hamster-ownership was either >50% or 50%; and 97% of Tweets expressing a view on whether hamster-cuteness played any sort of positive role said that it does.

    (Some people have said that our rating system was confusing and self-contradictory. Well, all I can say to that is that those people must be hamster-haters! Plus see Method 2, for example.)

    Method 2

    We asked the authors of about half of our #hamster Tweets whether they think that hamster-cuteness is a thing. 97% of those who replied said that they do.

    So take that, hamster-haters! It’s called science. Get used to it.

    *

    Not a perfect analogy but it might help some here see the flaws in the Cook 2013 rating system.

  482. Willard says:

    > Not a perfect analogy but it might help some here see the flaws in the Cook 2013 rating system.

    Telling what are those flaws might be more expedient, Vinny.

    Jim Bouldin already tried storytelling. If memory serves right, it ends up with the usual “why not ask those who did attribution studies instead” or something along those lines.

  483. Richard,

    As I said, there was I thinking that data quality matters.

    I agree, it does matter. What doesn’t really matter are the opinions of a single Economics Professor from the University of Sussex, however much he might think that his views trump the views of everyone else.

  484. Vinny Burgoo says:

    …was either >50% or 50% said it was >50%.

  485. Vinny Burgoo says:

    Willard, I have already tried the explicit route here. Didn’t work.

    But I might spell out the flaws again later if nobody can spot anything wrong with the hamster system.

  486. Let me repeat the categories here, and I’ll have to type it out, which will be tedious.

    1. Explicit endorsement with quantification – Explicitly states that humans are the primary cause of recent global warming.

    2. Explicit endorsement without quantification – Explicitly states humans are causing global warming or refers to anthropogenic global warming/climate change as a known fact.

    3. Implicit endorsement – Implies humans are causing global warming. E.g., research assumes greenhouse gas emissions cause warming without explicitly stating humans are the cause.

    4. …….

    5. Implicit rejection – Implies humans have had a minimal impact on global warming without saying so explicitly E.g., proposing a natural mechanism is the main cause of global warming

    6. Explicit rejection without quantification – Explicitly minimizes or rejects that humans are causing global warming

    7. Explicit rejection with quantification – Explicitly states that humans are causing less than half of global warming

    So, one of the criticisms is that it’s not clear if 2 and 3 are more than 50%, or could also be less than 50%. Well, this is clearly a sequence from 1 (strongest level of endorsement) to 7 (strongest level of rejection). If the strongest level of rejection is anything that explicitly quantifies the anthropogenic contribution to be less than 50%, how does it make any sense to assume that levels 2 or 3 (which are simply the two weaker levels of endorsement) were intended to include abstracts in which the anthropogenic contribution was assessed to be less than 50%. That doesn’t really make any sense as it would imply endorsement categories in which the anthropogenic contribution was less than in the strongest rejection category. Of course, this doesn’t mean that all the raters got this right, but to suggest that 2 and 3 were intended to include anthropogenic contributions of less than 50% just seems bizarre.

  487. VB,

    Willard, I have already tried the explicit route here. Didn’t work.

    What you really mean, I think, is that people didn’t agree. We don’t all have to agree.

  488. Vinny Burgoo says:

    ATTP, for starters, look at the published exemplar for Level 6. All of a sudden, the rating system is about the magnitude of global warming, not its attribution.

    Then look at the Level 1 exemplar, which according to Level 2’s guidelines should be in Level 3. Level 2’s own exemplar too, probably.

    And there’s more.

  489. VB,
    I’m not hugely interested in arguing/discussing this again. If you think the each rating category should be judged in isolation, go ahead. Personally, I suspect that such a set of rating categories have to be viewed in context. I’ve expressed my view in my earlier comment. You’re, of course, free to disagree.

    And there’s more.

    Yes, of course there’s more. The audit never ends.

  490. Vinny Burgoo says:

    Second correction: ‘…was either >50% or 50% said it was >50%’ should have been ‘…was either >50% or 50%’. Apologies.

  491. Vinny Burgoo says:

    Familiar territory, ATTP. You’re happy to proclaim the rightness of Cook 2013 but not to look at its weaknesses, even when they are so fundamental that they render the whole exercise pointless.

    But I’ll butt out again and leave it to the hamsters. Think about them for a while. If anything odd strikes you, let me know and I’ll be happy to help.

  492. Familiar territory, ATTP. You’re happy to proclaim the rightness of Cook 2013

    Where have I done this? I don’t think I have. The closest I’ve come is simply pointing out that I think the level of consensus is high. The key view that I do hold strongly is that if someone thinks a study if flawed in some way, then the appropriate response is to do it properly, not spend years complaining about the original study. Showing Cook et al.is wrong really requires doing another study that gets a very different result. Given how much energy has been expended criticising Cook 2013, it should have been possible to have done this by now. That it hasn’t happened is probably telling.

    but not to look at its weaknesses, even when they are so fundamental that they render the whole exercise pointless.

    I disagree with many of the criticisms, which isn’t the same as not looking at them. It’s also slightly ironic that you claim I’ve proclaimed the rightness of Cook 2013 (when I haven’t really) while you seem comfortable proclaiming the wrongness of Cook 2013.

  493. Vinny Burgoo says:

    Third and last correction, honest. When I said:

    Second correction: ‘…was either >50% or 50% said it was >50%’ should have been ‘…was either >50% or 50%’. Apologies.

    I meant:

    Second correction: ‘…was either >50% or 50% said it was >50%’ should have been ‘…was either >50% or [less than] 50%’. Apologies.

    The ‘less than’ chevron isn’t getting through for some reason.

    (ATTP: I have butted out of substantive discussion for now. Talk to the hamsters.)

  494. Richard Tol wrote “As I said, there was I thinking that data quality matters.”

    As I said, you probably should investigate the obvious outliers in your studies that mean the conclusions drawn from the piecewise linear model are unreliable.

    You still haven’t answered my questions, just as you avoided them on the thread linked above. Ironic that you seek clarifications from Cook et al. but refuse to give them regarding your own work.

  495. John Hartz says:

    To better understand the propaganda game that Richard Tol is playing, read:

    Misinformation spreads rapidly — and intentionally — in right-wing echo chambers. Here’s how they do it

    Lie big, lie often, never back down: Donald Trump, Fox News and the real reason why right-wing lies stick by Ari Rabin-Havt, Salon, Apr 18, 2016

  496. Willard says:

    > I have already tried the explicit route here. Didn’t work.

    A link might be nice.

  497. @wotts
    I agree: My opinion is but one. I therefore invite you to assess the quality of Cook’s data yourself. Stop attacking me for a while. Just sit down, look at what they did, and wonder what you’d do if an undergraduate would want to use these data for a minor paper.

  498. Richard,

    Stop attacking me for a while.

    Firstly, I’m not really attacking you. That’s more your style. Secondly, you have published something that is almost certainly untrue in a peer-reviewed paper, and that you almost certainly knew to be untrue before publishing. I think that’s a pretty clear indication of the quality of your response to Cook et al. (2013). Additionally, your earlier paper implied an initial consensus that was greater than 100%. This is clearly nonsensical and – again – would say something about the quality of your work.

    Just sit down, look at what they did, and wonder what you’d do if an undergraduate would want to use these data for a minor paper.

    I have looked at it. I’ve even rated > 100 abstracts myself. I’ve even repeated a number of your tests. Is the data quality perfect? No. Are the problems so severe that I would dismiss this study? No. Given that I’m more interested in understanding the answer to the question they’re trying to pose, how different could it be? Not much. Of course, if someone actually went a did a new study in a way that was better, that would be great. That noone has even attempted this probably tells us a lot.

    So, Richard, why don’t you sit down and think about this for a while. You’ve conducted a campaign against a paper, the result of which you don’t dispute. During this campaign, your behaviour has been – IMO – not that of a senior academic who wishes to appear objective and wishes to be taken seriously. During this campaign you’ve consorted with people who’ve stolen and publicised private material (in fact, I think you may have even published something from stolen communication). You’ve now managed to publish a paper with a claim that is almost certainly not true. So, if anyone should sit down and assess the quality of some your work, it’s – IMO – you and your work.

  499. Dikran Marsupial says:

    I think Richard should also consider why he continually refuses to answer questions that clarify his position or address the objections to his criticisms.

  500. The Very Reverend Jebediah Hypotenuse says:


    Stop attacking me for a while. Just sit down, look at what they did, and wonder what you’d do if an undergraduate would want to use these data for a minor paper.

    Nobody is attacking you, Richard Tol.
    The work you have published has been criticized and found wanting, just as you are attempting to do to Cook13. See how that works?
    Go please go play your poor-me victim card somewhere else.

    Your insinuation that Cook13 is a failure-quality undergrad paper, and that you happen to be the only person here who can see that, is asinine.

    Your moral posturing is particularly insipid coming from a GWPF member. It’s a pity that you don’t apply your amazing powers of deconstruction to the stuff that comes out of Montford, Ridley or Essex. Oh, well.

    Cook13 is one of many ‘consensus’ papers – that all reach essentially the same conclusion.
    Even you do not dispute that professional acceptance of ‘the consensus’, variously defined is in the 90s.
    Nor does any reasonable and well-informed person.

    Given the above,observers are left to conclude (if they in fact, bother to care at all about your beliefs) that your critique of Cook13, and your performance-art on this blog are the products of an attention-seeking concern-troll.

    The fact is that none of this ‘consensus quantification to arbitrary precision’ BS really matters.

    In case you haven’t noticed, human beings are currently running a global experiment that includes thermal, chemical, and ecological rates of change that have no precedent in the Earth’s history save for mass extinction events.

    That anyone is taking the opportunity to seriously engage with the opinions of Richard Tol on the subject of a 3-year old study by Cook on the declared degree of consensus among a particular set of raters, just goes to show how easy it is to use the internet as a weapon of mass distraction.

    That is all.

  501. John Hartz says:

    The Very Reverend Jebediah Hypotenuse:

    Well said! Bravo!

  502. John Hartz says:

    Fodder for a new OP:

    Non-linear events can affect climate change, Op-ed by Gwynne Dyer, The Border Mail (Australia), Apr 20, 2016

    PS – Although the title of Dyer’s Op-ed is a tad confusing (to me at least), his text is not.

  503. matt says:

    510 comments! ATTP, is this a record?

  504. matt,
    No, surprisingly, it’s not. I think this post still holds the record. However, if I do want lots of comments, all I need to do is write about the consensus 🙂

  505. John Hartz says:

    ATTP: …or “nonsense” papers posted on Curry’s blog. 🙂

  506. Eli Rabett says:

    They shoot overlong comment threads, don’t they?

  507. Shollenberger has now completed part 3 of the Cook Six Demolition Project:
    http://www.hi-izuru.org/wp_blog/2016/04/consensus-chart-craziness-part-3/

  508. Richard,
    Indeed, let’s promote the work of someone who regards stealing and publicising private information as acceptable. Especially given that he seems incapable of understanding the idea of an ordinal scale and hasn’t bothered trying to add the numbers 5.5 and 42.45 together.

    I also noticed that you’re promoting the silly publication bias paper and that you’ve been commenting on Lubos Motl’s post about it. Why would you want to associate with someone who was apparently fired from Harvard for unprofessional behaviour related to publicly smearing other academics? Okay, sorry, silly question.

  509. Alternatively, Wotts, you may want to engage with the arguments put forward.

  510. Richard,

    Alternatively, Wotts, you may want to engage with the arguments put forward.

    With Ridley or with Shollenberger?

  511. Shollenberger (here), Ridley (on the other thread) and Motl (on thread #3 — he’s a physicist like you).

  512. Richard,
    IMO, anyone who starts with accusations of dishonesty, should not expect those he’s accused of lying to then engage with him. This is especially true if that person appears to then ignore their own lapses in integrity. Shollenberger is simply a mouthy blogger who has strong opinions and is virtually never swayed. It would be an utter waste of time. He’s welcome to his views. You’re welcome to promote them. Apart from some vocal minority, I don’t think many others care.

    Ridley didn’t make an argument that can be engaged with. He’s just moaning about the fact that he writes stuff that is then criticised. It’s all a bit pathetic.

    The less said about Lubos Motl the better.

  513. Marco says:

    Sorry, ATTP, but Lubos Motl *should* be mentioned, especially considering Richard’s prior comments about ‘environmentalists’ having to denounce certain people. One wonders why Richard even dares to mention Motl, considering that the latter has repeatedly called for violence against and even the death of certain people he disagrees with.

  514. @marco
    If you go over to Motl’s place, you’d see a sub-discussion on misogyny.

  515. Richard,
    Your point? How does the existence of sub-discussion on misogyny on Motl’s blog address that he has called for violence against and even the death of certain people he disagrees with?

  516. verytallguy says:

    @Tol
    If you go over to Ackerman’s place you’ll see a discussion on obsessive feuds

  517. @wottsywotts
    Marco wondered whether I call out bad behaviour when I see it, and I provided a recent example to ease his mind.

  518. Richard,
    I certainy haven’t found it. Not that I looked very hard. I think Marco’s point was somewhat subtler than that, but feel free to ignore that if you wish.

  519. @wotts
    “subtle” may not be right word. Marco seems to argue that because someone wrote something bad once, all her writings should be ignored.

    Ars Rhetorica used to be on the high school curriculum.

  520. Richard,
    I think that’s a rather simplistic interpretation. I think it’s not just once and it also depends on whether or not the person has since distanced themselves (apologised?) for saying what they said. Also, the implication was that you feel more than comfortable suggesting guilt by association (even when there is no known association, in some cases) while associating with others who’s behaviour leaves much to be desired. It comes across as somewhat hypocritical. However, you’re – of course – free to associate with whomever you would like.

    I also realise that you tend to use “her” when gender is not known (or claim to, at least). However, in this case, we are talking about someone who is almost certainly male.

  521. Richard, since you are back…perhaps you would like to answer these questions:

    Reminder #5:

    Richard Tol would you expect the consensus estimates from the abstract survey to be asymptotically* identical to those from the author survey? Yes or No.

    * i.e. in the limit that all relevant papers were rated, rather than just a finite sample.

    Reminder #4:

    Richard Tol wrote ”In this context, the number of abstracts includes duplicates.”

    If the context included duplicates, please could you explain your use of “yet” in your statement:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    given that there is no incongruity (implied by “yet”) between the two figures if duplicates are present?

  522. Marco says:

    ““subtle” may not be right word. Marco seems to argue that because someone wrote something bad once, all her writings should be ignored.”

    I am quite certain that you, Richard Tol, expressed a clear wish that certain people were to be ostracised because they had done something bad. ATTP was called out by you because he hadn’t done that (even though he had nothing to do with these people).

    What I thus did was to point out your blatant hypocritical behaviour. Motl isn’t ostracised for writing “something bad once”, he is ostracised because he consistently behaves like a buffoon, and has repeatedly recommended violence against people who don’t think like him. He may have something valid to say at times, but he has not earned that anyone listens to him.

    I think the pattern has become quite clear: you attack people for doing something you disagree with, citing all kinds of moral and ethical issues, and then happily defend those you agree with, even though they more blatantly violate those same moral and ethical issues you brought up earlier. Yourself first, of course, since no one should doubt that you should retract your comment to Cook et al 2013 for making statements you knew were likely not true (see DM’s comment).

    ATTP, Richard knows full well that Lubos is male. The two have interacted so many times on Motl’s website that he cannot have missed thos (I notably found no evidence of Tol denouncing Motl on Motl’s latest misogynic thread, but there are so many to choose from, who knows, maybe he did on a slightly earlier thread).

  523. Joshua says:

    I think I may see a pattern:

    From Richard…

    ==> Marco seems to argue that because someone wrote something bad once, all her writings should be ignored.”

    From Matt:

    ==> Yet here he seems to be saying that The Times should censor inconvenient stories.

  524. @Dikran
    #5
    Yes. If the rating of abstracts and papers would be unbiased or at least consistent, then abstract and paper ratings would converge. (In Cook’s case, we know that the matched sample disagrees.)

    #4
    I think you are reading a lot into a three-letter word.

  525. Richard,

    If the rating of abstracts and papers would be unbiased or at least consistent, then abstract and paper ratings would converge.

    That would seem to suggest that abstracts say everything that could be said in a paper. This is clearly not the case.

    I think you are reading a lot into a three-letter word.

    I think you’re dodging the question. Lets remind ourselves of the background. You made a claim in a peer-reviewed paper that is almost certainly not true and that you almost certainly knew to not be true before publication.

  526. John Hartz says:

    I see that Richard Tol’s Artful Dodgeralgorithim is back up and running! Shouild we laugh or cry?

  527. @wotts
    It works fine if the information in the abstract is a subset of the information in the paper. It even works fine if abstract and paper show a subset of all true statements. It only goes wrong if abstracts systematically contradict the corresponding papers.

  528. Richard,

    It works fine if the information in the abstract is a subset of the information in the paper.

    Indeed.

    It only goes wrong if abstracts systematically contradict the corresponding papers.

    Indeed, although you can’t rule out a poorly written abstract.

    Now what is your point? I think we’ve already established that even though some abstract ratings contradicted self-rated papers, the impact of this would be minimal. You wouldn’t be arguing for perfection, would you?

  529. @wotts
    The null hypothesis is that abstract and paper ratings are both unbiased, and the distribution of abstract ratings is the same as the distribution of paper ratings. In Cook’s case, the null hypothesis is rejected.

  530. Richard,

    The null hypothesis is that abstract and paper ratings are both unbiased, the distribution of abstract ratings is the same as the distribution of paper ratings.

    No, it’s not. This would require that the abstracts and papers were the same, which would rather defeat the point of having an abstract.

  531. lerpo says:

    Tol published: “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    To me it seems like he is implying that Cook deleted inconvenient data, but Tol knew that this was not the case. He had been informed that duplicates were removed.

    In his next paragraph he expresses doubt that it would even be possible to explain the incongruity: “It would be of considerable benefit to readers if these four issues would be clarified, if at all possible.”

    But he already knew the explanation.

    As far as I can tell, his only real answer is this:

    “Where the duplicates removed before being rated, or after? This matters in principle, as data may have been deleted. In also matters in practice, because we know that ratings drifted over time: There are statistically significant differences between early rating and late ratings.”

    This answer is not satisfying to me. Publishing something that seems to imply inconvenient results were deleted – when he knew this was not the case – is not the best way to resolve questions about where in the process the duplicates were removed.

    I want to give him the benefit of the doubt. His continued caginess on this issue is not helping his case. A clear answer would be appreciated.

  532. lerpo,
    I think the manner in which one resolves these issues is that you insult the person to whom you’re addressing these questions so that they avoid interacting with you directly. Then you submit a formal response to the journal. However, because you can’t simply ask questions, you frame them as claims/statements, even if you know in advance that your claims/statements will not be true. Well, okay, that’s how Richard seems to think it should be done. Personally, I think there are much easier ways to resolve these things.

  533. Marco says:

    “The null hypothesis is that abstract and paper ratings are both unbiased, and the distribution of abstract ratings is the same as the distribution of paper ratings.”

    In my experience, in fields with a strong consensus you do not see nor would you expect such a distribution. It would be those few papers that contradict the consensus view that are more likely to highlight this in the abstract, whereas those that built on the consensus view would not. After all,

  534. @wotts
    In my data generating process, detailed above, the null hypothesis is that the two are the same. Please write down a data generating process in which the null hypothesis is different.

    @lerpo
    According to the paper, abstracts were downloaded in two batches, the first in March and the second in May. The second batch was downloaded after the first ratings were completed, and its focus was on “papers added to the Web of Science up to that date.” The Web of Science presents data in reverse chronological order. The missing abstract IDs, removed because of duplication, are concentrated among the lower numbers. All this suggests that the duplication arose due to the May download, and that already rated abstracts were deleted.

    Unfortunately, John Cook has yet to release his lab notes, so there is no way of knowing whether the above explanation is correct. It should be noted, though, that Cook and co-authors are well aware of the above but have yet to offer a coherent explanation of the missing IDs.

  535. @marco
    “It would be those few papers that contradict the consensus view that are more likely to highlight this in the abstract, whereas those that built on the consensus view would not.”

    In the matched sample (N=2136), 10 abstracts (0.5%) and 39 papers (1.8%) reject anthropogenic warming.

  536. Richard,

    In my data generating process, detailed above, the null hypothesis is that the two are the same.

    I’ve no idea what you’re talking about. I’m talking about the real world in which it is quite possible (in fact, almost certain) that the abstract of a paper will not say all that is said in the paper itself.

    Again, I still have no idea why you think John Cook should answer your questions. Especially as you have made a number of claims that have turned out to be true and publicly referred to them as the “fraud team”.

  537. pbjamm says:

    I think the only logical answer is for Dr Tol to implement all his superior methods himself and then compare his results to those found by Cook. Certainly he could find some competent and non-biased raters to do follow his exacting instructions and put this to bed once and for all.

  538. The Very Reverend Jebediah Hypotenuse says:

    pbjamm:

    …and put this to bed once and for all.

    That would be really swell.

    As everyone should know by now, Richard Tol and the rest of the highly-skilled and ethically-spotless GWPF gang exist to help the scientifically underprivileged.

    If only they could get a fair hearing.

  539. @wotts
    The data generating process is the hypothesized mechanism that led to the observations at hand. It is the basis for any hypothesis to be tested.

    I put my null hypothesis on the table, as did Marco. Our null hypotheses were roundly rejected by the data. What’s your null hypothesis?

  540. Richard,
    It’s certainly not this. You do realise that rejecting your chosen null doesn’t necessarily mean that you can conclude anything other than you’ve rejected that null? Your null hypothesis has to have some relation to reality. This is – obviously – a rhetorical question.

  541. Wotts: All agreed. So, what’s your reality-related null?

  542. Richard,
    Alternatively, what does your null tell you? It tells you that the two distributions are not the same. Given that we wouldn’t expect them to be the same, this is not a surprise. In some cases you don’t need statistics to illustrate the bleedin’ obvious.

  543. Dikran Marsupial says:

    Richard wrote “#4 I think you are reading a lot into a three-letter word.”

    No, that *is* what “yet” means. This is obvious evasion; I have now asked this question FIVE times, and yet you can still only reply without actually answering the question. Her, have another try:

    Reminder #5

    Richard Tol wrote ”In this context, the number of abstracts includes duplicates.”

    If the context included duplicates, please could you explain your use of “yet” in your statement:

    “Cook et al (2013) state that 12 465 abstracts were downloaded from the Web of Science, yet their supporting data show that there were 12 876 abstracts.”

    given that there is no incongruity (implied by “yet”) between the two figures if duplicates are present?

  544. Dikran Marsupial says:

    Richard wrote “The null hypothesis is that abstract and paper ratings are both unbiased, and the distribution of abstract ratings is the same as the distribution of paper ratings.”

    It has been pointed out to you repeatedly that we know a-priori that the distributions of abstract ratings and paper ratings cannot be reasonably expected to be identical (firstly because the paper contains more information than the abstract and secondly because the authors are more expert on their own work than the volunteer raters). So what is the point of performing a test when we know a-priori that the null hypothesis is false from the outset?

    In Cook’s case, the null hypothesis is rejected.”

    Which is not in the least surprising or interesting to anyone with a reasonable grasp of statistics.

  545. Dikran Marsupial says:

    I asked “Richard Tol would you expect the consensus estimates from the abstract survey to be asymptotically* identical to those from the author survey? Yes or No.

    * i.e. in the limit that all relevant papers were rated, rather than just a finite sample.”

    Richard eventually replied (after the question was asked SIX times):

    “Yes. If the rating of abstracts and papers would be unbiased or at least consistent, then abstract and paper ratings would converge. (In Cook’s case, we know that the matched sample disagrees.)”

    Thank you for a straight answer.

    Unfortunately the answer shows that you do not understand the problem very well. There is more information in the body of the paper than in the abstract, furthermore the abstract is a limited space in which to convey the topic/findings of the paper, so authors may be more likely to put background information (such as statements of the consensus position) into the introduction and conclusions, rather than the abstract. That alone would be enough to cause the distributions to be different (even asymptotically).

  546. @wotts, dikran
    So, your premise is that the two ratings cannot be compared. That implies that their comparison in Cook 2013 and Cook 2016 is meaningless.

  547. Dikran Marsupial says:

    Richard wrote “The data generating process is the hypothesized mechanism that led to the observations at hand. ”

    O.K., my data generating process is that the “true” labels for the papers come from a multinomial distribution, P(paper), which reflects the authors true position. The abstracts however contain less information, and I would expect a statement of agreement with the consensus to be more likely to appear in the introduction of the paper than the abstract, and this also depends on the “strength” of the authors position. Thus the abstract ratings are drawn from a multinomial conditioned on the true rating of the paper, i.e. P(abstract | paper). In this case, there is no reason to expect the marginal distribution P(abstract) to be equal to P(paper), indeed, that would be surprising.

    Of course in practice the “citizen-scientist” raters cannot be reasonable expected to know the authors true position based on the abstracts, so there ought to be another layer to the model, but if Richard’s null hypothesis is obviously untrue a-priori for perfect raters, I can see how using imperfect raters is going to change that.

    Richard is making the strong assumption that all topics (even if they are not directly the subject of the paper) are equally likely to be mentioned in the abstract as in the body of the paper, which is absurd, as anyone who has written a journal paper ought to know.

  548. Marco says:

    “In the matched sample (N=2136), 10 abstracts (0.5%) and 39 papers (1.8%) reject anthropogenic warming.”

    Considering the fact that downrating (less endorsement) by the authors was much less likely than uprating (more endorsement), I still stand by my point.

  549. Richard,

    So, your premise is that the two ratings cannot be compared.

    Wow, you really are stuggling here. No, the premise is very simply that abstracts and papers are not the same, therefore one would expect their distributions to not be the same, even in the asymptotic limit.

    That implies that their comparison in Cook 2013 and Cook 2016 is meaningless.

    Here’s what Cook et al. said (paraphrasing)

    1. ~97% of abstracts that take a position endorse the consensus.

    2. ~97% of self-rated papers that take a position endorse the consensus.

    If you think these are entirely unrelated – and hence meaningless – results, feel free. I think most people would regard that is an indicating that there is a strong consensus within the literature.

  550. Dikran Marsupial says:

    Richard wrote “So, your premise is that the two ratings cannot be compared. That implies that their comparison in Cook 2013 and Cook 2016 is meaningless.”

    No, of course not, as I have pointed out to you at least once already upthread:

    No, they are measures of very similar, but not identical things; they are both in the paper because they are consilient which suggests the findings are sound. However because they are not identical (even asymptotically), a naive null hypothesis statistical test would be deeply misleading.

    This ought to be common sense.

  551. Dikran,

    which is absurd, as anyone who has written a journal paper ought to know.

    Bear in mind, that we are dealing with someone who wrote

    Gremlins intervened in the preparation of my paper…

    In one of his recent publications.

  552. Dikran Marsupial says:

    Of course the two distributions can be compared, for instance you could compute their Kullback-Leibler divergence. The point is that if you want to perform a Null Hypothesis Statistical Test (NHST) then you need to understand the problem well enough to know what you can reasonably test for, i.e. what question are you trying to answer. In most cases, testing for something that common sense tells you isn’t true a-priori is a waste of time, and making a big song and dance about the result is likely to make you look silly.

  553. @wotts
    So, they’re incomparable but the same?

  554. I am as tall as the critical value of the t-test.

  555. Richard,

    So, they’re incomparable but the same?

    No.

    I am as tall as the critical value of the t-test.

    Good for you.

  556. Dikran Marsupial says:

    Richard wrote “So, they’re incomparable but the same?”

    No, you can compare them (KL divergence), they are not the same (but they are consilient), but the naive use of “cookbook statistics” doesn’t tell you anything interesting (and you make yourself look silly by drawing strong conclusions from them).

    How many times does this need to be explained to you?

  557. “So, they’re incomparable but the same?

    No.”

    So:
    – they’re incomparable and not the same;
    – they’re comparable and the same; or
    – they’re comparable but not the same?

  558. Richard,
    They are what they are. Shall I repeat it?

    1. ~97% of abstracts that took a position endorsed the consensus.

    2. ~97% of self-rated papers that took a position endorsed the consensus.

    Of course, one can compare whether the ratings of papers and abstracts to see if they’re consistent (did a self-rated paper that endorsed the consensus also have an abstract that was rated as endorsing the consensus, and vice verse) which we’ve already discussed and discovered that although there was not perfect agreement, the impact of the disagreement was small enough to probably not change the result.

    Maybe we can avoid going in circles. I realise that you seem to think that if you repeat something enough times, eventually people will agree with you. This may actually, on occasion, work but that’s probably just to get you to stop, rather than because they actually agree.

  559. Dikran Marsupial says:

    It would appear at least once more. They are comparable (and consillient), but not identical.

  560. @wotts
    We are indeed going round in circles, You are now reverting to the position that although the distributions are incomparable, functionals of those distributions are comparable.

  561. Richard,
    No, I’m repeating the fairly obvious point that if I measure something one way, and then measure it a different way, that I can at least consider the significance of those results either being about the same, or not. This is not all that complicated, so I don’t really plan to explain it to you again (partly, because I’m pretty confident that you do get this, but for some reason – probably your obsessive campaign against Cook et al. – are completely unwilling to acknowledge it).

  562. Dikran Marsupial says:

    “We are indeed going round in circles, You are now reverting to the position that although the distributions are incomparable, functionals of those distributions are comparable.”

    No Richard, I don’t think anybody is saying the distributions are incomparable, just that they are not (even asymptotically) identical, and hence your naive statistical test is fundamentally wrongheaded.

  563. verytallguy says:

    Of course you’re going round in circles.

    Tol’s purpose is to amplify flaws that can be argued, however small, in the Cook studies.

    Yours and Dikran’s purpose is to focus on the robustness of the overall conclusions.

    You will continue to orbit until someone decides to let the other have the last word. It could be a long wait.

  564. You will continue to orbit until someone decides to let the other have the last word. It could be a long wait.

    I could always start just deleting Richard’s comments and then he can go and complain – via Twitter – to my university about being censored.

  565. @dikran
    Fine. So the two distributions are different. Then you would also expect functionals of those distributions to be different. So what do we make of Cook’s finding that the functionals are very similar?

  566. Richard,

    Then you would also expect functionals of those distributions to be different.

    No, you wouldn’t necessarily expect this.

  567. Dikran Marsupial says:

    I am patient and don’t mind going round in circles too much, as VTG says I am interested in the robustness of the conclusions. If Richard has a valid point, I am willing to listen to his arguments (and point out the flaws I see in them). There is a possibility that Richard will see where he is going wrong and change his mind. Both of these would be a positive outcome. If Richard is playing games, then patiently pointing out his error and answering his questions, even when he doesn’t answer mine and ignores my answers, makes it clear who is playing games and who isn’t, which is also a useful outcome.

  568. @wotts
    Fine. The distributions are different, but their functionals are the same. That implies that you picked the wrong functional? After all, the functional shows similarity were dissimilarity is needed?

  569. Dikran Marsupial says:

    Richard wrote “Then you would also expect functionals of those distributions to be different. So what do we make of Cook’s finding that the functionals are very similar?”

    The fact that the distributions cannot reasonably be expected to be identical does not mean they (and their functionals) are expected to be very different. I would expect them to be similar (which indeed they are) but not identical.

    The question ought to be “how different should we expect the distributions (or their functionals) to be?”. The answer to this is clearly non-zero, so testing for that is obviously wrong-headed. Unless you can answer that, then you are not in a position to design or interpret a meaningful statistical test.

    For those with common sense, we can just consider the natures of the two studies and see their consillience without the need for a formal statistical hypothesis test. Some may say that is not statistically rigorous, but I would argue that a subjective analysis is better than than rigorous application of an obviously inappropriate null hypothesis statistical test (which is deeply misleading, while giving a false impression of rigour).

  570. Richard,
    Again, no. You really do need to stop putting words in people’s mouths.

  571. Dikran Marsupial says:

    “Fine. The distributions are different, but their functionals are the same.”

    That is not what ATTP actually said. The functionals may be less different than the distributions, but nobody is claiming that they are (even asymptotically) identical, so testing for that is still wrong-headed.

  572. OK, I’ll try and second-guess what you mean.

    Please tell me what is (a) the null hypothesis and (b) the appropriate statistical test?

  573. Richard,
    To test for what? I don’t need to do a statistical test to realise that the distribution of abstract ratings and distribution of paper ratings are unlikely to be the same. What are you trying to determine with your statuistical test?

  574. Dikran Marsupial says:

    Richard, there is more to statistics than null hypothesis statistical tests. Now before discussing tests, please tell us what it is we are testing for?

  575. “Among papers expressing a position on AGW, an overwhelming percentage (97.2% based on self-ratings, 97.1% based on abstract ratings) endorses the scientific consensus on AGW.”

    Are these two numbers comparable? If so, are they indistinguishable?

  576. Richard,

    Are these two numbers comparable? If so, are they indistinguishable?

    I don’t know, as far as I can tell 97.1% of abstracts that took a position endorsed the consensus and 97.2% of self-rated papers that took a position endorsed the consensus (there are also errors that I worked out earlier, but can’t remember). My sense of this is that it indicates a pretty high level of consensus in the literature, irrespective of whether you judge this from abstracts, or from the papers themselves. Maybe you disagree? Maybe despite these two analyses, we can still say nothing about the level of consensus in the scientific literature. Maybe, by some chance, there are vast quantities of papers and abstracts that reject the consensus, but that just didn’t turn up in the database search? Maybe there’s a world-wide conspiracy and the authors avoid saying anything that would dispute the consensus in their abstracts and were too scared to admit this when asked to self-rate their papers? Maybe all the authors who’s papers rejected the consensus didn’t respond to the emails asking for their self-ratings? There are a whole host of possibilities if you’re willing to put some effort into making them up. Maybe the Moon really is made of cheese?

    On the other hand, what we do know, is that in Cook et al.

    1. 97% of abstracts that took a position endorsed the consensus.

    AND

    2. 97% of self-rated papers that took a position endorsed the consensus.

  577. Thanks, Wotts. You seem to agree with me and Cook that a comparison of these two numbers is meaningful.

    So, what statistical test would you use for the null hypothesis that the numbers are the same?

  578. Richard,
    I don’t know, let’s think. If I was to use two different methods to try and estimate the same thing, and both methods returned a result that was statistically consistent (within the uncertainties), I might conclude that the result that these two methods give provides an indication of whatever it is I was trying to measure. What would you do?

    Technically, however, in this case we have a measure of the level of consensus from abstracts and a measure of the level of consensus from the whole papers. They’re not quite measuring the same thing. However, I think one would need to be an awful pedant to conclude that this did not provide some information as to the level of consensus in the literature.

  579. Dikran Marsupial says:

    Richard wrote “Thanks, Wotts. You seem to agree with me and Cook that a comparison of these two numbers is meaningful.

    So, what statistical test would you use for the null hypothesis that the numbers are the same?”

    Two things being comparable does not imply that they are the same. We know a-priori that they are not identical, so what is the point of testing for that? What does it tell us that we don’t already know?

    I think the problem is that Prof. Tol doesn’t actually know what to test for (i.e. what question to ask) and is just naively following the standard recipe from the statistics cookbook, without stopping to consider whether that is a sensible thing to do.

  580. Wotts; Waffle is not a statistical test.

  581. Richard,

    Waffle is not a statistical test.

    Indeed, but not everything needs a statistical test.

  582. From Dikran’s link.

    The essence of the ritual is the following:
    (1) Set up a statistical null hypothesis of “no mean difference” or “zero correlation.” Don’t specify the predictions of your research hypothesis or of any alternative substantive hypotheses.
    (2) Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis.
    (3) Always perform this procedure.

    I think we should start calling Richard “Pogo”.

  583. Dikran Marsupial says:

    Richard wrote ““Among papers expressing a position on AGW, an overwhelming percentage (97.2% based on self-ratings, 97.1% based on abstract ratings) endorses the scientific consensus on AGW.”

    Are these two numbers comparable?”

    yes, 97.1% is similar to 97.2%, there, I have compared them.

    If so, are they indistinguishable?”

    Given that we don’t expect them to be identical, if they are statistically indistinguishable that just means the sample isn’t big enough to confirm what we know a-prior to be true. So what?

    A more important question than whether there is a statistically significant difference between the two numbers is whether there is a practically significant difference between them. The answer to the second question is obviously “no”, a 0.1% difference in estimating the consensus isn’t going to change anybodies fundamental position. Of course this is one of the questions a good statistician will think about before applying a NHST.

  584. Dikran Marsupial says:

    Richard wrote “Wotts; Waffle is not a statistical test.”

    It is however preferable to the application of an inappropriate statistical test that gives a misleading impression of a problem with the study. As has already been pointed out.

    Richard what question are we trying to answer with the statistical test. We know a-priori that the numbers cannot reasonably be expected to be identical, so it would be pointless to ask if the results of the two studies are plausibly identical, so what question should we ask?

  585. Dikran Marsupial says:

    BTW, the importance of intuition (i.e. understanding the problem well enough to judge whether a statistical method is reasonable) and rigor has been discussed with Richard before now, see here and here.

    Sadly this discussion provides an salutary example of the need for both intuition and rigor in statistics. Prof. Tol is capable of rigor, but doesn’t seem to understand why the test he rigorously performs doesn’t tell you anything that isn’t already known and certainly isn’t indication of any problem with either survey.

  586. @dikran
    So, let’s turn the question on its head. How would you discover any indication of any problem with either survey?

  587. fourecks says:

    If you found say about 300 papers that has been completely misclassified as having endorsed the consensus when they didn’t?

  588. Dikran Marsupial says:

    Richard wrote “So, let’s turn the question on its head.

    No, lets not, it would be far more productive for you to realise that the test you have bee advocating on this thread is meaningless and why it is meaningless. The best way to do that is for you to answer the questions you have been asked already.

    “How would you discover any indication of any problem with either survey?”

    Before answering that, I’ll make an observation. Richard, perhaps you should ask yourself why you are trying so hard to find an indication of a problem with a survey that merely confirms what we all (including yourself) believe to be true:

    ““The consensus is of course in the high nineties. No one ever said it was not.”

    I would have thought that your time would be better spent looking for an indication of problems with studies that make great claims or say something surprising (or perhaps your own studies, for instance considering the effects of outliers on models fitted to small datasest).

    To answer your question, I would start by working out what I should reasonably expect, given the natures of the study. I would then consider whether the results of the surveys are consistent with those observations. I am not an expert on surveys, and neither are you AFAICS, which means neither of us are particularly well placed to look for problems.

  589. Zenghelis had a brilliant put-down for the assertion that his optimum violated its first-order conditions: Newton is long dead.

    So is Pearson.

  590. Dikran Marsupial says:

    Richard perhaps rather than looking for brilliant put downs, you should be trying to answer the questions that have been put to you. Starting with

    “Richard what question are we trying to answer with the statistical test. We know a-priori that the numbers cannot reasonably be expected to be identical, so it would be pointless to ask if the results of the two studies are plausibly identical, so what question should we ask?”

  591. @dikran
    On this point, I fully agree with John Cook: These are two methods for measuring the same thing, and the results can and should be compared.

  592. Indeed, the results can and should be compared. Let’s do that. One gives 97.1% with a 99% confidence interval of 0.7%, and the other gives 97.2% with a 99% confidence interval of 1.2%. Seems pretty similar to me. Do you disagree?

  593. @wotts
    I agree that for any two distributions, there is a functional that gives the same result.

    As I said, I am as tall as the critical value of the t-test.

  594. Dikran Marsupial says:

    Richard, yet again you haven’t answered the question.

    As I have said repeatedly, comparing them is a good thing, but testing for whether they are statistically indistinguishable is pointless as we know a-priori that they are not (even asymptotically) the same. Trying to suggest there is a problem with the studies on the grounds that the figures are not statistically indistinguishable is deeply misleading and shows considerable statistical naivité

    So the question is, if we know that two things are expected to be similar, e.g. both in the high nineties, but not identical, what statistical test establishes whether the results are consistent with out expectations? The null hypothesis that you keep putting forward obviously doesn’t fit the bill as it is a test for whether the two figures a plausibly identical, which nobody believes is true. If you want to show that there is a problem, then the onus is on you to come up with a suitable test, that isn’t just a straw man.

  595. Magma says:

    I would have thought that your time would be better spent looking for an indication of problems with studies that make great claims or say something surprising (or perhaps your own studies, for instance considering the effects of outliers on models fitted to small datasets). — Dikran Marsupial

    You and ATTP are very, very patient debaters. This thread has been exceptionally (maybe excessively) Tol-erant.

  596. pbjamm says:

    @ Richard Tol : So, they’re incomparable but the same?

    I am not a statistician or mathematician but I think I get what ATTP and Dikran are getting at. The abstracts and papers are not the same, you would not expect them to contain all the same information so rating the 2 you would expect different results. The fact that rating them both leads to nearly identical results indicates that the abstracts tended to contain enough information to draw an accurate conclusion as to the contents of paper as a whole. Not the same, but close enough to show the validity of the method.

  597. John Hartz says:

    Magma:

    Like the Energizer Bunny, Tol’s Artful Dodger algorithim just keeps going, and going…

  598. Dikran Marsupial says:

    @pbjamm exactly.

  599. anoilman says:

    Hey… I’m curious, but has Richard Tol fixed all his other papers yet?

  600. @pbjamm, dikran, wotts
    They’re different but the same.

    The chorus of Belle & Sebastian’s Me and the Major comes to mind.

  601. They’re different but the same.

    No, Richard, do you want to try again?

  602. They’re the same but different?

  603. No, do you want to keep trying?

  604. pbjamm says:

    @Richard Tol : No one has said they are the same except for you. They are similar, they are related, they are connected but not the same. As evidenced by the similar results of rating the papers vs rating the abstracts they are similar enough to draw conclusions about the content of the paper simply by evaluating the abstract. I would have thought this fairly obvious since the abstract is kind of the Cliffs Notes version of the full paper. At this point I have completely lost track of what you are objecting to. Please do not take that as an encouragement to start this all over again. Again.

  605. They’re similar enough to take comfort but too dissimilar to permit a formal test for similarity?

  606. No, I don’t think that’s it either. Do you want to keep trying?

  607. Francis says:

    Dr. Tol: As you are actively participating in this thread again, I was wondering if you could clear something up for me.

    Is the planet about 1 C warmer than it was in the late 60s and/or pre-industrial times?
    Do you see any evidence of a “welfare-equivalent income gain”?
    If you do, how do you reconcile that analysis with the recent European Commission report on food stress worldwide?
    If you don’t see any evidence of the income gain, do you have any professional obligation to withdraw the papers which made that argument or at least publish a re-analysis? Is your decision on that point in any way colored by your views of the Cook papers?

    Thanks so much for your attention.

  608. pbjamm says:

    Similar enough to be useful. I have no idea how you would test their similarity in any objective way. Best you could do is compare the paper to the abstract and make a subjective judgement as to whether the abstract accurately describes what is found in the paper. The assumption is that it would, otherwise it is a very poorly written abstract or some sort of error at the printers.

    I think this has degenerated into trolling.

  609. A wonderful conclusion: Similar enough to be useful (purpose undefined) but too dissimilar to allow for a formal test.

    @francis
    Validating these models is difficult because
    (a) the impact of the past climate change is the difference between the world as it was and the world as it would have been had the climate not changed
    (b) climate change is slow, so you need long series to see anything, and data quality and quantity rapidly deteriorates prior to the 1970s
    (c) many impacts are poorly observed even today (e.g., malaria)
    (d) there are many confounders.

    We have made a start: https://ideas.repec.org/p/sus/susewp/6213.html

  610. Dikran Marsupial says:

    Richard wrote “They’re similar enough to take comfort but too dissimilar to permit a formal test for similarity?”

    No, you could perform a formal test of similarity if you really wanted to, but that doesn’t mean the null ritual is the right test. We know that the two distributions are not identical so there is no point in testing for zero difference, so we need to test for whether the difference is below some threshold, determined by a reasonable model of the difference in information content between the abstracts and papers and between the expertise of the citizen-science raters and the authors. Not being able to come up with a suitable threshold is no reason to adopt an obviously inappropriate “null ritual” test.

    A more important question is “why do we need a formal test when the difference is obviously too small to be of practical significance?”. Perhaps Prof. Tol might want to answer that?

  611. We have a testable hypothesis! We should not test for Da-Dp=0 but rather for Da-Dp=c

    In the matched sample, 580 (43%) abstracts that were rated neutral were rated non-neutral as papers. This is consistent with the hypothesis that papers are more informative than abstracts.

    Of those 580, 551 (95%) went to endorse and 29 (5%) to reject.

    In the matched sample, 99% of non-neutral abstracts were endorsements, and 1% rejects.

    Is 99-1 similar to 95-5? Perhaps yes if you compare 99 to 95. Probably no if you compare 1 to 5.

  612. Richard,
    I think what is of interest is the likely of consensus. Is there any indication that a value like 97% is wildly wrong? If you want – for some reason – to know the precise level of consensus, then maybe 95%, 97%, and 99% are different. If you want to know if it is probably bigger than 90%, or not, then the differences probably don’t really matter given that there is probably no realistic scenario under which the result could be a consensus level less than 90%. So, if one is a statistical pedant (or using ones statistical pedantry to undermine an inconvenient result) one can probably find fault with anything. On the other hand, if one is simply interested in trying to understand something, then statistically pedantry is probably of less value.

    What is possibly intriguing is how someone who purports to be a statistical pedant has such trouble correctly interpeting what others have said. Given the precision with which this person wants to apply statistics, it almost seems intentional.

  613. Dikran Marsupial says:

    Richard, you have missed out the “determined by a reasonable model of the difference in information content between the abstracts and papers and between the expertise of the citizen-science raters and the authors.” bit.

    “Perhaps yes if you compare 99 to 95. Probably no if you compare 1 to 5.”

    There is no statistical difference between the two questions (hint: binomial), again your lack of statistical intuition is letting you down.

    Before trying again, please answer the question “why do we need a formal test when the difference is obviously too small to be of practical significance?” (reminder #1).

  614. @dikran
    The standard test on difference in proportions shows that the two distributions are different, with and without the no-position category.

  615. Richard,
    Yes, we know. The relevance of this is the bit that you seem unwilling to actually address. What does it mean?

  616. Dikran Marsupial says:

    Richard wrote “The standard test on difference in proportions shows that the two distributions are different, with and without the no-position category.”

    O.K., does that (the “null ritual”) tell us anything we don’t already know? As I have repeatedly pointed out, we know the distributions will be different a-priori.

    “Why do we need a formal test when the difference is obviously too small to be of practical significance?” (reminder #2).

  617. Dikran Marsupial says:

    BTW, it is a pity that you couldn’t simply admit your statistical blunder, there is no difference between the 95-99 question and the 1-5 question because they are both proportions, and you can change one into the other simply by inverting the labels.

  618. Also, for a sample of 580, a 95%, 5% split produces a 99% confidence interval of 2.3%.

  619. Dikran Marsupial says:

    Richard Tol wrote: “I was trained as a statistician. I have taught statistics. I have published in statistical journals. I have written statistical software. Your null hypothesis should therefore be that I do not make elementary errors.”

    Richard Tol wrote: “Is 99-1 similar to 95-5? Perhaps yes if you compare 99 to 95. Probably no if you compare 1 to 5.”

    I think we can now conclusively reject Richard’s null hypothesis (p < 0.001), although again, that is not telling us anything we didn't already know.

    Say we have two biased coins, one of which comes up tails once in one hundred flips and the other five in 100 flips and we want to compare the biasedness of the two coins. Richard’s statement is a bit like saying that the coins are similar if we say that one comes up heads 99% of the time and the other 95% of the time, but they are less similar if we say that one comes up tails 5% of the time, but the other only 1% of the time. Of course in reality there is no difference whatsoever between the two comparisons; they are both just differences in the way the same thing can be expressed. To perceive them as different would be an elementary error.

  620. Willard says:

    Have you tested the hypothesis that papers are more informative than abstracts on your own papers, Richie?

    Gremlins are not known to be informative.

  621. @wotts
    Hurray! We have a hypothesis test!

    So, if we coarsen the data to two categories, there is no statistically significant difference between abstracts and papers. Cook vindicated!

  622. Richard,
    No, I don’t think that’s it either, but I suspect that you keeping on trying to guess is not really worth it.

  623. Dikran Marsupial says:

    “So, if we coarsen the data to two categories, there is no statistically significant difference between abstracts and papers. Cook vindicated!”

    It seems that Richard is determined to show that he doesn’t understand Null Hypothesis Statistical Testing:

    (i) Nobody is claiming that there is no statistically significant difference, far from it, we know (and have pointed out REPEATEDLY) that we know a-priori the distributions are different, so if there is no statistically significant difference, it just means the sample is to small to reveal the difference we know is there.

    (ii) Statistical significance is not the same as practical significance. You have been asked several times to explain the point of a test for statistical significance when we know the difference is to small to be of practical significance.

    (iii) A lack of a statistically significant difference would not vindicate Cook, see (i).

    (iv) The existence of a statistically significant difference would not call Cook13/16 into question as we know a-priori that the distribution of ratings are not (even asymptotically) idendical (I think I just may have mentioned that already). This is true whether you look at two categories or the next level in the hierarchical classification scheme.

    Of course this might just be an attempt to bluster away from acknowledging your elementary blunder.

    “Why do we need a formal test when the difference is obviously too small to be of practical significance?” (reminder #3).

  624. @dikran
    Indeed. Black is white and up is down, as long as Cook is right.

  625. Richard,
    Or, alternatively, black is white and up is down, as long as Cook is wrong?

  626. Dikran Marsupial says:

    Richard wrote “The standard test on difference in proportions shows that the two distributions are different, with and without the no-position category.”

    I replied by asking “O.K., does that (the “null ritual”) tell us anything we don’t already know? As I have repeatedly pointed out, we know the distributions will be different a-priori.”

    Richard appears not to want to answer that question either.

  627. Dikran Marsupial says:

    Richard wrote “Indeed. Black is white and up is down, as long as Cook is right”

    No Richard, I pointed out that your statistical hypothesis test neither vindicates nor calls Cook et al. into question. This is because it is a pointless test for reasons that have been repeatedly explained to you in great detail. That you cannot acknowledge the flaws in your argument, and sedulously avoid answering the questions that would expose them, suggests that perhaps it is your reasoning that is unduly motivated, rather than mine?

  628. @dikran
    Alternatively, none of your arguments cut any wood? Pearson is dead, you seem to say.

  629. pbjamm says:

    It seems that Dr Tol is hung up on the individual ratings vs the overall result. Since, as Dikran has pointed out repeatedly, we knew the the rating of papers vs abstracts would be different (in some cases) this is in no way surprising. The fact that some % of papers are going to be shuffled around in the ratings seems strange, but if distribution stays the same (or nearly so) does that have any significance on the results? Individual voters can change positions on issues but if the distribution is the same the bill passes/fails the same regardless of the individual changes in opinion.

  630. I think I may have pointed this out to Richard before, but I think it would be possible to develop a procedure that would satisfy all of his tests, but that would return a result that was clearly wrong. Can’t see much point in that, but maybe Richard thinks otherwise.

  631. Dikran Marsupial says:

    Richard wrote “Alternatively, none of your arguments cut any wood?”

    If that was the case, it would be me that was avoiding answering questions, rather than you.

    A test for a difference in the distributions “cuts no wood” as we know a-priori that the distributions are not identical. You have yet to answer this point, despite it having been pointed out to you at least an order of magnitude more often that ought to be necessary.

    “Pearson is dead, you seem to say.”

    While Pearson is indeed dead, I doubt he would have been in favour of the “null ritual” (i.e. the unthinking application of statistical procedures without consideration of whether they are meaningful or appropriate) either.

  632. Dikran Marsupial says:

    “Why do we need a formal test when the difference is obviously too small to be of practical significance?” (reminder #4).

  633. John Hartz says:

    ATTP: You wrote:

    Richard appears not to want to answer that question either.

    Apparently Tol’s Artful Dodger algorithim has not yet been programmed to do so.

  634. Willard says:

    If Richie go drop by Nick’s and celebrate with him the anniversary of the GWPF’s inquiry, that would be nice:

    http://moyhu.blogspot.com/2016/04/gwpf-inquiry-anniversary.html

    If he could explain why the inquiry has been postponed, that’d be even nicer.

  635. @dikran
    So, why don’t you suggest a procedure for validating Cook’s data? I am obviously incompetent, so I need your help here.

  636. Dikran Marsupial says:

    Richard, the author survey already does precisely that, the results agree to within the limits of practical significance, which is all that you could reasonably expect. Statistics is not a substitute for common sense.

  637. Dikran Marsupial says:

    I should say, the unthinking application of cookbook statistics (such as the “null ritual”) is not a substitute for common sense. Statistics are very useful, but only if the reasoning behind the statistics is valid.

  638. BBD says:

    ‘Sedulously’ is spot-on, Dikran 🙂

  639. @dikran
    So, besides Pearson, Popper is dead too? Let’s just replace testable and falsifiable hypotheses with waffle?

  640. Richard,
    I think we should just start calling you Pogo.

  641. > So, besides Pearson, Popper is dead too?

    Sir Karl is dead since at least 1994, Richie.

    He actually killed his own meme in the 50s:

    “But Popper” is more than a meme – it’s a contrarian zombie.

    Please note that testability and falsifiability ain’t the same

  642. BBD says:

    “Shoot it in the head!”

    – George A. Romero

  643. John Hartz says:

    ATTP: For whatever reason, the Alfie theme song pops into my head when I read your exchanges with Tol. Here are the lyrics:

    Whats it all about Alfie
    Is it just for the moment we live
    Whats it all about
    When you sort it out Alfie

    Are we meant to take more than we give
    Or are we meant to be kind
    And if only fools are kind Alfie
    Then I guess it is wise to be cruel

    And if life belongs only to the strong Alfie
    What will you lend on an old golden ruler
    As sure as I believe theres a Heaven above Alfie
    I know theres something much more
    Something even non-believers can believe in

    I believe in love Alfie
    Without true love we just exist Alfie
    Until you find the love you’ve missed youre nothing Alfie
    When you walk let your heart lead the way
    And youll find love any day Alfie, Alfie, Alfie

    Songwriters
    BACHARACH, BURT / DAVID, HAL

    http://www.metrolyrics.com/alfie-lyrics-barbra-streisand.html

  644. JCH says:

    Please tell me the condensus is next.

  645. BBD says:

    Après ça le déluge.

  646. Dikran Marsupial says:

    Richard wrote “So, besides Pearson, Popper is dead too? Let’s just replace testable and falsifiable hypotheses with waffle?”

    No Richard, sadly it appears you don’t understand Popper either. The abstract survey performed by the citizen science raters was tested by conducting a survey of authors which could potentially have falsified the abstract survey. Had the author survey suggested, say a 50% consensus rate, that would have falsified the the abstract survey. However the abstract survey survived the test and was corroborated by it. You may not like that, but it is true nevertheless.

  647. @dikran
    You’ve made your point clear: You like your research sloppy and waffly.

  648. Richard,
    I almost deleted your comment, because it is ridiculous. What’s clear is that you’re happy to publish claims in peer-reviewed that are almost certainly not true, and that you probably knew to be not true before publishing. In defending your position, you’re also happy to misrepresent what others, and to ignore genuine questions about your own work. As I think I may have said many times already, I’m amazed that you think you’re in a position to question the research integrity of others.

  649. @wotts
    As an author, can you kindly explain why Cook 2016 shows results including don’t knows for 11 out of 14 studies, but results excluding don’t knows for the other 3?

    It makes a difference: http://richardtol.blogspot.co.uk/2016/05/consensus-on-consensus.html

  650. Richard,
    You’re going to have to provide a bit more information as to which 3 you claim didn’t include “don’t knows” and which 11 – according to you – did. I’ve had a quick look at the info and I’m not convinced that you’re correct.

  651. Don’t know / no position omitted: Cook, Verheggen, Rosenberg
    Don’t know / no position included: Bray 96, Bray 03, Bray 08, Oreskes, Doran, Anderegg, Stenhouse, Carlton, Pew, Gallup, Harris,
    Omitted altogether: Milloy, Bray 15

  652. Richard,
    Oreskes (2004) says

    The 928 papers were divided into six categories: explicit endorsement of the consensus position, evaluation of impacts, mitigation proposals, methods, paleoclimate analysis, and rejection of the consensus position. Of all the papers, 75% fell into the first three categories, either explicitly or implicitly accepting the consensus view; 25% dealt with methods or paleoclimate, taking no position on current anthropogenic climate change. Remarkably, none of the papers disagreed with the consensus position.

    In Table 1, we report Oreskes as producing a 100% consensus, therefore we did not include the “don’t know/no position” for Oreskes. I don’t plan to check the others at this stage. Maybe you should double check your claim before moving on.

    Since you’re asking me questions about our paper. Maybe you can answer my question. Why did you include a claim in your paper that is almost certainly not true and that you almost certainly know to not be true prior to publication?

  653. “Why did you include a claim in your paper that is almost certainly not true and that you almost certainly know to not be true prior to publication?”

    I didn’t.