Reproducibility?

I came across an interesting paper about the replication crisis that I thought I would briefly discuss (H/T Neuroskeptic). The paper in question is Reproducibility research: a minority opinion. It’s not open access, but I have found what I think is an early draft copy.

The background is basically that there have been a number cases in which people have been unable to replicate, or reproduce, some earlier scientific/research study. A suggested solution is that researchers should make everything available, so that others can check their results. Some have even suggested that this is a key aspect of the scientific method/process. The new paper takes a rather dissenting position and, in my view, makes some interesting, and valid, arguments.

For starters, a key aspect of science is basically to test hypotheses. Our confidence in a result increases as more and more research groups produce consistent/convergent results, ideally doing so using different methods and, in some cases, different data sets. We don’t really gain confidence if we get the same result by exactly repeating what others have done, using what they’ve provided. There’s nothing wrong this, and there may be scenarios under which this would be important (for example, if a single study is likely to play a dominant role in determing some decision), but this isn’t really a key aspect of science.

Similarly, we often talk about the scientific method, but it’s not really a well-defined process. There are certainly aspects that we’d probably all agree on, there are certainly philosophical descriptions of a scientific method, but there isn’t some kind of rigid set of rules. There are always likely to be exceptions to any set of rules, and I do think we should be careful of thinking in terms of some kind of checklist. We shouldn’t really trust something simply because it ticked all the boxes. Similarly, we shouldn’t simply dismiss something because it doesn’t. Again, we gain confidence when results from different groups, using different methods, converge on a consistent interpretation.

The paper also discussed the issue of misconduct. It suggested that misconduct is not really new, that it’s not responsible for this reproducibility crisis (which I would agree with), and that what’s proposed may not really be a solution. This isn’t to suggest that we shouldn’t take misconduct seriously and suitably deal with it when aware of it, but just suggests that it isn’t really new and that it impacts science less than one might expect; it is typically uncovered, especially if what is suggested is of particular interest.

When it comes to public confidence in science, the paper says

it would seem that any crisis of confidence would best be addressed by better informing the general public of the way science works

which I think is an important point. The idea is that people’s confidence is more impacted by apparent failures, than by explicit misconduct. It’s important, therefore, to make clear that science isn’t some kind of perfect process in which each step incrementally adds to our understanding in some kind of linear fashion. We get things wrong, we go down dead ends, we try things that end up not working. We can even spend some time accepting something that later turns out to be wrong. In a sense, we learn from our mistakes, as well as from our successes. However, over time we still expect to converge towards a reasonable understanding of whatever it is that is being studied.

As usual, I’ve said more than I intended, and there is still more that could be said. I certainly have no real problem with people making everything associated with their research available. There may well be some issues with doing so (waste of resources, and some searching for errors) but I don’t think any are sufficient to strongly argue against this. However, I don’t really think that it is necessarily required, and I don’t really see it as some key part of the scientific method. What’s key is that there is enough information to allow others to test the same basic hypothesis; this does not necessarily require providing every single thing associated with some research result. There may well be cases where it is more important to do so than in others, but I’m not convinced that it should become the norm. Others may well disagree.

This entry was posted in Research, Scientists, Sound Science (tm), The philosophy of science, The scientific method and tagged , , , , , , , . Bookmark the permalink.

49 Responses to Reproducibility?

  1. I agree.The main advantage for natural sciences is that with Open Data and Open Software you help your colleagues getting started and make scientific progress more efficiently.

    But the other side is that when you are not able to reproduce a finding with your own set-up that it would be nice to know in detail how the other set-up worked and to be able to study which difference in the methods produced the difference in outcomes.

    The reproduction crisis is mainly a problem for the empirical fields of science, when you work with humans nearly any outcome is possible, there are not many theoretical constraints. If experiments do not reproduce often and one often has to find out what produced the difference in outcomes, it also becomes more reasonable to ask scientists to put work into making their results exactly reproducible.

    In this respect it is quite ironic that the US culture warriors so often ask for purely empirical evidence.

  2. Victor,

    The reproduction crisis is mainly a problem for the empirical fields of science, when you work with humans nearly any outcome is possible, there are not many theoretical constraints.

    I agree that it probably applies more to these fields than to the physical sciences. I also think that people should think about what it is that they really expect to replicate. Are you really expecting an entirely statistically consistent result with a new sample, or simply a result that shows the same trend? Given how varied the sample could be in some cases, expecting to replicate (statistically) the same result might be unrealistic. I think there was an article making this point, but I can’t seem to find it at the moment.

  3. Joshua says:

    The use of the term “crisis” can be somewhat like the term “wicked,” IMO: easily leveraged for partisan game.

    Does an increasing awareness of the limitations of replicability of research constitute a “crisis?”

    Of so, why? Couldn’t awareness of the prevalence of research that doesn’t produce be increasing even as the relative % of research that doesn’t reproduce is staying the same, or even going down?

    In fact, couldn’t even the % of research that doesn’t reproduce be increasing even as the amount of research produced that does provide useful information is increasing?

    So what constitutes a “crisis?” is material wealth being destroyed by this “crisis?” Are we being made dumber as the result of this “crisis?” Is life-expectancy being driven down by this “crisis” (or is there even a provable “opportunity cost” in terms of life-expectancy that results from research failing to reproduce? Is there some cut off point where there is a “cost” once un-reproducable research reaches a certain relative prevalence?)

    It seems to me that the question of the reproducibility of research is fascinating, and important, it is also a concept that is being gamed at the expense of political expediency. The notion of a “crises” is ill-defined at best, and more likely a concern that is being exploited to advance ideological agendas. For example, we can see it often being gamed by “skeptics” to advance their agenda in the climate wars – e.g., where they simultaneously talk about the increases in life-expectancy with facile causal arguments attributing cause to fossil fuels even as they rhetorically leverage the reproducibility “crisis” to erode the validity of scientific research while ignoring the progresses in life expectancy attributable to scientific research. Consider all the evidence-free claims of a “crisis” in public trust in science we see made by “skeptics.”

    One of the things that drew me to this position…is there’s a crisis in public trust in science.

    Alarm over the public loss of trust in science

  4. Everett F Sargent says:

    ATTP sez …

    “Are you really expecting an entirely statistically consistent result with a new sample, or simply a result that shows the same trend? Given how varied the sample could be in some cases, expecting to replicate (statistically) the same result might be unrealistic.”

    Then it is quite likely that both ‘so called’ studies are incorrect.

    Perturbation theory, whatever the puck I’m talking about or high on, suggests that if multiple studies produce different outcomes, then there is something very wrong in the experimental design (likely due to not enough explicit controls, that are clearly stated in those ‘so called studies).

    As to what author’s include with their papers the more the better. Or you actually taking the opposite POV? I don’t think you are, but …

  5. EFS,

    Then it is quite likely that both ‘so called’ studies are incorrect.

    Quite possibly. On the other hand, it could also be that the populations from which you’ve drawn your sample are also different. Okay, I guess one could argue that the latter study should have been careful to have drawn from the same population.

  6. Here is the article I was thinking about, which says

    Yet there are good reasons why real effects may fail to reproduce, and in many cases, we should expect replications to fail, even if the original finding is real.

  7. EFS,

    As to what author’s include with their papers the more the better. Or you actually taking the opposite POV? I don’t think you are, but …

    No, I don’t think I am. Maybe I’ll take a step back and pose a question. In, for example, the social sciences is it possible to constrain a study population well enough that if someone else wants to replicate a study, they can select a sample from a comparable population and have confidence that they should get a statistically consistent result?

  8. Everett F Sargent: “As to what author’s include with their papers the more the better. Or you actually taking the opposite POV? I don’t think you are, but …

    For the homogenisation benchmarking study I am most famous for, I made the data available. I only know of one researcher who has used this and his studies would not have been much different had he not. In retrospect a waste of my time. For smaller studies this would be even more the case.

    I like the trend towards open science and technology has made it much easier to do this, but we should not be fundamentalist about it. More bureaucracy is not always better science.

    When I made this study I did know there were some troubles in America, but had not realised how bad it was. So it was quite funny that when I blogged about the study the US culture warriors kept on asking where the data was. When I gave them the URL, they were no longer interested in analysing it; the question was intended as a bludgeon. Their disappointment may have made publishing the data worth it.

  9. Andrew E Dessler says:

    I don’t think that there is any reproducibility issue in climate science. Rather, climate must be one of the most reproduced fields in science history.The issue is not, “how did you get that result,” but rather “what does that mean?” For example, everybody can reproduce low climate sensitivity estimates from the 20th century – but what does it mean for our the climate system’s actual ECS?

  10. Andrew,
    Yes, I agree. That’s probably related to what I was trying to get at above. Sometimes when you get different results when trying to determine the same thing it’s because of some actual underlying differences, not because someone has necessarily made a mistake. Understanding why there are differences can, itself, be informative.

  11. Everett F Sargent says:

    OK, so the entire paragraph reads …

    “Yet there are good reasons why real effects may fail to reproduce, and in many cases, we should expect replications to fail, even if the original finding is real. It may seem counterintuitive, but initial studies have a known bias toward overestimating the magnitude of an effect for a simple reason: They were selected for publication because of their unusually small p-values, said Veronica Vieland, a biostatistician at the Battelle Center for Mathematical Medicine in Columbus, Ohio.”

    I’ve seen a lot of those types of studies, p < 0.0001 and r^2 = 0.05, sample size = 123 people. WTF?

    D'oh! Seriously, everyone should understand that mostly only initial studies showing significance get published. I consider that a form of confirmation bias in the publication process. But there are also very similar things that don't get published because of less than adequate significance (e. g. other researchers, in the same field, have been there done that and have not found adequate significance). That unpublished knowledge and their creators then try to update their previous work and replicate.the other study and still don't find significance? That one will get published/ Again, D'oh!

    What really killed off the dinosaurs (excluding avian species)?

    Then, there are those studies that have a long causal link through a chain of previous studies. Find the weak link, break that causal chain link, and poof, your study is no longer valid.

    With so many more people publishing today (and for the foreseeable future) in so many more journals (and the rise of pay-to-pay predatory journals) today, I'm of the opinion that the normal rules have now changed somewhat.

    If, for example, the field of psychology is shown to be relatively weak (e. g. not robust) in it's statistical methodologies, than that field needs an independent outside review process or needs to reevaluate it's own statistical significance criteria.

    Oh wait, others have said the same thing with respect to the field of climate science. I'm a psychology studies denier. So, oh well, never mind. 😉

    House of cards.

  12. EFS,
    I’m certainly not arguing that there aren’t any problems, or that we should try to do better (especially when it comes to statistical analyses). Just not really convinced that there is some kind of crisis, or that the solution is to make everything associated with a study available.

  13. Everett F Sargent says:

    “In, for example, the social sciences is it possible to constrain a study population well enough that if someone else wants to replicate a study, they can select a sample from a comparable population and have confidence that they should get a statistically consistent result?”

    Technically, one shouldn’t need to constrain a study population IF the original study was robust in its findings.

    However, if the original study overstates its conclusions or applies those conclusions beyond the study population demographics WITH the same statistical certainty (adequate demographics are the key area of contention IMHO, in other words, a large number of psychology studies have absurdly small sample sizes that don’t adequately cover the complete demographics in the binning sense), then I think that a lot of additional studies are required to arrive at either a narrow or broad consensus.

  14. EFS,

    Technically, one shouldn’t need to constrain a study population IF the original study was robust in its findings.

    What I’m getting is maybe more “define” than “constrain”. If you’re going to test someone elses’s results, you need to know something of the population from which their sample was drawn. What I’m asking is whether or not it is possible to define a population sufficiently well so that other studies can be confident that they’re drawing their sample from the same population and, hence, whether or not they’d expect a statistically consistent result.

  15. Joshua says:

    Technically, one shouldn’t need to constrain a study population IF the original study was robust in its findings.

    Except when you’re interested in examining the influence of demographic variables, which is often.

  16. Everett F Sargent says:

    “Just not really convinced that there is some kind of crisis, or that the solution is to make everything associated with a study available.”

    But the cat is out of the bag, so to speak. Kind of hard to put that toothpaste back in the tube, as it were. So that now, there will be more subject matter experts (SME’s) in their various fields that will exhibit greater skepticism towards other SME’s publishing in those same fields. And that I think that that will be a good thing going forward.

    Random thought: I think it can be helpful to provide the underlying data presented in plots, for example.

  17. Everett F Sargent says:

    “If you’re going to test someone elses’s results, you need to know something of the population from which their sample was drawn. What I’m asking is whether or not it is possible to define a population sufficiently well so that other studies can be confident that they’re drawing their sample from the same population and, hence, whether or not they’d expect a statistically consistent result.”

    You will need to look into papers using MTurk demographics. Those population demographics are extremely skewed with respect to US Census demographics. But, most of those papers, do make arguments as to their representation or similarity of the US Census demographics. However, there are a small group of papers that do question MTurk representation or similarity to the US Census demographics. So, for example, don’t use MTurk to study people over say 35 years of age, as you will always get a too small sample size (I have the statistics and they are robust).

    It sounds circular, but if you don’t get the same results, and the original paper did not supply their population demographics (I’ve seen that and it is very unfortunate to the point of making that study useless, because one should always supply adequate population demographics, so that others might be able to replicate said study), then the paper’s conclusions ARE dependent on the underlying population demographics. I would consider that a Bad Thing (TM), making a generalization from a specific (e. g. the conclusion is dependent on what should be an independent variable (i. e. the population)).

    In other words, does every paper become a one off, because said paper did not provide adequate demographics?

  18. Joshua says:

    It sounds circular, but if you don’t get the same results, and the original paper did not supply their population demographics…

    Some papers have methodological problems. What is your point beyond that?

  19. Yvan Dutil says:

    This crisis comes not from the fact that some research are not but that in some fields most are not reproducible. In biomed is 80% of non reproductibility, close to this number in psychology, Dont speak about social science. But in astrophysics, it is higher than 95%,

  20. Joshua says:

    In biomed is 80% of non reproductibility,

    Why is that a “crisis?”

  21. Everett F Sargent says:

    “What is your point beyond that?”

    I’m not sure anymore given this result (ATTP’s link) …
    https://espnfivethirtyeight.files.wordpress.com/2016/03/truth-vigilantes-soccer-calls2.png?w=1024&quality=90&strip=info

  22. Steven Mosher says:

    First it helps to understand the difference between replication and reproduceability and then to understand the history of the New term.

    Replication typically happens in the physical sciences. For example, I describe my method how I collected data and what tests I did. The claim is that if you try to replicate my behavior you will achieve the same result and thereby confirm my finding and disconfirm the claim that observer bias caused the the result.
    This works fine for what I would call the physical sciences or laboratory sciences.
    However, jon clarebout and others distinguished a unique problem in digital sciences and informational sciences that replication doesnt address. There insights come from work in signal sciences where an author describes the algorithm he uses and then presents results. In these cases the method decriptions are rarely adequate to allow any reader to replicate the work. And a negative result doesnt tell you anything. You dont know if you have recoded the algorithm the writer drscribed in words. The early empirical tests showed that authors themselves could not reproduce the very charts they published as results. That is the origin of the call for supplying the code and data required to reproduce, not replicate the science. It is tied to the unique nature of sciences or papers that are largely algorithmic.
    A good example of this was detailed in the climategate mails. McIntyre was unable to reproduce Jones work. He followed the methods wrote his own code and got different answers. He asked for the code
    And was denied. Jones in the mail explained why McIntyre failed:the method described in the paper skipped some steps that were in the code. Code he would never share.

  23. Everett F Sargent: “I’ve seen a lot of those types of studies, p < 0.0001 and r^2 = 0.05, sample size = 123 people. WTF?"

    No need for a replication study. Those numbers do not fit together.

    Everett F Sargent: “D’oh! Seriously, everyone should understand that mostly only initial studies showing significance get published. I consider that a form of confirmation bias in the publication process.

    In those highly empirical fields maybe or when you cannot have some confidence that your null result is because your experiment went wrong. But if you have theory or a model and you would expect a relationship, then it is highly interesting if the observations do not show a relationship. If you analyse data and you find relationships for X, Y, and Z, you can also report that there was no relationship for A, B, and C.

    Wasn’t there recently a study that explicitly looked at publication biases in climatology and could not find any?

    Everett F Sargent: “Random thought: I think it can be helpful to provide the underlying data presented in plots, for example.

    I think in the natural sciences, that is the strongest argument for Open Science. It helps the community as a whole progress faster.

  24. Everett F Sargent says:

    Wasn’t there recently a study that explicitly looked at publication biases in climatology and could not find any?

    No evidence of publication bias in climate change science
    https://link.springer.com/article/10.1007/s10584-016-1880-1

    VV,

    I’m talking, or rambling on about mostly psychology papers. I think I’ll backpedal on my D’oh! statement though.

  25. Jon Kirwan says:

    This has been argued (and will continue to be) forever, it seems. I remember having extended discussions about this back circa 1985. And that wasn’t the first time. Just the longest time I engaged a discussion about it.

    In broad strokes, Chris Drummond is right that science isn’t defined by researchers providing every scrap and note required to quantitatively replicate results down to a gnat’s eyebrow. What’s desired is that there is sufficient information that someone else well-informed (and hopefully as comprehensively informed as possible) on the subject would be able to figure out how to reproduce the results sufficiently well and with some reasonable rigor.

    In physics, it is not nearly enough to say that “every particle attracts every other particle.” But if you instead say “every particle attracts every other particle and the attractive force diminishes at the same rate over the distance apart, similarly to that which light intensity also does, and that if you double the mass of one particle the force is twice as much and if you double the mass of both particles, then the force is four times as much,” then I could say that you’ve written entirely enough. That language is sufficiently objective that anyone sufficiently well informed can rigorously develop the necessary equations, differential or otherwise, to any shared set of specific circumstances and reach similar results as another researcher in a different time, culture, and place.

    One doesn’t need to do a memory dump of their computer for the work product to be scientific. The main thing is that you’ve provided enough context that most of those who are well informed on the topic can replicate the work, if they wanted to do so, within a reasonable range.

    Galileo spent a great deal of time adding rigor to his work. He started out trying to time things with his pulse rate and also with water drops from containers. But he realized that (in his day) others in different places in the world might not have similar access to his tools, nor could they necessarily understand his words well enough to replicate his results. So he added the solar year to his work and provided enough information that anyone else on the planet, using whatever tools they had on hand, could get to the same place. He provided references.

    But there was no need at all that others have the same tools, the same drop buckets, the same pulse rates, or the same anything else. They might have novel ideas Galileo couldn’t even imagine at the time. It doesn’t matter. He used language SUFFICIENT to reproduce his results, if you wanted to do so. And that is all that counts.

    How all this plays into other fields, such as psychology for example, I have no idea at all. Or if you cannot experiment over and over to get things right (climate, for example, or weather?) I also don’t know. I think what is meant by “sufficient for someone well-informed to replicate” is a matter for specialists in the fields in question, perhaps. (Hopefully, they can work it out so that these things are independent of culture, time, and place.)

    I’ll end this with another thought. Not everything gets the privilege of being “replicated.” Who wants to replicate the work of another, except in extreme cases where there are certain circumstances demanding replication? One wants to contribute, do novel work, add to a field. Not go backwards replicating old results. But there are times when that’s needed, of course. But when it may be needed, isn’t it better that less is provided than more, so long as it is sufficient? You don’t want duplication. You want replication and you want creativity in approaching that. So what’s desired is enough context that what is being claimed is understood well enough for appropriate rigor, but not much more than that.

    Just my thoughts from some old, long discussions decades ago.

  26. Steven Mosher says:

    “hat’s desired is that there is sufficient information that someone else well-informed (and hopefully as comprehensively informed as possible) on the subject would be able to figure out how to reproduce the results sufficiently well and with some reasonable rigor.”

    If we have empirical results that a significant percentage of researchers cant reproduce their own work when they have first hand acces to their data and methods, what conclusion can you draw from that.

  27. Steven Mosher says:

    But when it may be needed, isn’t it better that less is provided than more, so long as it is sufficient? You don’t want duplication. You want replication and you want creativity in approaching that. So what’s desired is enough context that what is being claimed is understood well enough for appropriate rigor, but not much more than that.:

    1. Who judges what is sufficient and how is that tested.

    You write a paper claiming.
    A. I used this data.
    B. I developed a method to analyze that data.
    C. This description in words represents a sufficient set of instructions for you to write this code.
    D. I ran my code and got these tables, charts and graphs.

    I take your words. I write code. I get different answers.
    Explain the root cause.
    1. Was our data different
    2. Was your description of the code correct and complete.
    3. Did I code what you described.
    4. How does my code differ from yours.
    5. Did either of us screw up the output.

    When you get a different answer you don’t know why.
    And no journal will force an author to get down to root cause. They will hold that the description of the method in words is correct even though they never checked it against the code.
    This leaves us trusting peoplte to work with potential critics to get to root cause.

    Point is you learn nothing by having people re implement code from written instructions..

  28. Yvan,

    But in astrophysics, it is higher than 95%,

    Did you mean 95% is not reproducible, or is reproducible?

  29. Steven,

    Point is you learn nothing by having people re implement code from written instructions..

    If I understand what you’re saying, then yes. If I follow someone elses’s instructions and get the same answer, I’ve probably learned little. However, if I understand what they were trying to do and then develop my own way of testing what they were trying to test, I may well learn something.

  30. angech says:

    The background is basically that there have been a number cases in which people have been unable to replicate, or reproduce, some earlier scientific/research study.
    Yvan “In biomed is 80% of non reproductibility, close to this number in psychology, Dont speak about social science. But in astrophysics, it is higher than 95%,”
    EFS “a large number of psychology studies have absurdly small sample sizes that don’t adequately cover the complete demographics in the binning sense)”
    VV “The reproduction crisis is mainly a problem for the empirical fields of science, when you work with humans nearly any outcome is possible, there are not many theoretical constraints.”

    Most of physical science is replicated daily in schools, universities and industry and it works hence is replicated with ease. If not the expectation is that a mistake has occurred in the implementation
    In new areas of physical science the expectation is that the old ideas should seamlessly add on to the new.

    In the social sciences the problem is not only small sample size but a change in the underlying item/idea you are trying to measure that does not occur in the physical sciences.
    Take a population voting in Year 1 Obama and Year 1 Trump. Suddenly the notion that people are blue or red changes because what you are measuring changes.
    Take people’s perceptions of the economy 6 months before and 6 months after 2008 financial crisis. Same units [people], Totally different outcomes.
    I do not think reproducibility should be expected to occur in the social sciences over any decent time frame and the trouble with reproducibility is that technically it should usually occur over a time frame. You can measure changes in people’s beliefs but not expect to reproduce said beliefs in a replication.

    Is there another controversial study in the social sciences field this is leading into?

  31. dikranmarsupial says:

    “Point is you learn nothing by having people re implement code from written instructions..”

    I don’t quite agree with this. One thing you have learned is that the authors code probably actually does what the author thinks it does (judging from the written description). This kind of replication increases confidence that the study results are actually correct. I’d argue we learn nothing by just getting the author’s code and data and running it again. Often implementing an algorithm in code is helpful in understanding it, in a way that just running it isn’t.

    One thing that is neglected here is that there is a cost associated in making code and data available, which is that if there is any interest in it, you may find yourself being asked questions about it for a decade or so. Writing code that you can still understand after a decade involves rather more work than just writing “research quality” code that allows you (alone) to run an experiment. With every increasing time and funding pressures on academia, wanting open science will mean less science gets done. If society wants open data/science, then the funding needs to be there, this is especially the case where the data and code is provided because people request it and then don’t actually do anything with it.

    The main reason I give code away is to encourage people to use my ideas in their work, not just so that they can reproduce the results in my paper.

  32. Dikran,

    One thing that is neglected here is that there is a cost associated in making code and data available, which is that if there is any interest in it, you may find yourself being asked questions about it for a decade or so.

    Something like this was mentioned in the paper. I think there is a sense that it should be easy to provide all your data and code. I suspect, however, that doing is properly is not as easy as some might suggest and – as you say – you may end up fielding queries for a long time after that.

  33. If I understand what you’re saying, then yes. If I follow someone else’s instructions and get the same answer, I’ve probably learned little.

    But still more than running someone else’s code and getting the same answer. At least makes it unlikely that result was due to a coding error.

    One thing that is neglected here is that there is a cost associated in making code and data available, which is that if there is any interest in it, you may find yourself being asked questions about it for a decade or so.

    The main user of your own work is your future self. Making sure you can reuse your work is an important argument for open science.

  34. Victor,

    But still more than running someone else’s code and getting the same answer. At least makes it unlikely that result was due to a coding error.

    Yes, true. I’ve certainly spent time developing my own versions of codes, rather than simply getting a code from someone else. In these cases, it’s more to do something new, than to simply test what others have done before.

    The main user of your own work is your future self. Making sure you can reuse your work is an important argument for open science.

    Good point. I spend quite a bit of my time trying to find bits of code I now I’ve written before. I should really do better 🙂

  35. Steven Mosher says:

    Dk. What I learn when I get your code and run it this

    A. Your method description matches you actual code.
    B. Your published output matches the actual output.
    C. Your result is not machine dependent..

    Remember where this started…informational sciences.

    Spending years try to write someone’s code from their written description is just medieval and you know it.

  36. Steven,

    Spending years try to write someone’s code from their written description is just medieval and you know it.

    Spending years might be a bit much, but I have certainly spent some time writing my own code, rather than simply using someone elses’s. I have also done the latter. Depends on what you’re trying to do and what (as Victor said) you might learn by writing your own version of a code.

  37. If I can make a general comment, it seems that the real issue is that this is complicated. There are situations where having all of someone’s data and code might be appropriate, and others where it is not. There might be cases where re-writing someone’s code would be useful, and others where it is not. This all might depend partly on different disciplines (physical versus social sciences) and may even differ within a discipline. Also, there clearly are some issues that should be address, in particular how we present things like statistical significance. I certainly don’t have some kind of particular suggestion, but I do remain unconvinced that there should be a requirement that researchers make everything available. However, that’s not to say that I never think this would be suitable, or that I think we shouldn’t be endeavouring to be as open as is reasonably possible.

  38. Exactly, it is a nice new shiny toy and I like it a lot, but one should not exaggerate its importance and allow for human judgement.

  39. -1=e^iπ says:

    @Dikran – “wanting open science will mean less science gets done”

    Not necessarily, if there is more open science and more access to the code of other scientists then scientists may be able to use this to learn things faster, verify things faster and develop new code faster (by modifying someone else’s code).

  40. izen says:

    @-SM
    “A good example of this was detailed in the climategate mails. McIntyre was unable to reproduce Jones work. He followed the methods wrote his own code and got different answers.”

    It is an excellent example.
    There were many other researchers who used other methods with different data and got the SAME answer.
    A hockey stick shaped graph.

    And when McIntyre finally made his code and methods available it became evident he had (cherry) picked the results he claimed showed the MBH method produced ‘hockey sticks’ from random data.
    AS Wegman’s attempt to replicate McIntyre result revealed.

  41. Steven Mosher says:

    If you give me the code there is nothing preventing me from writing my own version myself..and using your code to check my work…all without pestering you with questions.

    You give me the code. I work from your description to do my own..when I’m done I check against the real thing. Best of both worlds.

    The difference is who chooses the best way to learn.

  42. Steven Mosher says:

    Yes izen.

    Nothing prevents replication. If you share code you are not forcing people to use it. You are not preventing them from doing their own from scratch. Reproduceability does not replace replication.

  43. dikranmarsupial says:

    -1 you are being overly optimistic. Lots of code gets put in the public domain that never gets used for anything. Also in my experience, a lot of the research code that gets put on line is of no use as it requires environments or third party libraries that you don’t have access to.

  44. dikranmarsupial says:

    SM “What I learn when I get your code and run it this

    A. Your method description matches you actual code.”

    No, just RUNNING it does not tell you that. Analysing the code does.

    “B. Your published output matches the actual output.”

    yes.

    “C. Your result is not machine dependent..”

    no, not unless you test it on all machines. And provided you have the same version of MATLAB and that it has been compiled with a compatible set of numeric libraries, and you have the same toolboxes etc.

    Don’t get me wrong, I am generally in favour of making code available, I am also in favour of replication from the paper. The two are not mutually exclusive and both have benefits. They both have costs though as well. Requesting code or data from someone when you have no intention of using it is harassment though IMHO.

    Richard Tol’s approach to code sharing shows the limitations, for the paper I investigated it was an unannotated spreadsheet that just created more questions than it answered. Presumably that is because adding the annotations would be more work.

  45. dikranmarsupial says:

    I should add a couple of years ago, I wanted to do an experimental evaluation of a particular topic in machine learning, so I downloaded all of the publicly available (MATLAB) code I could find. I could only get one of the toolboxes to work on my computer because so many of them used third party optimisation toolboxes that I didn’t have and couldn’t afford. Fortunately the one that did work was the best direct competitor for my approach, but it did put an end to the idea of performing an experimental survey. Your mileage may vary, as they say.

  46. izen says:

    It seems unlikely that the idealistic notion of open data and methods will gain traction anytime soon in medical research.

    Commercial confidentiality and non-disclosure of proprietary information is still the default position despite over a decade of trying to get a, at least voluntary, public record of all research carried out.

  47. JCH says:

    And Professor Curry and her ilk will continue to use the problems in medical research to smear. That’s how integrity works.

  48. Willard says:

    Something like a simple checksum should suffice.

  49. Willard says:

    If there’s no checksum-like solution, we may have a problem:

    Why the Reward Structure of Science Makes Reproducibility Problems Inevitable

    Recent empirical work has shown that many scientific results may not be reproducible. By itself, this does not entail that there is a problem (or “crisis”). However, I argue that there is a problem: the reward structure of science incentivizes scientists to focus on speed and impact at the expense of the reproducibility of their work. I illustrate this using a well-known failure of reproducibility: Fleischmann and Pons’ work on cold fusion. I then use a rational choice model to identify a set of sufficient conditions for this problem to arise, and I argue that these conditions plausibly apply to a wide range of research situations. In the conclusion I consider possible solutions and implications for how Fleischmann and Pons’ work should be evaluated.

    Click to access rewards-and-reproducibility2.pdf

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.