Unless you’re a UK-based academic, you may not know that yesterday was an important day in UK academia; it was the release of the results from the 2014 Research Excellence Framework (REF) exercise. This is an assessment of most of the academic departments at all UK universities, that takes place about every 7 years. It involves around 300 senior academics spending about a year reading hundreds of papers each, so as to produce an overall score for each department; a result that could probably be roughly reproduced in a few hours using any of the available research databases. Amongst many others reasons, that one’s of the issues that I have with the whole exercise. As pointed out in this Guardian article

‘Excellence’ tells us nothing about how important the science is and everything about who decides.

Essentially we end up quantifying excellence using metrics that the system itself has decided are some kind of measure of quality, but may be no such thing.

The problem as I see it is that the desire is to quantify something that is probably unquantifiable, and so what happens is that it’s done using simplistic metrics that may not be a particularly good indicator of excellence, but are more an indicator of what is regarded as interesting – by the system itself – at that time. There are certainly people who publish lots of papers and get lots of citations, but don’t do particularly excellent work. There are others who publish less and don’t get lots of citations, but do work of a very high quality. You may think that the latter may not be worthwhile if it isn’t noticed, but sometimes you need to do high-quality work to make that crucial incremental step that allows others to solve the next big problem.

Also, there are impacts that are even harder to quantify. Did some work capture the public’s imagination and encourage an interest in science and the humanities? Did some research lead to the development of some new technology? Did some research simply satisfy our desire to better understand the world around us? Will something we do today play a small, but important role, in something amazing that will happen in 50 years time? Some things just aren’t really quantifiable and I think sometimes we should be careful of trying to do so.

Another problem with such an assessment exercise is that it involves so much money, and doing well makes such a large difference to a university’s finances, that you start being encouraged to think about how your work could score well in such an exercise, rather than thinking about what interesting problem you could be trying to solve. Not only does it reward those who just happen to score well in such an exercise, it encourages everyone else to try and do the same. In some sense, it risks reducing diversity. Of course, some might argue that that’s okay because it will all be excellent, but that’s only because we’ve defined excellence in that way, not because it is excellent in some intrinsic sense. We also run the risk of missing out on potentially excellent research that isn’t done because it might not score well on an assessment exercise.

Additionally, it runs the risk of focusing resources in an ever decreasing number of institutions. If you do very well you get a lot more money per person than if you do poorly. Hence those that do well in one exercise could do even better the next time, and could collect an even bigger fraction of the available funding. If it was truly a good measure of excellence, maybe this would be fine, but it probably isn’t. We risk damaging pockets of excellence in small universities if we make the difference between doing well and doing poorly too great.

There are even other issues with the whole process. It’s research only. Universities are also places where we teach and it’s certainly my view that the biggest impact I’ll have in the short (and maybe medium) term is that I will be involved in teaching people who will hopefully benefit from their education, and who will hopefully contribute to society as a result of their education. There’s no incentive to reward good teaching because doing so doesn’t actually make any immediate difference to a university’s finances. Someone who can publish papers that will score well on an assessment exercise is, however, worth rewarding even if their teaching is poor.

Anyway, that’s my little rant over. We actually did quite well in REF2014, so this isn’t some kind of sour grapes. I just think that universities are important and complex organisations, and that trying to quantify their value in some simplistic way could end up doing more harm than good. I don’t think that we should be funding things that don’t have any value, but I also don’t think there’s much point in justifying our funding decisions using a complicated and time-consuming process that probably doesn’t really have the ability to judge intrinsic value. I don’t really have a solution though, but – fortunately – it’s not my job to find one.

This entry was posted in Science and tagged , , , , , , . Bookmark the permalink.

70 Responses to Excellence!

  1. dikranmarsupial says:

    I agree with much of that (UEA did pretty well too, so no sour grapes here either). Reminds me of an interview Peter Higgs gave:

    “He also had little enthusiasm for the changes that had transformed academia by the time he retired in 1996. “Today, I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough.” His published papers can be counted on two hands, whereas academics now are expected to churn out several a year, and when I ask if he feels this has come at the cost of space for intellectual thought, he says: “I was certainly uncomfortable with it. After I retired, it was quite a long time before I went back to my department. I thought I was well out of it. It wasn’t my way of doing things any more. It’s difficult to imagine how I would ever have enough peace and quiet in the present sort of climate to do what I did in 1964.”

    Chuckling, he goes on, “I was an embarrassment to the department when they did research assessment exercises. A message would go round the department: ‘Please give a list of your recent publications.’ And I would send back a statement: ‘None.'” This is clearly not a source of embarrassment to Higgs, who lets out a mischievous cackle when he adds that a researcher studying Nobel prize winners’ publications emailed him recently to say that he’d looked at the papers listed on Higgs’ website, but could he please have the full list?”


  2. Dikran,
    Thanks. I was actually visiting somewhere recently and they were telling a story of the previous REF (called an RAE back then) in which an academic had been left out because their publication record wasn’t good enough. There was apparently a mad scramble two weeks before the deadline, when this person was awarded a Nobel prize.

  3. dikranmarsupial says:

    Would probably score quite well as an “indicator of esteem” for the RAE! ;o)

  4. guthrie says:

    I read a few years ago of a Canadia study which looked at the awarding of grant money, and came to the conclusion that the best thing to do would be to just dole it out as requested, because that way people didn’t spend half their time writing grant applicaitons but insntead got on with research.
    Here of course with the REF, you only need to look at the previous examples of such things, e.g. NHS waiting times and lists, school exam results etc etc, to know that it isn’t going to work.

  5. Neither such an evaluation nor the views about its problems are unique to UK. It’s not done as regularly in Finland but the rest is the same.

    I have been involved with similar evaluation also in an organization of applied research (VTT, The Technical Research Centre of Finland), where the applied and target oriented nature of the research might make evaluation easier, but the problems appeared to be about as difficult to resolve.

  6. Working in Germany, no sour grapes for me as well.

  7. Victor,
    Very good, I agree. You end up being forced to find a way to make your result seem as positive as you possibly can.


    Here of course with the REF, you only need to look at the previous examples of such things, e.g. NHS waiting times and lists, school exam results etc etc, to know that it isn’t going to work.

    Yes, that’s roughly what I was thinking too.

    I’m not surprised similar things happen elsewhere. It’s partly human nature to want some way to measure success and find ways to justify how your spend money in the future.

  8. I’ll add an addendum to my post here. What motivated this was the Guardian article to which I linked, but I’ve always had an issue with the whole “excellence” claim. In particular, I often hear it when people discuss diversity in the sciences. The argument goes something like “it’s okay if we’re not diverse, because our hiring policy is to hire on the basis of excellence and therefore we’re not discriminating”. The problem I have with that is that excellence is defined in terms of the system that already exists; it’s not intrinsic and may not be the same type of “excellence” as would be possible in a more diverse environment. It seems like a cop-out and a way of justifying something that people probably know isn’t quite right. I think we should stop falling for it and realise that what we think is excellent may not be universally true. It’s hard to define excellence and there are probably many ways to do something that might be regarded as excellent, but that we just don’t realise yet.

  9. I just had my end-of-year assessment. I wrote “I was awesome” on it. And some other words too.

    Of course the larger scale assessment is done by the market.

  10. William,

    Of course the larger scale assessment is done by the market.

    Yes, but I suspect that the “market” in the university sense is our economy and society in the coming decades, rather than something we can quantify easily today.

  11. philipmoriarty says:

    Great post. The problem with the term “excellence” is that it’s entirely ill-defined, and therefore means anything (and everything) to everybody, particularly university managers/PVCs.

    After receiving a series of e-mails earlier this year riddled with the “excel?!*ce” word (and which had been, ahem, “cascaded” down from the upper echelons at the “centre”), I finally cracked and had to vent my spleen: http://physicsfocus.org/philip-moriarty-vacuity-excellence/

  12. Layzej says:

    This is probably not a problem that is unique to academia. I’ve been in situations where the building is burning down around us – but looks at the metrics! The metrics are great! (and vice versa). They measure what is measurable, but not necessarily what is important. They need to be taken in context and your post does a good job of providing some of that context.

  13. Philip,
    Thanks, and thanks for the link to your post. I’d actually just seen that after seeing Jack Stilgoe link to it in a tweet. Also a good post.

    Thanks. I’m sure it’s not unique to academia, although I think that – as William suggests – other sectors can have more obvious signs or success (making a profit). Admittedly, the same kind of thing does happen in healthcare and school education in the UK and that doesn’t appear to have been wonderfully successful.

  14. Joshua says:

    ==> “The problem as I see it is that the desire is to quantify something that is probably unquantifiable, and so what happens is that it’s done using simplistic metrics that may not be a particularly good indicator of excellence, but are more an indicator of what is regarded as interesting – by the system itself – at that time”

    See this all the time. Want to measure something? Make a scale and assign some numbers. Why bother to assess validity (that the numbers measure what they purportedly measure?. All you need to know is the number.

  15. They measure what is measurable, but not necessarily what is important.

    And once you put this into a reward system, people will start to do what is measurable and not what is important.

    It sure is not unique to academia, it might be a bit more prevalent in the UK than elsewhere. Which at least rewarded us with an excellent Reith lecture on trust, which is to a large part on micromanagement via quantifiable measures to increase trust, which destroys trust.

    Like Philip, I had expected a bit more about the horror term: excellence, given the ironic title of the post. Imagine a university with only Einsteins, where an Einstein is debugging a net-cdf library to read some data that later turns out to be a measurement error. We need people at all the different skill levels, with a range of different skills and interests.

  16. MWS says:

    Not everything that counts can be counted, and not everything that can be counted, counts.

  17. Eunice says:

    Excellence in academia is akin to Quality in business:

  18. John Mashey says:

    But it is your job to find one, in this “your” meaning the generic academic research community. If you do not propose better ways of allocating money to research activities, others will keep deciding this.

    (Evaluating research professionals is far harder than doing it for development engineers or especially salespeople. Nevertheless, some entities have some history of being able to do it OK and maybe such might be studied, although academe is not exactly the same as industry or government.)

    Without a clear alternative, people might be tempted to apply Churchill’s comment on democracy.

    Note: this is not a defense of the existing system.
    I went industrial rather than academic long ago since I had concerns about evaluation systems even then.

  19. Victor,

    We need people at all the different skill levels, with a range of different skills and interests.

    Yes, and is partly what motivated my mention of diversity.


    If you do not propose better ways of allocating money to research activities, others will keep deciding this.

    Partly I agree. Partly I think we do have to be careful of assuming that we do know what’s best. A form of feedback is what I think is needed. Governments will try and push in one direction and university’s should – in my opinion – be willing to push in a different direction if they think that’s what would be better. Over time you may evolve towards something sensible. The problem is that government’s dangle pots of money and university management likes money. As far as they’re concerned this has probably been a success, especially for the bigger universities that have done well. You need to have people in senior positions who are willing to think about the long term and not the short term; and that is probably quite difficult for someone who is trying to balance next year’s budget/

  20. Bwana_mkubwa says:

    The term excellence, when used in corporate speak, now has much meaning as the word passionate as used by every talent show wouldbe and advertizing copywriter. Research excellence has as much resonance as We are passionate about [making baked beans/singing songs/baking brownies.
    Coupled to current UK research council policies,the ultimate effect of the REF system will be to produce a highly-raked pyramid structure. The only new appointees that will succeed and move up the pyramid will be those in the “right department” who are gophers for the Big-shots defining what is Excellent research.
    Rather than having a systems that works through responsive modes, in which individual academics come up with ideas and explore new ideas, the UK is slowly moving to a command economy for research, where politicians and administrators define what research is important and tell researcher what their ideas should be.

    I have always contented that if all the effort that was expended in the REF game was applied to actual research, Academia-UK would probably have 20-25% more papers published and the quality would also be lifted.

  21. Bwana,
    Yes, I think that you’re right. I’m not sure we need 20-25% more papers, but I agree that if we allocated the money in a more responsive fashion that rewarded genuine risk-taking, the quality would be lifted.

  22. John Mashey says:

    Try R2-D2 and other lessons from Bell Labs. That is not exactly the academic problem, but it overlaps. Academic research would mostly fit R1+R2, with some D1 and D2, but generally not D3/D4. Bell Labs Area 10 was mostly R1+R2, and often had people working on basic science research, for things that might or might not pay off in 20 years. Of course, the research areas were narrower than the breadth in universities, but within the domains covered, people had a lot of flexibility. (Penzias & Wilson’s Nobel discovery of Cosmic Background Radiation from the Big Bang might not be an obvious research direction for a telephone company). Sadly, by the time of Plastic Fantastic, things didn’t work so well.

    The US academic/government relationship is much more distributed than in UK, but there are certainly examples of government being able to do the right thing on R. DARPA folks have long done OK.

    But again, if people don’t think the system works well, there need to be concrete proposals to change it or avoid it getting worse. Back in the 1970s, the software content of Bell Labs’ work was rising quickly, and we did leading edge systems and software engineering.

    (A)The Bell System was used to creating metrics for everything, and some took a very heavy top-down metrics approach, trying to push Lines of Code/day for evaluation, so that managers who didn’t really know what was going on could have numbers. Does this sound familiar?

    (B) Some of us emphasized small teams, continually-improving sets of tools and support environments, re-use of code vs writing new code all the time, but writing at the highest level possible when new code was needed.

    Anyway, it takes hard work and continual evaluation to make sure that system-wide metrics do not have unintended side-effects. We used to worry about this all the time, for example in such mundane things as telephone repair. If you used metrics that just encouraged managers to respond quickly to repair calls, they’d tend to slack off on preventative maintenance, which in the long run would increase trouble rates, so we’d diddle metrics to incent them better to balance longer-term issues.)

  23. dikranmarsupial says:

    I think making more money available for responsive mode grants would greatly improve the system. On the bright side, the way the REF looks at only four papers per researcher is a good thing in that it rewards quality over quantity, and so is not feeding the growth of predatory open access journals (I would expect papers in such journals to be likely to be rated as 1* or 2*, as good quality work is likely to be publishable in a more highly ranked journal). An “excellence framework” that looked at total volume of research output would be worse.

  24. dikran,
    Yes, I agree. Only assessing 4 papers is clearly better than simply looking at volume. However, I think one could get a very similar result by simply looking at a Department’s h-index, so I suspect it’s better but still being done in a rather simplistic way (despite what might be being claimed). Knowing my luck, I’ll focus on getting 4 strong papers out in the next 7 years and they’ll suddenly decide that it should be 6 🙂

  25. dikranmarsupial says:

    I just try to do good work and not worry too much about REF etc. Sadly much easier said than done!

  26. Yes, I tend to do the same. Has worked so far, I think.

  27. Bwana_Mrefu says:

    Quality over quantity is the way forward.

    Got my four papers for the next REF in the bag already. So now I am just going to focus on getting at least one Science, Nature, Nature-something paper….he says cockily (The little Devil in my head is saying “it is going to end in tears.”)

  28. Quality over quantity is important. And actually reading the papers is a lot better than using some bibliographic indices. If only because that is much harder to game. Also using 6 (or 2) papers next time rather than 4 is great to catch people trying to game the system. In this respect I must say that REF is structured better than I had thought.

    Bibliographic indices should never be computed for individuals, the code of ethics of the German Science Foundation even forbids this. But also if you compute it for an entire school or university computing these indices becomes problematic when you spread money using them. Then you are creating a system in which the schools and universities start to apply pressure on individual researchers about their personal bibliographic indices. And researchers start to be forced to optimise them (and business volume) rather than simply doing the best research they can.

  29. dikranmarsupial says:

    I’m not convinced that reading the papers is necessarily any better than bibliographic indices, simply because it requires the reader to be sufficiently aware of the sub-field of each paper to properly appreciate it, which may be quite nuanced. I would prefer to see a system based on the impact factors (normalized by refined subject area) of the journals in which the papers were published (rather than bibliographic indices of the papers themselves). Good journals tend to have high standards and highly expert reviewers, so it is questionable whether a more generalist panel re-reviewing them will result in a better judgement of their quality. Hats off to the panel though, a herculean effort for sure, and I think they did a good job.

  30. I was going to say something similar to Dikran. I think one can produce a ranking using bibliographic data that is similar to what is achieved by reading the papers, which either means that the bibliographic data is not a bad indicator of quality, or the paper assessment isn’t as independent as one might hope (my, not completely ill-informed, view). Additionally, if memory serves me right, not only does the panel not include enough people to cover all areas in sufficient depth (an impossibility I suspect) each paper was read by two people, which meant they all had to read and rate 600 papers in a year. A difficult, if not impossible, task (not to do, but to do properly).

  31. Okay, yes, if the reviewers are not able to judge the papers well, and almost nobody is smart enough to be able to judge 600 papers (in year), then it starts to make less sense. I am still high sceptical of using bibliographic data to judge people. Probably a main reason why science has becomes less productive.

  32. philipmoriarty says:

    I agree entirely with Victor on this. The problem with using bibliographic data such as H-indices and journal impact factors are that they are not a measure of quality — they’re a measure of popularity. These need not necessarily be well-correlated.

    Moreover, if a system is introduced whereby bibliometrics form the basis of the REF (or whatever the next assessment exercise will be called), then university management will focus simple-mindedly on those simplistic metrics. See the following blog posts (and associated comments threads) on the dangers of using simplistic metrics to assess staff.






  33. Philip,
    Thanks. I’ve also just read David Colquhoun’s most recent post about the death of Stefan Grimm and life at Imperial College. It’s a tragic story and appears to not reflect well on Imperial College at all.

  34. dikranmarsupial says:

    Philip, bibliographic indices of journals do measure quality, which is why my suggestion was based on the indices of the journals in which the paper was published, rather than of the papers themselves. In my field (machine learning) the impact factors of the various journals are a reasonably good guide to their quality and don’t suffer from the biases (and we all have them) of individual reviewers. Personally, I’d rather see more money allocated via responsive mode grants, where it is the quality of the proposed research that matters, rather than the standing of the institution where it takes place.

  35. philipmoriarty says:

    No, I’m afraid I vigorously disagree. An impact factor does not measure quality. We do PhD students and early career researchers (and, indeed, science as a whole) a great disservice by assuming that impact factors and the quality of a paper are necessarily positively correlated. Stephen Curry’s “I’m sick of impact factors” post — linked to in my comment above — is very important. I strongly recommend that anyone with a commitment to the ‘value’ of impact factors reads it.

    Good papers are published in low impact factor journals. And some awfully poor quality papers are published in extremely high impact factor journals. (And, of course, vice versa). Over the last couple of years I have been embroiled in a debate in my research field (nanoscience) regarding the publication of thirty papers in many of the highest quality journals there are. All based on the misinterpretation of simple artefacts. See http://physicsfocus.org/philip-moriarty-peer-review-cyber-bullies/

    The key problem is that the journal impact factor — or, in some cases, the journal ‘brand’ — is now used as a proxy for quality. I’ve sat on a number of fellowship/lectureship panel meetings during which the publications of two candidates are compared. “This chap has got better publications than this one”. How do they know? They’ve never read the papers! They just assume that the papers are better because of the brand-name/impact factor of the journals.

    This type of thinking is insidious. It’s clear in the “stripy nanoparticle” example linked to above that a major contributing factor to fundamentally flawed work making it into “top tier” journals was the perception that “Well, they published their last paper in (e.g.) Nature Materials. That’s a good journal. This work must be sound”. And yet the work shouldn’t have been passed for an undergraduate report, let alone appear in what are considered to be the highest quality journals in the field.

    Where I will agree with you is on responsive mode funding. But that doesn’t really exist any more. How could it? Every grant proposal for EPSRC must be accompanied by a National Importance and a “Pathways to Impact” statement, so the work needs to align with strategic priorities to stand a chance of being successful. That’s not responsive mode…

  36. dikranmarsupial says:

    phillipmoriarty indeed, barely worth writing what it is you actually want to do! ;o)

    While there are good papers published in bad journals, and vice versa, on average the quality of papers published in top-tier journals is better than that in lesser journals which in turn is higher than that in predatory open access journals. It is not a matter of “must be sound” but “more likely to be sound”, and I would argue that is largely true. Likewise there will be good papers that are rated poorly by individual reviewers and bad papers that are reviewed highly, especially if the reviewers are not specialists on the particular sub-topic. Peer review of journal papers demonstrate this to be true, I suspect most of us have had sets of mutually contradictory reviews, where some of the reviewers have liked the paper and others hated it, and in that case the reviewers ought to be experts at the sub-topic level. Both bibliographic indices and reviews will be high-variance estimates of quality. Bibliographic indices at least have the advantage of being cheap and objective (so the rules of the game are made explicit and constant) and could be applied on a continual basis, which would be less wasteful than the cyclical stress nature of the current scheme.

    I don’t think any scheme is perfect, but I am not convinced that the current system is sufficiently better than a bibliographic index (of the journal not the papers) based approach to be worth the expense and (entirely admirable) effort.

    Personally I think citations of papers are very important, I want my research to be useful to others. If nobody wants to take what I have done and build on it, what was the point in having done it in the first place?

  37. Dikran,

    Personally I think citations of papers are very important, I want my research to be useful to others. If nobody wants to take what I have done and build on it, what was the point in having done it in the first place?

    Likewise, but there are always issues with this. There are areas within physics where papers can easily collect 10s or even 100 of citations in a year. There are others where a paper collecting over 100 over many years would be very good. This clearly reflects the interests in those individual areas, but does mean that it’s hard to even compare papers using bibliographic info even within the same general areas (physics, for example), because the sizes of sub-fields can be quite variable. Of course, it does tell us something about how interested we are in these different sub-fields at this point of time, but doesn’t necessarily mean we would still be interested in those sub-fields in 10 or 20 years time.

  38. dikranmarsupial says:

    ATTP, yes very true, also sometimes a paper is in advance of its time and only collects citations after everybody else has caught up. There is no ideal metric, but writing papers in such a way as to make it as easy for others to take up and use (and hence cite) is a pretty good thing to aim for. I used to teach a post-grad module on how to review a paper, and I would usually start by asking why we write them first, and this was usually quite a long way down the list!.

  39. Dikran,
    Yes, a good abstract, and a clear conclusion are quite importants part of a paper. I guess my view is that we don’t need the REF. We’d be much better off properly funding universities to do quality teaching, provide some amount of non-responsive mode funding so universities can pump-prime some areas of research and have some ability to take risks in areas that may not currently be popular, and then have the rest coming through responsive-mode funding that is little influenced by “pathways to impact”.

  40. philipmoriarty says:

    “Personally I think citations of papers are very important, I want my research to be useful to others. If nobody wants to take what I have done and build on it, what was the point in having done it in the first place?”

    So the quality of a piece of science is defined by its utility to others? I quess we’ll have to agree to disagree on this. A piece of science can stand alone as being of high quality even if no-one cites it. There are interesting parallels here with the arts. Is the quality of a piece of music or literature defined by the total number of “downloads” or sales? As I say in the blog post linked to below, One Direction have outsold The Beatles. Does this mean that they’re artistically more important?

    A high quality piece of science is a high quality piece of science regardless of whether it picks up 10 citations or 1000 citations. And when the number of citations is directly linked with the perceived prestige of the journal – and the majority of researchers use impact factor as a proxy for quality — then we clearly have a “self-referential” and rather flawed system.


  41. dikranmarsupial says:

    philipmoriarty “So the quality of a piece of science is defined by its utility to others?” In the sentence you quote I am talking about what is important not what makes something of high quality. As an employee of the public sector, it is reasonable for me to be expected to produce work that is useful to society (although not necessarily immediately useful) rather than (or as well as) work that is “merely” of high quality. There is also nothing wrong with solid incremental science that is useful to society or the research field and that should be rewarded as well.

    However, as I was arguing that the criterion should be the quality of the journal in which the work is published, rather than the citations of the individual paper, my scheme would reward both intrinsic quality (as high quality work is generally published in high quality journals, even if it isn’t highly cited) and solid incremental work (as that usually ends up in a decent journal as well). The point is that for high variance estimators (both bibliometric and reviewer-based) you need to do a lot of aggregation to average out the variance.

  42. dikranmarsupial says:

    aTTP, exacatly!

  43. philipmoriarty says:


    (i) But a piece of fundamental science — eg in astronomy or particle physics — *is* of value to society.

    (ii) “However, as I was arguing that the criterion should be the quality of the journal in which the work is published” . But a journal impact factor is not an intrinsic measure of the quality of a paper published in that journal (or, indeed, of the journal itself). Take a look at PubPeer and/or Retraction Watch and note how many deeply flawed papers get published in those high impact factor journals…

  44. John Mashey says:

    Retraction Watch: Peer review isn’t good at “dealing with exceptional or unconventional submissions,” says study is somewhat related, although as I noted, I’m not sure the title describes the paper so well.

  45. dikranmarsupial says:

    philipmoriarty Impact factors may not be an intrinsic measure if the quality of a paper published in that journal, but they are quite well correlated in my experience. As I am saying you need to aggregate to average out the variance, and that is as true of bibliometrics as it is of human reviewers. Have you never had reviews of a paper where the reviewers have disagreed on its quality?

    As for Retraction Watch, you need to compare that with scholarlyoa.com predatory open access journals that regularly publish deeply flawed papers but very rarely retract them. Top journals have high readership, including people willing to try an replicate the work, so any failures of peer review are less likely to go unnoticed, so there will be some bias there.

  46. philipmoriarty says:

    “Have you never had reviews of a paper where the reviewers have disagreed on its quality?”

    Yes, very often. But I don’t see how that strengthens the argument for using journal impact factor as a proxy for quality.

    Top journals have high readership, including people willing to try an replicate the work .

    This is not the case, at all, in my field of research (nanoscience/condensed matter physics/physical chemistry) and I am confident that it is not the case in many other fields. What is the incentive for repeating work? None at all. Work which aims to replicate a previous study will be seen as “incremental” at best, and derivative at worst. There are huge barriers in place if researchers want to repeat a previous study — what “high profile” journal publishes a study designed to repeat/test previous work?

    Here’s what the Editor-in-Chief of a very prestigious journal in my field had to say when we tried to submit a paper which critiqued work previously published in their pages:

    “However, [our journal] does NOT publish papers that rely only on existing published data.In other words. [our journal] does NOT publish papers that correct, correlate, reinterpret, or in any way use existing published literature data.We only publish papers with original experimental data.”

    See https://pubpeer.com/publications/58199FBEA31FB5755C750144088886#fb17386 for the full story.

  47. dikranmarsupial says:

    JM interesting paper, will have to read more than the abstract in the new year. It kind of makes my point in that human reviewers are not that reliable, and at least in the case of journal papers you have the option of submitting it to another journal and (most likely) another set of reviewers. The chances of a good quality piece of work not making it into a decent journal are not that high*. If you only have two reviewers, and no chance to address address the reviewers concerns, as in REF, the chances of a good piece of work being rated poorly is likely to be higher. Unless there is reason to believe that the REF reviewers will do a better job than the original reviewers of the journal papers.

    *of course it also increases the chance of a flawed paper making it into a decent journal.

  48. dikranmarsupial says:

    philipmoriarty wrote “Yes, very often. But I don’t see how that strengthens the argument for using journal impact factor as a proxy for quality. ”

    so why should the two reviews of the paper from REF be any more reliable an indicator of quality?

    “What is the incentive for repeating work? None at all. ”

    To make sure you understand it, or so that you can build on it. If I see an interesting algorithm or idea in a journal it is not that unusual for me to try it out, quite often the authors of a paper provide software to encourage their readers to do just that.

    “Work which aims to replicate a previous study will be seen as “incremental” at best”

    I think this is not a good attitude. A lot of progress in science is made by incremental work and it shouldn’t be looked down on if it is a useful step forward.

  49. dikranmarsupial says:

    BTW I think it is very sad to see a journal that does not accept criticisms of published work, that ought to be an important element of quality control in academic publishing.

  50. I kind of feel that this discussion is getting slightly confused. As I see it, if one accepts the existence of something like the REF, one could design something using bibliographic data that would do as good a job as 300 people each reading 600 papers in a year (given that you want to produce some kind of ranking for a large diverse organisation). However, it would at best be a proxy for quality and would probably (as I suggested in the post) be an extremely poor indicator of intrinsic value.

    If one actually wants to assess intrinsic value/quality, then I don’t really think it is truly possible to do so, given the numerous ways in which a piece of research could have value. So, any attempt to assess quality is likely to fail to actually do so, but bibliographic data is probably no worse than what we’re doing for REF. IMO, we shouldn’t really be doing it in the first place.

  51. philipmoriarty says:


    You misunderstood what I was saying, I’m afraid. I understand entirely why work should be repeated. Indeed, I’ve just spent the last two years with colleagues at Nottingham, Liverpool, and NIMS (Tsukuba, Japan) reproducing experiments/reanalysing flawed published data and then attempting to get the critique published in those supposedly high quality high IF journals (see previous links).

    The point is that the publication system actively discourages the publication of work which is critical of previous studies. This is what I meant by asking why work should be repeated.

  52. dikranmarsupial says:

    aTTP, seems reasonable to me. The key advantage of bibliographic indices is not that they are any more reliable, just that they are quicker and cheaper and we could devote more of our time to performing the research, rather than assessing it.

  53. dikranmarsupial says:

    It has been an interesting discussion, but I really must go (especially as my simulations have now finished). Merry Christmas and a productive new year!

  54. philipmoriarty says:


    I agree that the discussion is getting confused. The problem is that if we do down the bibliometric route, we will lock ourselves into our work being defined by simplistic metrics. Universities already do far too much of this — it is depressing as hell to see league tables trotted out where institutions are “differentiated” on the basis of numbers given to three or four (in)significant figures. If we do down the metrics route, this level of statistical illiteracy (or straight-forward innumeracy in some cases) will be exacerbated.

    My preferred route would be to get rid of dual support and put the funding into the research councils **but if, and only if*** the “QR-diverted” funding was used to support fundamental research (in whatever field). That is, no “Pathways to Impact”, no “National Importance”, no “Shaping Capability” etc…

  55. philip,
    Yes, I agree with your last paragraph. I really wish that we could start to recognise (again) the intrinsic value of doing quality, fundamental research in a university environment (i.e., one in which academics see themselves as teachers as well as researchers).

    Something I have wondered is whether an issue in the UK is that we don’t have long-term soft-money positions (as they do in US) or long-term, permanent, non-teaching positions (as they do in much of Europe) and so we’re trying to do it all in a university environment, and rather than getting the best of all possible worlds, we’re heading towards the worst (well, maybe not quite that bad, but we’re slightly slightly losing the beauty of research in a university environment).

  56. philipmoriarty says:


    Agree wholeheartedly. There’s a great post from Richard Jones on just why universities in the UK are expected to pick up the slack for deficiencies elsewhere in the innovation “ecosystem”: http://www.softmachines.org/wordpress/?p=1213

    And with that, I’ll have to call it a night. Have a great Christmas holiday.


  57. I think we should distinguish two factors (next year).

    1. Whether bibliographic indices (and number of articles with it) are an accurate measure of research quality (and just being better than REF may not yet be a sign that it is accurate).

    2. Maybe more importantly, what steering science on bibliographic indices does to science. Once you tell people that this is your micro-management instrument, you produce a powerful feedback.

    Fröhliche Weihnachten!!

  58. Victor,

    Once you tell people that this is your micro-management instrument, you produce a powerful feedback.

    Yes, I think this is the crucial point. There’s nothing fundamentally wrong with using bibliographic data to get a sense of someone’s scientific/academic/research credentials. However, once it becomes known that this a major factor in determining someone’s career path or the finances of an institution, you introduce a feedback where improving the bibliographic numbers starts to become more of a priority than it really should be.

  59. dikranmarsupial, you are not on twitter? In that case, this one is for you.

  60. dikranmarsupial says:

    Victor, I am indeed on Twitter, but under my real name (in fact we are now mutual followers).

    Interesting article by Prof. Leyser, however the problem is that university management do require some means of assessing performance but don’t themselves have the ability to evaluate the quality of the actual work themselves, so they must adopt some proxy measure. My position is that if some proxy measure is going to be used, I would rather it was based on the output of my research rather than the quantity of funding that has gone into it (which is the other major estimator used in practice). Having a cheap, continuous estimator is better than a cyclic expensive one because it would mean that we don’t have the pressure to meet an arbitrary deadline, which I would suggest is at least as likely to cause some of the research problems noted in the article. It isn’t as good as assessment by peer-review (i.e. by those in the same specialism as you), but that is impractical (the report already acknowledges the shortage of reviewers) and open to the same sorts of abuses.

    The real problem (IMHO) is that academic research is grossly underfunded, which means there is so much competition for funding that the optimal compromise between funding as much high quality research whilst at the same time not wasting money on lesser quality work is over-shot. Lots of high quality proposals are not funded, not because there is something wrong with them, but simply that the money had already run out by the time that point in the ranked list is reached (note there will be a fair bit of noise in the rank list for the reasons discussed above). The solution is to provide more funding and make more of it available in responsive mode, rather than directed, but I suspect that is not on the table.

    Note that the article discusses the need for assessment and an objective question is “Does this researcher have an established track record in using these techniques?”. How exactly can this be established without publications? If someone has several articles in top journals on the topic in question, isn’t that a good evidence of an “established track record in using these techniques”?

    Sometimes we have to settle for something achievable and “better” if what is “best” is not achievable in practice.

  61. David Young says:

    One gets the feeling that this assessment is really a self assessment of academics by academics. That may be fine for what its worth, perhaps that is not much. The invasion of research by big money is of relatively recent origin, perhaps since WWII when it became obvious that science was very important for national security. Money corrupts and big money corrupts big.

    I doubt that research is really underfunded in the Western world. If anything, there are just far too many PhD’s chasing the very large pool of money available and the money givers are all trying to “sell” their research program to someone through “impact.” That often leads to overselling and failure to meet stated objectives. I believe the overpowering need to sell your work is perhaps the single most corrupting influence on modern science.

    Is there any attempt here to deal with the substantive critiques of the modern research system emerging recently from the community itself? Some in the community will deny that there is a problem, or soft-pedal their own doubts. Denial is a phenomenon not discussed in this post it would seem, but perhaps very relevant.

  62. DY,

    One gets the feeling that this assessment is really a self assessment of academics by academics.

    That might be true, but it’s one that I think many would rather didn’t actually happen. I don’t think that the process was suggested by academics, even if they are the ones who do the actual work.

    If anything, there are just far too many PhD’s chasing the very large pool of money available and the money givers are all trying to “sell” their research program to someone through “impact.”

    If you’re suggesting that academic researchers are somehow desperate to sell their research through impact, then I think you’re woefully under-informed. In the UK at least, the impact agenda has been imposed upon academia, and is not something that most academics particularly value.

    Is there any attempt here to deal with the substantive critiques of the modern research system emerging recently from the community itself?

    There are indeed problems with the modern research system. If you’d actually read the post and some of the other comments, you may have noticed that this was indeed touched upon. In my view, the main problem is the ever increasing corporate culture that is being imposed upon university’s rather than anything intrinsically wrong with Academic research.

  63. Over a couple of decades much expansion has taken place. More and more PhDs are educated, costly research has developed to be a major industry, … Big science and other high cost projects have taken an increasing share of the funding. Similar expansion has taken place in many other areas than science putting pressure for government budgets. It is always easier to add new ways of using funds to worthwhile purposes than to cut down spending in areas that are not any more as essential. For this reason governments and other sources of funding for science have found out that they must make cuts. We are now in this phase.

    At the same time requirements have been presented that a larger and larger number of universities and other research organizations must be allowed to compete for the remaining funds on equal footing. This combination has unavoidably led to more and more competition, and to more and more need to use measurable criteria in support of decisions on the funding.

    Going several decades back in time, the procedures were very different. The overall funding of science was much less, but those lucky enough to be in the right position got their share with much less effort. It was usually impossible to get much more, but very easy to keep the basic funding at a level sufficient for continuing research at some level.

  64. David Young says:

    AT, I agree that most honest academics find the selling of impact and the fanatical pursuit of soft money uncongenial. It has become in the US, the road to advancement in academia and administrators do not even attempt to disguise it. One could mention the Palo Alto marketing machine otherwise known as the cardinal.

  65. dikranmarsupial, I got an anonymous tip that you were on twitter.

    I do not agree that there is a lack of funding. Both absolutely and relative to GDP science is well funded. There is naturally always more one could do, there is always less money than one would like to have, but that is not the same as being underfunded. It could be that a bigger part goes to relatively unproductive Big science projects nowadays.

    Given that the salaries in science are low relative to the qualifications and relative to what most could make in industry, I think it is not unreasonable to assume that most scientists work because they are intrinsically motivated. I am not sure whether strong competition makes them better. In fact most research seems to suggest that in creative and cognitive professions, less competition and less differences in rewards is more productive. For simple manual tasks a carrot helps a lot, but our kind of tasks it may well be counter productive.

    Some people may lose interest in science, your interests change over the decades. Thus it might be good to have a mechanism to be able to get rid of dead wood. Having such a mechanism will also prevent much dead wood from building up. But otherwise, I would expect that science thrives with only little competition for funds.


    Going several decades back in time, the procedures were very different. The overall funding of science was much less, but those lucky enough to be in the right position got their share with much less effort. It was usually impossible to get much more, but very easy to keep the basic funding at a level sufficient for continuing research at some level.

    I feel it would be good to go back to that system. It allows scientists to spend more time on science, rather than on acquiring funding. With a limited budget a scientists will have set his own priorities and this scientist is best able to judge what is important. With project funding you can always apply for something that sounds nice, but is not really that important, if you get it good, if you do not get it just a pity of the writing, but you do not suffer a loss of scientific output, like you would if you would invest your own limited funding in such a less important project.

    No need to increase funding, but have more stable funding and reserve the project funding to reward exceptional researchers with a really nice idea. This would probably hurt climatology. In the system with much project funding it could grow fast, whereas in a system where universities or professors simply have their budget you cannot change things as fast. That is a price I am willing to pay. Politicians probably do not like such a system, with more flexible funding they can now steer more money into projects they think are important to stimulate economic growth within 4 years (which anyway does nor work that fast).

  66. Victor,

    I do not agree that there is a lack of funding. Both absolutely and relative to GDP science is well funded.

    I suspect this is broadly true, although I think the UK is lower, in terms of GDP at least, than many other comparable countries. Also, in the last few decades we’ve seen a shift from having a combination of national labs, industry labs, and universities, to expecting more and more from the university sector. This is forcing the sector to focus on things like impact which, in my view, is not healthy.

  67. Victor,
    My experience is that in many cases more stable funding at a level that depends (almost) only on past performance would be a better alternative. There are also many cases, where empirical work requires project type management and funding. Thus one model is not the best for all, and some way should be devised that would allow for the coexistence of alternative funding models.

  68. David Young says:

    Victor, I agree with what you say. In the old days, Universities paid professors to do research as part of their job and professors were not particularly well paid. In the US, endowments are huge and certainly sufficient for that to continue essentially indefinitely at the older private universities. The new soft money regime is something I don’t fully understand but a part of it is simply due to a new breed of administrators who see a way to increase their own salaries and prestige. University presidents can draw very large salaries, as large as good football coaches. Ironically, the same pressures seem to be adversely affecting both groups. And the “best” professors at garnering soft money are also amply rewarded, the salaries being a shock to me when I saw some of them a decade ago.

    A real problem here is that it is virtually impossible I believe to be an entrepreneur and a top notch original researcher at the same time. Paraphrasing Lincoln, they cannot remain forever, but will become either all the one or all the other.

  69. In the European situations I know, the salary of a professor does not depend on how much soft money you acquire. In Germany there are two types of professors. If you acquire much soft money it becomes easier to get the version with the highest salary, but still hard and you often would have to move to another university to do so. Once you are there, you “only” gain in prestige.

    (A pity of your personal attack against me at Climate Etc. Had preferred to hear some arguments.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.