Yesterday, a group in Oxford released a paper that implied that a signifcant fraction of those in the UK may already have been infected. This was quickly picked up by numerous media outlets who highlighted that coronavirus could already have infected half the British population. James Annan has already discussed it in a couple of post, but I thought I would comment briefly myself.

To be clear, I certainly have no expertise in epidemiology, but I do have expertise in computational modelling. So, I coded up their model, which is described in Equations 1-4 in their paper. They were also doing a parameter estimation, while I’m simply going to run the model with their parameters.

The key parameter is , which is the proportion of the population that is at risk of severe disease, a fraction of whom will die (14%). They explicitly assume that *only a very small proportion of the population is at risk of hospitalisable illness.* Consequently, they focus on scenarios where the proportion requiring hospitalisation is 1% () and 0.1% (). The Figure on the right, which considers , , and , is from my model and seems to largely match what’s been presented in the paper.

The curves that start at 1 and then drop are the proportion of the population that is still susceptible (left-hand y-axis) while the diagonal straight lines are the logs of the cumulative deaths (right-hand y-axis). I’ve also shifted the models so that the latter overlap. This Figure illustrates why this study was picked up by the media. Cumulative deaths to date is just over 400. If the proportion of the population at risk of hospitalisation is small () then just over 30% of the total population would still be susceptible. In other words, more than half of the UK population would already have been infected. On the other hand, if the proportion at risk of hospitalisation is large () then the proportion susceptible is still large () and the fraction that has already been infected is small.

One way to estimate is from the date at which the first case is reported. If is small then the lag between the first case and the first death is larger than if is large. The paper implies that the current data is more consistent with a small than a large . The problem, as this critique highlights, is that this implies that this first case is the progenitor of most of the subsequent cases. Given the small numbers involved, this may well not be the case, since a localised outbreak may not have taken hold. Hence, there doesn’t really seem to be strong evidence in support of being small and, consequently, there is little evidence to suggest that a significant fraction of the UK population has already been infected.

Okay, despite the lengthy pre-amble, this is really what I wanted to focus on in this post. I think it’s perfectly fine to play around with models and to try and estimate various parameters. However, especially when the results have societal significance, it’s very important to be clear about what’s been done when presenting the work publicly. This research has not demonstrated that more than half the UK population has already been infected, it’s simply illustrated that it’s possible. Clearly if most of the UK population has already been infected, then this virtual lockdown could probably be relaxed. However, if is not small, then the lockdown would seem justified. As James points out in this post, even though the paper implies that the current data is consistent with being small, there do seem to be regions where this seems not to be the case.

So, I think it’s highly irresponsible to present a result like this without being extremely careful to minimise the chances of it being misconstrued. It’s clearly not possible to completely avoid research being misrepresented, but researchers do – in my view – have a responsibility to ensure that this not an easy thing to do. It would be great if the impact of this virus is far less severe than we currently think. However, until we have more evidence to support such a conclusion, we really should be very careful of presenting results that imply that this is the case.

**Addendum:**

This post ended up being much longer than I intended. I was mostly wanting to highlight how I think the presentation of this result was highly irresponsible. The first bit was just meant to illustrate what they’d done in their model. Since I’m not an expert in this field, and have no interest in spreading misinformation about an important topic, if any experts think I’ve made some kind of mistake, feel free to point it out.

I also wanted to post another figure, which is essentially the same as James highlighted in this post. The curves that rise and fall are the number of people who are infectious (left-hand y-axis) while the curves that rise and then level off are the cumulative deaths (right-hand y-axis).

This again illustrates (given that cumulative deaths to date is just over 400) that if the proportion requiring hospitalisation is small () then the number of people who have already been infected is already quite high, while if the proportion needing hospitalisation is large () then the number of people who have already been infected is much smaller. It also illustrates that the overall cumulative deaths depends quite strongly on this parameter; if we relax current conditions based on this work and it turns out that isn’t small, the impact could be substantial.

In the interests of transparency, if you would like to codes that produced the two figures, you can download them from here.