My previous post was a little bit away from the norm in that it really did draw on my knowledge and experience more than it tried to expand it. I am going to do that again – because “models” are seemingly all the rage and, to be honest, I am seeing so much that is familiar from my days using (and justifying) models at work that is being reflected in the current situation.
Who would have thought that someone rejoicing in the title “Senior Data Visualisation Journalist” would become a much needed guest on the news shows – but I have just seen that happen and, to his credit, he tried really hard to portray everything ‘correctly’ but was facing severe problems from the host who – not surprisingly – had seemingly little knowledge of what such models involved.
I should say that what he was presenting was more ways of presenting the results rather than the model itself – but even so it quickly became clear that all the hurdles that I have faced in the past were bouncing around there trying to make life difficult.
In addition I have come across a few different bits of journalism over the last few days that have tried to explain at least some of the difficulties of looking at numbers and making a snap decision about them. There is no way – and no point – in covering absolutely all the same ground here – but it is worth concentrating on just some of the uncertainties so that, perhaps, one or two people may think twice before either taking the “numbers” as a reason for rejoicing or as a reason for depression.
The biggie is perhaps the morbidity rate. This is put forward all the time – it is the most telling “number” – how many people have died. For some, this is a pretty stark number – but I think that almost everyone gets the fact that it is related to how big the population size is and how many people have been infected. Even knowing that, there are still some very strange ways in which the numbers are presented.
Take yesterday when it was announced “Spain has overtaken China in number of deaths” – yes – true – there were now more deaths in Spain than there have been in China – but wait a minute – those two countries are rather different in size are they not!! As it stands if you measure the number of deaths per million people Spain is over 40 times worse. Rather than the 3288th death being the cause of that headline – perhaps it should have been the 95th!! Yes – it was that long ago that – proportionately – Spain overtook China in the numbers of deaths.
Of course, as soon as this is pointed out it becomes obvious. Not all the strange things are quite so clear to see – even in hindsight.
Complex Adaptive Systems (CAS) are all around us – even if we have never heard the term before and for reasons that I have long forgotten (not really!) I have had an interest in this topic for many years. They are – undoubtedly – difficult to understand. They are well nigh impossible to model accurately. They rarely offer much hope of accurate prediction of future events. Unfortunately they are often misconstrued as something much simpler.
World health is undoubtedly a CAS – a simple definition of a CAS is a system where understanding all the bits does not in itself yield an understanding of the whole.
Modelling of any CAS is made difficult in a number of ways – but for the present I want to focus on just one aspect. Every model is based on assumptions – it is inevitable. The first assumption is what the system boundary is – and this is ALWAYS an arbitrary choice. Worse (as far as we are concerned here) it isn’t always an explicitly made choice. What is included in the model? What is excluded? Why? All of these need to be addressed.
The assumptions can make HUGE differences and this can be readily shown through just one number from the covid-19 database. Italy has – compared to everywhere else – a huge number of deaths per million and a morbidity of around 10% – i.e. 1 in 10 of those identified as suffering from covid-19 die as a result. Taken in isolation – this is a worrying number. However, there is an assumption built in to the Italian reporting which means that this number is (compared to other countries) an inflated one. It is neither right nor wrong, but only around 10% of those reported as dying of covid-19 actually had that as the cause of death – the others died from other caused where covid-19 was not a major contributory factor.
This changes things a bit – Italy is still a “high death rate” country, but stops being a complete outlier. Of course there are lots of other things that affect the accuracy of the rate – not least the fact that in reality no one knows how many people have the virus – only full population testing could determine that and that is neither worthwhile nor sensible (nor achievable!!) So, the best we can do is say that the morbidity rate is almost certainly lower than the numbers suggest.
There is a lot more that could be said – but the point is not to criticise the numbers or the reporting but to highlight the fact that whatever you are hearing MUST be treated as an approximation at best – perhaps some of it will be a complete fabrication!!