With statistics and numerical data being constantly televised during the current coronavirus outbreak, it may be difficult to determine which data is credible and what the data even means. If there is anything that the Data Science and Bio Workshop held at UTM last Thursday emphasized, it is that context matters.
The workshop was organized by the Women in Science and Computing Club (WiSC) and hosted by an epidemiologist who would prefer to remain anonymous, and lecturer Samantha-Jo Caetano from the UTM statistics faculty. The workshop aimed to introduce students to the role of statistics in tracking common outbreak patterns.
The speakers began the workshop with a scenario which entailed a public health unit noticing an increase in diarrhea, fever, and other common symptoms. Attendees were provided a lengthy table which displayed a list of subjects, the subjects’ genders, and the foods they claimed to have eaten prior to experiencing the symptoms. The task was to determine which foods were causing the increase in sickness.
To solve the scenario, students were advised to calculate the relative risk—a term which refers to an individual’s risk of developing an illness relative to everyone else’s risk. The relative risk is calculated for each different circumstance and can only be used if the number of people who got ill and what they ate beforehand is known. In this case, the different circumstances were the types of food subjects recalled eating.
The relative risk is expressed as a ratio of the probability of an illness in an exposed group to the probability of illness in an unexposed group. It is the standard calculation health units use to assess an influx of illness. It is important to note; however, that it is not finite. For example, just because the relative risk of those who ate potatoes is zero, that does not mean that an individual who ate potatoes has a zero percent chance of getting sick. The individual could still get sick and if they did, they would be considered an outlier. The context is very important when analyzing data regarding an illness. News stations often like to report that there is, for example, a fifty percent increase in an illness or its risk of contraction. The question is, what is that percentage relative to?
When health units are not aware of how many people got ill or of what they ate before contracting the illness, health professionals use the odds ratio. This normally happens when the disease occurs on a larger scale such as nationally or internationally. The odds ratio quantifies the odds of getting sick given a particular circumstance, which, in the scenario mentioned above, would be the type of food. While the odds ratio is not as accurate as a relative ratio, it is the only method that can be used in certain situations such as when the COVID-19 initially broke out. In terms of COVID-19, it was difficult to determine whether the virus had a continuous source, where the exposure occurred over multiple and separate time periods, or a point source, where the exposure occurred over a single time period and at one particular source.
Other issues regarding tracking illnesses include biases. A common type of bias is the recall bias which describes when subjects are more likely to say that they were exposed to a stimulus when they experience an associated outcome. People often require visual aids when recalling what they were, and were not, exposed to. For instance, the speaker provided the example of drug recalls. If subjects are solely provided the name of the medication when being questioned about exposure, the subjects are more likely to say that they did take it. Therefore, health units have to physically show subjects the medication container and the tablets in order to receive an accurate yes-or-no response.
In terms of COVID-19, it may seem as if the number of cases multiplied overnight. However, the workshop explained how part of this jump was due to improved detection methods. When hospitals are able to test the presence of the virus more efficiently, the number of confirmed cases rapidly increase by default. The speaker also noted that the number of COVID-19 cases are beginning to plateau in China, but gradually increase in other countries.
By the end of the workshop, it became clear that there are various ways the media can portray the spread of a disease inaccurately. Students were introduced to the various methods used by public health professionals, with some methods being more accurate than others. While viruses remain a threat, it is important to understand the context behind the numbers being widely reported.