COVID-19 and the epistemology of epidemiological models at the dawn of AI

The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direc...

Full description

Bibliographic Details
Published in:Annals of Human Biology
Main Author: Ellison, George
Format: Article in Journal/Newspaper
Language:English
Published: Taylor and Francis 2020
Subjects:
Online Access:http://clok.uclan.ac.uk/35068/
http://clok.uclan.ac.uk/35068/1/200805%20COVID-19%20Commentary%20-%20Manuscript%20-%20PrePrints.org.pdf
http://clok.uclan.ac.uk/35068/2/200804%20COVID-19%20Commentary%20-%20Supplementary%20Material%20-%20PrePrints.org.pdf
https://doi.org/10.1080/03014460.2020.1839132
id ftunivclancas:oai:clok.uclan.ac.uk:35068
record_format openpolar
institution Open Polar
collection University of Central Lancashire: CLOK - Central Lancashire Online Knowledge
op_collection_id ftunivclancas
language English
topic Public health engineering
Machine learning
spellingShingle Public health engineering
Machine learning
Ellison, George
COVID-19 and the epistemology of epidemiological models at the dawn of AI
topic_facet Public health engineering
Machine learning
description The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined. Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks. 1. These biases, and the terminology involved, may be challenging to readers who are unfamiliar with the use of causal path diagrams (such as Directed Acyclic Graphs; DAGs) which have been instrumental in identifying the different roles that variables can play in causal processes (whether as ‘exposures’, ‘outcomes’, ‘confounders’, ‘mediators’, ‘colliders’, ‘competing exposures’ or ‘consequences of the outcome’) and revealing hitherto under-acknowledged sources of bias in analyses designed to support causal inference. For what we hoped might offer accessible introductions to DAGs (and how [not] to use these) please see: Ellison (2020); and Tennant et al. (2019). For more technical detail on ‘collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ (and its related concern, the ‘Table 2 fallacy’), please refer to: Cook and Ranstam 2017; Munafò et al. (2018); Tennant et al. (2017); VanderWeele and Arah (2011); and Westreich and Greenland (2013).
format Article in Journal/Newspaper
author Ellison, George
author_facet Ellison, George
author_sort Ellison, George
title COVID-19 and the epistemology of epidemiological models at the dawn of AI
title_short COVID-19 and the epistemology of epidemiological models at the dawn of AI
title_full COVID-19 and the epistemology of epidemiological models at the dawn of AI
title_fullStr COVID-19 and the epistemology of epidemiological models at the dawn of AI
title_full_unstemmed COVID-19 and the epistemology of epidemiological models at the dawn of AI
title_sort covid-19 and the epistemology of epidemiological models at the dawn of ai
publisher Taylor and Francis
publishDate 2020
url http://clok.uclan.ac.uk/35068/
http://clok.uclan.ac.uk/35068/1/200805%20COVID-19%20Commentary%20-%20Manuscript%20-%20PrePrints.org.pdf
http://clok.uclan.ac.uk/35068/2/200804%20COVID-19%20Commentary%20-%20Supplementary%20Material%20-%20PrePrints.org.pdf
https://doi.org/10.1080/03014460.2020.1839132
long_lat ENVELOPE(-62.683,-62.683,-64.700,-64.700)
geographic Greenland
Tennant
geographic_facet Greenland
Tennant
genre Greenland
genre_facet Greenland
op_relation http://clok.uclan.ac.uk/35068/1/200805%20COVID-19%20Commentary%20-%20Manuscript%20-%20PrePrints.org.pdf
http://clok.uclan.ac.uk/35068/2/200804%20COVID-19%20Commentary%20-%20Supplementary%20Material%20-%20PrePrints.org.pdf
Ellison, George orcid:0000-0001-8914-6812 (2020) COVID-19 and the epistemology of epidemiological models at the dawn of AI. Annals of Human Biology, 47 (6). pp. 506-513. ISSN 0301-4460
doi:10.1080/03014460.2020.1839132
op_rights cc_by_nc_nd_4
op_rightsnorm CC-BY-NC-ND
op_doi https://doi.org/10.1080/03014460.2020.1839132
container_title Annals of Human Biology
container_volume 47
container_issue 6
container_start_page 506
op_container_end_page 513
_version_ 1766020521755934720
spelling ftunivclancas:oai:clok.uclan.ac.uk:35068 2023-05-15T16:30:47+02:00 COVID-19 and the epistemology of epidemiological models at the dawn of AI Ellison, George 2020-11-23 application/pdf http://clok.uclan.ac.uk/35068/ http://clok.uclan.ac.uk/35068/1/200805%20COVID-19%20Commentary%20-%20Manuscript%20-%20PrePrints.org.pdf http://clok.uclan.ac.uk/35068/2/200804%20COVID-19%20Commentary%20-%20Supplementary%20Material%20-%20PrePrints.org.pdf https://doi.org/10.1080/03014460.2020.1839132 en eng Taylor and Francis http://clok.uclan.ac.uk/35068/1/200805%20COVID-19%20Commentary%20-%20Manuscript%20-%20PrePrints.org.pdf http://clok.uclan.ac.uk/35068/2/200804%20COVID-19%20Commentary%20-%20Supplementary%20Material%20-%20PrePrints.org.pdf Ellison, George orcid:0000-0001-8914-6812 (2020) COVID-19 and the epistemology of epidemiological models at the dawn of AI. Annals of Human Biology, 47 (6). pp. 506-513. ISSN 0301-4460 doi:10.1080/03014460.2020.1839132 cc_by_nc_nd_4 CC-BY-NC-ND Public health engineering Machine learning Article NonPeerReviewed 2020 ftunivclancas https://doi.org/10.1080/03014460.2020.1839132 2021-10-14T22:20:43Z The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined. Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks. 1. These biases, and the terminology involved, may be challenging to readers who are unfamiliar with the use of causal path diagrams (such as Directed Acyclic Graphs; DAGs) which have been instrumental in identifying the different roles that variables can play in causal processes (whether as ‘exposures’, ‘outcomes’, ‘confounders’, ‘mediators’, ‘colliders’, ‘competing exposures’ or ‘consequences of the outcome’) and revealing hitherto under-acknowledged sources of bias in analyses designed to support causal inference. For what we hoped might offer accessible introductions to DAGs (and how [not] to use these) please see: Ellison (2020); and Tennant et al. (2019). For more technical detail on ‘collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ (and its related concern, the ‘Table 2 fallacy’), please refer to: Cook and Ranstam 2017; Munafò et al. (2018); Tennant et al. (2017); VanderWeele and Arah (2011); and Westreich and Greenland (2013). Article in Journal/Newspaper Greenland University of Central Lancashire: CLOK - Central Lancashire Online Knowledge Greenland Tennant ENVELOPE(-62.683,-62.683,-64.700,-64.700) Annals of Human Biology 47 6 506 513