Exploring the variation in associations between socioeconomic indicators and non-communicable diseases in the Tromsø Study: an algorithmic approach

Aims: We contribute to the methodological literature on the assessment of health inequalities by applying an algorithmic approach to evaluate the capabilities of socioeconomic variables in predicting the prevalence of non-communicable diseases in a Norwegian health survey. Methods: We use data from...

Full description

Bibliographic Details
Published in:Scandinavian Journal of Public Health
Main Authors: Svalestuen, Sigbjørn, Sari, Emre, Langholz, Petja Lyn, Vo, Chi Quynh
Format: Article in Journal/Newspaper
Language:English
Published: Sage 2024
Subjects:
Online Access:https://hdl.handle.net/10037/33893
https://doi.org/10.1177/14034948241249519
Description
Summary:Aims: We contribute to the methodological literature on the assessment of health inequalities by applying an algorithmic approach to evaluate the capabilities of socioeconomic variables in predicting the prevalence of non-communicable diseases in a Norwegian health survey. Methods: We use data from the seventh survey of the population based Tromsø Study (2015–2016), including 11,074 women and 10,009 men aged 40 years and above. We apply the random forest algorithm to predict four non-communicable disease outcomes (heart attack, cancer, diabetes and stroke) based on information on a number of social root causes and health behaviours. We evaluate our results using the classification error, the mean decrease in accuracy, partial dependence statistics. Results: Results suggest that education, household income and occupation to a variable extent contribute to predicting non-communicable disease outcomes. Prediction misclassification ranges between 25.1% and 35.4% depending on the non-communicable diseases under study. Partial dependences reveal mostly expected health gradients, with some examples of complex functional relationships. Out-of-sample model validation shows that predictions translate to new data input. Conclusions: Algorithmic modelling can provide additional empirical detail and metrics for evaluating heterogeneous inequalities in morbidity. The extent to which education, income and occupation contribute to predicting binary non-communicable disease outcomes depends on both non-communicable diseases and socioeconomic indicator. Partial dependences reveal that social gradients in non-communicable disease outcomes vary in shape between combinations of non-communicable disease outcome and socioeconomic status indicator. Misclassification rates highlight the extent of variation within socioeconomic groups, suggesting that future studies may improve predictive accuracy by exploring further subpopulation heterogeneity.