Validation of a province-wide commercial food store dataset in a heterogeneous predominantly rural food environment

OBJECTIVE: Commercially available business (CAB) datasets for food environments have been investigated for error in large urban contexts and some rural areas, but there is a relative dearth of literature that reports error across regions of variable rurality. The objective of the current study was t...

Full description

Bibliographic Details
Published in:Public Health Nutrition
Main Authors: Taylor, Nathan GA, Stymest, Jillian, Mah, Catherine L
Format: Text
Language:English
Published: Cambridge University Press 2020
Subjects:
Online Access:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10200544/
http://www.ncbi.nlm.nih.gov/pubmed/32295655
https://doi.org/10.1017/S1368980019004506
Description
Summary:OBJECTIVE: Commercially available business (CAB) datasets for food environments have been investigated for error in large urban contexts and some rural areas, but there is a relative dearth of literature that reports error across regions of variable rurality. The objective of the current study was to assess the validity of a CAB dataset using a government dataset at the provincial scale. DESIGN: A ground-truthed dataset provided by the government of Newfoundland and Labrador (NL) was used to assess a popular commercial dataset. Concordance, sensitivity, positive-predictive value (PPV) and geocoding errors were calculated. Measures were stratified by store types and rurality to investigate any association between these variables and database accuracy. SETTING: NL, Canada. PARTICIPANTS: The current analysis used store-level (ecological) data. RESULTS: Of 1125 stores, there were 380 stores that existed in both datasets and were considered true-positive stores. The mean positional error between a ground-truthed and test point was 17·72 km. When compared with the provincial dataset of businesses, grocery stores had the greatest agreement, sensitivity = 0·64, PPV = 0·60 and concordance = 0·45. Gas stations had the least agreement, sensitivity = 0·26, PPV = 0·32 and concordance = 0·17. Only 4 % of commercial data points in rural areas matched every criterion examined. CONCLUSIONS: The commercial dataset exhibits a low level of agreement with the ground-truthed provincial data. Particularly retailers in rural areas or belonging to the gas station category suffered from misclassification and/or geocoding errors. Taken together, the commercial dataset is differentially representative of the ground-truthed reality based on store-type and rurality/urbanity.