A real‐world dataset and data simulation algorithm for automated fish species identification

Abstract Developing high‐performing machine learning algorithms requires large amounts of annotated data. Manual annotation of data is labour‐intensive, and the cost and effort needed are an important obstacle to the development and deployment of automated analysis. In a previous work, we have shown...

Full description

Bibliographic Details
Published in:Geoscience Data Journal
Main Authors: Allken, Vaneeda, Rosen, Shale, Handegard, Nils Olav, Malde, Ketil
Other Authors: Norges Forskningsråd
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2021
Subjects:
Online Access:http://dx.doi.org/10.1002/gdj3.114
https://onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.114
https://onlinelibrary.wiley.com/doi/full-xml/10.1002/gdj3.114
https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.114
id crwiley:10.1002/gdj3.114
record_format openpolar
spelling crwiley:10.1002/gdj3.114 2024-09-15T18:25:23+00:00 A real‐world dataset and data simulation algorithm for automated fish species identification Allken, Vaneeda Rosen, Shale Handegard, Nils Olav Malde, Ketil Norges Forskningsråd 2021 http://dx.doi.org/10.1002/gdj3.114 https://onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.114 https://onlinelibrary.wiley.com/doi/full-xml/10.1002/gdj3.114 https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.114 en eng Wiley http://creativecommons.org/licenses/by/4.0/ Geoscience Data Journal volume 8, issue 2, page 199-209 ISSN 2049-6060 2049-6060 journal-article 2021 crwiley https://doi.org/10.1002/gdj3.114 2024-07-30T04:17:26Z Abstract Developing high‐performing machine learning algorithms requires large amounts of annotated data. Manual annotation of data is labour‐intensive, and the cost and effort needed are an important obstacle to the development and deployment of automated analysis. In a previous work, we have shown that deep learning classifiers can successfully be trained on synthetic images and annotations. Here, we provide a curated set of fish image data and backgrounds, the necessary software tools to generate synthetic images and annotations, and annotated real datasets to test classifier performance. The dataset is constructed from images collected using the Deep Vision system during two surveys from 2017 and 2018 that targeted economically important pelagic species in the Northeast Atlantic Ocean. We annotated a total of 1,879 images, randomly selected across trawl stations from both surveys, comprising 482 images of blue whiting, 456 images of Atlantic herring, 341 images of Atlantic mackerel, 335 images of mesopelagic fishes and 265 images containing a mixture of the four categories. Article in Journal/Newspaper Northeast Atlantic Wiley Online Library Geoscience Data Journal 8 2 199 209
institution Open Polar
collection Wiley Online Library
op_collection_id crwiley
language English
description Abstract Developing high‐performing machine learning algorithms requires large amounts of annotated data. Manual annotation of data is labour‐intensive, and the cost and effort needed are an important obstacle to the development and deployment of automated analysis. In a previous work, we have shown that deep learning classifiers can successfully be trained on synthetic images and annotations. Here, we provide a curated set of fish image data and backgrounds, the necessary software tools to generate synthetic images and annotations, and annotated real datasets to test classifier performance. The dataset is constructed from images collected using the Deep Vision system during two surveys from 2017 and 2018 that targeted economically important pelagic species in the Northeast Atlantic Ocean. We annotated a total of 1,879 images, randomly selected across trawl stations from both surveys, comprising 482 images of blue whiting, 456 images of Atlantic herring, 341 images of Atlantic mackerel, 335 images of mesopelagic fishes and 265 images containing a mixture of the four categories.
author2 Norges Forskningsråd
format Article in Journal/Newspaper
author Allken, Vaneeda
Rosen, Shale
Handegard, Nils Olav
Malde, Ketil
spellingShingle Allken, Vaneeda
Rosen, Shale
Handegard, Nils Olav
Malde, Ketil
A real‐world dataset and data simulation algorithm for automated fish species identification
author_facet Allken, Vaneeda
Rosen, Shale
Handegard, Nils Olav
Malde, Ketil
author_sort Allken, Vaneeda
title A real‐world dataset and data simulation algorithm for automated fish species identification
title_short A real‐world dataset and data simulation algorithm for automated fish species identification
title_full A real‐world dataset and data simulation algorithm for automated fish species identification
title_fullStr A real‐world dataset and data simulation algorithm for automated fish species identification
title_full_unstemmed A real‐world dataset and data simulation algorithm for automated fish species identification
title_sort real‐world dataset and data simulation algorithm for automated fish species identification
publisher Wiley
publishDate 2021
url http://dx.doi.org/10.1002/gdj3.114
https://onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.114
https://onlinelibrary.wiley.com/doi/full-xml/10.1002/gdj3.114
https://rmets.onlinelibrary.wiley.com/doi/pdf/10.1002/gdj3.114
genre Northeast Atlantic
genre_facet Northeast Atlantic
op_source Geoscience Data Journal
volume 8, issue 2, page 199-209
ISSN 2049-6060 2049-6060
op_rights http://creativecommons.org/licenses/by/4.0/
op_doi https://doi.org/10.1002/gdj3.114
container_title Geoscience Data Journal
container_volume 8
container_issue 2
container_start_page 199
op_container_end_page 209
_version_ 1810465891115073536