Optimizing Distributed Machine Learning for Large Scale EEG Data Set

Distributed Machine Learning (DML) has gained its importance more than ever in this era of Big Data. There are a lot of challenges to scale machine learning techniques on distributed platforms. When it comes to scalability, improving the processor technology for high level computation of data is at...

Full description

Bibliographic Details
Published in:Sukkur IBA Journal of Computing and Mathematical Sciences
Main Authors: M Bilal Shaikh, M Abdul Rehman, Attaullah Sahito
Format: Article in Journal/Newspaper
Language:English
Published: Sukkur IBA University 2017
Subjects:
DML
Online Access:https://doi.org/10.30537/sjcms.v1i1.14
https://doaj.org/article/3e64e1ed30894d5eab4a26dbf30a4726
id ftdoajarticles:oai:doaj.org/article:3e64e1ed30894d5eab4a26dbf30a4726
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:3e64e1ed30894d5eab4a26dbf30a4726 2023-05-15T16:01:49+02:00 Optimizing Distributed Machine Learning for Large Scale EEG Data Set M Bilal Shaikh M Abdul Rehman Attaullah Sahito 2017-06-01T00:00:00Z https://doi.org/10.30537/sjcms.v1i1.14 https://doaj.org/article/3e64e1ed30894d5eab4a26dbf30a4726 EN eng Sukkur IBA University http://journal.iba-suk.edu.pk:8089/SIBAJournals/index.php/sjcms/article/view/14 https://doaj.org/toc/2520-0755 https://doaj.org/toc/2522-3003 2520-0755 2522-3003 doi:10.30537/sjcms.v1i1.14 https://doaj.org/article/3e64e1ed30894d5eab4a26dbf30a4726 Sukkur IBA Journal of Computing and Mathematical Sciences, Vol 1, Iss 1, Pp 114-121 (2017) Computer engineering. Computer hardware TK7885-7895 Mathematics QA1-939 Electronic computers. Computer science QA75.5-76.95 article 2017 ftdoajarticles https://doi.org/10.30537/sjcms.v1i1.14 2022-12-31T12:37:49Z Distributed Machine Learning (DML) has gained its importance more than ever in this era of Big Data. There are a lot of challenges to scale machine learning techniques on distributed platforms. When it comes to scalability, improving the processor technology for high level computation of data is at its limit, however increasing machine nodes and distributing data along with computation looks as a viable solution. Different frameworks and platforms are available to solve DML problems. These platforms provide automated random data distribution of datasets which miss the power of user defined intelligent data partitioning based on domain knowledge. We have conducted an empirical study which uses an EEG Data Set collected through P300 Speller component of an ERP (Event Related Potential) which is widely used in BCI problems; it helps in translating the intention of subject w h i l e performing any cognitive task. EEG data contains noise due to waves generated by other activities in the brain which contaminates true P300Speller. Use of Machine Learning techniques could help in detecting errors made by P300 Speller. We are solving this classification problem by partitioning data into different chunks and preparing distributed models using Elastic CV Classifier. To present a case of optimizing distributed machine learning, we propose an intelligent user defined data partitioning approach that could impact on the accuracy of distributed machine learners on average. Our results show better average AUC as compared to average AUC obtained after applying random data partitioning which gives no control to user over data partitioning. It improves the average accuracy of distributed learner due to the domain specific intelligent partitioning by the user. Our customized approach achieves 0.66 AUC on individual sessions and 0.75 AUC on mixed sessions, whereas random / uncontrolled data distribution records 0.63 AUC. Article in Journal/Newspaper DML Directory of Open Access Journals: DOAJ Articles Speller ENVELOPE(-60.717,-60.717,-62.500,-62.500) Sukkur IBA Journal of Computing and Mathematical Sciences 1 1 114
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic Computer engineering. Computer hardware
TK7885-7895
Mathematics
QA1-939
Electronic computers. Computer science
QA75.5-76.95
spellingShingle Computer engineering. Computer hardware
TK7885-7895
Mathematics
QA1-939
Electronic computers. Computer science
QA75.5-76.95
M Bilal Shaikh
M Abdul Rehman
Attaullah Sahito
Optimizing Distributed Machine Learning for Large Scale EEG Data Set
topic_facet Computer engineering. Computer hardware
TK7885-7895
Mathematics
QA1-939
Electronic computers. Computer science
QA75.5-76.95
description Distributed Machine Learning (DML) has gained its importance more than ever in this era of Big Data. There are a lot of challenges to scale machine learning techniques on distributed platforms. When it comes to scalability, improving the processor technology for high level computation of data is at its limit, however increasing machine nodes and distributing data along with computation looks as a viable solution. Different frameworks and platforms are available to solve DML problems. These platforms provide automated random data distribution of datasets which miss the power of user defined intelligent data partitioning based on domain knowledge. We have conducted an empirical study which uses an EEG Data Set collected through P300 Speller component of an ERP (Event Related Potential) which is widely used in BCI problems; it helps in translating the intention of subject w h i l e performing any cognitive task. EEG data contains noise due to waves generated by other activities in the brain which contaminates true P300Speller. Use of Machine Learning techniques could help in detecting errors made by P300 Speller. We are solving this classification problem by partitioning data into different chunks and preparing distributed models using Elastic CV Classifier. To present a case of optimizing distributed machine learning, we propose an intelligent user defined data partitioning approach that could impact on the accuracy of distributed machine learners on average. Our results show better average AUC as compared to average AUC obtained after applying random data partitioning which gives no control to user over data partitioning. It improves the average accuracy of distributed learner due to the domain specific intelligent partitioning by the user. Our customized approach achieves 0.66 AUC on individual sessions and 0.75 AUC on mixed sessions, whereas random / uncontrolled data distribution records 0.63 AUC.
format Article in Journal/Newspaper
author M Bilal Shaikh
M Abdul Rehman
Attaullah Sahito
author_facet M Bilal Shaikh
M Abdul Rehman
Attaullah Sahito
author_sort M Bilal Shaikh
title Optimizing Distributed Machine Learning for Large Scale EEG Data Set
title_short Optimizing Distributed Machine Learning for Large Scale EEG Data Set
title_full Optimizing Distributed Machine Learning for Large Scale EEG Data Set
title_fullStr Optimizing Distributed Machine Learning for Large Scale EEG Data Set
title_full_unstemmed Optimizing Distributed Machine Learning for Large Scale EEG Data Set
title_sort optimizing distributed machine learning for large scale eeg data set
publisher Sukkur IBA University
publishDate 2017
url https://doi.org/10.30537/sjcms.v1i1.14
https://doaj.org/article/3e64e1ed30894d5eab4a26dbf30a4726
long_lat ENVELOPE(-60.717,-60.717,-62.500,-62.500)
geographic Speller
geographic_facet Speller
genre DML
genre_facet DML
op_source Sukkur IBA Journal of Computing and Mathematical Sciences, Vol 1, Iss 1, Pp 114-121 (2017)
op_relation http://journal.iba-suk.edu.pk:8089/SIBAJournals/index.php/sjcms/article/view/14
https://doaj.org/toc/2520-0755
https://doaj.org/toc/2522-3003
2520-0755
2522-3003
doi:10.30537/sjcms.v1i1.14
https://doaj.org/article/3e64e1ed30894d5eab4a26dbf30a4726
op_doi https://doi.org/10.30537/sjcms.v1i1.14
container_title Sukkur IBA Journal of Computing and Mathematical Sciences
container_volume 1
container_issue 1
container_start_page 114
_version_ 1766397534683529216