Towards federated learning over large-scale streaming data

2020 Spring. Includes bibliographical references. Distributed Stream Processing Engines (DSPEs) have seen significant deployment growth along with an increase in streaming data sources such as sensor networks. These DSPEs enable processing large amounts of streaming data in a cluster of commodity ma...

Full description

Bibliographic Details
Main Authors: Pereira, Aaron, author, Pallickara, Sangmi, advisor, Pallickara, Shrideep, committee member, Zahran, Sammy, committee member
Format: Text
Language:English
Published: Colorado State University. Libraries 2020
Subjects:
Online Access:https://hdl.handle.net/10217/208427
Description
Summary:2020 Spring. Includes bibliographical references. Distributed Stream Processing Engines (DSPEs) have seen significant deployment growth along with an increase in streaming data sources such as sensor networks. These DSPEs enable processing large amounts of streaming data in a cluster of commodity machines to extract knowledge and insights in real-time. Due to fluctuating data arrival rates in real-world applications, modern DSPEs often provide auto-scaling. However, the existing designs of advanced analytical frameworks are not effectively aligned with scalable streaming computing environments. We have designed and developed ORCA, a federated learning architecture that supports the training of traditional Artificial Neural Networks as well as Convolutional Neural Networks and Long Short-term Memory Network based models while ensuring resiliency during scaling. ORCA also introduces dynamic adjustment of the 'elasticity' hyper-parameter for rescaled computing environments. We estimate this elasticity hyper-parameter using reinforcement learning. Our empirical benchmarks show that ORCA is capable of achieving an MSE of 0.038 over real-world streaming datasets.