Evaluating a modern in-memory columnar data management system with a contemporary OLTP workload

Due to the considerable differences between transactional and analytical workloads, a “one size does not fit all” paradigm is typically applied to isolate transactional and analytical data into separate database management systems. Even though the separation has its advantages, it compromises real-t...

Full description

Bibliographic Details
Main Author: Javadi Isfahani, Bahman
Other Authors: Lappeenrannan teknillinen yliopisto, School of Business and Management, Tietotekniikka / Lappeenranta University of Technology, School of Business and Management, Computer Science
Format: Master Thesis
Language:English
Published: 2017
Subjects:
DML
Online Access:http://lutpub.lut.fi/handle/10024/144190
Description
Summary:Due to the considerable differences between transactional and analytical workloads, a “one size does not fit all” paradigm is typically applied to isolate transactional and analytical data into separate database management systems. Even though the separation has its advantages, it compromises real-time analytics. To blur boundaries between analytical and transactional data management systems, hybrid transactional/analytical processing (HTAP) systems are turned into reality. HTAP systems mostly rely on in-memory computation to present profound performance. Also, columnar data layout has become popular specifically for analytical use-cases. In this thesis, a quantitative empirical research is conducted with the goal of evaluating the performance of an HTAP system with a transactional workload. HANA (High-Performance Analytic Appliance), an in-memory HTAP system, is used as the underlying data management system for the research; HANA comes with two data stores: a columnar and a row data store. Firstly, the performance of HANA’s columnar store is compared with the row store. To generate the required workload, an industry-grade transactional benchmark (TPC-E) is implemented. Secondly, a profiling tool is employed to analyze primary cost drivers of the HTAP system while running the benchmark. Finally, it is investigated how optimal an HTAP-oriented stored procedure language (SQLScript) is for the transactional workload. To investigate this matter, several transactions are designed on top of TPC-E schema; the transactions then are implemented with and without using SQLScript iterative constructs. The transactions are studied regarding the response time and growth rate. The experiment shows that the row data store achieves 26% higher throughput compared to its counterpart for the transactional workload. Furthermore, the profiling results demonstrate that the transactional workload mainly breaks down into eight components of HANA including query compilation and validation, data store access and predicate evaluation, index access and join processing, memory management, sorting operation, data manipulation language (DML) operations, network transfer and communication, and SQLScript execution. Lastly, the experiment reveals that the native SQL set-based operations outperform the iterative paradigm offered by SQLScript.