Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages

Today, organizations are collecting and storing huge amounts of data that could potentially be very valuable. Finding trends and patterns in historic data can allow businesses to make more informed decision. Data scientists are therefore working to extract meaning from the massive amount of data. Ho...

Full description

Bibliographic Details
Main Author: Storvoll, Tine-Lovise
Other Authors: Soylu, Ahmet, Martin-Recuerda, Francisco
Format: Master Thesis
Language:English
Published: OsloMet - storbyuniversitetet 2022
Subjects:
DML
Online Access:https://hdl.handle.net/11250/3017395
id fthsosloakersoda:oai:oda.oslomet.no:11250/3017395
record_format openpolar
spelling fthsosloakersoda:oai:oda.oslomet.no:11250/3017395 2023-05-15T16:02:09+02:00 Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages Storvoll, Tine-Lovise Soylu, Ahmet Martin-Recuerda, Francisco 2022 application/pdf https://hdl.handle.net/11250/3017395 eng eng OsloMet - storbyuniversitetet ACIT;2022 https://hdl.handle.net/11250/3017395 Data transformation Manipulation Tools Preparation Master thesis 2022 fthsosloakersoda 2022-09-14T22:36:09Z Today, organizations are collecting and storing huge amounts of data that could potentially be very valuable. Finding trends and patterns in historic data can allow businesses to make more informed decision. Data scientists are therefore working to extract meaning from the massive amount of data. However, 80% of the time in data science projects is spent preparing the data for analysis. Selecting an efficient tool for the job can contribute to reducing the time spent on data transformation. Thus, this thesis will provide some insights into existing tools and their performance. A selection of common tools is made in Chapter 3. The tools are reviewed with regards to a framework to identify the support of common data preparation tasks and an evaluation of the tools are given at the end of the chapter. In Chapter 4, one declarative and one procedural Data Manipulation Language (DML) are selected from the common data transformation tools. Python pandas, a procedural language, and SQL, a declarative language, are evaluated and compared in a case study. The case study delves deeper into the tools through a use case and the comparative analysis at the end will provide some insights into the differences in the two DMLs. Thus, the first contribution of this thesis is a review of the support of common data preparation tasks provided by a selection of some prevalent data transformation tools. The second contribution is an analysis of the differences in a declarative vs procedural approach to data manipulation through a case study comparing two popular DMLs. The findings of the review of tools in Chapter 3, revealed that the most prevalent data transformation tools support the majority of the common data preparation tasks. This review gives some general insight into which tasks are supported, which tasks needs more effort to perform, and which are not supported at all. The review is exclusively based on information found in technical documentation of the tools, and no further experimentation is done to investigate the ... Master Thesis DML OsloMet (Oslo Metropolitan University): ODA (Open Digital Archive)
institution Open Polar
collection OsloMet (Oslo Metropolitan University): ODA (Open Digital Archive)
op_collection_id fthsosloakersoda
language English
topic Data transformation
Manipulation
Tools
Preparation
spellingShingle Data transformation
Manipulation
Tools
Preparation
Storvoll, Tine-Lovise
Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
topic_facet Data transformation
Manipulation
Tools
Preparation
description Today, organizations are collecting and storing huge amounts of data that could potentially be very valuable. Finding trends and patterns in historic data can allow businesses to make more informed decision. Data scientists are therefore working to extract meaning from the massive amount of data. However, 80% of the time in data science projects is spent preparing the data for analysis. Selecting an efficient tool for the job can contribute to reducing the time spent on data transformation. Thus, this thesis will provide some insights into existing tools and their performance. A selection of common tools is made in Chapter 3. The tools are reviewed with regards to a framework to identify the support of common data preparation tasks and an evaluation of the tools are given at the end of the chapter. In Chapter 4, one declarative and one procedural Data Manipulation Language (DML) are selected from the common data transformation tools. Python pandas, a procedural language, and SQL, a declarative language, are evaluated and compared in a case study. The case study delves deeper into the tools through a use case and the comparative analysis at the end will provide some insights into the differences in the two DMLs. Thus, the first contribution of this thesis is a review of the support of common data preparation tasks provided by a selection of some prevalent data transformation tools. The second contribution is an analysis of the differences in a declarative vs procedural approach to data manipulation through a case study comparing two popular DMLs. The findings of the review of tools in Chapter 3, revealed that the most prevalent data transformation tools support the majority of the common data preparation tasks. This review gives some general insight into which tasks are supported, which tasks needs more effort to perform, and which are not supported at all. The review is exclusively based on information found in technical documentation of the tools, and no further experimentation is done to investigate the ...
author2 Soylu, Ahmet
Martin-Recuerda, Francisco
format Master Thesis
author Storvoll, Tine-Lovise
author_facet Storvoll, Tine-Lovise
author_sort Storvoll, Tine-Lovise
title Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
title_short Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
title_full Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
title_fullStr Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
title_full_unstemmed Comparative study of data transformation tools: An investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
title_sort comparative study of data transformation tools: an investigation of functionalities supported in common tools and case study of declarative and procedural data manipulation languages
publisher OsloMet - storbyuniversitetet
publishDate 2022
url https://hdl.handle.net/11250/3017395
genre DML
genre_facet DML
op_relation ACIT;2022
https://hdl.handle.net/11250/3017395
_version_ 1766397750049505280