Space-efficient implementation of a compressed trie

학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2016. 2. Srinivasa Rao Satti. Trie is the typical data structure for keyword searching algorithms. The algorithms are classified into two ways. One is the array representation, the other one is the succinct representation. In array representation, one can access the c...

Full description

Bibliographic Details
Main Author: Jeongsoo Shin
Other Authors: Srinivasa Rao Satti, 신정수, 공과대학 컴퓨터공학부
Format: Thesis
Language:English
Published: 서울대학교 대학원 2016
Subjects:
621
Online Access:http://hdl.handle.net/10371/122659
Description
Summary:학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2016. 2. Srinivasa Rao Satti. Trie is the typical data structure for keyword searching algorithms. The algorithms are classified into two ways. One is the array representation, the other one is the succinct representation. In array representation, one can access the child node using the one dimensional array which stores the pointers from the node to its child. This shows fast lookup time, but takes a lot of space. In succinct representation, one separates a tree structure and index array, and represents a tree structure by succinct representation. This shows slower query time than array representation, but takes less space. In this thesis, we first use the degree sequence and its variants to implement the space-efficient trie on the small fixed alphabet. The experimental result shows that for small alphabet ( Σ ≤10), our implementation gives fast query time and less space usage compared to the exist implementations. Chapter 1 Introduction 1 1.1 Previous Work 3 1.2 Contribution of this thesis 4 1.3 Organization of this thesis 5 Chapter 2 Preliminaries 6 2.1 Array representation of trie 6 2.1.1 Double Array 7 2.1.2 Compacted Double-Array (CDA) 7 2.1.3 DALF 8 2.2 Succinct representations of trie 8 2.2.1 Level-order unary degree sequence (LOUDS) 10 2.2.2 Balanced parentheses (BP) 11 2.2.3 Depth-first unary degree sequence (DFUDS) 11 Chapter 3 Degree sequence and its variants 13 3.1 Degree sequence method 13 3.2 Splitting method 15 3.3 Prefix code 17 3.4 Huffman encoding 18 Chapter 4 Evaluation of Implementations 19 4.1 Experimental environment 19 4.2 Data sets 19 4.3 Experimental Result 22 4.3.1 Space usage 22 4.3.2 Lookup time 25 Chapter 5 Conclusion 29 Bibliography 30 Abstract in Korean 32 Master