What's the Difference? Incremental Processing with Change Queries in Snowflake

Incremental algorithms are the heart and soul of stream processing. Low latency results depend on the ability to react to the subset of changes in a dataset over time rather than reprocessing the entirety of a dataset as it evolves. But while the SQL language is well suited for representing streams...

Full description

Bibliographic Details
Published in:Proceedings of the ACM on Management of Data
Main Authors: Akidau, Tyler, Barbier, Paul, Cseri, Istvan, Hueske, Fabian, Jones, Tyler, Lionheart, Sasha, Mills, Daniel, Pauliukevich, Dzmitry, Probst, Lukas, Semmler, Niklas, Sotolongo, Dan, Zhang, Boyuan
Format: Article in Journal/Newspaper
Language:English
Published: Association for Computing Machinery (ACM) 2023
Subjects:
DML
Online Access:http://dx.doi.org/10.1145/3589776
https://dl.acm.org/doi/pdf/10.1145/3589776
Description
Summary:Incremental algorithms are the heart and soul of stream processing. Low latency results depend on the ability to react to the subset of changes in a dataset over time rather than reprocessing the entirety of a dataset as it evolves. But while the SQL language is well suited for representing streams of changes (via tables) and their application to tables over time (via DML), it entirely lacks a method to query the changes to a table or view in the first place. In this paper, we present CHANGES queries and STREAM objects, Snowflake's primitives for querying and consuming incremental changes to table objects over time. CHANGES queries and STREAMs have been in use within Snowflake for three years, and see broad adoption across our customers. We describe the semantics of these primitives, discuss the implementation challenges, present an analysis of their usage at Snowflake, and contrast with other offerings.