Scalability of Distributed Version Control Systems

Source at https://ojs.bibsys.no/index.php/NIK/article/view/434 . Distributed version control systems are popular for storing source code, but they are notoriously ill suited for storing large binary files. We report on the results from a set of experiments designed to characterize the behavior of so...

Full description

Bibliographic Details
Main Authors: Murphy, Mike, Bjørndalen, John Markus, Anshus, Otto
Format: Article in Journal/Newspaper
Language:English
Published: Norsk informatikkonferanse 2017
Subjects:
Online Access:https://hdl.handle.net/10037/19015
id ftunivtroemsoe:oai:munin.uit.no:10037/19015
record_format openpolar
spelling ftunivtroemsoe:oai:munin.uit.no:10037/19015 2023-05-15T14:22:43+02:00 Scalability of Distributed Version Control Systems Murphy, Mike Bjørndalen, John Markus Anshus, Otto 2017-11-26 https://hdl.handle.net/10037/19015 eng eng Norsk informatikkonferanse NIK: Norsk Informatikkonferanse info:eu-repo/grantAgreement/RCN/IKTPLUSS/270672/Distributed Arctic Observatory (DAO): A Cyber-Physical System for Ubiquitous Data and Services Covering the Arctic Tundra// Murphy, M. J., Bjørndalen, J. M. & Anshus, O. (2017). Scalability of Distributed Version Control Systems. NIK: Norsk Informatikkonferanse. FRIDAID 1522935 1892-0713 1892-0721 https://hdl.handle.net/10037/19015 openAccess Copyright 2017 The Authors VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550 VDP::Technology: 500::Information and communication technology: 550 Journal article Tidsskriftartikkel publishedVersion 2017 ftunivtroemsoe 2021-06-25T17:55:44Z Source at https://ojs.bibsys.no/index.php/NIK/article/view/434 . Distributed version control systems are popular for storing source code, but they are notoriously ill suited for storing large binary files. We report on the results from a set of experiments designed to characterize the behavior of some widely used distributed version control systems with respect to scaling. The experiments measured commit times and repository sizes when storing single files of increasing size, and when storing increasing numbers of single-kilobyte files. The goal is to build a distributed storage system with characteristics similar to version control but for much larger data sets. An early prototype of such a system, Distributed Media Versioning (DMV), is briefly described and compared with Git, Mercurial, and the Git-based backup tool Bup. We find that processing large files without splitting them into smaller parts will limit maximum file size to what can fit in RAM. Storing millions of small files will result in inefficient use of disk space. And storing files with hash-based file and directory names will result in high-latency write operations, due to having to switch between directories rather than performing a sequential write. The next-phase strategy for DMV will be to break files into chunks by content for de-duplication, then re-aggregating the chunks into append-only log files for low-latency write operations and efficient use of disk space. Article in Journal/Newspaper Arctic University of Tromsø: Munin Open Research Archive
institution Open Polar
collection University of Tromsø: Munin Open Research Archive
op_collection_id ftunivtroemsoe
language English
topic VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550
VDP::Technology: 500::Information and communication technology: 550
spellingShingle VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550
VDP::Technology: 500::Information and communication technology: 550
Murphy, Mike
Bjørndalen, John Markus
Anshus, Otto
Scalability of Distributed Version Control Systems
topic_facet VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550
VDP::Technology: 500::Information and communication technology: 550
description Source at https://ojs.bibsys.no/index.php/NIK/article/view/434 . Distributed version control systems are popular for storing source code, but they are notoriously ill suited for storing large binary files. We report on the results from a set of experiments designed to characterize the behavior of some widely used distributed version control systems with respect to scaling. The experiments measured commit times and repository sizes when storing single files of increasing size, and when storing increasing numbers of single-kilobyte files. The goal is to build a distributed storage system with characteristics similar to version control but for much larger data sets. An early prototype of such a system, Distributed Media Versioning (DMV), is briefly described and compared with Git, Mercurial, and the Git-based backup tool Bup. We find that processing large files without splitting them into smaller parts will limit maximum file size to what can fit in RAM. Storing millions of small files will result in inefficient use of disk space. And storing files with hash-based file and directory names will result in high-latency write operations, due to having to switch between directories rather than performing a sequential write. The next-phase strategy for DMV will be to break files into chunks by content for de-duplication, then re-aggregating the chunks into append-only log files for low-latency write operations and efficient use of disk space.
format Article in Journal/Newspaper
author Murphy, Mike
Bjørndalen, John Markus
Anshus, Otto
author_facet Murphy, Mike
Bjørndalen, John Markus
Anshus, Otto
author_sort Murphy, Mike
title Scalability of Distributed Version Control Systems
title_short Scalability of Distributed Version Control Systems
title_full Scalability of Distributed Version Control Systems
title_fullStr Scalability of Distributed Version Control Systems
title_full_unstemmed Scalability of Distributed Version Control Systems
title_sort scalability of distributed version control systems
publisher Norsk informatikkonferanse
publishDate 2017
url https://hdl.handle.net/10037/19015
genre Arctic
genre_facet Arctic
op_relation NIK: Norsk Informatikkonferanse
info:eu-repo/grantAgreement/RCN/IKTPLUSS/270672/Distributed Arctic Observatory (DAO): A Cyber-Physical System for Ubiquitous Data and Services Covering the Arctic Tundra//
Murphy, M. J., Bjørndalen, J. M. & Anshus, O. (2017). Scalability of Distributed Version Control Systems. NIK: Norsk Informatikkonferanse.
FRIDAID 1522935
1892-0713
1892-0721
https://hdl.handle.net/10037/19015
op_rights openAccess
Copyright 2017 The Authors
_version_ 1766295247724216320