Flexible parsing and preprocessing of technical sequences with splitcode

Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment...

Full description

Bibliographic Details
Main Authors: Sullivan, Delaney K., Pachter, Lior
Format: Report
Language:unknown
Published: 2023
Subjects:
Online Access:https://www.ncbi.nlm.nih.gov/pmc/PMC10055216
https://doi.org/10.1101/2023.03.20.533521
id ftcaltechauth:oai:authors.library.caltech.edu:s54v6-nf783
record_format openpolar
spelling ftcaltechauth:oai:authors.library.caltech.edu:s54v6-nf783 2024-06-23T07:54:06+00:00 Flexible parsing and preprocessing of technical sequences with splitcode Sullivan, Delaney K. Pachter, Lior 2023-03-23 https://www.ncbi.nlm.nih.gov/pmc/PMC10055216 https://doi.org/10.1101/2023.03.20.533521 unknown https://doi.org/10.1101/2023.03.20.533521 http://github.com/pachterlab/splitcode oai:authors.library.caltech.edu:s54v6-nf783 https://www.ncbi.nlm.nih.gov/pmc/PMC10055216 eprintid:120411 resolverid:CaltechAUTHORS:20230327-441955000.1 info:eu-repo/semantics/openAccess Other info:eu-repo/semantics/report 2023 ftcaltechauth https://doi.org/10.1101/2023.03.20.533521 2024-06-12T04:32:38Z Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient preprocessing, parsing, and manipulation of sequencing reads. The splitcode program is free, open source, and available for download at http://github.com/pachterlab/splitcode. This versatile tool will facilitate simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays. The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. We thank the laboratory of Mitchell Guttman (Caltech) for discussions which motivated this project. Some of the splitcode code is derived from code written by Páll Melsted (University of Iceland), and we are grateful to him for sharing his code with us. Thanks to A. Sina Booeshaghi for helpful discussions regarding seqspec and splitcode. Illustrations were created with BioRender: http://biorender.com. D.K.S. was funded by the UCLA-Caltech Medical Scientist Training Program (NIH NIGMS training grant T32 GM008042). L.P. was supported in part by the National Institutes of Health (NIH) grants U19MH114830 and 5UM1HG012077-02. The authors declare no conflicts of interest. Contributions. D.K.S. conceived of the work, developed the methods and software, and drafted the manuscript. L.P. supervised the work. Both authors reviewed and edited the manuscript. Code Availability. The splitcode software is available at http://github.com/pachterlab/splitcode. The version of the splitcode software referred to throughout this paper is version 0.28.0. Submitted - nihpp-2023.03.20.533521v2.pdf Report Iceland Caltech Authors (California Institute of Technology)
institution Open Polar
collection Caltech Authors (California Institute of Technology)
op_collection_id ftcaltechauth
language unknown
description Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient preprocessing, parsing, and manipulation of sequencing reads. The splitcode program is free, open source, and available for download at http://github.com/pachterlab/splitcode. This versatile tool will facilitate simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays. The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. We thank the laboratory of Mitchell Guttman (Caltech) for discussions which motivated this project. Some of the splitcode code is derived from code written by Páll Melsted (University of Iceland), and we are grateful to him for sharing his code with us. Thanks to A. Sina Booeshaghi for helpful discussions regarding seqspec and splitcode. Illustrations were created with BioRender: http://biorender.com. D.K.S. was funded by the UCLA-Caltech Medical Scientist Training Program (NIH NIGMS training grant T32 GM008042). L.P. was supported in part by the National Institutes of Health (NIH) grants U19MH114830 and 5UM1HG012077-02. The authors declare no conflicts of interest. Contributions. D.K.S. conceived of the work, developed the methods and software, and drafted the manuscript. L.P. supervised the work. Both authors reviewed and edited the manuscript. Code Availability. The splitcode software is available at http://github.com/pachterlab/splitcode. The version of the splitcode software referred to throughout this paper is version 0.28.0. Submitted - nihpp-2023.03.20.533521v2.pdf
format Report
author Sullivan, Delaney K.
Pachter, Lior
spellingShingle Sullivan, Delaney K.
Pachter, Lior
Flexible parsing and preprocessing of technical sequences with splitcode
author_facet Sullivan, Delaney K.
Pachter, Lior
author_sort Sullivan, Delaney K.
title Flexible parsing and preprocessing of technical sequences with splitcode
title_short Flexible parsing and preprocessing of technical sequences with splitcode
title_full Flexible parsing and preprocessing of technical sequences with splitcode
title_fullStr Flexible parsing and preprocessing of technical sequences with splitcode
title_full_unstemmed Flexible parsing and preprocessing of technical sequences with splitcode
title_sort flexible parsing and preprocessing of technical sequences with splitcode
publishDate 2023
url https://www.ncbi.nlm.nih.gov/pmc/PMC10055216
https://doi.org/10.1101/2023.03.20.533521
genre Iceland
genre_facet Iceland
op_relation https://doi.org/10.1101/2023.03.20.533521
http://github.com/pachterlab/splitcode
oai:authors.library.caltech.edu:s54v6-nf783
https://www.ncbi.nlm.nih.gov/pmc/PMC10055216
eprintid:120411
resolverid:CaltechAUTHORS:20230327-441955000.1
op_rights info:eu-repo/semantics/openAccess
Other
op_doi https://doi.org/10.1101/2023.03.20.533521
_version_ 1802646058225893376