Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond

While genomics approaches are important in studying host phenotype alterations in response to environmental changes or disease, proteomics approaches offer a complementary perspective by providing a direct readout of expressed functional pathways. Proteogenomic strategies utilizing RNA-sequencing da...

Full description

Bibliographic Details
Main Author: Lin, Miin Sophia
Other Authors: Bafna, Vineet
Format: Software
Language:English
Published: eScholarship, University of California 2022
Subjects:
Online Access:https://escholarship.org/uc/item/1s98z674
id ftcdlib:oai:escholarship.org:ark:/13030/qt1s98z674
record_format openpolar
spelling ftcdlib:oai:escholarship.org:ark:/13030/qt1s98z674 2024-09-15T17:56:32+00:00 Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond Lin, Miin Sophia Bafna, Vineet 2022-01-01 https://escholarship.org/uc/item/1s98z674 en eng eScholarship, University of California qt1s98z674 https://escholarship.org/uc/item/1s98z674 public Bioinformatics multimedia 2022 ftcdlib 2024-06-28T06:28:21Z While genomics approaches are important in studying host phenotype alterations in response to environmental changes or disease, proteomics approaches offer a complementary perspective by providing a direct readout of expressed functional pathways. Proteogenomic strategies utilizing RNA-sequencing data to construct splice graph databases have been used in a variety of applications to identify novel splice junctions and mutated peptides. The work in this dissertation begins with the integration of splice databases into a proteogenomic pipeline for the validation of the recently released annotation of the Atlantic salmon genome, and the validation of primary hepatocytes as in vitro models for salmon toxicity studies. Searching in-house generated LC-MS/MS datasets against splice databases constructed from publicly available and in-house-generated salmon transcriptomics data, our proteogenomic pipeline identified 183 events in support of 71 transcript predictions. These included novel genes, corrections to current annotations, and support for Ensembl transcripts. In addition to host-expressed proteins, microbial-expressed proteins can also alter host phenotype. In the absence of prior taxonomic information, tandem mass spectra would be searched against large pan-microbial databases, requiring heavy computational workload and reducing sensitivity. Using both software and algorithmic methods, we developed ProteoStorm, an efficient database search framework for large-scale metaproteomics studies, that significantly reduced runtime from 22 weeks to 9.7 hours while retaining 96% of peptide identifications when compared to MSGF+. A reanalysis of a urinary tract infection dataset revealed a complex pattern of polymicrobial expression, including previously identified microbes. In the final chapter, we used transcriptomics data from TCGA to identify a set of genes that may be involved in the maintenance of ecDNA amplicons in cancer. Specifically, we applied the Boruta algorithm, which incorporates the Random Forest classifier ... Software Atlantic salmon University of California: eScholarship
institution Open Polar
collection University of California: eScholarship
op_collection_id ftcdlib
language English
topic Bioinformatics
spellingShingle Bioinformatics
Lin, Miin Sophia
Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond
topic_facet Bioinformatics
description While genomics approaches are important in studying host phenotype alterations in response to environmental changes or disease, proteomics approaches offer a complementary perspective by providing a direct readout of expressed functional pathways. Proteogenomic strategies utilizing RNA-sequencing data to construct splice graph databases have been used in a variety of applications to identify novel splice junctions and mutated peptides. The work in this dissertation begins with the integration of splice databases into a proteogenomic pipeline for the validation of the recently released annotation of the Atlantic salmon genome, and the validation of primary hepatocytes as in vitro models for salmon toxicity studies. Searching in-house generated LC-MS/MS datasets against splice databases constructed from publicly available and in-house-generated salmon transcriptomics data, our proteogenomic pipeline identified 183 events in support of 71 transcript predictions. These included novel genes, corrections to current annotations, and support for Ensembl transcripts. In addition to host-expressed proteins, microbial-expressed proteins can also alter host phenotype. In the absence of prior taxonomic information, tandem mass spectra would be searched against large pan-microbial databases, requiring heavy computational workload and reducing sensitivity. Using both software and algorithmic methods, we developed ProteoStorm, an efficient database search framework for large-scale metaproteomics studies, that significantly reduced runtime from 22 weeks to 9.7 hours while retaining 96% of peptide identifications when compared to MSGF+. A reanalysis of a urinary tract infection dataset revealed a complex pattern of polymicrobial expression, including previously identified microbes. In the final chapter, we used transcriptomics data from TCGA to identify a set of genes that may be involved in the maintenance of ecDNA amplicons in cancer. Specifically, we applied the Boruta algorithm, which incorporates the Random Forest classifier ...
author2 Bafna, Vineet
format Software
author Lin, Miin Sophia
author_facet Lin, Miin Sophia
author_sort Lin, Miin Sophia
title Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond
title_short Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond
title_full Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond
title_fullStr Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond
title_full_unstemmed Decoding Phenotypes via Transcriptomics and Proteomics: Cancer and beyond
title_sort decoding phenotypes via transcriptomics and proteomics: cancer and beyond
publisher eScholarship, University of California
publishDate 2022
url https://escholarship.org/uc/item/1s98z674
genre Atlantic salmon
genre_facet Atlantic salmon
op_relation qt1s98z674
https://escholarship.org/uc/item/1s98z674
op_rights public
_version_ 1810432732963012608