Fasta filtering and DNAm scoring pipeline

This file takes in maxee filtered and dereplicated sample fasta files. Samples are then checked for the correct sequence site “CCGGG” or “GGG” at the beginning of each sequence. Samples are trimmed to be the same length and dereplicated using USEARCH10. A database of each set of unique sequences of...

Full description

Bibliographic Details
Main Authors: De Paoli-Iseppi, Ricardo, Deagle, Bruce, Polanowski, Andrea, McMahon, Clive, Dickinson, Joanne, Hindell, Mark, Jarman, Simon
Format: Dataset
Language:unknown
Published: Dryad Digital Repository 2018
Subjects:
Age
Online Access:https://dx.doi.org/10.5061/dryad.n4h3672/3
https://datadryad.org/resource/doi:10.5061/dryad.n4h3672/3
Description
Summary:This file takes in maxee filtered and dereplicated sample fasta files. Samples are then checked for the correct sequence site “CCGGG” or “GGG” at the beginning of each sequence. Samples are trimmed to be the same length and dereplicated using USEARCH10. A database of each set of unique sequences of CCGGG or GGG starting sites is created e.g. a methylated and unmethylated database. Samples are compared against these two databases and counts recorded to generate scores between 0-1. DNAm scores can then be further filtered on read depth, standard deviation or output as needed.