Fasta filtering and DNAm scoring pipeline
This file takes in maxee filtered and dereplicated sample fasta files. Samples are then checked for the correct sequence site “CCGGG” or “GGG” at the beginning of each sequence. Samples are trimmed to be the same length and dereplicated using USEARCH10. A database of each set of unique sequences of...
Main Authors: | , , , , , , |
---|---|
Format: | Dataset |
Language: | unknown |
Published: |
Dryad Digital Repository
2018
|
Subjects: | |
Online Access: | https://dx.doi.org/10.5061/dryad.n4h3672/3 https://datadryad.org/resource/doi:10.5061/dryad.n4h3672/3 |
Summary: | This file takes in maxee filtered and dereplicated sample fasta files. Samples are then checked for the correct sequence site “CCGGG” or “GGG” at the beginning of each sequence. Samples are trimmed to be the same length and dereplicated using USEARCH10. A database of each set of unique sequences of CCGGG or GGG starting sites is created e.g. a methylated and unmethylated database. Samples are compared against these two databases and counts recorded to generate scores between 0-1. DNAm scores can then be further filtered on read depth, standard deviation or output as needed. |
---|