Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whether deblur trim the read length with multipile sequence alignment? #196

Open
biofuture opened this issue Jan 15, 2019 · 9 comments
Open

Comments

@biofuture
Copy link

As the deblur require uniform length and usually there is a parameters to trim sequences, I am wondering for the trimming process, whether deblur firstly did a alignment and then trim to the target length?

If the length of long sequences are trimmed just at the front or the end of a sequences, the left fragment was probably not that good to compare across different sequences.

Thank you very much.

@amnona
Copy link
Contributor

amnona commented Jan 16, 2019

Hi,
the trimming step in deblur is without alignment (i.e. jest keep the X first nucleotides).
Since deblur is intended for amplicon sequencing, all sequences from the same bacteria should begin at the same position (following the primer). Usually, all amplicons are also of the same length. The two purposes of trimming are:

  1. to remove all sequences that are shorter due to quality control. I.E if the read length of the sequencing is 150bp, after some quality control methods, a small fraction of reads will be a bit shorter (last nucleotides removed). Deblur discards these sequences since for a shorter sequence we cannot guess the removed nucleotides are supposed to be, so cannot combine them with the longer reads.
  2. Read errors increase as a function of the read length (i.e. mean read error rate at position 10 is lower than the mean read error at position 300). So sometimes, you can opt to use only the first (say 150bp) of your reads. You pay the price of losing some information (the nucleotides at position >150), but gain a lower noise level, which enables better discrimination between close sequences.
    I usually recommend using the first 150-200bp (trimming at that length)

Does this make sense?

Cheers
Amnon

@biofuture
Copy link
Author

It make sense! thank you for your explanation.

@AnalissaFSarno
Copy link

Hi Deblur team!
I have a similar trimming question. I am wondering if it is possible to use only the sequence trimming function of the Deblur workflow. Under Example Usage on the homepage deblur workflow --seqs-fp all_samples.fna --output-dir output -t 150 is shown and the description makes it seem like just the sequences are being trimmed and not subjected to the positive or negative reference library etc or other functions.

Thank you!

@wasade
Copy link
Member

wasade commented Oct 2, 2020

Hi @AnalissaFSarno, yup! See below

$ deblur trim --help
Usage: deblur trim [OPTIONS] SEQS_FP OUTPUT_FP

  Trim FASTA sequences

Options:
  -t, --trim-length INTEGER  Sequence trim length  [required]
  --log-level INTEGER RANGE  Level of messages for log file(range 1-debug to
                             5-critical  [default: 2]

  --log-file PATH            log file name  [default: deblur.log]
  --help                     Show this message and exit.

@AnalissaFSarno
Copy link

Hi @wasade, thank you for your quick reply.

I tried the following command:
deblur trim --trim-length 355 100seqs.fna Trim.100seq.fna

And got the following error:
Traceback (most recent call last):
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/bin/deblur", line 684, in
deblur_cmds()
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/lib/python3.5/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/lib/python3.5/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/lib/python3.5/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/lib/python3.5/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/packages/7x/anaconda3/5.3.0/envs/deblurenv/bin/deblur", line 140, in trim
for label, seq in trim_seqs(sequence_generator(seqs_fp), trim_length):
TypeError: trim_seqs() missing 1 required positional argument: 'left_trim_len'

Then when I try:
deblur trim --trim-length 355 --left_trim_len 0 100seqs.fna Trim.100seq.fna

I get the following error:
Usage: deblur trim [OPTIONS] SEQS_FP OUTPUT_FP
Try "deblur trim --help" for help.

Error: no such option: --left_trim_len

Any guidance would be greatly appreciated!

@wasade
Copy link
Member

wasade commented Oct 2, 2020

That looks like it may be a bug. This isn't a commonly used method from deblur afaik as generally the use is via "workflow"

A separate option, assuming your fasta identifiers are short, is to use linux commands:

$ cut -c 1-355 100seqs.fna Trim.100seq.fna

...that will keep the first 355 columns in the file

@AnalissaFSarno
Copy link

AnalissaFSarno commented Oct 4, 2020 via email

@julibeg
Copy link

julibeg commented Mar 19, 2021

I got a question along similar lines and a quick google search didn't really turn up anything useful: Do the reads need to strictly start at the same position or can I do some quality trimming at the front after removing the primers?

@wasade
Copy link
Member

wasade commented Mar 19, 2021

Reads should be the same length, and relative to the same 5' position; I recommend against quality trimming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants