Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only outputting one file? #64

Open
theheking opened this issue Mar 24, 2019 · 5 comments
Open

Only outputting one file? #64

theheking opened this issue Mar 24, 2019 · 5 comments

Comments

@theheking
Copy link

Hi,

I am running polyester using the following code:

simulate_experiment(EnsemblfastaFile, numreps=c(1,1),fold_changes=fold_changes, reads_per_transcript=readspertx, paired=FALSE, outdir="/mnt/lustre/users/k1632479/polyester/simulatedread", distr="empirical", error_model="illumina5", bias="rnaf")

It runs for hours and outputs a large 8 GB fasta file.
I don't understand what I'm running wrong.

@JMF47
Copy link
Collaborator

JMF47 commented Mar 24, 2019

It’s likely that this is an input issue and would be more appropriate on the bioconductor user forum.

What does your readspertx look like?

@theheking
Copy link
Author

If it is an input issue, I don't know why I would get a output fasta file for the first sample_01.fasta which is 14G.

I have already removed the zeroes from the readspertx. So I don't think it is that.
`fasta_File <- readDNAStringSet(EnsemblfastaFile)
#replace all 0 counts with 1
fastaFile_nozero <- replace(width(fasta_File), width(fasta_File) == 0, 1)

readspertx = round(20 * fastaFile_nozero / 100)`

@JMF47
Copy link
Collaborator

JMF47 commented Mar 24, 2019

What does the following show:

head(readspertx)
sum(readspertx)

@theheking
Copy link
Author

head(readspertx)
[1] 214 22 831 598 727 96
sum(readspertx)
[1] 47182274

@JMF47
Copy link
Collaborator

JMF47 commented Mar 25, 2019

If I’m understanding your query and the output you’ve shared with me, the output file size is within the realm of expectation. The input specified generates one file per sample, since paired=F. If I recall correctly, 47 million 100bp reads stored in fasta format should be >10Gb. If you could let me know a bit more about what your goal of the simulation is (such as simulating 20x coverage of every transcript in the transcriptome) I could try and help you tweak your commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants