Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it suitable for whole gene? #37

Open
326reborn opened this issue Sep 22, 2023 · 2 comments
Open

Is it suitable for whole gene? #37

326reborn opened this issue Sep 22, 2023 · 2 comments

Comments

@326reborn
Copy link

I wonder if it's suitable for full sequence?
When I test the test_data and change it to amino acid genotypes, it works well. However, when I elongate the test sequence to 242 aa, it can't work.

And I got error at the step of SequenceSpace():
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 14.2 GiB for an array with shape (1912602624,) and data type float64

I hope to use it to analyse random mutantions on a 239 amino acids long gene.

@lperezmo
Copy link
Member

A simple way to get around that would be to use a system with more RAM, since it looks like the program tried to save the whole array into memory and it ran out of space. Another option would be to store the array on disk instead of memory using numpy's memmap. If you could share the code you used that might be useful to see where something like that could be implemented. using float32 for the array is another option

@326reborn
Copy link
Author

Hello Morales,
Thanks for your reply! I just use the code to get space:

import pandas as pd
import numpy as np
import seaborn as sns
import holoviews as hv

import gpmap.src.plot as plot

from gpmap.src.inference import VCregression
from gpmap.src.space import SequenceSpace
from gpmap.src.randwalk import WMWSWalk

data=pd.read_csv('test_data.csv',sep=',',header=0)
space = SequenceSpace(X=data['genotypes'].values, y=data['phenotypes'].values)

test_data.csv

I think it might be the long sequence caused the memory overflow.
By the way, some packages can't be imported because the version of my scipy is 1.10. I tried existing version of scipy and it still can't work. So I changed the code in space.py and other scripts like 'from scipy.sparse import csr_matrix'. I'm not sure if this will lead to bug.

Best,
Yu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants