Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with generate_spark_graph() #5

Open
eirikhoye opened this issue Nov 19, 2020 · 1 comment
Open

Issue with generate_spark_graph() #5

eirikhoye opened this issue Nov 19, 2020 · 1 comment

Comments

@eirikhoye
Copy link

Hi, I managed to get sparkhpc and imnet running on our institute HPC cluster, however, when I run the code to generate a distributed graph:

import findspark; findspark.init()
import sparkhpc
template_path = '/cluster/home/eirikhoy/sparkhpc/build/lib/sparkhpc/templates/sparkjob.slurm.template'
sj = sparkhpc.sparkjob.SLURMSparkJob(ncores=4, template=template_path)
from pyspark import SparkContext
sc = SparkContext(master=sj.master_url())
import imnet
import numpy as np
from scipy.sparse import csr_matrix 
import pyspark
strings = imnet.random_strings.generate_random_sequences(5000)

g_rdd = imnet.process_strings.generate_spark_graph(strings, sc, max_ld=2).cache()

I get the error:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-15-af167cc949f4> in <module>()
----> 1 g_rdd = imnet.process_strings.generate_spark_graph(strings, sc, max_ld=2).cache()

/cluster/home/eirikhoy/.conda/envs/imnet_v0.2/lib/python2.7/site-packages/imnet/process_strings.pyc in generate_spark_graph(strings, sc, mat, min_ld, max_ld)
    189         warn("Problem importing pyspark -- are you sure your SPARK_HOME is set?")
    190 
--> 191     sqc = SQLContext(sc)
    192 
    193     strings_b = sc.broadcast(strings)

UnboundLocalError: local variable 'SQLContext' referenced before assignment

Note, I tested it on a local VM and got the same error, so maybe the issue is not with incorrect dependencies?

Both SPARK_HOME and JAVA_HOME environment variable are assigned:

>>> os.environ['SPARK_HOME']
'/cluster/software/Spark/2.4.0-intel-2018b-Python-3.6.6'
>>> os.environ['JAVA_HOME']
'/cluster/software/Java/1.8.0_212'

The rest of the code examples ran fine.

@rokroskar
Copy link
Owner

Hi @eirikhoye apologies, I missed this issue - have you managed to resolve it? Can you try running these lines one by one in your session to maybe pinpoint where the problem lies?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants