-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common or shared variables for different notebooks in a pipeline #3179
Comments
There are individual node properties and pipeline "global" properties... you can find it as different tabs on properties panel |
@lresende thanks for response. I beleive I was not clear with my message, when I mentioned about pipeline parameters. Yes, these parameters are available, but only when scheduled as pipeline. But this the final stage when someone has developed all those notebooks . How would those notebooks get those variables, when they are not running as pipeline? In the example above, lets say I am developing Part 1 notebook interactively. In my case especially a new kubernetes pod is started for that notebook. It does not have access to any other file that one can see in the explorer. The way in a normal python/jupyter environment, I can create a simple .py file with some configs and can access it any other notebook. How can I do it in this case ? |
hi @arundeep78. You might try looking at using volume mounts, either conditional or unconditional, by adjusting the kernel-pod template. This, in combination with |
We would usually specify environment variables, and in the notebook, lookup the env vars with default values for local runs.
This way, you will get the correct value from MY_VAR when running as pipelines and otherwise default to the local value. |
@lresende Sorry, I probably do not understand the whole architecture completely. I start an IBm Elyra environment and start a Jupyterlab interface. This has some directory structure and notebooks. If I want to use environment variables, then I would have to customize this image to get those variables in the interactive development phase. Which if I would do won't make sense. As in the jupyterlab environment I may have different pipelines confugured in different folders e.g.. Either I am mising something or people just do not use it this way and configure a working notebook which runs independently in its own environment. |
@kevin-bates thanks. I will read that documentation and will come back. But just to clarify, we don't have 'local pipelines'. We develop notebooks interactively using EG, which starts the kernel from its standard image. in our case kernel named "python_kubernetes". Once this notebook is working, we schedule it through Elyra interface on Airflow. As notebooks have all variables inside without dependencies on any environment variables and so, it runs fine. Only challenge we have is, scenarios when functionality is splitted in multiple notebooks, we are duplicating those variables in all notebooks. |
@arundeep78 - Thanks for the additional information. So your notebook "nodes" must be requiring resources that are not available locally, yet are also available in airflow. And also, combining them onto one server (as would be the case if you, say, used Jupyter Hub to host each elyra-server) using that server's kernels locally, would still have insufficient resources - is that correct? If that's the case, then, yeah, looking into the volume mounts approach is probably your best bet. Seems like you might be able to use unconditional volume mounts and just enter the necessary instructions into the kernel-pod templates directly (rather than relying on |
@kevin-bates what do you mean when you say "insufficient resources" for nodes? are you refering to CPU, memory etc. or ? Anyways, we do not lack compute power. In Elyra development environment we have a github repo that synch all notebooks to Elyra. In that we have multiple folders that contains all notebooks relevant for a single pipeline example below. During development of lets say notebook1, Elyra will start a kernel which will not have access to "common_paramters.py", which may contain common variables that are needed by I assume if I schedule it as a pipleine and add But, during the interactive development of these notebooks, the kernel that has been started as kubernetes pod, does not have access to this 'common_parameters.py' file, This means I cannot just develop a notebook in interactive mode and use the same in pipeline (unless there is a way). These paramters I cannot define in kernel images, as their values is different for each notebook. e.g. for Also, these I am not sure if they are really environment variables. These are more like "application vairables" as in application defined by pipeline and environment variable is something like database connection, which can be different for development or production, but still use the same variables for table name. I hope I made the situation more clear than before as to what I am trying to achieve to find a way to reach there. |
My comment use essentially asking "why do you need to use EG to develop your notebooks?". If you have enough resources locally you don't need EG and the (local) files are available to all notebooks (nodes). |
This probably is more a "how-to" issue than an "issue", but I am not sure where it falls.
I have elyra installed in kubernetes with enterprise gateway with configuration that every notebook execution in interactive mode starts a new pod in kubernetes.
I want to to know, how can I share common vairables in multiple notebooks using Elyra at development as well as scheduling those in pipeline. I am talking python kernels here.
I will take this standard pipeline from Elyra website as an example.
In this case lets assume there is a database table name "t_save_data_here". This is to be used in all the notebooks and scripts.
Currently, I am declaring a variable in all the notebooks to connect to it.
e.g. table_name = "t_save_data_here".
But how can I do it by just declaring it once e.g. in a parameters.py, which can be imported as a first line in all notebooks and I get all global variables defined? This way e.g. if the table name changes, I just to need to change it once.
I read through documentation and some threads,but it mostly talks about pipeline parameters, env variables or exporting "output" from previous stage to next.
These become very specific to pipeline config and won't be available when I am developing a given notebook or script...
Is there a documentation that I could not find ?
Or is this is just wrong way of trying to solve the problem? Is there a better way?
The text was updated successfully, but these errors were encountered: