Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using nbclient to talk to jupyter lab running remotely #213

Open
amit-chandak-unskript opened this issue Mar 15, 2022 · 10 comments
Open

Using nbclient to talk to jupyter lab running remotely #213

amit-chandak-unskript opened this issue Mar 15, 2022 · 10 comments

Comments

@amit-chandak-unskript
Copy link

Hi,
I have a use case wherein i have jupyterlab server running on an EC2 instance and i want to run a .ipynb file against a kernel inside that jupyterlab. I was wondering if i can use nbclient to achieve that? I have used nbclient to talk to enterprise gateway and run notebooks but when i try the same approach for standalone jupyterlab server, it doesnt work.

@davidbrochart
Copy link
Member

Hi @amit-chandak-unskript,
I think you will need to pass a kernel manager to nbclient, with a custom kernel provisioner that allows to talk to the kernel remotely. If you don't want to control the life cycle of this kernel, but only execute code, I think you just need to get the kernel's connection info. The ZMQ sockets are TCP sockets that can be accessed remotely, so this should work. But I don't know if this has been done before, maybe @kevin-bates has ideas?

@kevin-bates
Copy link
Member

Thanks @davidbrochart, hello @amit-chandak-unskript.

Hmm. One of the primary differences between running a kernel via the jupyter-server (lab) REST API and the gateway REST API, is that jupyter-server is session-centric, while the gateway's are more kernel-centric, meaning that all things start from a session in jupyter-server whereas the gateway doesn't. What happens when you use the GatewayKernelManager with nbclient and point at your jupyter-server instance?

Might you be able to deploy (and expose) a gateway instance beside your lab instance in EC2? They could both share the same kernel specifications, but have their own managed space (i.e., Lab won't see the Gateway's kernels and vice versa).

@amit-chandak-unskript
Copy link
Author

Thanks @kevin-bates @davidbrochart ,

  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/jupyter_server/gateway/gateway_client.py", line 404, in gateway_request
    response = await client.fetch(endpoint, **kwargs)
tornado.httpclient.HTTPClientError: HTTP 403: Forbidden

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "temp.py", line 84, in <module>
    main()
  File "temp.py", line 79, in main
    run_notebook(eg_url, bucket, input_key, output_key)
  File "temp.py", line 44, in run_notebook
    resp = client.execute(kernel_name="python3")
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/util.py", line 78, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/util.py", line 57, in just_run
    return loop.run_until_complete(coro)
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/client.py", line 542, in async_execute
    async with self.async_setup_kernel(**kwargs):
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/contextlib.py", line 170, in __aenter__
    return await self.gen.__anext__()
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/client.py", line 500, in async_setup_kernel
    await self.async_start_new_kernel(**kwargs)
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/client.py", line 412, in async_start_new_kernel
    await ensure_async(self.km.start_kernel(extra_arguments=self.extra_arguments, **kwargs))
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/util.py", line 89, in ensure_async
    result = await obj
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/jupyter_server/gateway/managers.py", line 438, in start_kernel
    response = await gateway_request(self.kernels_url, method="POST", body=json_body)
  File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/jupyter_server/gateway/gateway_client.py", line 425, in gateway_request
    ) from e
tornado.web.HTTPError: HTTP 403: Forbidden (Error attempting to connect to Gateway server url

I see the above error, when i am using the jupyterlab server url. Do i need to add some path to the base url?

Here is my code snippet

import os
import logging
import argparse
import nbformat
import boto3
from nbclient import NotebookClient
from nbclient.exceptions import CellExecutionError
from jupyter_server.gateway.managers import GatewayKernelManager

logger = logging.getLogger()
logger.setLevel(logging.INFO)

# The function runs the notebook in S3 against the specified EG.
# It expects the following arguments:
# eg_url - This is the url of the Enterprise gateway.
# bucket - Name of the bucket where the notebook is kept.
# input_key - S3 notebook to run.
# output_key - S3 notebook to store the output of the run.
def run_notebook(eg_url, bucket, input_key, output_key) -> bool:
    os.environ["JUPYTER_GATEWAY_URL"] = eg_url

    s3 = boto3.client('s3')
    try:
        getObject = s3.get_object(Bucket=bucket, Key=input_key)
    except Exception:
        logger.error(f'GetObject {input_key} failed')
        return False

    fp = getObject['Body']
    try:
        run_notebook = nbformat.read(fp, as_version=4)
    except Exception:
        logger.error('Wrong notebook format')
        return False

    client = NotebookClient(nb=run_notebook, kernel_manager_class=GatewayKernelManager)
    try:
        resp = client.execute(kernel_name="python3")
    except CellExecutionError:
        pass

    # Reads the output file contents.
    bodyStr = nbformat.writes(run_notebook)

    # Upload the output nb file to s3.
    try:
        s3.put_object(Body=str.encode(bodyStr), Bucket=bucket, Key=output_key)
    except Exception:
        logger.error(f'S3 upload {output_key} failed')
        return False

    return True

def main():
    parser = argparse.ArgumentParser(description='Execute runbook nb file.')

    parser.add_argument('bucket', metavar='bucket', type=str,
                        help='AWS S3 bucket name where nb files are stored')
    parser.add_argument('input_key', metavar='input_key', type=str,
                        help='S3 key to runbook nb file')
    parser.add_argument('output_key', metavar='output_key', type=str,
                        help='S3 key to where output nb file will be stored')
    parser.add_argument('eg_url', metavar='gateway_url', type=str,
                        help='Url to Jupyterlab gateway')

    args = parser.parse_args()

    bucket = args.bucket
    input_key = args.input_key
    output_key = args.output_key
    eg_url = args.eg_url

    run_notebook(eg_url, bucket, input_key, output_key)

    return

if __name__ == '__main__':
    main()

The above script works fine with enterprise gateway. I am just trying the same script with jupyterlab url passed as eg_url argument.

@amit-chandak-unskript
Copy link
Author

Its not reachability issue, as i confirmed the jupyterlab server url works if i do the following

 curl -XGET https://<jupyterlab base url>/api/kernelspecs
{"default": "python3", "kernelspecs": {"python3": {"name": "python3", "spec": {"argv": ["/opt/conda/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"], "env": {}, "display_name": "python3", "language": "python", "interrupt_mode": "signal", "metadata": {"debugger": true}}, "resources": {"logo-32x32": "/d3c71c1b-d075-4b5c-a6dd-9f417f483c3e/kernelspecs/python3/logo-32x32.png", "logo-64x64": "/d3c71c1b-d075-4b5c-a6dd-9f417f483c3e/kernelspecs/python3/logo-64x64.png"}}}}(con

@amit-chandak-unskript
Copy link
Author

@kevin-bates i like your idea of having an enterprise gateway beside the jupyterlab server with same kernelspec and use that for nbclient. But i would like to avoid maintaining 2 instances. So, nbclient was never meant to be used with remote jupyterlab server, is it?

@davidbrochart
Copy link
Member

So, nbclient was never meant to be used with remote jupyterlab server, is it?

No, it wasn't. Nbclient doesn't talk any HTTP, and jupyter-server is an HTTP server. The only web that it talks is TCP sockets, which are used to connect to a (remote) kernel. If it had access to the kernel's connection info, that would be possible.

@amit-chandak-unskript
Copy link
Author

Thanks @davidbrochart , one more question, is it possible to make nbclient use a custom kernel to run? as in does it take a kernel spec to run the notebook?

@davidbrochart
Copy link
Member

Yes, you will need to pass your own kernel manager (km). You can get one like this:

from jupyter_client.manager import KernelManager

km = KernelManager(kernel_name="python3")
km.start_kernel()

# pass km to nbclient

km.shutdown_kernel()

@kevin-bates
Copy link
Member

Right, but, as David points out, this won't get you to the remote server. You'd essentially be rewriting GatewayKernelManager (and GatewayKernelClient). Since you received a 403, you might need to explore adding the token to your headers via JUPYTER_GATEWAY_AUTH_TOKEN. I suspect there will be other issues as well - probably in the session management aspect of things.

@davidbrochart
Copy link
Member

@amit-chandak-unskript I'm working on a new project that will allow to do just what you want, see davidbrochart/jpterm#2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants