Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Device or Resource Busy' errors on directory rename are not handled correctly. #1458

Open
kkapper opened this issue Sep 18, 2024 · 1 comment
Labels

Comments

@kkapper
Copy link

kkapper commented Sep 18, 2024

Bug description

If during a directory rename Jupyter server encounters a 'Device or Resource Busy' error, the server will continue the rename operation on the contents of the directory.

This is a big deal in Kubernetes environments where we often only offer persistence of certain directories. If a Jupyterlab user were to rename a persisted directory, this bug would cause data loss.

The UI will claim the rename has failed, but the source directory ends up empty, and the contents have been moved to the new directory.

How to reproduce

  1. You'll need some way to generate this specific kind of error: (This is easy to do in a Kubernetes environment where this was observed)
Internal Server Error (Unknown error renaming file: private [Errno 16] Device or resource busy:
  1. Create a directory in Jupyterlab.

  2. Rename the directory in the UI.

image

  1. Receive error:

image

  1. Notice files have been moved to the new directory

image

  1. Old directory is now empty.

image

Expected behaviour

If the rename fails for any reason, the move process should probably just stop dead in its tracks.

Actual behaviour

The rename errors, but all the files have been moved to the target directory.

Your personal set up

This error as reported happened in a Kubernetes environment, but it likely happens in all versions of Jupyter server with varying degrees of severity.

Specifically we are guaranteeing that the /private and /shared directories are backed by persistent storage, and any content elsewhere is just in memory and will be deleted on server restart.

We use the standard helm chart here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s

current chart version: 3.0.3

paste relevant logs here, if any

[W 2024-09-16 18:13:48.483 ServerApp] wrote error: "Unknown error renaming file: private [Errno 16] Device or resource busy: '/home/jovyan/private'"
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.10/shutil.py", line 816, in move
        os.rename(src, real_dst)
    OSError: [Errno 16] Device or resource busy: '/home/jovyan/private' -> '/home/jovyan/private-copy'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 1050, in rename_file
        await run_sync(shutil.move, old_os_path, new_os_path)
      File "/opt/conda/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
        return await get_asynclib().run_sync_in_worker_thread(
      File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
        return await future
      File "/opt/conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
        result = context.run(func, *args)
      File "/opt/conda/lib/python3.10/shutil.py", line 834, in move
        rmtree(src)
      File "/opt/conda/lib/python3.10/shutil.py", line 731, in rmtree
        onerror(os.rmdir, path, sys.exc_info())
      File "/opt/conda/lib/python3.10/shutil.py", line 729, in rmtree
        os.rmdir(path)
    OSError: [Errno 16] Device or resource busy: '/home/jovyan/private'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.10/site-packages/tornado/web.py", line 1786, in _execute
        result = await result
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/handlers.py", line 151, in patch
        model = await ensure_async(cm.update(model, path))
      File "/opt/conda/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 182, in ensure_async
        result = await obj
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/manager.py", line 901, in update
        await self.rename(path, new_path)
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/manager.py", line 888, in rename
        await self.rename_file(old_path, new_path)
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 1054, in rename_file
        raise web.HTTPError(500, f"Unknown error renaming file: {old_path} {e}") from e
    tornado.web.HTTPError: HTTP 500: Internal Server Error (Unknown error renaming file: private [Errno 16] Device or resource busy: '/home/jovyan/private')
@kkapper kkapper added the bug label Sep 18, 2024
@krassowski
Copy link
Collaborator

This looks like a limitation of shutil.move; that said there seems to be a precedent on wrapping file system operations with some extra safety logic:

@contextmanager
def atomic_writing(path, text=True, encoding="utf-8", log=None, **kwargs):
"""Context manager to write to a file only if the entire write is successful.
This works by copying the previous file contents to a temporary file in the
same directory, and renaming that file back to the target if the context
exits with an error. If the context is successful, the new data is synced to
disk and the temporary file is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants