Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] temp_dask folder sometimes is not founded and raises an error #8902

Open
gargantuadev opened this issue Oct 21, 2024 · 1 comment
Open

Comments

@gargantuadev
Copy link

I cannot provide code since it is against my company policies.

I have small .parquet files, and I have tons of them. I read these ones with Dask. They are pretty small like 60KB. If I do the ".compute()" on a dask dataframe, it raises this error:

Traceback (most recent call last):
    df = df[[Key, "Index"]].reset_index(drop=True).compute() 
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 286, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 568, in compute
    results = schedule(dsk, keys, **kwargs)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 560, in get_sync
    return get_async(
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 503, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "D:\Python38\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "D:\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 545, in submit
    fut.set_result(fn(*args, **kwargs))
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in batch_execute_tasks
    return [execute_task(*a) for a in it]
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in <listcomp>
    return [execute_task(*a) for a in it]
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 228, in execute_task
    result = pack_exception(e, dumps)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\dataframe\shuffle.py", line 448, in __call__
    path = tempfile.mkdtemp(suffix=".partd", dir=self.tempdir)
  File "D:\Python38\lib\tempfile.py", line 358, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'E:/temp_dask/1729501515.144889\\tmpol0vhvzl.partd'

Anything else we need to know?: it happens when I have a lot of small files and that compute is done for each one of them.

Environment:

  • Dask version: 2021.7.0
  • Python version: 3.8.0
  • Operating System: Windows
  • Install method (conda, pip, source): pip
@hendrikmakait
Copy link
Member

Thanks for reporting this issue! It looks like your Dask version lags by several years. Please try updating to the latest release and see if the error still occurs. There's a lot of development activity, and your problem may have already been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants