Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scandir is slow when access stat #288

Open
darouwan opened this issue Sep 5, 2024 · 2 comments
Open

scandir is slow when access stat #288

darouwan opened this issue Sep 5, 2024 · 2 comments

Comments

@darouwan
Copy link

darouwan commented Sep 5, 2024

My folder includes more than 6000 files, and need to list them order by last modified time.
I use the code

file_list = []
for file_info in scandir(fr"\\{self._hostname}\{self._service_name}\{path}"):
        file_list.append((file_info.name, file_info.stat().st_mtime))

It costs about 60 seconds.
But if I remove the file_info.stat().st_mtime, it only takes 1 second.
So it seems be caused by stat(). Anyway to accelerate the stat() or other way to list files order by last modified time

@adiroiban
Copy link
Contributor

adiroiban commented Sep 5, 2024

Instead of calling stat() for each member, have you tried to use the already exisitng information from SMBDirEntry ? For example SMBDirEntry.smb_info.last_write_time ?

You can check the source code here

def scandir(path, search_pattern="*", **kwargs):
"""
Return an iterator of DirEntry objects corresponding to the entries in the directory given by path. The entries are
yielded in arbitrary order, and the special entries '.' and '..' are not included.
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type
or file attribute information, because DirEntry objects expose this information if the SMB server provides it when
scanning a directory. All DirEntry methods may perform a SMB request, but is_dir(), is_file(), is_symlink() usually
only require a one system call unless the file or directory is a reparse point which requires 2 calls. See the
Python documentation for how DirEntry is set up and the methods and attributes that are available.
:param path: The path to a directory to scan.
:param search_pattern: THe search string to match against the names of directories or files. This pattern can use
'*' as a wildcard for multiple chars and '?' as a wildcard for a single char. Does not support regex patterns.
:param kwargs: Common SMB Session arguments for smbclient.
:return: An iterator of DirEntry objects in the directory.
"""
connection_cache = kwargs.get("connection_cache", None)
with SMBDirectoryIO(path, share_access="rwd", **kwargs) as fd:
for raw_dir_info in fd.query_directory(search_pattern, FileInformationClass.FILE_ID_FULL_DIRECTORY_INFORMATION):
filename = raw_dir_info["file_name"].get_value().decode("utf-16-le")
if filename in [".", ".."]:
continue
dir_info = SMBDirEntryInformation(
creation_time=raw_dir_info["creation_time"].get_value(),
last_access_time=raw_dir_info["last_access_time"].get_value(),
last_write_time=raw_dir_info["last_write_time"].get_value(),
change_time=raw_dir_info["change_time"].get_value(),
end_of_file=raw_dir_info["end_of_file"].get_value(),
allocation_size=raw_dir_info["allocation_size"].get_value(),
file_attributes=raw_dir_info["file_attributes"].get_value(),
ea_size=raw_dir_info["ea_size"].get_value(),
file_id=raw_dir_info["file_id"].get_value(),
file_name=filename,
)
dir_entry = SMBDirEntry(
SMBRawIO(rf"{path}\{filename}", **kwargs),
dir_info,
connection_cache=connection_cache,
)
yield dir_entry

@jborean93
Copy link
Owner

Thanks @adiroiban for sharing that info.

The benefits of using scandir is that the act of enumerating the directory gives you the raw information accessible through the dir_info attribute. The stat() method on the yielded SMBDirInfo has the stat() method to get some more information not available on the dir_info attribute but it involves another network request which is done per file. In this case the last_write_time on the dir_info should contain the information you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants