Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wildcard sync command if files have special characters #761

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

briceflaceliere
Copy link

@briceflaceliere briceflaceliere commented Oct 1, 2024

Problem encountered:
I am using s5cmd to sync several million files, some of which contain special characters (such as \u00a0, quotes, etc.). However, the sprintf("%q") function used in generateCommand modifies these file names, causing the cp command to no longer find the files correctly.

Solution provided:
To resolve this issue, I used the github.com/kballard/go-shellquote library to properly escape URL parameters without converting UTF-8 characters. This process is applied only when necessary, ensuring file names are preserved.

Test changes:
I modified the tests accordingly. However, I am not certain of the potential impact on other parts of the project. I will be able to confirm if this resolves the synchronization issues once the migration of our millions of files is completed.

Related issues:

@briceflaceliere briceflaceliere requested a review from a team as a code owner October 1, 2024 16:00
@briceflaceliere briceflaceliere requested review from igungor and seruman and removed request for a team October 1, 2024 16:00
@igungor
Copy link
Member

igungor commented Oct 18, 2024

Hey @briceflaceliere, thanks for sending a patch! Appreciated. Using shellquote looks good to me.

Could you add your scenario to cp integration test case to demonstrate the fix please? Thank you.

s5cmd/e2e/cp_test.go

Lines 2142 to 2146 in 12d1038

0: equals(`cp s3://%v/sub&@$/test+1.txt s3://%v/sub&@$/test+1.txt`, srcbucket, dstbucket),
1: equals(`cp s3://%v/sub///test&@$:,?;= 4.txt s3://%v/sub///test&@$:,?;= 4.txt`, srcbucket, dstbucket),
2: equals(`cp s3://%v/sub/this-is-normal-file.txt s3://%v/sub/this-is-normal-file.txt`, srcbucket, dstbucket),
3: equals(`cp s3://%v/sub:,?/test; =2.txt s3://%v/sub:,?/test; =2.txt`, srcbucket, dstbucket),
4: equals(`cp s3://%v/test&@$:,?;= 3.txt s3://%v/test&@$:,?;= 3.txt`, srcbucket, dstbucket),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants