Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4994][#4369] feat(core): support S3 credential vending #4966

Merged
merged 10 commits into from
Oct 29, 2024

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Sep 19, 2024

What changes were proposed in this pull request?

support S3 credential vending, include s3 token and s3 static key

Why are the changes needed?

Fix: #4994
Fix: #4369

Does this PR introduce any user-facing change?

no

How was this patch tested?

add IT to do Iceberg operation by using S3 token

@FANNG1 FANNG1 marked this pull request as draft September 19, 2024 11:18
jerryshao pushed a commit that referenced this pull request Oct 15, 2024
### What changes were proposed in this pull request?
support credential vending framework 

### Why are the changes needed?

Fix: #4992 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
1. add UT
2. propose a draft PR in #4966 , and could run pass S3 token with
Gravitino IcebergRESTServer
@FANNG1 FANNG1 changed the title [SIP] support S3 credential vending for IcebergRESTServer [#4994] feat(core): support S3 credential vending Oct 24, 2024
@FANNG1 FANNG1 self-assigned this Oct 24, 2024
@FANNG1 FANNG1 marked this pull request as ready for review October 24, 2024 02:23
@FANNG1
Copy link
Contributor Author

FANNG1 commented Oct 24, 2024

@yuqi1129 @jerqi @jerryshao please help to review when you have time, thanks

public class S3TokenCredential implements Credential {

/** S3 token credential type. */
public static final String S3_TOKEN_CREDENTIAL_TYPE = "s3-token";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean about token and session-token?

return uri.getHost();
}

private Credentials getS3Token(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createS3Token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

StaticCredentialsProvider.create(
AwsBasicCredentials.create(
s3CredentialConfig.accessKeyID(), s3CredentialConfig.secretAccessKey()));
String region = s3CredentialConfig.region();
Copy link
Collaborator

@jerqi jerqi Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builder builder = StsClient.builder()
    .credentialsProvider(credentialsProvider);
if (StringUtils.isNotBlank(region)) {
      builder.region(Region.of(region));  
}
return builder.build();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}
}

private IamPolicy getPolicy(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createPolicy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

# specific language governing permissions and limitations
# under the License.
#
org.apache.gravitino.s3.credential.S3TokenProvider
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is AWS Hadoop Credential provider names

    org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
    org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
    software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider,
    org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider

Do we need to refer to them? Many people may be used to using them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for developers only, the user only care about credential provider type, like s3-token for this, I'm ok to use this name, do you have some suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok if it's only used by developers. But we need to make sure that s3-token is easy to understand by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think s3-token and s3-secret-key is clear for users?

Copy link
Collaborator

@jerqi jerqi Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What name do other systems use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polaris use storage-type = s3 please refer to https://polaris.io/#tag/Polaris-Entities/Catalog, uc get s3 properties if the location is s3 schema, see https://docs.unitycatalog.io/server/configuration/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two consideration points.

  1. GCP is gcp-token? Should be this aws-token?
  2. I am not sure token is correct enough? Does S3 have multiple tokens? Do we support all?

Copy link
Contributor Author

@FANNG1 FANNG1 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. GCP is gcs-token, I prefer to keep s3-token, because it's used for s3 storage.
  2. We just support token from AsummRole, I think the name is simple and clear. You could provide your option if have better name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I just discuss some possible names to make sure. I'm ok with current name.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Oct 28, 2024

@jerqi any other comments?

Copy link
Collaborator

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Oct 28, 2024

@jerryshao @yuqi1129 , do you have time to take another look?

@jerryshao jerryshao added the branch-0.7 Automatically cherry-pick commit to branch-0.7 label Oct 28, 2024
@jerryshao jerryshao changed the title [#4994] feat(core): support S3 credential vending [#4994][#4369] feat(core): support S3 credential vending Oct 28, 2024
Comment on lines 32 to 35
/** The static access key ID used to access S3 data. */
public static final String GRAVITINO_S3_ACCESS_KEY_ID = "s3-access-key-id";
/** The static secret access key used to access S3 data. */
public static final String GRAVITINO_S3_SECRET_ACCESS_KEY = "s3-secret-access-key";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated here and above, can you please consolidate them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are different properties in S3TokenCredential and S3SecretKeyCredential with same name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please tell me what's the difference? If they're different, why do you use the same name and the same definition here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't address this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for miss this comment, Sts will response a sessionToken with AccessKeyId and secretAccessKey which is different from the AKSK, do you think we should use a different name like sessionAccessKeyId to make it more clear , but this maybe inconsistent with traditional s3 naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update the comment for the properties

Comment on lines 36 to 37
private String accessKeyId;
private String secretAccessKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make them final here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jerryshao
Copy link
Contributor

@FANNG1 please fix the ci problem.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Oct 28, 2024

@FANNG1 please fix the ci problem.

caused by temporal network error, fixed by rerun

@FANNG1
Copy link
Contributor Author

FANNG1 commented Oct 28, 2024

@jerryshao any other comments?

/** The session access key ID used to access S3 data. */
public static final String GRAVITINO_S3_ACCESS_KEY_ID = "s3-access-key-id";
/** The session secret access key used to access S3 data. */
public static final String GRAVITINO_S3_SECRET_ACCESS_KEY = "s3-secret-access-key";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the name of two key "static" and "session" are the same in S3, in the gravitino side, we can choose a different name to define, like "GRAVITINO_STATIC_S3_SECRET_ACCESS_KEY" and "GRAVITINO_SESSION_S3_SECRET_ACCESS_KEY"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/** The static secret access key used to access S3 data. */
public static final String GRAVITINO_S3_SECRET_ACCESS_KEY = "s3-secret-access-key";
public static final String GRAVITINO_S3_STATIC_SECRET_ACCESS_KEY = "s3-static-secret-access-key";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we can still follow AWS convention to use "s3-secret-access-key", but the definition here in Gravitino can be a more differentiable name. like:

public static final String GRAVITINO_S3_STATIC_SECRET_ACCESS_KEY = "s3-secret-access-key";

Also for other keys, what do you think? @FANNG1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jerryshao
Copy link
Contributor

Please make the CI pass.

@jerryshao jerryshao merged commit f265e50 into apache:main Oct 29, 2024
26 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 29, 2024
### What changes were proposed in this pull request?

support S3 credential vending, include s3 token and s3 static key


### Why are the changes needed?

Fix: #4994 
Fix: #4369 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

add IT to do Iceberg operation by using S3 token
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-0.7 Automatically cherry-pick commit to branch-0.7
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] support S3 token credential provider [Subtask] support using S3 token to access IcebergRESTservice
3 participants