part1: Preparation
-
Create IAM Role
-
Add S3 bucke in Lake Formation
-
Create catalog database
-
Grant the permissio to database and table in Lake Formation
- why I need to grant the permission to database -> (case 1)[]
- when I run the crawler to careate the table in catalog database, it needs the permission to catalog database
- why I need to grant the permission to the tables -> (case 2)[]
- when I want to delete the table, then I find I need the permission to deal with the table
- why I need to grant the permission to database -> (case 1)[]
part2: Create a Crawler 5. create table using crawler
- read the schema in s3 bucket
- read the schema in RDB database (postgresql core)
- create a connection to the RDB database
part3: Create a Glue Visual ETL Job 6. Build the ETL Job
- Read the data from s3 bucket as format csv, save to s3 bucket as format json
- Read data from S3 json file, save to the RDS Postgres Database
part4: Create a Glue script ETL Job
- Read the data from s3 bucket as format csv, save to s3 bucket as format json
part5: How to solve the connection fail in Glue Job
- Read data from S3 json file, save to the RDS Postgres Database
-
Build the workflow on demand
-
Build workflow the first trigger depend on the event that s3 update new folders
- Create cloudTrail for s3 event
- Create eventBridge Rule for AWS API Call CloudTrail
-
Incremental Glue Crawling using Amazon S3 event notification (SQS)
-
Test check the behavior of Crawler
-
Create workflow with hourly scheduled
-
create catalog database
-
Grant Lake Formation permission
-
Create connection
-
Create Crawler to RDB database
-
Create Glue ETL job
-
Create Workflow