You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the feature and found no similar feature requirement.
Description
Flag to decide whether to use overwrite mode when inserting data into Hive. If set to true, for non-partitioned tables, the existing data in the table will be deleted before inserting new data. For partitioned tables, the data in the relevant partition will be deleted before inserting new data;
When performing Hive insert operations, the current mode is append, but in reality, there may be requirements for overwriting the data, similar to insert overwrite, or deleting before insertion. There are several implementation approaches, such as:
Using Scheduling Workflows: A temporary solution for data processing involves configuring a workflow, dragging a workflow to first delete the corresponding table, and then performing the insertion.
Upper-Layer Data Integration Products: Through pipelines or similar methods, data is deleted before insertion.
Native Support for "Overwrite" Mode in Seatunnel: Currently, implementing this feature directly in the Seatunnel core is the most convenient. By leveraging Seatunnel's two-phase commit logic, data is first written to a temporary directory, then deleted (using deleteFile(directory)), and finally renamed. This approach ensures better data consistency, with the time between deleting and renaming the directory being in milliseconds. It leverages existing utility classes, resulting in minimal code changes and significantly better performance compared to upper-layer methods.
During the implementation, the logic of Flink's overwrite operator was referenced.
Simply adding an overwrite parameter on the Hive side (defaulting to false) would suffice.
When performing Hive insert operations, the current mode is append, but in reality, there may be requirements for overwriting the data, similar to insert overwrite, or deleting before insertion.
The text was updated successfully, but these errors were encountered:
Adamyuanyuan
changed the title
[Feature][connector-hive] hive connector support overwrite mode
[Feature][connector-hive] hive sink connector support overwrite mode
Oct 15, 2024
Search before asking
Description
Flag to decide whether to use overwrite mode when inserting data into Hive. If set to true, for non-partitioned tables, the existing data in the table will be deleted before inserting new data. For partitioned tables, the data in the relevant partition will be deleted before inserting new data;
When performing Hive insert operations, the current mode is append, but in reality, there may be requirements for overwriting the data, similar to
insert overwrite
, or deleting before insertion. There are several implementation approaches, such as:deleteFile(directory)
), and finally renamed. This approach ensures better data consistency, with the time between deleting and renaming the directory being in milliseconds. It leverages existing utility classes, resulting in minimal code changes and significantly better performance compared to upper-layer methods.During the implementation, the logic of Flink's
overwrite
operator was referenced.Simply adding an
overwrite
parameter on the Hive side (defaulting tofalse
) would suffice.Expected Logic
目前进行hive插入的时候,模式是append的方式,但是实际上,可能有的需求是需要覆盖写入的,类似于insert overwrite,或者说插入之前先删除。这有很多种实现思路,比如:
目前对比下来,直接在Seatunnel底层实现这个功能最方便,借助Seatunnel的二阶段提交逻辑,先写到临时目录,再删(deleteFile(目录),再rename,数据一致性更好,删目录和rename目录之间的时间为毫秒级,借助现成的工具类,代码改动比较少,效果远好于通过上层的方式;
实现的过程中,参考了 Flink的overwrite算子的逻辑。
只需要在hive侧新增一个overwrite参数即可(默认为false):
期望逻辑
Usage Scenario
When performing Hive insert operations, the current mode is append, but in reality, there may be requirements for overwriting the data, similar to
insert overwrite
, or deleting before insertion.Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: