Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Task Instance primary key be a UUID #43243

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Oct 22, 2024

[skip ci]

closes #43161 part of AIP-72.

As part of the ongoing work for AIP-72: Task Execution Interface, we are migrating the task_instance table to use a UUID primary key. This change is being made to simplify task instance identification, especially when communicating between the executor and workers.

Currently, the primary key of task_instance is a composite key consisting of dag_id, task_id, run_id, and map_index as shown below. This migration introduces a UUID v7 column (id) as the new primary key.

__tablename__ = "task_instance"
task_id = Column(StringID(), primary_key=True, nullable=False)
dag_id = Column(StringID(), primary_key=True, nullable=False)
run_id = Column(StringID(), primary_key=True, nullable=False)
map_index = Column(Integer, primary_key=True, nullable=False, server_default=text("-1"))

Why UUID v7?

The UUID v7 format was chosen because of its improved temporal sorting capabilities. For existing records, UUID v7 will be generated using either the queued_dttm, start_date, or the current timestamp.

image

(From this blog post.)

This change also includes:

  • A new UUID v7 function for both PostgreSQL and MySQL.
  • Backward-compatible indexes for easier querying by dag_id and run_id.
  • Modifications to support the new primary key structure while maintaining foreign key constraints.

TODOs

  • Check if the DB user has permissions to add CREATE extension for Postgres as we install pgcrypto
  • Convert the raw SQL for dropping the primary key (CASCADE) into an Alembic operation. This will allow capturing foreign key constraints and managing them in a downgrade scenario. MySQL does not support dropping foreign keys using CASCADE with primary keys anyway.
  • SQLite Support: Add SQLite support for UUID v7 generation using the uuid6 Python package.
  • Add changes to TaskInstance model file to add id
  • Is the same change needed for task_instance_history table?
  • Batching: Add batching for handling large tables to avoid locking when updating records. Maybe a batch of 1000 rows, TBD!
  • Ensure foreign key constraints are re-added during downgrades after the primary key has been changed back to the composite key.

@kaxil kaxil requested a review from ashb October 22, 2024 01:26
@kaxil kaxil added this to the Airflow 3.0.0 milestone Oct 22, 2024
kaxil added a commit to astronomer/airflow that referenced this pull request Oct 22, 2024
While working on this apache#43243, I was following https://github.com/apache/airflow/blob/main/contributing-docs/13_metadata_database_updates.rst and I ran into an error with  pre-commit hook.

When running the revision command as follows:

```
root@f1f78138ad78:/opt/airflow/airflow# alembic revision -m "New revision"
  Generating /opt/airflow/airflow/migrations/versions/01b38be821e9_new_revision.py ...  done
```

It creates a file as follows:

```python
"""New revision

Revision ID: cd7be1ae8b80
Revises: 05234396c6fc
Create Date: 2024-10-22 01:44:17.873864

"""

import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = 'cd7be1ae8b80'
down_revision = '05234396c6fc'
branch_labels = None
depends_on = None
```

Notice single quotes in `revision` & `down_revision`.

Now if I just run that single pre-commit hook (`update-migration-references`), it fails

```
❯ pre-commit run "update-migration-references" --all-files
Update migration ref doc.................................................Failed
- hook id: update-migration-references
- exit code: 1

Using 'uv' to install Airflow

Using airflow version from current sources

Updating migration reference for airflow
Making sure airflow version updated
Making sure there's no mismatching revision numbers
Traceback (most recent call last):
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 246, in <module>
    correct_mismatching_revision_nums(revisions=revisions)
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 230, in correct_mismatching_revision_nums
    new_content = content.replace(revision_id_match.group(1), revision_match.group(1), 1)
AttributeError: 'NoneType' object has no attribute 'group'
Error 1 returned

If you see strange stacktraces above, run `breeze ci-image build --python 3.9` and try again.
```

That isn't a problem generally as `ruff` will fail before and convert it into double quotes. But rather than doing that, we fix it at source.
@ashb
Copy link
Member

ashb commented Oct 22, 2024

A neat trick to avoid the need for pgcrypto extension: https://brandur.org/fragments/secure-bytes-without-pgcrypto

Though uuid_send() fn it uses doesn't exist on Pg10, it only comes from Pg11 onwards. However I do think it's reasonable to say 12 or 13 is the min required Pg version for Airflow 3 (Postgres only supports v13 onwards).

Oh, Pg 12 is officially all we support anyway, so I think using the uuid_send() function approach could work fine.

kaxil added a commit to astronomer/airflow that referenced this pull request Oct 22, 2024
While working on this apache#43243, I was following https://github.com/apache/airflow/blob/main/contributing-docs/13_metadata_database_updates.rst and I ran into an error with  pre-commit hook.

When running the revision command as follows:

```
root@f1f78138ad78:/opt/airflow/airflow# alembic revision -m "New revision"
  Generating /opt/airflow/airflow/migrations/versions/01b38be821e9_new_revision.py ...  done
```

It creates a file as follows:

```python
"""New revision

Revision ID: cd7be1ae8b80
Revises: 05234396c6fc
Create Date: 2024-10-22 01:44:17.873864

"""

import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = 'cd7be1ae8b80'
down_revision = '05234396c6fc'
branch_labels = None
depends_on = None
```

Notice single quotes in `revision` & `down_revision`.

Now if I just run that single pre-commit hook (`update-migration-references`), it fails

```
❯ pre-commit run "update-migration-references" --all-files
Update migration ref doc.................................................Failed
- hook id: update-migration-references
- exit code: 1

Using 'uv' to install Airflow

Using airflow version from current sources

Updating migration reference for airflow
Making sure airflow version updated
Making sure there's no mismatching revision numbers
Traceback (most recent call last):
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 246, in <module>
    correct_mismatching_revision_nums(revisions=revisions)
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 230, in correct_mismatching_revision_nums
    new_content = content.replace(revision_id_match.group(1), revision_match.group(1), 1)
AttributeError: 'NoneType' object has no attribute 'group'
Error 1 returned

If you see strange stacktraces above, run `breeze ci-image build --python 3.9` and try again.
```

That isn't a problem generally as `ruff` will fail before and convert it into double quotes. But rather than doing that, we fix it at source.
Lee-W pushed a commit that referenced this pull request Oct 22, 2024
While working on this #43243, I was following https://github.com/apache/airflow/blob/main/contributing-docs/13_metadata_database_updates.rst and I ran into an error with  pre-commit hook.

When running the revision command as follows:

```
root@f1f78138ad78:/opt/airflow/airflow# alembic revision -m "New revision"
  Generating /opt/airflow/airflow/migrations/versions/01b38be821e9_new_revision.py ...  done
```

It creates a file as follows:

```python
"""New revision

Revision ID: cd7be1ae8b80
Revises: 05234396c6fc
Create Date: 2024-10-22 01:44:17.873864

"""

import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = 'cd7be1ae8b80'
down_revision = '05234396c6fc'
branch_labels = None
depends_on = None
```

Notice single quotes in `revision` & `down_revision`.

Now if I just run that single pre-commit hook (`update-migration-references`), it fails

```
❯ pre-commit run "update-migration-references" --all-files
Update migration ref doc.................................................Failed
- hook id: update-migration-references
- exit code: 1

Using 'uv' to install Airflow

Using airflow version from current sources

Updating migration reference for airflow
Making sure airflow version updated
Making sure there's no mismatching revision numbers
Traceback (most recent call last):
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 246, in <module>
    correct_mismatching_revision_nums(revisions=revisions)
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 230, in correct_mismatching_revision_nums
    new_content = content.replace(revision_id_match.group(1), revision_match.group(1), 1)
AttributeError: 'NoneType' object has no attribute 'group'
Error 1 returned

If you see strange stacktraces above, run `breeze ci-image build --python 3.9` and try again.
```

That isn't a problem generally as `ruff` will fail before and convert it into double quotes. But rather than doing that, we fix it at source.
harjeevanmaan pushed a commit to harjeevanmaan/airflow that referenced this pull request Oct 23, 2024
While working on this apache#43243, I was following https://github.com/apache/airflow/blob/main/contributing-docs/13_metadata_database_updates.rst and I ran into an error with  pre-commit hook.

When running the revision command as follows:

```
root@f1f78138ad78:/opt/airflow/airflow# alembic revision -m "New revision"
  Generating /opt/airflow/airflow/migrations/versions/01b38be821e9_new_revision.py ...  done
```

It creates a file as follows:

```python
"""New revision

Revision ID: cd7be1ae8b80
Revises: 05234396c6fc
Create Date: 2024-10-22 01:44:17.873864

"""

import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = 'cd7be1ae8b80'
down_revision = '05234396c6fc'
branch_labels = None
depends_on = None
```

Notice single quotes in `revision` & `down_revision`.

Now if I just run that single pre-commit hook (`update-migration-references`), it fails

```
❯ pre-commit run "update-migration-references" --all-files
Update migration ref doc.................................................Failed
- hook id: update-migration-references
- exit code: 1

Using 'uv' to install Airflow

Using airflow version from current sources

Updating migration reference for airflow
Making sure airflow version updated
Making sure there's no mismatching revision numbers
Traceback (most recent call last):
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 246, in <module>
    correct_mismatching_revision_nums(revisions=revisions)
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 230, in correct_mismatching_revision_nums
    new_content = content.replace(revision_id_match.group(1), revision_match.group(1), 1)
AttributeError: 'NoneType' object has no attribute 'group'
Error 1 returned

If you see strange stacktraces above, run `breeze ci-image build --python 3.9` and try again.
```

That isn't a problem generally as `ruff` will fail before and convert it into double quotes. But rather than doing that, we fix it at source.
PaulKobow7536 pushed a commit to PaulKobow7536/airflow that referenced this pull request Oct 24, 2024
While working on this apache#43243, I was following https://github.com/apache/airflow/blob/main/contributing-docs/13_metadata_database_updates.rst and I ran into an error with  pre-commit hook.

When running the revision command as follows:

```
root@f1f78138ad78:/opt/airflow/airflow# alembic revision -m "New revision"
  Generating /opt/airflow/airflow/migrations/versions/01b38be821e9_new_revision.py ...  done
```

It creates a file as follows:

```python
"""New revision

Revision ID: cd7be1ae8b80
Revises: 05234396c6fc
Create Date: 2024-10-22 01:44:17.873864

"""

import sqlalchemy as sa
from alembic import op

# revision identifiers, used by Alembic.
revision = 'cd7be1ae8b80'
down_revision = '05234396c6fc'
branch_labels = None
depends_on = None
```

Notice single quotes in `revision` & `down_revision`.

Now if I just run that single pre-commit hook (`update-migration-references`), it fails

```
❯ pre-commit run "update-migration-references" --all-files
Update migration ref doc.................................................Failed
- hook id: update-migration-references
- exit code: 1

Using 'uv' to install Airflow

Using airflow version from current sources

Updating migration reference for airflow
Making sure airflow version updated
Making sure there's no mismatching revision numbers
Traceback (most recent call last):
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 246, in <module>
    correct_mismatching_revision_nums(revisions=revisions)
  File "/opt/airflow/scripts/in_container/run_migration_reference.py", line 230, in correct_mismatching_revision_nums
    new_content = content.replace(revision_id_match.group(1), revision_match.group(1), 1)
AttributeError: 'NoneType' object has no attribute 'group'
Error 1 returned

If you see strange stacktraces above, run `breeze ci-image build --python 3.9` and try again.
```

That isn't a problem generally as `ruff` will fail before and convert it into double quotes. But rather than doing that, we fix it at source.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

Make Task Instance primary key be a UUID
2 participants