Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create/update a macro to verify sum of row counts of n relations #940

Open
aranke opened this issue Jul 25, 2024 · 3 comments
Open

Create/update a macro to verify sum of row counts of n relations #940

aranke opened this issue Jul 25, 2024 · 3 comments
Labels
awaiting_response enhancement New feature or request

Comments

@aranke
Copy link
Member

aranke commented Jul 25, 2024

Describe the feature

There are situations where a model simply is a union all of various other models:

-- int_my_source
select * from {{ ref('stg_my_source_a') }}
union all
select * from {{ ref('stg_my_source_b') }}
union all
select * from {{ ref('stg_my_source_c') }}

In this example, it would be useful to have a 1:many test to verify that:

count(int_my_source) = count(stg_my_source_a) + count(stg_my_source_b) + count(stg_my_source_c)

Describe alternatives you've considered

equal_rowcount is the closest thing in dbt-utils, but only does 1:1 relations

Additional context

I don't think this feature is database-specific.

Who will this benefit?

This feature has been requested several times over the years (dbt internal Slack, previous GitHub issue); this feature will benefit those users.

Are you interested in contributing to this feature?

Sure, happy to help.

@aranke aranke added enhancement New feature or request triage labels Jul 25, 2024
@dbeatty10
Copy link
Contributor

@aranke Thanks for raising this issue and linking those two other places where this has come up!

If the model is really that simple, what would be the value of a data test here? Wouldn't the counts be guaranteed to be equal? I don't see how the count of rows could be anything but exactly the same in this case.

@edmei1
Copy link

edmei1 commented Jul 27, 2024

Hi adding my 2 cents here and I raised this with dbt support too. It's useful in our case when the unioned model is incremental.
i.e. we're selecting from stg_my_source_a where timestamp > now - 2 days for example. And want to see if the unioned model still has all the rows in the individual staging models.
Hope that makes sense.

@dbeatty10
Copy link
Contributor

Thanks for that info @edmei1 !

Could you share an example model that shows more detail? Something like this:

models/int_my_source.sql

{{
    config(
        materialized='incremental'
    )
}}

with unioned_models as (

    select * from {{ ref('stg_my_source_a') }}
    union all
    select * from {{ ref('stg_my_source_b') }}
    union all
    select * from {{ ref('stg_my_source_c') }}

)

select *
from unioned_models

{% if is_incremental() %}

where event_time >= (select coalesce(max(event_time),'1900-01-01') from {{ this }} )

{% endif %}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants