Skip to content
serialbandicoot edited this page Jan 24, 2022 · 12 revisions

Welcome to the great-assertions wiki!

This library is inspired by the Great Expectations library and has made various expectations found in Great Expectations available when using the inbuilt python unittest assertions. For example if you wanted to use expect_column_values_to_be_between then you can access assertExpectColumnValuesToBeBetween.

The library has also added in further expectations, which may be similar or new.

About Great Assertions

A list of the available assertion can be found here. These assertions where a direct or similar mapping should be labelled and those assertions, which are not found in the GE library will also be noted.

The major difference with GE and this library, is that this library is intended to be very light-weight and work within a familiar testing framework. Therefore, this library is integrated with unittest and because this is core to python, it should be easier to maintain.

How to use the library

The code snippet below, shows the basic interaction with great-assertions. Instead of inheriting unittest.TestCase, we exchange this for GreatAssertions. This means that we still get access to all the is unittest, it just now that we also have access to the great-assertions expectations.

from great_assertions import GreatAssertions
import pandas as pd

class GreatAssertionTests(GreatAssertions):
    def test_expect_table_row_count_to_equal(self):
        df = pd.DataFrame({"col_1": [100, 200, 300], "col_2": [10, 20, 30]})
        self.expect_table_row_count_to_equal(df, 3)

In the example above, if the row-count fails, then would receive an error message e.g. expected row count is 4 the actual was 3 : . An additional msg can be tacked on the end.

self.expect_table_row_count_to_equal(df, 3, "my bespoke message")

The full response would be expected row count is 4 the actual was 3 : my bespoke message

Complex Expectations

In practice we have found that using several expectations is a good way to provide coverage when verifying the quality of the data-source. For example if we had a data set:

col_1 col_2 col_3
1 Y Hello
2 Y Hello
2 N World
1 N World
7 Bye

If we were looking at ranges for col_1 we could see that 1 and 2 are the most common value, however there is an outlier of 7. Therefore if we were to use the range expectation.

expect_column_values_to_te_between(df, min_value=1, max_value=7)

Although this would assert correctly, we might want to add some additional confirmation. The expectation expect_column_values_to_te_between would provide a secondary check to make sure that the overall measure should be closer to 1 or 2.

expect_column_values_to_te_between(df, min_value=1, max_value=3)

Expectations with Multiple Layers of Assertions

If we wanted to test the value counts (pandas function) of a column, we can use the self.expect_column_value_counts_percent_to_be_between assertion.

df = pd.DataFrame(
     {
          "col_1": ["Y", "Y", "N", "Y", "Y", "N", "N", "Y", "N", "Maybe"],
     }
)

value_counts = {
     "Y": {"min": 45, "max": 55},
     "N": {"min": 35, "max": 45},
     "Maybe": {"min": 5, "max": 15},
}

self.expect_column_value_counts_percent_to_be_between(df, "col_1", value_counts)

This allows a percentage range of the occurrences of a particular entry. In this example, we know that the majority, though slim is more 'Y'. However if we combined a assertExpectTableColumnsToMatchSet assertion. This would check only Y/N/Maybe are the only available values alongside the value counts, would check the overall grouping counts.

Therefore if only 1% of results were Maybe, then we would easily be able to check both set and percentage with these two assertions.