bug fix when all columns match but no rows match #277

fdosani · 2024-03-12T00:26:53Z

Fixes #276

@SimonBFrank would you mind pulling down this branch and seeing if it fixes your issue. Long story short, there are no results and the subsequent dictionary with values which get displayed out is not able to generate because of None values.

More of a hack since we will be deprecating this legacy Spark implementation (#275) for a much more readable and logically similar to Pandas version. Just catching the TypeError for this situation and putting out {} for columns_with_any_diffs and columns_fully_matching

SimonBFrank · 2024-03-12T02:14:26Z

When I compare the two dataframes with join columns ["id", "label"] this is the result:

Dataframe 1:

id	label	tmp
1	foo	1
2	bar	1

Dataframe 2:

id	label	tmp
3	foo	1
4	bar	1

****** Column Summary ******
Number of columns in common with matching schemas: 3
Number of columns in common with schema differences: 0
Number of columns in base but not compare: 0
Number of columns in compare but not base: 0

****** Row Summary ******
Number of rows in common: 0
Number of rows in base but not compare: 2
Number of rows in compare but not base: 2
Number of duplicate rows found in base: 0
Number of duplicate rows found in compare: 0

****** Row Comparison ******
Number of rows with some columns unequal: 0
Number of rows with all columns equal: 0

****** Column Comparison ******
Number of columns compared with some values unequal: 0
Number of columns compared with all values equal: 0

****** Columns with Unequal Values ******
Base Column Name  Compare Column Name  Base Dtype     Compare Dtype  # Matches  # Mismatches
----------------  -------------------  -------------  -------------  ---------  ------------

I believe Number of rows with some columns unequal and Number of columns compared with some values unequal should be 2 since only the values in the column id are different. Additionally, Columns with Unequal Values should have id.

fdosani · 2024-03-12T02:32:56Z

I believe Number of rows with some columns unequal and Number of columns compared with some values unequal should be 2 since only the values in the column id are different. Additionally, Columns with Unequal Values should have id.

I might not be following right, but I think since none of the join columns match (["id", "label"]) in this situation it should all be 0. It is joining on both the fields, not just one.

SimonBFrank · 2024-03-12T13:24:40Z

I believe Number of rows with some columns unequal and Number of columns compared with some values unequal should be 2 since only the values in the column id are different. Additionally, Columns with Unequal Values should have id.

I might not be following right, but I think since none of the join columns match (["id", "label"]) in this situation it should all be 0. It is joining on both the fields, not just one.

Whoops, I must've been late and I didn't understand it correctly. LGTM

* refactor SparkCompare * tweaking SparkCompare and adding back Legacy * conditional import * cleaning up tests and using pytest-spark for legacy * adding docs * caching and some typo fixes * adding in doc and pandas 2 changes * adding pandas to testing matrix * drop 3.8 * drop 3.8 * refactoring ^ * rebase fix for #277 * fixing legacy uncode column names * unicode fix for legacy * unicode test for new spark logic * typo fix * changes from PR review

* fixes capitalone#276 * bump version

* refactor SparkCompare * tweaking SparkCompare and adding back Legacy * conditional import * cleaning up tests and using pytest-spark for legacy * adding docs * caching and some typo fixes * adding in doc and pandas 2 changes * adding pandas to testing matrix * drop 3.8 * drop 3.8 * refactoring ^ * rebase fix for capitalone#277 * fixing legacy uncode column names * unicode fix for legacy * unicode test for new spark logic * typo fix * changes from PR review

fixes #276

7f37f5d

fdosani added the bug Something isn't working label Mar 12, 2024

fdosani marked this pull request as ready for review March 12, 2024 00:30

fdosani requested review from ak-gupta, jdawang, gladysteh99 and NikhilJArora as code owners March 12, 2024 00:30

bump version

41aaf8e

ak-gupta approved these changes Mar 12, 2024

View reviewed changes

fdosani merged commit 930e038 into develop Mar 12, 2024
28 checks passed

fdosani deleted the spark-no-rows-match branch March 12, 2024 14:10

fdosani pushed a commit that referenced this pull request Mar 12, 2024

rebase fix for #277

6813cd2

fdosani pushed a commit that referenced this pull request Mar 12, 2024

rebase fix for #277

6c59e0b

fdosani pushed a commit that referenced this pull request Mar 25, 2024

rebase fix for #277

c81315d

rhaffar pushed a commit to rhaffar/datacompy that referenced this pull request Sep 12, 2024

bug fix when all columns match but no rows match (capitalone#277)

b8f4be3

* fixes capitalone#276 * bump version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fix when all columns match but no rows match #277

bug fix when all columns match but no rows match #277

fdosani commented Mar 12, 2024 •

edited

Loading

SimonBFrank commented Mar 12, 2024

fdosani commented Mar 12, 2024

SimonBFrank commented Mar 12, 2024

bug fix when all columns match but no rows match #277

bug fix when all columns match but no rows match #277

Conversation

fdosani commented Mar 12, 2024 • edited Loading

SimonBFrank commented Mar 12, 2024

fdosani commented Mar 12, 2024

SimonBFrank commented Mar 12, 2024

fdosani commented Mar 12, 2024 •

edited

Loading