Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for comparison issue in columns_equal when ignore_case=True (Generated by Ana - AI SDE) #329

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions datacompy/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -749,6 +749,9 @@ def render(filename: str, *fields: Union[int, float, str]) -> str:
return file_open.read().format(*fields)


import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be placed at the top of the file as it doesn't conform to our code quality guidelines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fdosani
Thanks for highlighting this. We will look into this as to why this happened.

BR,
Team Ana

import numpy as np

def columns_equal(
col_1: "pd.Series[Any]",
col_2: "pd.Series[Any]",
Expand Down Expand Up @@ -816,9 +819,9 @@ def columns_equal(
col_2 = col_2.str.strip()

if ignore_case:
if col_1.dtype.kind == "O":
if col_1.dtype == 'object' and col_1.apply(lambda x: isinstance(x, str)).all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect. The dtype would return an O not object AFAIK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fdosani

The comment is partially correct. dtype.kind returns 'O' for object types, while dtype itself returns 'object'. Both col_1.dtype.kind == "O" and col_1.dtype == 'object' are valid ways to check for object dtypes. The proposed change adds an extra check to ensure all elements are strings, which may be useful depending on the specific requirements of the comparison.

image

BR,
Team Ana

col_1 = col_1.str.upper()
if col_2.dtype.kind == "O":
if col_2.dtype == 'object' and col_2.apply(lambda x: isinstance(x, str)).all():
col_2 = col_2.str.upper()

if {col_1.dtype.kind, col_2.dtype.kind} == {"M", "O"}:
Expand Down
Loading