Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Unicode emoji (🪩🫠, maybe others?) are categorized as None, breaking HTML rendering #1325

Open
3 tasks done
fizmat opened this issue May 15, 2023 · 5 comments
Open
3 tasks done
Labels
feature request 💬 Requests for new features

Comments

@fizmat
Copy link

fizmat commented May 15, 2023

Current Behaviour

Rendering a report to HTML fails completely:

Summarize dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.10it/s, Completed]
Generate report structure:   0%|                                                                                                                                                    | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/fizmat/Desktop/profiling-bug/main.py", line 4, in <module>
    rep.to_html()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/__init__.py", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/profile_report.py", line 461, in to_html
    return self.html
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/__init__.py", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/profile_report.py", line 272, in html
    self._html = self._render_html()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/__init__.py", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/profile_report.py", line 380, in _render_html
    report = self.report
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/typeguard/__init__.py", line 1033, in wrapper
    retval = func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/profile_report.py", line 266, in report
    self._report = get_report_structure(self.config, self.description_set)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/report.py", line 383, in get_report_structure
    render_variables_section(config, summary),
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/report.py", line 159, in render_variables_section
    template_variables.update(render_map_type(config, template_variables))
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/variables/render_categorical.py", line 413, in render_categorical
    overview_table_char, unitab = render_categorical_unicode(config, summary, varid)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/ydata-bug/lib/python3.10/site-packages/ydata_profiling/report/structure/variables/render_categorical.py", line 139, in render_categorical_unicode
    category_alias_name = category_alias_name.replace("_", " ")
AttributeError: 'NoneType' object has no attribute 'replace'

Expected Behaviour

  • A report should run, these emoji should probably be categorized as "Other Symbol", just like 😀 and 🔥
  • In general, an unexpected emoji in data should not completely break rendering. Either one will work:
    1. unicode_summary_vc() should check if the returned category is None and replace it with a string. For example "None" or "Other Symbol", depending on your design philosophy.
    2. render_categorical_unicode() should work correctly when summary["category_alias_char_counts"] contains a None key instead of a string.

Maybe this is related to #1068 and #1070 and the two supported Unicode dependencies behaving differently?

Data Description

Originally encountered with https://www.kaggle.com/datasets/salvatorerastelli/spotify-and-youtube, but even a minimal example works.

Code that reproduces the bug

import pandas as pd
from ydata_profiling import ProfileReport
rep = ProfileReport(pd.DataFrame({'a': ['🪩']}))
rep.to_html()

pandas-profiling version

4.1.2

Dependencies

appnope                   0.1.3              pyhd8ed1ab_0    conda-forge
asttokens                 2.2.1              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                pyhd8ed1ab_3    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
brotli                    1.0.9                h1a8c8d9_8    conda-forge
brotli-bin                1.0.9                h1a8c8d9_8    conda-forge
brotlipy                  0.7.0           py310h8e9501a_1005    conda-forge
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2023.5.7             hf0a4a13_0    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h2399d43_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.1.3              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.7           py310h2887b22_0    conda-forge
cryptography              40.0.2          py310hfc83b78_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
debugpy                   1.6.7           py310h0f1eb42_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
executing                 1.2.0              pyhd8ed1ab_0    conda-forge
fonttools                 4.39.4          py310h2aa6e3c_0    conda-forge
freetype                  2.12.1               hd633e50_1    conda-forge
htmlmin                   0.1.12                     py_1    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
imagehash                 4.3.1              pyhd8ed1ab_0    conda-forge
importlib-metadata        6.6.0              pyha770c72_0    conda-forge
importlib_metadata        6.6.0                hd8ed1ab_0    conda-forge
ipykernel                 6.23.1             pyh736e0ef_0    conda-forge
ipython                   8.13.2             pyhd1c38e8_0    conda-forge
ipywidgets                8.0.6              pyhd8ed1ab_0    conda-forge
jedi                      0.18.2             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
jupyter_client            8.2.0              pyhd8ed1ab_0    conda-forge
jupyter_core              5.3.0           py310hbe9552e_0    conda-forge
jupyterlab_widgets        3.0.7              pyhd8ed1ab_0    conda-forge
kiwisolver                1.4.4           py310h2887b22_1    conda-forge
lcms2                     2.15                 hd835a16_1    conda-forge
lerc                      4.0.0                h9a09cb3_0    conda-forge
libblas                   3.9.0           16_osxarm64_openblas    conda-forge
libbrotlicommon           1.0.9                h1a8c8d9_8    conda-forge
libbrotlidec              1.0.9                h1a8c8d9_8    conda-forge
libbrotlienc              1.0.9                h1a8c8d9_8    conda-forge
libcblas                  3.9.0           16_osxarm64_openblas    conda-forge
libcxx                    16.0.3               h4653b0c_0    conda-forge
libdeflate                1.18                 h1a8c8d9_0    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libgfortran               5.0.0           12_2_0_hd922786_31    conda-forge
libgfortran5              12.2.0              h0eea778_31    conda-forge
libjpeg-turbo             2.1.5.1              h1a8c8d9_0    conda-forge
liblapack                 3.9.0           16_osxarm64_openblas    conda-forge
libllvm11                 11.1.0               hfa12f05_5    conda-forge
libopenblas               0.3.21          openmp_hc731615_3    conda-forge
libpng                    1.6.39               h76d750c_0    conda-forge
libsodium                 1.0.18               h27ca646_1    conda-forge
libsqlite                 3.41.2               hb31c410_1    conda-forge
libtiff                   4.5.0                h4f7d55c_6    conda-forge
libwebp-base              1.3.0                h1a8c8d9_0    conda-forge
libxcb                    1.13              h9b22ae9_1004    conda-forge
libzlib                   1.2.13               h03a7124_4    conda-forge
llvm-openmp               16.0.3               h1c12783_0    conda-forge
llvmlite                  0.39.1          py310h1e34944_1    conda-forge
markupsafe                2.1.2           py310h8e9501a_0    conda-forge
matplotlib-base           3.6.3           py310h78c5c2f_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
multimethod               1.4                        py_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.3                  h07bb92c_1    conda-forge
nest-asyncio              1.5.6              pyhd8ed1ab_0    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
numba                     0.56.4          py310h3124f1e_1    conda-forge
numpy                     1.23.5          py310h5d7c261_0    conda-forge
openjpeg                  2.5.0                hbc2ba62_2    conda-forge
openssl                   3.1.0                h53f4e23_3    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3           py310h2b830bf_1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
patsy                     0.5.3              pyhd8ed1ab_0    conda-forge
pexpect                   4.8.0              pyh1a96a4e_2    conda-forge
phik                      0.11.2             pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    9.5.0           py310h07496d3_0    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
platformdirs              3.5.1              pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.38             pyha770c72_0    conda-forge
prompt_toolkit            3.0.38               hd8ed1ab_0    conda-forge
psutil                    5.9.5           py310h8e9501a_0    conda-forge
pthread-stubs             0.4               h27ca646_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydantic                  1.10.7          py310h8e9501a_0    conda-forge
pygments                  2.15.1             pyhd8ed1ab_0    conda-forge
pyopenssl                 23.1.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.11         h3ba56d0_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pywavelets                1.4.1           py310hf1a086a_0    conda-forge
pyyaml                    6.0             py310h8e9501a_5    conda-forge
pyzmq                     25.0.2          py310hc407298_0    conda-forge
readline                  8.2                  h92ec313_1    conda-forge
requests                  2.28.2             pyhd8ed1ab_1    conda-forge
scipy                     1.9.3           py310ha0d8a01_2    conda-forge
seaborn-base              0.12.2             pyhd8ed1ab_0    conda-forge
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
statsmodels               0.13.5          py310hf1a086a_2    conda-forge
tangled-up-in-unicode     0.2.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               he1e0b03_0    conda-forge
tornado                   6.3.2           py310h2aa6e3c_0    conda-forge
tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
traitlets                 5.9.0              pyhd8ed1ab_0    conda-forge
typeguard                 2.13.3             pyhd8ed1ab_0    conda-forge
typing-extensions         4.5.0                hd8ed1ab_0    conda-forge
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0          py310h8e9501a_0    conda-forge
urllib3                   1.26.15            pyhd8ed1ab_0    conda-forge
visions                   0.7.5              pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.6              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.7              pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h27ca646_0    conda-forge
xorg-libxdmcp             1.1.3                h27ca646_0    conda-forge
xz                        5.2.6                h57fd34a_0    conda-forge
yaml                      0.2.5                h3422bc3_2    conda-forge
ydata-profiling           4.1.2              pyhd8ed1ab_0    conda-forge
zeromq                    4.3.4                hbdafb3b_1    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zstd                      1.5.2                hf913c23_6    conda-forge

OS

MacOS 13.3.1, Google Colab

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@fabclmnt fabclmnt added feature request 💬 Requests for new features and removed needs-triage labels May 16, 2023
@lala7573
Copy link

Same here. Not only the html, but it occurs in the notebook widget.

@fabclmnt
Copy link
Contributor

fabclmnt commented Jun 3, 2023

@fizmat and @lala7573 have you tried following the instructions to install de unicode tangler?

pip install -U ydata-profiling[unicode]

@fayewu
Copy link

fayewu commented Jun 5, 2023

same here

@rhelmeczi
Copy link

@fabclmnt I am also encountering this problem, and that installation did not solve the problem.

As a workaround, one can simply ignore these keys. At this line:

    for category_alias_name, category_alias_counts in sorted(
        summary["category_alias_char_counts"].items(), key=lambda x: -len(x[1])
    ):
        category_alias_name = category_alias_name.replace("_", " ")

Replace it with

    for category_alias_name, category_alias_counts in sorted(
        summary["category_alias_char_counts"].items(), key=lambda x: -len(x[1])
    ):
        if category_alias_name is None:
            continue
        category_alias_name = category_alias_name.replace("_", " ")

or

    for category_alias_name, category_alias_counts in sorted(
        summary["category_alias_char_counts"].items(), key=lambda x: -len(x[1])
    ):
        if category_alias_name is None:
            category_alias_name = "None"
        category_alias_name = category_alias_name.replace("_", " ")

@desobolevsky
Copy link

desobolevsky commented Jul 30, 2024

Hey everyone! Made a PR #1632 on this matter, since the previous one isn't merged or supported. I'll be happy to update or correct everything to the latest code updates :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 💬 Requests for new features
Projects
None yet
Development

No branches or pull requests

7 participants