Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean html text #286

Open
andrewjtdunn opened this issue May 22, 2024 · 2 comments · May be fixed by #295
Open

clean html text #286

andrewjtdunn opened this issue May 22, 2024 · 2 comments · May be fixed by #295

Comments

@andrewjtdunn
Copy link
Contributor

We sometimes have html characters in our text fields. Rather than writing specific regex expressions, perhaps there is a package that does this for us? Issue appears in comments and in summaries from the federal register

@andrewjtdunn
Copy link
Contributor Author

@jgibson517
Copy link
Contributor

Django has a strip_tags functions that removes things like <\br> - https://docs.djangoproject.com/en/5.0/ref/utils/#django.utils.html.strip_tags

And python has a html.unescape that removes the other entities: https://docs.python.org/3/library/html.html

@jgibson517 jgibson517 linked a pull request May 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants