-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
set the encoding consistently to be 'utf-8'
- Loading branch information
Showing
2 changed files
with
3 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I think we may need to pass encoding to Scrapy here.
What happens: browserHtml is always utf-8, even if original content was using a different encoding. It means that meta tags in browserHtml may suggest a different encoding; headers may suggest a different encoding as well. So, if we just pass utf8 data as body, response.text may do encoding detection, and get it wrong.
It should be possible to write a test for it - ZyteAPITextResponse.from_api_response, where browserHtml is coming from a non-utf8 page. Then check that response.text is correct. The content should include some data which is not ascii.