Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDuckGo block not detected #367

Closed
adbenitez opened this issue Apr 2, 2021 · 7 comments · Fixed by #374
Closed

DuckDuckGo block not detected #367

adbenitez opened this issue Apr 2, 2021 · 7 comments · Fixed by #374

Comments

@adbenitez
Copy link
Contributor

I executed howdoi like 3 times with cache disabled while testing in my laptop and DuckDuckGo is returning this instead of html:

If this error persists, please let us know: [email protected]

this is not in the BLOCK_INDICATORS

@adbenitez
Copy link
Contributor Author

analyzing the issue I can reproduce it in firefox editing the normal request and removing headers Referer, Host and Origin, the server returns an errror 403 and the message mentioned above as the response body, since howdoi is using the response text without checking for HTTP errors this text is handled expecting it to be HTML.

So now instead of comparing text with string If this error persists, please let us know: [email protected] I recommend to check for response status as a better way to check for blocking errors instead of adding this text to BLOCK_INDICATORS

@adbenitez
Copy link
Contributor Author

probably, a separated issue should open to solve the blocking issue due to no Referer|Host|Origin (I didn't checked which one is vital) but I already opened quite some issues so I will stop spamming the project ;)

@gleitz
Copy link
Owner

gleitz commented Apr 3, 2021

Do you think it's enough to add that message to BLOCK_INDICATORS?

@adbenitez
Copy link
Contributor Author

I recommend to check response status, because that will capture this and other errors, so it is easier to maintain, if for example in a year they change the message to something else.

I could help if you want.

@gleitz
Copy link
Owner

gleitz commented Apr 7, 2021

Sure - always happy to accept a PR :)

@gleitz
Copy link
Owner

gleitz commented Jul 13, 2021

Hey @adbenitez it looks like I'm getting block errors again for DDG. Given your previous research, any thoughts on how we can avoid the 403?

Here's the howdoi result:

» howdoi format date bash --engine duckduckgo -C --explain
INFO: Version: 2.0.16
Cache cleared successfully
INFO: Fetching answers for query: format date bash
INFO: Searching duckduckgo with URL: https://duckduckgo.com/html?q=site:stackoverflow.com%20format%20date%20bash&t=hj&ia=web
ERROR: Unable to find an answer because the search engine temporarily blocked the request. Please wait a few minutes or select a different search engine.
Traceback (most recent call last):
  File "/Users/gleitz/.homebrew/bin/howdoi", line 8, in <module>
    sys.exit(command_line_runner())
  File "/Users/gleitz/.homebrew/lib/python3.9/site-packages/howdoi/howdoi.py", line 785, in command_line_runner
    utf8_result = howdoi(args).encode('utf-8', 'ignore')
  File "/Users/gleitz/.homebrew/lib/python3.9/site-packages/howdoi/howdoi.py", line 608, in howdoi
    res = _get_answers(args)
  File "/Users/gleitz/.homebrew/lib/python3.9/site-packages/howdoi/howdoi.py", line 417, in _get_answers
    question_links = _get_links_with_cache(args['query'])
  File "/Users/gleitz/.homebrew/lib/python3.9/site-packages/howdoi/howdoi.py", line 396, in _get_links_with_cache
    links = _get_links(query)
  File "/Users/gleitz/.homebrew/lib/python3.9/site-packages/howdoi/howdoi.py", line 281, in _get_links
    raise BlockError('Temporary block by search engine')
howdoi.howdoi.BlockError: Temporary block by search engine

and here's wget (also failing)

» wget "https://duckduckgo.com/html?q=site:stackoverflow.com%20format%20date%20bash&t=hj&ia=web"
--2021-07-13 09:50:21--  https://duckduckgo.com/html?q=site:stackoverflow.com%20format%20date%20bash&t=hj&ia=web
Resolving duckduckgo.com (duckduckgo.com)... 52.250.42.157
Connecting to duckduckgo.com (duckduckgo.com)|52.250.42.157|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://html.duckduckgo.com/html?q=site:stackoverflow.com%20format%20date%20bash&t=hj&ia=web [following]
--2021-07-13 09:50:21--  https://html.duckduckgo.com/html?q=site:stackoverflow.com%20format%20date%20bash&t=hj&ia=web
Resolving html.duckduckgo.com (html.duckduckgo.com)... 52.250.42.157
Connecting to html.duckduckgo.com (html.duckduckgo.com)|52.250.42.157|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-07-13 09:50:21 ERROR 403: Forbidden.

@gleitz
Copy link
Owner

gleitz commented Sep 17, 2021

Closing in favor of #404

@gleitz gleitz closed this as completed Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants