Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tkinter UI for icrawler #123

Open
Patty-OFurniture opened this issue Feb 3, 2024 · 1 comment
Open

Tkinter UI for icrawler #123

Patty-OFurniture opened this issue Feb 3, 2024 · 1 comment

Comments

@Patty-OFurniture
Copy link

I have a simple UI in Tkinter, which fixes several issues, WITHOUT changing the core library. If you are interested, it does show some interesting things you can do with icrawler. Yes it might seem like a mess, but if you are already using icrawler it should be clear. I can write python, and I am learning tkinter, but suggestions are welcome on my Issues list. Most things work and I want to add more.

I forked the whole project in case I needed to do fixes, but the UI is all in /examples/

  • FileTypes.py
  • FilenameDownloader.py
  • GoogleLanguageOptions.py
  • iCrawlerTK.py
  • iCrawlerTK.yaml
  • logging.conf

https://github.com/Patty-OFurniture/icrawler

#98 - keep_file() override in FilenameDownloader checks file type, you can return False if extension != "jpg"
#111 - example how to override set_logger() for full control (commented out for me)
#108 - get file name (from Content-Disposition or URL)
#108 - also log (INFO) image #, filename, URL. You can change the formatting, log to a file, or whatever else you want
#117 and #107- log (DEBUG) the Google content if no images are found to help resolve, if it's still a problem
#110 - a similar log could be done for Bing. Not implemented, but easily copied (google.py)
#106 - a keyword separator option, so you san enter, for example: "beans|rice" and search first "beans" then "rice", separately
#103 - google language selection fix should help Baidu, since it adds headers to look more like a web browser and avoid getting flagged.
#104 - google language selection should help. Common languages are in GoogleLanguageOptions.py, add to it if you need to
#61 - sort of fixed, it creates a directory for each keyword. "rice" goes in storage/rice/, "beans" in storage/beans/ - hopefully it is a good example.
#121 - a better, but not perfect, check for disk space errors, in the core library

Also image type detection for #108, finding the correct file extension

Thanks to hellock for the library, I'm just making it easier for me to use!

Have fun!
Patty

@ZhiyuanChen
Copy link
Collaborator

Thank you for your work! It looks excellent

@ZhiyuanChen ZhiyuanChen pinned this issue Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants