Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux port using a neural net #490

Open
joihn opened this issue Feb 14, 2023 · 3 comments
Open

Linux port using a neural net #490

joihn opened this issue Feb 14, 2023 · 3 comments

Comments

@joihn
Copy link

joihn commented Feb 14, 2023

I would be interested in a linux port.
Since "accessibility API" (uses to retrieve button location) are not available on linux, one needs something else to gather the buttons location.

I'm thinking about building a small neural net performing object detection, to retrieve the button locations.
Recent advance have made lightweight but CPU run network possible, with an acceptable detection performance and input to output delay.
example

Alternatively, since bounding box are not really needed (only 1 coordinate / button is needed), a segmentation neural net trained to output a heatmap of button location could be another approach/
kinda like this paper (ignoring the segmentation part of course)

The most challenging aspect would be collecting a good dataset of various GUI with labeled buttons.
Web scrapping and HTML parsing could be done to find the button location, giving a big dataset for cheap.

However one would only have "web looking" button, and no "desktop looking" button.
One could use MacOs GUI + accessibilty API to further diversify the dataset.

The advantage of such an approach would be that such tool "should" be compatible with all apps out of the box.
What are your thought on such an approach ?

@dexterleng
Copy link
Collaborator

dexterleng commented Feb 15, 2023 via email

@joihn
Copy link
Author

joihn commented Feb 15, 2023

relevant link, haven't tested yet :
https://github.com/phil294/vimium-everywhere

@garywill
Copy link

garywill commented May 19, 2023

Last year I made this https://github.com/garywill/vimouse
Uses opencv to do vision recognition based click
1
2
The screenshot may seem ugly right now. The algorithm and parameters may need changing. Haven implemented any AI. Currently it just finds any "object" on screen (at least almost every button found lol)

It is in very very very early stage.

Cross-platform & lightweight. I made that in 300 lines python code.

BTW, I listed many similar projects in that readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants