-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux port using a neural net #490
Comments
Not sure how well it would work. I don't know the first thing about ML. I'm exploring automating detection of buttons from screenshots in iOS apps (just the data collection part) right now for a freelance gig, will have to see the results. Not sure if I would support linux either (given the userbase size + willlingness to pay for stuff).
… On 15 Feb 2023, at 5:59 AM, Maxime G ***@***.*** ***@***.***>> wrote:
I would be interested in a linux port.
Since "accessibility API" (uses to retrieve button location) are not available on linux, one needs something else to gather the buttons location.
I'm thinking about building a small neural net performing object detection, to retrieve the button locations.
Recent advance have made lightweight but CPU run network possible, with an acceptable detection performance and input to output delay.
example <https://github.com/ultralytics/ultralytics>
Alternatively, since bounding box are not really needed (only 1 coordinate / button is needed), a segmentation neural net trained to output a heatmap of button location could be another approach/
kinda like this paper <https://ieeexplore.ieee.org/abstract/document/8593678/>(ignoring the segmentation part of course)
The most challenging aspect would be collecting a good dataset of various GUI with labeled buttons.
Web scrapping and HTML parsing could be done to find the button location, giving a big dataset for cheap.
However one would only have "web looking" button, and no "desktop looking" button.
One could use MacOs GUI + accessibilty API to further diversify the dataset.
The advantage of such an approach would be that such tool "should" be compatible with all apps out of the box.
What are your thought on such an approach ?
—
Reply to this email directly, view it on GitHub <#490>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIE6VXFT5DAJMFJIQ7P6I3TWXP55TANCNFSM6AAAAAAU4DBIHQ>.
You are receiving this because you are subscribed to this thread.
|
relevant link, haven't tested yet : |
Last year I made this https://github.com/garywill/vimouse It is in very very very early stage. Cross-platform & lightweight. I made that in 300 lines python code. BTW, I listed many similar projects in that readme |
I would be interested in a linux port.
Since "accessibility API" (uses to retrieve button location) are not available on linux, one needs something else to gather the buttons location.
I'm thinking about building a small neural net performing object detection, to retrieve the button locations.
Recent advance have made lightweight but CPU run network possible, with an acceptable detection performance and input to output delay.
example
Alternatively, since bounding box are not really needed (only 1 coordinate / button is needed), a segmentation neural net trained to output a heatmap of button location could be another approach/
kinda like this paper (ignoring the segmentation part of course)
The most challenging aspect would be collecting a good dataset of various GUI with labeled buttons.
Web scrapping and HTML parsing could be done to find the button location, giving a big dataset for cheap.
However one would only have "web looking" button, and no "desktop looking" button.
One could use MacOs GUI + accessibilty API to further diversify the dataset.
The advantage of such an approach would be that such tool "should" be compatible with all apps out of the box.
What are your thought on such an approach ?
The text was updated successfully, but these errors were encountered: