[Do not merge] 1st Overview Unet Model #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I processed the data from the diverse dataset with ~1000 images from the dropbox.
The diff will be unreadable and github does not like to show large Notebooks, so I added a copy to 1stOverview.html on my web space.
It's not production ready, right the notebook expects to have local Dropbox folder (and the models are saved via data version control on my own web host), but I think we can change that easily when the hackathon starts.
I implemented the straight forward Unet model with fastai (that's much faster for me and we get all the data visualizations for free and ease). tensorflow.keras might have the advantage that we could scale it over more than 1 gpu easier (it's already painful on my not so bad GTX 1080 locally), but I'm afraid, if we start doing this, we won't do anything else for the rest of the week. But I would be fine to just rewrite it for keras if you like it more. Prototyping is always easier in fastai...
The biggest problem IMHO right now from technical side is the low batch size (so the batch normalization does not work really good) and so the training is really slow. I think the best line here is to discuss with our mentor.
From content site, I grabbed all the labels from the XML file directly (without using the API - it was easier for me this way). But I think, the labels are too detailed now and the network is very much in progress to distunguish different semantical paragraphs (that are still paragraphs). I'm looking forward to talk about with you domain experts about. I think, this makes the learning process much harder (in the end, the network now will also learn to read a bit in addition :-o)
I'll run it a bit longer from now, just to see whether it would still progress, but close to 100 epochs is already too long to work interactive with it.
But just look at the results, i think have a good starting point for the hackathon and what we might acheive with a segmentation model trying to classify each pixel. (right here on downscaled image)
I guess, we will be able to improve and might still add some class image processing, but I'd say it has a lot of potential...
What is of course missing would be the closing gap to tesseract, we still need something to do next week. I'll only look to tesseract API this evening, but won't implement anything.