[Do not merge] 1st Overview Unet Model #9

hanfried · 2019-09-08T13:52:38Z

I processed the data from the diverse dataset with ~1000 images from the dropbox.
The diff will be unreadable and github does not like to show large Notebooks, so I added a copy to 1stOverview.html on my web space.
It's not production ready, right the notebook expects to have local Dropbox folder (and the models are saved via data version control on my own web host), but I think we can change that easily when the hackathon starts.

I implemented the straight forward Unet model with fastai (that's much faster for me and we get all the data visualizations for free and ease). tensorflow.keras might have the advantage that we could scale it over more than 1 gpu easier (it's already painful on my not so bad GTX 1080 locally), but I'm afraid, if we start doing this, we won't do anything else for the rest of the week. But I would be fine to just rewrite it for keras if you like it more. Prototyping is always easier in fastai...

The biggest problem IMHO right now from technical side is the low batch size (so the batch normalization does not work really good) and so the training is really slow. I think the best line here is to discuss with our mentor.

From content site, I grabbed all the labels from the XML file directly (without using the API - it was easier for me this way). But I think, the labels are too detailed now and the network is very much in progress to distunguish different semantical paragraphs (that are still paragraphs). I'm looking forward to talk about with you domain experts about. I think, this makes the learning process much harder (in the end, the network now will also learn to read a bit in addition :-o)

I'll run it a bit longer from now, just to see whether it would still progress, but close to 100 epochs is already too long to work interactive with it.
But just look at the results, i think have a good starting point for the hackathon and what we might acheive with a segmentation model trying to classify each pixel. (right here on downscaled image)
I guess, we will be able to improve and might still add some class image processing, but I'd say it has a lot of potential...

What is of course missing would be the closing gap to tesseract, we still need something to do next week. I'll only look to tesseract API this evening, but won't implement anything.

Not really working. Checking for contains/within is too tight (I guess the containers are touching or slightly overlapping). So, here is a TODO for a better logic.

Have CUDA memory problems, so will need to check it on a better GPU.

Right now it's pointing to my own webserver, what's not the desired final state.

hanfried and others added 9 commits September 7, 2019 15:37

Setup jupyter requirements and install instructions

2cd3566

Load, resize and look to some images

1553597

Look at segment annotations

666b8f6

Create segmentation images as labels

0e0feb3

Add logic to only look at node containers

63905fe

Not really working. Checking for contains/within is too tight (I guess the containers are touching or slightly overlapping). So, here is a TODO for a better logic.

Prepare first learner

27cb812

Have CUDA memory problems, so will need to check it on a better GPU.

Overfit tiny model

52fd285

Run training on diverse dataset for 100 epochs w/o optimizations

099d4bd

Add dvc to keep track of models

9e27caf

Right now it's pointing to my own webserver, what's not the desired final state.

hanfried requested review from kba, wrznr and bertsky September 8, 2019 13:52

hanfried and others added 3 commits September 8, 2019 17:53

Training another 60 epochs without effect

4649784

Comment from discussion session

2fff632

Preliminary kaggle notebook translation

06f642d

wrznr changed the title ~~1st Overview Unet Model~~ [Do not merge] 1st Overview Unet Model Dec 6, 2019

hanfried marked this pull request as draft July 8, 2020 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] 1st Overview Unet Model #9

[Do not merge] 1st Overview Unet Model #9

hanfried commented Sep 8, 2019

[Do not merge] 1st Overview Unet Model #9

Are you sure you want to change the base?

[Do not merge] 1st Overview Unet Model #9

Conversation

hanfried commented Sep 8, 2019