Question about crop step in Data augmentation #5

HXACA · 2019-06-13T07:43:10Z

Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step？

JingChaoLiu · 2019-06-14T08:05:50Z

Q: Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step？

No, we don't apply any tricks in the procedure of crop. But you may need to pay attention to some details of cropping images and generating pyramid labels.

The steps of cropping images and generating pyramid labels are as follows:

Considering the training speed, we keep the mask in the form of points list, not a H*W image, until the sample are forwarded to the mask branch.

crop the origin text mask

cropped_text_mask = crop_region ∩ origin_text_mask
                  = Polygon[cropped_points_num, {x,y}]

note: the cropped_points_num may varies from 3 to 8.

get the bounding box by wrapping the cropped mask with a new bounding box, rather than cropping the origin bounding box. As illustrated in the above image, the cropped origin bounding box may greater than the correct cropped bounding box.
generate the pyramid label for the corresponding predicted bounding box. In our setting, the generation step of pyramid label has been deferred to the stage of calculating the mask loss.
```
predicted_bounding_box = (left, top, bottom, right)
mask_label = cropped_text_mask ∩ predicted_bounding_box
           = Tensor[Channel=1, H=28, W=28] # pyramid label or binary label
```
note: though the points_num of the cropped_text_mask varies from 3 to 8, the pyramid label can still handle this variance.

HXACA · 2019-06-15T07:39:04Z

@JingChaoLiu Thanks for your response

soldierofhell · 2019-09-03T13:40:36Z

Hi, actually my questions refer to pyramid label generation, not the cropping, but I'll use this issue quotes :)

Considering the training speed, we keep the mask in the form of points list, not a H*W image, until the sample are forwarded to the mask branch.

You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?

generate the pyramid label for the corresponding predicted bounding box. In our setting, the generation step of pyramid label has been deferred to the stage of calculating the mask loss.

So they're calculated on 28x28 grid? Something like:
for p in grid_28x28: for v in vertices: [alpha, beta] = A^-1*b; if alpha>=0 and beta>=0: score(p) = max(1-(alpha+beta),0)

JingChaoLiu · 2019-09-04T13:47:44Z

You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?

Yes

So they're calculated on 28x28 grid? Something like: ...

Denote the ground-truth mask point list as P=Tensor[points_num, {x,y}] and the predicted bounding box as pred_box = {pred_top, pred_bottom, pred_left, pred_right}. Furthermore, define pred_h = pred_bottom - pred_top and pred_w = pred_right - pred_left. We have tried two schemas:

generate a mask_label within {pred_top, pred_bottom, pred_left, pred_right} based on P, then resize this mask_label from the scale of [pred_h, pred_w] to the scale of [28, 28]
map pred_box from {pred_top, pred_bottom, pred_left, pred_right} to {0, 28, 0, 28} and perform the same map for the points list P, i.e. resized_P = (P-(pred_left, pred_top)) * (28/pred_h, 28/pred_w), finally generate a mask_label within {0, 28, 0, 28} based on resized_P

The schema you mentioned may be schema 2. In our experiments, schema 2 is lower than schema 1 by 0.3% F-measure. But schema 2 is very efficient both for memory and for calculation. The training time of schema 2 is two-third of schema 1.

soldierofhell · 2019-09-05T08:23:47Z

Thank you @JingChaoLiu for your valuable analysis.
It seems like current maskrcnn-benchmark approach is closer to 2., because there're basically three steps:

crop (actually only origin translation)
resize (rescale of vertices)
and then convertion to mask

I don't get it why they're not using roialign here for efficiency

By the way is matrix inversion really necessarily for calculating target? I mean this pyramid function seems like very "regular" and I'm suprised there's no "analytic" formula
If not, maybe for efficiency of training some other form like "stepwise" pyramid would be better? Actually I guess polygon approach is kind of more refined idea from EAST where gt "mass" was uniformely concentrated in the center

Regards,

donglin8506 · 2019-09-10T09:14:13Z

Could you share the code of generating Pyramid label?

JingChaoLiu · 2019-09-10T14:32:33Z

Here is a simplified version. Adjust these code as you need. @donglin8506

import cv2
import numpy as np


def generate_pyramid_label(H, W, corner_points):
    """

    :param int H: image_H
    :param int W: image_W
    :param np.ndarray corner_points: dtype=np.float32, shape=[point_num, {x,y}] 3 <= point_num <= 8
    :return: np.ndarray ans: dtype=np.float32, shape=[H, W]

    generate a pyramid label from corner_points 
      within the bounding box {box_top=0, box_bottom=H, box_left=0, box_right=W}
    """
    point_num = len(corner_points)
    center = corner_points.mean(axis=0)
    vectors = corner_points - center
    matrices = np.empty((point_num, 2, 2), dtype=np.float32)
    for i in range(point_num):
        m = vectors[[i, (i + 1) % point_num]].T
        matrices[i] = np.linalg.pinv(m)
    points = np.empty((H, W, 2), dtype=np.float32)  # H, W, {x, y}
    points[:, :, 0] = np.arange(W)
    points[:, :, 1] = np.arange(H)[..., None]
    points -= center
    ans: np.ndarray = np.matmul(matrices[:, None, None, ...], points[..., None])
    ans = ans.squeeze()
    ans = (ans >= 0).all(axis=-1) * ans.sum(axis=-1)
    ans = np.max(ans, axis=0)
    ans = np.maximum(1 - ans, 0)
    return ans


def main():
    H, W = 150, 224
    corner_points = np.array([
        187, 0,
        224, 80,
        30, 150,
        0, 65
    ], dtype=np.float32).reshape(-1, 2)

    ans = generate_pyramid_label(H, W, corner_points)

    cv2.imshow('image', ans)
    cv2.waitKey(0)


if __name__ == '__main__':
    main()

donglin8506 · 2019-09-11T02:03:52Z

@JingChaoLiu Thank you very much, this will give a lot of help, you're welcome! Best regards!

insightcs · 2019-10-14T12:52:30Z

@JingChaoLiu Thank you for your great work, but I have a question about generating pyramid labels. I generate pyramid mask in your way, but it has also a few white dots, as shown in the figure. Does it affect model training? Ask for your help, thanks.

JingChaoLiu · 2019-10-16T07:04:07Z

@insightcs It's OK. This won't hurt the model training. The phenomenon is caused by the numerical instability of matrix inversion of matrices[i] = np.linalg.pinv(m)

xxlxx1 · 2019-11-27T14:57:01Z

@insightcs hi, if I want to use this soft mask label, need I add this code to the project? I can't find about soft mask label in the project.

JingChaoLiu mentioned this issue Aug 15, 2019

About configurations #8

Open

JingChaoLiu mentioned this issue Sep 24, 2019

how to create the soft mask label ? #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about crop step in Data augmentation #5

Question about crop step in Data augmentation #5

HXACA commented Jun 13, 2019

JingChaoLiu commented Jun 14, 2019

HXACA commented Jun 15, 2019

soldierofhell commented Sep 3, 2019

JingChaoLiu commented Sep 4, 2019

soldierofhell commented Sep 5, 2019

donglin8506 commented Sep 10, 2019

JingChaoLiu commented Sep 10, 2019 •

edited

Loading

donglin8506 commented Sep 11, 2019 •

edited

Loading

insightcs commented Oct 14, 2019

JingChaoLiu commented Oct 16, 2019

xxlxx1 commented Nov 27, 2019

Question about crop step in Data augmentation #5

Question about crop step in Data augmentation #5

Comments

HXACA commented Jun 13, 2019

JingChaoLiu commented Jun 14, 2019

HXACA commented Jun 15, 2019

soldierofhell commented Sep 3, 2019

JingChaoLiu commented Sep 4, 2019

soldierofhell commented Sep 5, 2019

donglin8506 commented Sep 10, 2019

JingChaoLiu commented Sep 10, 2019 • edited Loading

donglin8506 commented Sep 11, 2019 • edited Loading

insightcs commented Oct 14, 2019

JingChaoLiu commented Oct 16, 2019

xxlxx1 commented Nov 27, 2019

JingChaoLiu commented Sep 10, 2019 •

edited

Loading

donglin8506 commented Sep 11, 2019 •

edited

Loading