Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about crop step in Data augmentation #5

Open
HXACA opened this issue Jun 13, 2019 · 11 comments
Open

Question about crop step in Data augmentation #5

HXACA opened this issue Jun 13, 2019 · 11 comments

Comments

@HXACA
Copy link

HXACA commented Jun 13, 2019

Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step?

@JingChaoLiu
Copy link
Collaborator

Q: Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step?

No, we don't apply any tricks in the procedure of crop. But you may need to pay attention to some details of cropping images and generating pyramid labels.


The steps of cropping images and generating pyramid labels are as follows:

The correct cropped bbox

  1. Considering the training speed, we keep the mask in the form of points list, not a H*W image, until the sample are forwarded to the mask branch.

  2. crop the origin text mask

    cropped_text_mask = crop_region ∩ origin_text_mask
                      = Polygon[cropped_points_num, {x,y}]
    

    note: the cropped_points_num may varies from 3 to 8.

  3. get the bounding box by wrapping the cropped mask with a new bounding box, rather than cropping the origin bounding box. As illustrated in the above image, the cropped origin bounding box may greater than the correct cropped bounding box.

  4. generate the pyramid label for the corresponding predicted bounding box. In our setting, the generation step of pyramid label has been deferred to the stage of calculating the mask loss.

    predicted_bounding_box = (left, top, bottom, right)
    mask_label = cropped_text_mask ∩ predicted_bounding_box
               = Tensor[Channel=1, H=28, W=28] # pyramid label or binary label
    

    note: though the points_num of the cropped_text_mask varies from 3 to 8, the pyramid label can still handle this variance.

pyramid_label

@HXACA
Copy link
Author

HXACA commented Jun 15, 2019

@JingChaoLiu Thanks for your response

@soldierofhell
Copy link

Hi, actually my questions refer to pyramid label generation, not the cropping, but I'll use this issue quotes :)

  1. Considering the training speed, we keep the mask in the form of points list, not a H*W image, until the sample are forwarded to the mask branch.

You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?

  1. generate the pyramid label for the corresponding predicted bounding box. In our setting, the generation step of pyramid label has been deferred to the stage of calculating the mask loss.

So they're calculated on 28x28 grid? Something like:
for p in grid_28x28: for v in vertices: [alpha, beta] = A^-1*b; if alpha>=0 and beta>=0: score(p) = max(1-(alpha+beta),0)

@JingChaoLiu
Copy link
Collaborator

You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?

Yes

So they're calculated on 28x28 grid? Something like: ...

Denote the ground-truth mask point list as P=Tensor[points_num, {x,y}] and the predicted bounding box as pred_box = {pred_top, pred_bottom, pred_left, pred_right}. Furthermore, define pred_h = pred_bottom - pred_top and pred_w = pred_right - pred_left. We have tried two schemas:

  1. generate a mask_label within {pred_top, pred_bottom, pred_left, pred_right} based on P, then resize this mask_label from the scale of [pred_h, pred_w] to the scale of [28, 28]

  2. map pred_box from {pred_top, pred_bottom, pred_left, pred_right} to {0, 28, 0, 28} and perform the same map for the points list P, i.e. resized_P = (P-(pred_left, pred_top)) * (28/pred_h, 28/pred_w), finally generate a mask_label within {0, 28, 0, 28} based on resized_P

The schema you mentioned may be schema 2. In our experiments, schema 2 is lower than schema 1 by 0.3% F-measure. But schema 2 is very efficient both for memory and for calculation. The training time of schema 2 is two-third of schema 1.

@soldierofhell
Copy link

Thank you @JingChaoLiu for your valuable analysis.
It seems like current maskrcnn-benchmark approach is closer to 2., because there're basically three steps:

  • crop (actually only origin translation)
  • resize (rescale of vertices)
  • and then convertion to mask

I don't get it why they're not using roialign here for efficiency

By the way is matrix inversion really necessarily for calculating target? I mean this pyramid function seems like very "regular" and I'm suprised there's no "analytic" formula
If not, maybe for efficiency of training some other form like "stepwise" pyramid would be better? Actually I guess polygon approach is kind of more refined idea from EAST where gt "mass" was uniformely concentrated in the center

Regards,

@donglin8506
Copy link

Could you share the code of generating Pyramid label?

@JingChaoLiu
Copy link
Collaborator

JingChaoLiu commented Sep 10, 2019

Here is a simplified version. Adjust these code as you need. @donglin8506

import cv2
import numpy as np


def generate_pyramid_label(H, W, corner_points):
    """

    :param int H: image_H
    :param int W: image_W
    :param np.ndarray corner_points: dtype=np.float32, shape=[point_num, {x,y}] 3 <= point_num <= 8
    :return: np.ndarray ans: dtype=np.float32, shape=[H, W]

    generate a pyramid label from corner_points 
      within the bounding box {box_top=0, box_bottom=H, box_left=0, box_right=W}
    """
    point_num = len(corner_points)
    center = corner_points.mean(axis=0)
    vectors = corner_points - center
    matrices = np.empty((point_num, 2, 2), dtype=np.float32)
    for i in range(point_num):
        m = vectors[[i, (i + 1) % point_num]].T
        matrices[i] = np.linalg.pinv(m)
    points = np.empty((H, W, 2), dtype=np.float32)  # H, W, {x, y}
    points[:, :, 0] = np.arange(W)
    points[:, :, 1] = np.arange(H)[..., None]
    points -= center
    ans: np.ndarray = np.matmul(matrices[:, None, None, ...], points[..., None])
    ans = ans.squeeze()
    ans = (ans >= 0).all(axis=-1) * ans.sum(axis=-1)
    ans = np.max(ans, axis=0)
    ans = np.maximum(1 - ans, 0)
    return ans


def main():
    H, W = 150, 224
    corner_points = np.array([
        187, 0,
        224, 80,
        30, 150,
        0, 65
    ], dtype=np.float32).reshape(-1, 2)

    ans = generate_pyramid_label(H, W, corner_points)

    cv2.imshow('image', ans)
    cv2.waitKey(0)


if __name__ == '__main__':
    main()

@donglin8506
Copy link

donglin8506 commented Sep 11, 2019

@JingChaoLiu Thank you very much, this will give a lot of help, you're welcome! Best regards!

@insightcs
Copy link

@JingChaoLiu Thank you for your great work, but I have a question about generating pyramid labels. I generate pyramid mask in your way, but it has also a few white dots, as shown in the figure. Does it affect model training? Ask for your help, thanks.
image
image

@JingChaoLiu
Copy link
Collaborator

@insightcs It's OK. This won't hurt the model training. The phenomenon is caused by the numerical instability of matrix inversion of matrices[i] = np.linalg.pinv(m)

@xxlxx1
Copy link

xxlxx1 commented Nov 27, 2019

@insightcs hi, if I want to use this soft mask label, need I add this code to the project? I can't find about soft mask label in the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants