-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about crop step in Data augmentation #5
Comments
No, we don't apply any tricks in the procedure of crop. But you may need to pay attention to some details of cropping images and generating pyramid labels. The steps of cropping images and generating pyramid labels are as follows:
|
@JingChaoLiu Thanks for your response |
Hi, actually my questions refer to pyramid label generation, not the cropping, but I'll use this issue quotes :)
You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?
So they're calculated on 28x28 grid? Something like: |
Yes
Denote the ground-truth mask point list as
The schema you mentioned may be schema 2. In our experiments, schema 2 is lower than schema 1 by 0.3% F-measure. But schema 2 is very efficient both for memory and for calculation. The training time of schema 2 is two-third of schema 1. |
Thank you @JingChaoLiu for your valuable analysis.
I don't get it why they're not using roialign here for efficiency By the way is matrix inversion really necessarily for calculating target? I mean this pyramid function seems like very "regular" and I'm suprised there's no "analytic" formula Regards, |
Could you share the code of generating Pyramid label? |
Here is a simplified version. Adjust these code as you need. @donglin8506 import cv2
import numpy as np
def generate_pyramid_label(H, W, corner_points):
"""
:param int H: image_H
:param int W: image_W
:param np.ndarray corner_points: dtype=np.float32, shape=[point_num, {x,y}] 3 <= point_num <= 8
:return: np.ndarray ans: dtype=np.float32, shape=[H, W]
generate a pyramid label from corner_points
within the bounding box {box_top=0, box_bottom=H, box_left=0, box_right=W}
"""
point_num = len(corner_points)
center = corner_points.mean(axis=0)
vectors = corner_points - center
matrices = np.empty((point_num, 2, 2), dtype=np.float32)
for i in range(point_num):
m = vectors[[i, (i + 1) % point_num]].T
matrices[i] = np.linalg.pinv(m)
points = np.empty((H, W, 2), dtype=np.float32) # H, W, {x, y}
points[:, :, 0] = np.arange(W)
points[:, :, 1] = np.arange(H)[..., None]
points -= center
ans: np.ndarray = np.matmul(matrices[:, None, None, ...], points[..., None])
ans = ans.squeeze()
ans = (ans >= 0).all(axis=-1) * ans.sum(axis=-1)
ans = np.max(ans, axis=0)
ans = np.maximum(1 - ans, 0)
return ans
def main():
H, W = 150, 224
corner_points = np.array([
187, 0,
224, 80,
30, 150,
0, 65
], dtype=np.float32).reshape(-1, 2)
ans = generate_pyramid_label(H, W, corner_points)
cv2.imshow('image', ans)
cv2.waitKey(0)
if __name__ == '__main__':
main() |
@JingChaoLiu Thank you very much, this will give a lot of help, you're welcome! Best regards! |
@JingChaoLiu Thank you for your great work, but I have a question about generating pyramid labels. I generate pyramid mask in your way, but it has also a few white dots, as shown in the figure. Does it affect model training? Ask for your help, thanks. |
@insightcs It's OK. This won't hurt the model training. The phenomenon is caused by the numerical instability of matrix inversion of |
@insightcs hi, if I want to use this soft mask label, need I add this code to the project? I can't find about soft mask label in the project. |
Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step?
The text was updated successfully, but these errors were encountered: