You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Q1. Why Algorithm 1 's inputs is all segmentation result of a image( H*W points ), while its outputs is just only single one text bounding box ( 4 planes )
Q2. what's the detail about INITPLANES function? what parameters(A, B, D) is after calling the function ? I' cannot see from the paper.
Thanks !
The text was updated successfully, but these errors were encountered:
Q1. Why Algorithm 1 's inputs is all segmentation result of a image( H*W points ), while its outputs is just only single one text bounding box ( 4 planes )
A1: The plane clustering algorithm aims to rebuild the pyramid for one text mask basing on a single text region(text_mask = Tensor[C=1, H=28, W=28])). The input of this algorithm is indeed a text mask, not the whole image. For one image which normally contains a dozen text regions, the plane clustering will try to rebuild one pyramid for each text mask respectively.
Q2. what's the detail about INITPLANES function? what parameters(A, B, D) is after calling the function ? I' cannot see from the paper.
A2: Given the positive point list P = Tensor[point_num, Channel={x, y, z}], the INIT_PLANES will do these things:
calculate out the approximate apex of the pyramid, noted as E.
E.x, E.y = mean(P[:, :2])
E.z = 1
produce the initial planes. we simplily link the apex E to the corner points {(0, 0, 0), (0, 28, 0), (28, 28, 0), (28, 0, 0)} to form the four inclined plane.
note: (0, 0, 0) means x=0, y=0, z=0
Another thing worth to mention:
During the procedure of clustering, this algorithm only cares about the four independent inclined planes. In other words, we only require the four inclined planes to be independent, without the constraint of sharing a common apex. So we can rebuild both pyramid and square frustum from the text mask.
Q1. Why Algorithm 1 's inputs is all segmentation result of a image( H*W points ), while its outputs is just only single one text bounding box ( 4 planes )
Q2. what's the detail about INITPLANES function? what parameters(A, B, D) is after calling the function ? I' cannot see from the paper.
Thanks !
The text was updated successfully, but these errors were encountered: