谷歌视觉OCR，将单词坐标从90、180、270个文档旋转到0度

问题

鉴于我们有以下指南，摘自Google Vision OCR文档https://developers.google.com/resources/api-libraries/documentation/vision/v1p1beta1/python/latest/vision_v1p1beta1.files.html

"boundingBox": { # A bounding polygon for the detected image annotation. # The bounding box for the paragraph.
# The vertices are in the order of top-left, top-right, bottom-right,
# bottom-left. When a rotation of the bounding box is detected the rotation
# is represented as around the top-left corner as defined when the text is
# read in the 'natural' orientation.
# For example:
#   * when the text is horizontal it might look like:
#      0----1
#      |    |
#      3----2
#   * when it's rotated 180 degrees around the top-left corner it becomes:
#      2----3
#      |    |
#      1----0
#   and the vertex order will still be (0, 1, 2, 3).

因此，作为一个实验，我以四种不同的方向扫描了同一份文档，并通过谷歌的视觉OCR(document_TEXT_DETECTION(运行它。即0、90、180和270度。从谷歌的OCR输出中得出以下结果。

文档方向为0度这是水平文本的默认值。它具有0度文本旋转。它的四个角是：

0----1
|    |
3----2
Document height 3508
Document width 2479

输出文本示例

LEGO - {'vertices': [{'x': 755, 'y': 172}, {'x': 877, 'y': 173}, {'x': 876, 'y': 237}, {'x': 754, 'y': 236}]}
LEGOLAND - {'vertices': [{'x': 1994, 'y': 189}, {'x': 2269, 'y': 192}, {'x': 2268, 'y': 244}, {'x': 1993, 'y': 241}]}

文档方向为90度

1----2
|    |
0----3
*vertex order will still be (0, 1, 2, 3)
Document height 2479
Document width 3508

输出文本示例

LEGO - {'vertices': [{'x': 170, 'y': 1730}, {'x': 171, 'y': 1604}, {'x': 241, 'y': 1604}, {'x': 240, 'y': 1730}]}
LEGOLAND - {'vertices': [{'x': 188, 'y': 486}, {'x': 192, 'y': 213}, {'x': 245, 'y': 214}, {'x': 241, 'y': 487}]}

180度方向的文档

2----3
|    |
1----0
*vertex order will still be (0, 1, 2, 3)
Document height 3508
Document width 2479

输出文本示例

LEGO - {'vertices': [{'x': 1740, 'y': 3337}, {'x': 1584, 'y': 3336}, {'x': 1585, 'y': 3259}, {'x': 1741, 'y': 3260}]}
LEGOLAND - {'vertices': [{'x': 485, 'y': 3315}, {'x': 212, 'y': 3311}, {'x': 213, 'y': 3261}, {'x': 486, 'y': 3265}]}

文档方向为270度

3----0
|    |
2----1
*vertex order will still be (0, 1, 2, 3)
Document height 2479
Document width 3508

输出文本示例

LEGO - {'vertices': [{'x': 3335, 'y': 738}, {'x': 3333, 'y': 893}, {'x': 3269, 'y': 892}, {'x': 3271, 'y': 737}]}
LEGOLAND - {'vertices': [{'x': 3318, 'y': 1994}, {'x': 3313, 'y': 2266}, {'x': 3261, 'y': 2265}, {'x': 3266, 'y': 1993}]}

现在的问题

假设文档以90度、180度和270度扫描，那么如何从数学上转换坐标，以便无论以哪个方向扫描，它们都会给出与默认0度文档相同的结果。或者换句话说，如何像0度扫描一样校正90度、180度和270度的坐标？

这个问题对一些人来说可能很简单，但在过去的几天里，我一直在尝试各种方法，但我似乎无法解决。

因此，输入参数是扫描的页面方向度(0,90180270(、谷歌OCR输出的文本顶点以及谷歌OCR的页面大小(高度和宽度(。

输出必须是0度页面方向的校正文本顶点

我会给你数学答案。请记住，数学是一门精确的科学，而视觉OCR扫描是一种经验技术，即不是精确的科学。

请允许我举一个简单的例子，这样你就可以看到行为了。想象一个高度为10、宽度为4、点位于坐标(1.9(的文档。将其旋转90º时，点的坐标变为(9，3(，然后变为(3，1(，最后变为(1，1(。

原因是对于高度H和宽度W的通用矩形，点(a，b(的90º旋转产生：
；(b，W-a(，W'=H，H'=W。

这种变换重复产生180º、270º的变换
作为序列(a，b(->(b，W-a(->(W-a、H-b(->(H-b，a(->(a，b(

因此，从序列中的任何一点将其返回到(a，b(只是一个简单的方程，只要你知道所有的参数。

例如，对于180度边界框：

LEGO - {'vertices': [{'x': 1740, 'y': 3337}, {'x': 1584, 'y': 3336}, {'x': 1585, 'y': 3259}, {'x': 1741, 'y': 3260}]}

每个x值都跟在x=width-x0->x0=宽度-x
每个y值跟随y=高度-y0->y0=高度-y

哪个给出：

LEGO - {'vertices': [{'x': 739, 'y': 171}, {'x': 895, 'y': 172}, {'x': 894, 'y': 249}, {'x': 738, 'y': 248}]}

当然，这与你最初的价值观略有不同。如果对所有旋转执行简单的变换，您可以看到它们在所有旋转中都略有不同。记住这是经验的"；边界框"；，它们有一个相关的错误，并且不可能使它们与"错误"相同；数学"；问题

相关内容

最新更新

热门标签：