如何使用计算机视觉 API 识别跑步者的号码布号码

我想使用 Microsoft 认知服务计算机视觉 API 来识别比赛中跑步者照片上的号码布号码，无论是单个跑步者还是相当数量的单个跑步者。

这是 OCR 功能应该能够处理的任务吗？我已经使用"入门"程序和测试控制台尝试了几个示例，它返回了一个空的区域数组。我做错了什么，还是超出了它的能力范围？

首先，检查您的图像是否符合 API 的描述。

支持的图像格式：JPEG，PNG，GIF，BMP。图像文件大小必须为小于 4MB。图片尺寸必须介于 40 x 40 和 3200 x 之间 3200 像素，并且图像不能大于 1000 万像素。

OCR 系统通常会做出一些假设;

图像的旋转不会超过一定程度，在Microsoft的情况下是 40 度。

文本检测仍然是一个研究的热门话题。在野外检测文本可能具有挑战性。例如，玛丽亚评论中的图像非常简单。文字颜色为黑白，照片取自

在这里，我分享两张照片：

对 OCR 来说是一个坏问题： http://www.athletico.com/blog2/wp-content/uploads/2012/04/Runners.jpg

下面是此图像的输出Microsoft认知服务视觉 OCR API

{
"language": "zh-Hant",
"textAngle": 6.0999999999999641,
"orientation": "Up",
"regions": [
{
"boundingBox": "1441,490,51,41",
"lines": [
{
"boundingBox": "1441,490,51,41",
"words": [
{
"boundingBox": "1441,490,51,41",
"text": "39"
}
]
}
]
}
]
}

一个很好的OCR：

http://running.competitor.com/files/2014/04/HappyRunner-Raleigh14.jpg

现在让我们看看来自同一 API 的输出：

{
"language": "en",
"textAngle": -2.900000000000035,
"orientation": "Up",
"regions": [
{
"boundingBox": "1597,1824,585,576",
"lines": [
{
"boundingBox": "1654,1824,528,67",
"words": [
{
"boundingBox": "1654,1829,211,62",
"text": "7?.cek"
},
{
"boundingBox": "2146,1824,36,52",
"text": "Y'"
}
]
},
{
"boundingBox": "1603,1889,551,98",
"words": [
{
"boundingBox": "1603,1889,551,98",
"text": "RALEIGH"
}
]
},
{
"boundingBox": "1695,1990,370,37",
"words": [
{
"boundingBox": "1695,1990,79,35",
"text": "1/2"
},
{
"boundingBox": "1794,1993,271,34",
"text": "marathon"
}
]
},
{
"boundingBox": "1742,2052,138,26",
"words": [
{
"boundingBox": "1742,2052,105,23",
"text": "presented"
},
{
"boundingBox": "1856,2053,24,25",
"text": "by"
}
]
},
{
"boundingBox": "1798,2099,156,21",
"words": [
{
"boundingBox": "1798,2099,65,17",
"text": "APRIL"
},
{
"boundingBox": "1872,2101,26,19",
"text": "13,"
},
{
"boundingBox": "1905,2101,49,15",
"text": "2014"
}
]
},
{
"boundingBox": "1597,2160,536,159",
"words": [
{
"boundingBox": "1597,2160,536,159",
"text": "19401"
}
]
},
{
"boundingBox": "1749,2368,101,32",
"words": [
{
"boundingBox": "1749,2368,101,32",
"text": "benefiting"
}
]
}
]
}
]
}

好多了！有人可能会认为第二张图像更难识别。但区别在于，几何图像变换(https://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/geometry/geo-tran.html)，特别是仿射变换对于计算机来说仍然很难掌握。我们的大脑以非常好的成功率处理。

因此，OCR 将擅长识别面向相机的图像，而使用这种转换很容易在文本图像上失败。

相关内容

最新更新

热门标签：