如何使 LUIS Microsoft区分大小写?

我有一个用于NLP的Azure LUIS实例，尝试使用正则表达式提取字母数字值。它运行良好，但输出以小写字母输出。

例如：

案例1*

我的输入："运行 AE0002 的作业"RegExCode = [a-zA-Z]{2}d+

输出：

{
"query": " run job for AE0002",
"topScoringIntent": {
"intent": "Run Job",
"score": 0.7897274
},
"intents": [
{
"intent": "Run Job",
"score": 0.7897274
},
{
"intent": "None",
"score": 0.00434472738
}
],
"entities": [
{
"entity": "ae0002",
"type": "Alpha Number",
"startIndex": 15,
"endIndex": 20
}
]
}

我需要维护输入的大小写。

案例2

我的输入："只提取像惠普和IBM这样的阿布雷航空"RegExCode = [A-Z]{2,}

输出：

{
"query": "extract only abreaviations like hp and ibm", // Query accepted by LUIS test window
"query": "extract only abreaviations like HP and IBM", // Query accepted as an endpoint url
"prediction": {
"normalizedQuery": "extract only abreaviations like hp and ibm",
"topIntent": "None",
"intents": {
"None": {
"score": 0.09844558
}
},
"entities": {
"Abbre": [
"extract",
"only",
"abreaviations",
"like",
"hp",
"and",
"ibm"
],
"$instance": {
"Abbre": [
{
"type": "Abbre",
"text": "extract",
"startIndex": 0,
"length": 7,
"modelTypeId": 8,
"modelType": "Regex Entity Extractor",
"recognitionSources": [
"model"
]
},
{
"type": "Abbre",
"text": "only",
"startIndex": 8,
"length": 4,
"modelTypeId": 8,
"modelType": "Regex Entity Extractor",
"recognitionSources": [
"model"
]
},....          
{
"type": "Abbre",
"text": "ibm",
"startIndex": 39,
"length": 3,
"modelTypeId": 8,
"modelType": "Regex Entity Extractor",
"recognitionSources": [
"model"
]
}
]
}
}
}
}

这让我怀疑整个训练是否都是小写的，让我震惊的是，最初训练到各自实体的所有单词都被重新训练为缩写

任何意见都会有很大帮助:)

谢谢

对于案例 1，是否需要保留大小写才能在系统上查询作业？只要作业标识符始终具有大写字符，您就可以使用 toUpperCase((，例如var jobName = step._info.options.entities.Alpha_Number.toUpperCase()(不确定 Alpha Number 中的下划线，我以前从未有过带空格的实体(。

对于案例 2，这是 LUIS 应用程序的一个缺点。您可以使用 (？-i( 在正则表达式中强制区分大小写(例如/(?-i)[A-Z]{2,}/g(。但是，LUIS 似乎首先将所有内容转换为小写，因此你永远不会获得与该语句的任何匹配(这比匹配每个单词要好，但这并不能说明太多！我不知道有什么方法可以让 LUIS 以你请求的方式识别实体。

您可以创建一个包含您期望的所有缩写的列表实体，但根据您期望的输入，这可能太多而无法维护。加上也是单词的缩写将被选为误报(例如 CAT 和 cat(。还可以编写一个函数在 LUIS 之外为你执行此操作，基本上是构建自己的手动实体检测。根据您在确定缩写后尝试执行的操作，可能会有一些其他解决方案。

您可以简单地使用输出中提供的单词索引从输入字符串中获取值，就像它们提供的那样。

{
"query": " run job for AE0002",
...
"entities": [
{
"entity": "ae0002",
"type": "Alpha Number",
"startIndex": 15,
"endIndex": 20
}
]
}

获得此回复后，请在查询中使用substring方法，使用startIndex和endIndex(如果您的方法需要长度而不是结束索引，则endIndex - startIndex(，以便获得所需的值。

相关内容

最新更新

热门标签：