我在solr
中有一个多值字段,它有类似的用户名称
{
"counsel_for_department": [
"mr a g srivastava with mr xyz doe,
" mr johh david and mr john deo",
" mr n p smith and mr ng smith",
]
},
当我像fq=counsel_for_department:a g srivastava
那样查询时,它不会返回任何结果。我正在使用这个领域的标准标记器
该字段的字段类型为text_general
如果我们需要为多值字段配置不同的设置,请告诉我。
我得到以下json对象
{
"responseHeader": {
"status": 0,
"QTime": 20,
"params": {
"q": "*:*",
"indent": "true",
"fl": "counsel_for_department",
"fq": [
"doc_type:source_analysis",
"counsel_for_department:*g*c*Srivastava*"
],
"rows": "100",
"wt": "json",
"debugQuery": "true",
"_": "1459351342391"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"debug": {
"rawquerystring": "*:*",
"querystring": "*:*",
"parsedquery": "MatchAllDocsQuery(*:*)",
"parsedquery_toString": "*:*",
"explain": {},
"QParser": "LuceneQParser",
"filter_queries": [
"doc_type:source_analysis",
"counsel_for_department:*g*c*Srivastava*"
],
"parsed_filter_queries": [
"doc_type:source_analysis",
"counsel_for_department:*g*c*srivastava*"
],
"timing": {
"time": 20,
"prepare": {
"time": 16,
"query": {
"time": 16
},
"facet": {
"time": 0
},
"facet_module": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"expand": {
"time": 0
},
"debug": {
"time": 0
}
},
"process": {
"time": 3,
"query": {
"time": 3
},
"facet": {
"time": 0
},
"facet_module": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"expand": {
"time": 0
},
"debug": {
"time": 0
}
}
}
}
}
提前感谢
通配符查询不会被分析,所以在大多数情况下最好远离它们,而是使用术语匹配。这样,无论术语的顺序如何,您都可以匹配文档,因此"john oliver"也将匹配"oliver john","john oliver"基于短语匹配而增强。
为了扩展,通配符匹配的唯一方式是基础数据集中的实际令牌匹配——如果你有令牌化器和过滤器链,通常情况下,只要你在混合中加入一个空格,它就不会出现。
去掉通配符并使用适当的匹配(这正是Solr真正擅长的)。
对于纯文本搜索,您应该选择:
fq=counsel_for_department:*a g srivastava*
//OR you can also use :
fq=counsel_for_department:*a*g*srivastava*
一开始就这样用。但在SOLR中,这是一个相对昂贵/较慢的查询。作为改进,如果此查询非常昂贵(花费太多时间),则应该在1个合并字段中转换多值字段。并查询该字段而不是多值字段。