Lucene查询结果的长值和双值不正确

我使用Lucene 6.1.0为具有名称和值的元素编制索引
例如

<documents>
<Document>
<field name="NAME" value="Long_-1"/>
<field name="VALUE" value="-1"/>
</Document>
<Document>
<field name="NAME" value="Double_-1.0"/>
<field name="VALUE" value="-1.0"/>
</Document>
<Document>
<field name="NAME" value="Double_-0.5"/>
<field name="VALUE" value="-0.5"/>
</Document>
<Document>
<field name="NAME" value="Long_0"/>
<field name="VALUE" value="0"/>
</Document>
<Document>
<field name="NAME" value="Double_0.0"/>
<field name="VALUE" value="0.0"/>
</Document>
<Document>
<field name="NAME" value="Double_0.5"/>
<field name="VALUE" value="0.5"/>
</Document>
<Document>
<field name="NAME" value="Long_1"/>
<field name="VALUE" value="1"/>
</Document>
<Document>
<field name="NAME" value="Double_1.0"/>
<field name="VALUE" value="1.0"/>
</Document>
<Document>
<field name="NAME" value="Double_1.5"/>
<field name="VALUE" value="1.5"/>
</Document>
<Document>
<field name="NAME" value="Long_2"/>
<field name="VALUE" value="2"/>
</Document>
</documents>

根据文档，我使用LongPoint和DoublePoint来构建索引。

public static void addLongField(String name, long value, Document doc) {
doc.add(new LongPoint(name, value));
// since Lucene6.x a second field is required to store the values. 
doc.add(new StoredField(name, value));
}
public static void addDoubleField(String name, double value, Document doc) {
doc.add(new DoublePoint(name, value));
// since Lucene6.x a second field is required to store the values. 
doc.add(new StoredField(name, value));
}

由于我对长值和双值使用相同的字段，如果最小值和最大值有不同的符号，我的RangeQuery会得到奇怪的结果。

LongPoint.newRangeQuery(field, minValue, maxValue);
DoublePoint.newRangeQuery(field, minValue, maxValue);

此示例是正确的：
值：[1到1]值：[0.5到1.0]

结果：
0.5 nbsp Double_0.5
1 nbsp nbsp nbsp；Long_1
1.0 nbsp Double_1.0

此示例是错误的
值：[0到1]值：[-0.5到1.0]

结果：
0 nbsp nbsp nbsp；Long_0
0.0 nbsp Double_0.0
1 nbsp nbsp nbsp；Long_1
-1 nbsp nbsp Long_-1
-0.5 Double_-0.5
0.5 nbsp Double_0.5
1.0 nbsp Double_1.0
2 nbsp nbsp nbsp；Long_2

除了正确的结果外，还会返回所有长值。

有人知道为什么吗
是否不能在同一字段中存储长值和双值
非常感谢

BR Tobias

不，不应该在同一字段中保留不同的数据类型。您应该将它们放在单独的字段中，或者将long转换为double(反之亦然)，以便它们都以相同的格式进行索引。

要了解发生了什么，了解数字字段的实际作用会有所帮助。数值字段以二进制表示形式进行编码，便于对该类型进行范围搜索。积分类型的编码和浮点类型的编码是不可比较的。例如，对于数字1:

long1=lucene字节参考：[80 0 0 0
double 1.0=lucene BytesRef:[bf f0 0 0 0 0]

这些BytesRef二进制表示实际上是被搜索的。由于查询的一部分是从双0.5到1.0，因此您实际上正在运行一个查询：

编码器值：[40 1f ff ff ff ff ff]-bf f0 0 0 0

这不包括长值范围之外的一些额外命中，但之外的大多数长值实际上是的高和低范围(您需要进入Long.MAX_VALUE/2的邻域)。

相关内容

最新更新

热门标签：