XQuery集合中的计数器出现问题



我正在学习XQuery,并在5年前发现了以下代码:

如何编写XQuery flwor表达式来计算单词之间的概率?

我正在学习如何使collection((适应它,并修改了如下代码:

<table border='1'>
<tr><td>Target</td><td>Successor</td><td>Frequency</td><td>Probability</td></tr>
{
let $target := "we"
let $occurrences := collection(".?select=*xml")//s//w[lower-case(normalize-space())=$target]
for $successor in distinct-values($occurrences/following-sibling::w[1])
let $frequency := $occurrences/following-sibling::w[1][. = $successor]
let $probability := count($frequency) div count(collection(".?select=*xml")//s//w[lower-case(normalize-space()) = lower-case(normalize-space($successor))])
order by count($frequency) descending
return <tr>
<td>{$target}</td>
<td>{$successor}</td>
<td>{count($frequency)}</td>
<td>{$probability}</td>
</tr>
}
</table>
</body>

虽然它效果很好,但它的效率非常低,而且需要很长时间!我知道这是因为第二次收集发生了。我的问题是,你能给我建议如何重写第二次收集的部分吗?目标是在不使用第二个集合的情况下,收集所有与$occurrences/following同级具有相同值的单词:w[1](但不一定在$target之后,我需要找到所有与同级具有相同价值的单词才能将其用作股息(。

在BaseX这样的XML数据库中,您应该尝试在w上设置索引,通常分组可能会有所帮助:

<table border='1'>
<tr><td>Target</td><td>Successor</td><td>Frequency</td><td>Probability</td></tr>
{
let $target := "we"
let $occurrences := collection(".?select=*xml")//s//w[lower-case(normalize-space())=$target]
for $successor in $occurrences/following-sibling::w[1]
group by $w := lower-case(normalize-space($sucessor))
let $frequency := count($successor)
let $probability := $frequency div count(collection(".?select=*xml")//s//w[lower-case(normalize-space()) = $w])
order by $frequency descending
return <tr>
<td>{$target}</td>
<td>{$w}</td>
<td>{$frequency}</td>
<td>{$probability}</td>
</tr>
}
</table>
</body>

根据集合和/或优化器的实现,将let $words := collection(".?select=*xml")//s//w拉入变量并在上进一步使用可能有帮助,也可能没有帮助

<table border='1'>
<tr><td>Target</td><td>Successor</td><td>Frequency</td><td>Probability</td></tr>
{
let $target := "we",
$words := collection(".?select=*xml")//s//w
let $occurrences := $words[lower-case(normalize-space())=$target]
for $successor in $occurrences/following-sibling::w[1]
group by $w := lower-case(normalize-space($sucessor))
let $frequency := count($successor)
let $probability := $frequency div count($words[lower-case(normalize-space()) = $w])
order by $frequency descending
return <tr>
<td>{$target}</td>
<td>{$w}</td>
<td>{$frequency}</td>
<td>{$probability}</td>
</tr>
}
</table>
</body>

我不知道收集的数量和你的单词数量,但当你想找到和计算某些单词时,首先在地图中计算所有单词,然后访问地图也可能有帮助,例如

<table border='1'>
<tr><td>Target</td><td>Successor</td><td>Frequency</td><td>Probability</td></tr>
{
let $target := "we",
$words := collection(".?select=*xml")//s//w,
$word-map := map:merge(for $word in $words group by $key := lower-case(normalize-space($word)) return map { $key : count($word) })
let $occurrences := $words[lower-case(normalize-space())=$target]
for $successor in $occurrences/following-sibling::w[1]
group by $w := lower-case(normalize-space($sucessor))
let $frequency := count($successor)
let $probability := $frequency div $word-map($w)
order by $frequency descending
return <tr>
<td>{$target}</td>
<td>{$w}</td>
<td>{$frequency}</td>
<td>{$probability}</td>
</tr>
}
</table>
</body>

使用XQuery3.1窗口是否有帮助也很有趣,但我需要先了解XML的结构。

最新更新