类似查询计数不匹配



如果记录总数为x;像";查询为y,则计数为"0";不像";查询应该是x-y

我得到了";x〃;作为"0"的计数;不像";查询如下:

SELECT COUNT(DISTINCT(b.word)) 
FROM "hunspell"."oscar2_sorted" AS b

总计数:9597651

SELECT COUNT(distinct(b.word))
FROM "hunspell"."oscar2_sorted" as b 
INNER JOIN invalidswar AS a 
ON b.word LIKE (CONCAT('%', a.word,'%'))

点赞次数:73116

SELECT COUNT(distinct(b.word)) 
FROM "hunspell"."oscar2_sorted" AS b 
INNER JOIN invalidswar AS a
ON b.word NOT LIKE (CONCAT('%', a.word,'%'))

不相似计数:9597651

预期:9524535

我不确定我遗漏了什么。


更新:

左联接计数接近预期值,但看起来仍然不正确。

SELECT COUNT(DISTINCT(b.word))
FROM "hunspell"."oscar2_sorted" AS b 
LEFT JOIN (SELECT DISTINCT(b.word) AS dword 
FROM "hunspell"."oscar2_sorted" AS b 
INNER JOIN invalidswar AS a 
ON b.word LIKE (CONCAT('%', a.word,'%'))) AS d 
ON d.dword = b.word 
WHERE d.dword IS NULL

左联接计数:9536539


更新2:

12004的差异可以追溯到类似regexp_like执行方式的差异。

SELECT count(distinct(b.word)) 
FROM "hunspell"."oscar2_sorted" as b
INNER JOIN invalidswar AS a 
ON regexp_like(b.word, a.word)

类似regex的计数:61112

SELECT COUNT(word)
FROM (SELECT word 
FROM "hunspell"."oscar2_sorted"
EXCEPT DISTINCT
(SELECT b.word
FROM "hunspell"."oscar2_sorted" as b
INNER JOIN invalidswar AS a 
ON regexp_like(b.word, a.word)))

请参阅:EXCEPT子句和regexp_like

WITH
invalid_check AS
(
SELECT
o.word,
CASE WHEN
EXISTS (
SELECT *
FROM invalidswar AS i
WHERE o.word LIKE CONCAT('%', i.word,'%')
)
THEN
1
ELSE
0
END
AS is_invalid
FROM
"hunspell"."oscar2_sorted"   AS o
GROUP BY
o.word
)
SELECT
COUNT(*)            AS all_words,
SUM(is_invalid)     AS invalid_words,
SUM(1-is_invalid)   AS valid_words
FROM
valid_check

假设以上内容符合您的预期,那么只计算有效单词就可以…

SELECT
COUNT(DISTINCT o.word)
FROM
"hunspell"."oscar2_sorted"   AS o
WHERE
NOT EXISTS (
SELECT *
FROM invalidswar AS i
WHERE o.word LIKE CONCAT('%', i.word,'%')
)

相关内容

  • 没有找到相关文章

最新更新