SQL Server:仅包含基于类型的一定百分比的行



我有一个输入的问题表:Select OneTrue FalseSelect All 我正在尝试编写一个查询,该查询将返回一定数量的问题,同时将其中一种类型限制为返回类型的特定百分比。

例:

SELECT QuestionID, QuestionText, QuestionType
FROM Question
WHERE/HAVING --limit QuestionTypeID = 4 to be only 10% of total returned

我最初的想法是单独提取查询

SELECT TOP 10 PERCENT QuestionID, QuestionText, QuestionType
FROM Question
WHERE QuestionTypeID <> 4
UNION ALL
SELECT QuestionID, QuestionText, QuestionType
FROM Question
WHERE QuestionTypeID = 4

然而,似乎有一种更有效的方法来计算这一点。


编辑:

让我澄清一下我想要完成什么...我需要返回 90% 的其他类型的问题的混合,只有 10% 的类型 4。

我需要随机化结果,所以我一直在使用SET ROWCOUNTORDER BY NEWID()

我可以使用子查询,但不确定在子查询中使用SET ROWCOUNT的方式......在这一点上,似乎需要临时表,但如果有更好的方法,请告诉我......

到目前为止我有什么...带临时表

DECLARE @ReturnPercent [int] --variable that holds percentage for bottom questions
DECLARE @ReturnCount [int] --variable that holds how many questions I'd like returned
CREATE TABLE #Temp1(
        QuestionID [int], 
        QuestionText [nvarchar](256), 
        QuestionTypeID [int]
);
DECLARE @TOP [int] = @ReturnCount-CAST(@ReturnCount*@ReturnPercent AS INT);
DECLARE @BOTTOM [int] = CAST(@ReturnCount(@ReturnPercent AS INT);
SET ROWCOUNT @TOP
INSERT INTO #Temp1(QuestionID, QuestionText, QuestionTypeID)
SELECT QuestionID, QuestionText, QuestionTypeID
FROM Question
WHERE QuestionTypeID <> 4
ORDER BY NEWID()
SET ROWCOUNT 0
SET ROWCOUNT @BOTTOM
INSERT INTO #Temp1(QuestionID, QuestionText, QuestionTypeID)
SELECT QuestionID, QuestionText, QuestionTypeID
FROM Question
WHERE QuestionTypeID = 4
ORDER BY NEWID()
SET ROWCOUNT 0
--Query to join them with other data(omitted) 
SET ROWCOUNT @ReturnCount
SELECT a.QuestionID, a.QuestionText, a.QuestionTypeID
FROM #Temp1 a
JOIN --OTHER TABLES FOR FULL QUERY
ORDER BY NEWID()
SET ROWCOUNT 0

联合很可能是最有效的方法。

有一种有效的方法可以将行号分配给所有行:

SELECT row_number() over (order by (select NULL)) as seqnum

但是,如果您添加分区子句,我认为这效率不高。 如果这是有效的,并且您在问题 ID 上有一个索引,您可以使用:

select q.QuestionID, q.QuestionText, q.QuestionType
from (select QuestionID, QuestionText, QuestionType,
             row_number() over (partition by QuestionId order by (select NULL)) as seqnum
      from Question
     ) q join
     (select QuestionId, count(*) as cnt
      From Question
      group by Questionid
     ) qsum
     q.questionid = qsum.questionid
where questionid <> 4 or seqnum*1.0/cnt <= 0.1

(这个想法是,分组依据将满足索引,连接将非常便宜,并且 row_number() 高效。

但是,我认为,工会可能更快。

嗯,现在我想起来了,你可以用一个近似值走得很远:

select q.QuestionID, q.QuestionText, q.QuestionType
from (select QuestionID, QuestionText, QuestionType,
             row_number() over (order by (select NULL)) as seqnum
      from Question
     ) q
where mod(seqnum, 10) <> 0 and questionid <> 4 or
      mod(seqnum, 10) = 0 and questionid = 4

这占用了 questionid <> 4 的大约 90% 的行和 questionid = 4 的大约 10%。 这应该非常有效,因为此措辞中的 row_number() 分配一个没有排序或分区的序列号。

最新更新