我构建了一个存储过程,旨在识别表中的重复行,并以有意义的顺序显示重复的行。它看起来像这样:
CREATE PROCEDURE [dbo].[spFindDuplicates]
@tableName nvarchar(255),
@field1 nvarchar(255),
@field2 nvarchar(255) = '1',
@field3 nvarchar(255) = '2',
@field4 nvarchar(255) = '3',
@field5 nvarchar(255) = '4'
AS
BEGIN
DECLARE @query AS nvarchar(MAX);
SET @query = '
SELECT *
FROM ' + @tableName + '
WHERE CAST(' + @field1 + ' AS nvarchar(255)) + CAST(' + @field2 + ' AS nvarchar(255)) + CAST(' + @field3 + ' AS nvarchar(255)) + CAST(' + @field4 + ' AS nvarchar(255)) + CAST(' + @field5 + ' AS nvarchar(255))
IN
(
SELECT CAST(' + @field1 + ' AS nvarchar(255)) + CAST(' + @field2 + ' AS nvarchar(255)) + CAST(' + @field3 + ' AS nvarchar(255)) + CAST(' + @field4 + ' AS nvarchar(255)) + CAST(' + @field5 + ' AS nvarchar(255))
FROM ' + @tableName + '
GROUP BY CAST(' + @field1 + ' AS nvarchar(255)) + CAST(' + @field2 + ' AS nvarchar(255)) + CAST(' + @field3 + ' AS nvarchar(255)) + CAST(' + @field4 + ' AS nvarchar(255)) + CAST(' + @field5 + ' AS nvarchar(255))
HAVING COUNT(*) > 1
)
ORDER BY ' + @field1 + ', ' + @field2 + ', ' + @field3 + ', ' + @field4 + ', ' + @field5
EXECUTE(@query);
END
GO
--Example:
EXEC spFindDuplicates @tableName = 'someRandomTable', @field1 = 'firstField', @field2 = 'secondField', @field3 = 'thirdField'
正如你所看到的,我最多可以使用5个不同的字段来连接,以便获得一个用于确定是否有重复的键。请注意,我使用CAST函数可以将字段与各种数据类型(varchar、int、date等)连接起来
当我用5个不同的字段执行上面的存储过程时,它工作得很好。但我希望能够用可变数量的字段(从1到5)运行它,这就是为什么我为@field2到@field5提供了默认值。
但是,当我用上面的例子(提供了3个字段)执行它时,我会得到以下错误消息:
在排序依据列表中多次指定列。按顺序列表中的列必须是唯一的。
问题:如何在不出错的情况下继续对结果表进行排序?
奖金问题:如果你找到一种动态的方法来使用任何数量的字段(4、17或其他)的存储过程,那对我来说会更有用。
正如我在评论中所说,注射是一个巨大的问题,你需要考虑它。说"让我们考虑一下我不介意注射">是天真的,你需要改变这种态度。始终确保SQL的安全;那么你的应用程序就没有任何借口和机会被泄露。
正如你所追求的,我怀疑这达到了目标。子查询不需要使用IN
扫描您的表。在这里,您可以在CTE中使用COUNT
和OVER
子句。
CREATE PROCEDURE [dbo].[FindDuplicates] --I've removed te sp prefix, as sp_ is reserved by MS
@tableName sysname,
@field1 sysname,
@field2 sysname = NULL,
@field3 sysname = NULL,
@field4 sysname = NULL,
@field5 sysname = NULL
AS BEGIN
DECLARE @query AS nvarchar(MAX);
SET @query = N'WITH CTE AS(' + NCHAR(10) +
N' SELECT *' + NCHAR(10) +
N' COUNT(*) OVER (PARTITION BY ' + STUFF(CONCAT(N',' + QUOTENAME(@field1),N',' + QUOTENAME(@field2),N',' + QUOTENAME(@field3),N',' + QUOTENAME(@field4),N',' + QUOTENAME(@field5)),1,1,N'') + N' AS RowCount' + NCHAR(10) +
N' FROM ' + QUOTENAME(@tableName) + N')' + NCHAR(10) +
N'SELECT *' + NCHAR(10) +
N'FROM CTE' + NCHAR(10) +
N'WHERE RowCount > 1' + NCHAR(10) +
N'ORDER BY ' + STUFF(CONCAT(N',' + QUOTENAME(@field1),N',' + QUOTENAME(@field2),N',' + QUOTENAME(@field3),N',' + QUOTENAME(@field4),N',' + QUOTENAME(@field5)),1,1,N'') + N';';
PRINT @query;
--EXEC sys.sp_executesql @query; --Uncomment to rrun the actual query
END
GO
对于您给我们的命令EXEC dbo.FindDuplicates @tableName = 'someRandomTable', @field1 = 'firstField', @field2 = 'secondField', @field3 = 'thirdField';
,这将返回SQL:
WITH CTE AS(
SELECT *
COUNT(*) OVER (PARTITION BY [firstField],[secondField],[thirdField] AS RowCount
FROM [someRandomTable])
SELECT *
FROM CTE
WHERE RowCount > 1
ORDER BY [firstField],[secondField],[thirdField];
我相信会给你带来你想要的行为。
编辑代码以检查sys.columns上是否存在列列表,方法是确保我们只获取合适的列。
CREATE FUNCTION dbo.fn_SplitString
(
@List NVARCHAR(MAX),
@Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(@List, @Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
ALTER PROCEDURE [dbo].[spFindDuplicates]
@tableName nvarchar(255),
@columnlist nvarchar(max)
AS
BEGIN
DECLARE @query AS nvarchar(MAX);
SET @columnlist = (SELECT STUFF((SELECT ','+'['+[name]+']'
FROM SYS.columns
WHERE object_id = object_id(@tableName)
AND [Name] IN
(
SELECT Item
FROM dbo.fn_SplitString(@columnlist,',')
)
FOR XML PATH('')
)
,1,1,''))
PRINT @columnlist
SET @query = 'SELECT * FROM (SELECT '+CAST(@columnlist AS NVARCHAR(MAX))+'
FROM '+CAST(@tableName AS nvarchar(MAX))+'
GROUP BY '+CAST(@columnlist AS NVARCHAR(MAX))+'
HAVING COUNT(*) > 1)Res1
ORDER BY '+@columnlist
EXEC SP_EXECUTESQL @query;
END
GO