尝试在表上进行一些数据探索,以单独计算行中所有列和所有唯一字符串,如以下输出:
+------------+-------------+------------+----------+
| table_name | column_name | distinct | count(*) |
| | | row_string | |
+------------+-------------+------------+----------+
| customer | state | WA | 15 |
+------------+-------------+------------+----------+
| customer | state | NSW | 786 |
+------------+-------------+------------+----------+
| customer | state | SA | 51 |
+------------+-------------+------------+----------+
| ... | ... | ... | ... |
+------------+-------------+------------+----------+
| customer | zip_code | 3563 | 33 |
+------------+-------------+------------+----------+
| customer | zip_code | 7583 | 52 |
+------------+-------------+------------+----------+
| customer | zip_code | 3453 | 553 |
+------------+-------------+------------+----------+
| customer | zip_code | 2132 | 211 |
+------------+-------------+------------+----------+
| ... | ... | ... | ... |
+------------+-------------+------------+----------+
我一直在做这样的事情:
select state, count(*)
from customer
group by state
union
select zip_code, count(*)
from customer
group by zip_code
union
...
但是,如果您的表中有成堆的列,那么这是不有效的。有没有更有效的方法来实现这一点?
(下面是在我看到Martin Smith在上面评论中发布的链接之前发布的。使用XML的方法看起来更简单,并且可能具有大致相似的性能。)
您可以尝试生成如下动态SQL:
select
cast(N'customer' AS sysname) AS table_name,
C.column_name,
C.value,
count(*) AS [count]
from [customer] T
cross apply (
values
(cast(N'state' AS sysname), cast(T.[state] AS nvachar(max)),
(cast(N'zip_code' AS sysname), cast(T.[zip_code] AS nvachar(max))
) C(column_name, value)
group by column_name, value
CAST()
操作可能看起来有点过分,但是如果较短的数据后面跟着较长的数据,它们可能是必要的,以防止截断。对于某些数据类型,如DATETIME或FLOAT,您可能希望在格式化中更精确,甚至可能有意限制多余的精度。
当向生成的SQL中注入表名和列名时,使用QUOTENAME(...)
和QUOTENAME(..., '''')
安全地[quote]或'quote'注入的值