sqlserver-使用筛选统计信息的情况

我在下面的链接中查看了过滤后的统计数据。

http://blogs.msdn.com/b/psssql/archive/2010/09/28/case-of-using-filtered-statistics.aspx

数据严重倾斜，一个区域有0行，其余都来自不同的区域。以下是完整的代码来重现问题

create table Region(id int, name nvarchar(100)) 
go 
create table Sales(id int, detail int) 
go 
create clustered index d1 on Region(id) 
go 
create index ix_Region_name on Region(name) 
go 
create statistics ix_Region_id_name on Region(id, name) 
go 
create clustered index ix_Sales_id_detail on Sales(id, detail) 
go
-- only two values in this table as lookup or dim table 
insert Region values(0, 'Dallas') 
insert Region values(1, 'New York') 
go
set nocount on 
-- Sales is skewed 
insert Sales values(0, 0) 
declare @i int 
set @i = 1 
while @i <= 1000 begin 
insert Sales  values (1, @i) 
set @i = @i + 1 
end 
go
update statistics Region with fullscan 
update statistics Sales with fullscan 
go
set statistics profile on 
go 
--note that this query will over estimate 
-- it estimate there will be 500.5 rows 
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile) 
--this query will under estimate 
-- this query will also estimate 500.5 rows in fact 1000 rows returned 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 
go
set statistics profile off 
go
create statistics Region_stats_id on Region (id) 
where name = 'Dallas' 
go 
create statistics  Region_stats_id2 on Region (id) 
where name = 'New York' 
go
set statistics profile on 
go 
--now the estimate becomes accurate (1 row) because 
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile)
--the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 
go
set statistics profile off

我的问题是，我们在两张表上都有以下统计数据

sp_helpstats 'region','all'
sp_helpstats 'sales','all'

表格区域：

statistics_name   statistics_keys
d1                    id
ix_Region_id_name     id, name
ix_Region_name        name

餐桌销售：

statistics_name    statistics_keys
ix_Sales_id_detail     id, detail

1.为什么以下查询的估计出错

select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile)
--the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile)

2.当我按照作者的要求创建过滤统计数据时，我可以正确地看到估计，但为什么我们需要创建过滤统计信息，我怎么能说我的查询需要过滤统计信息呢？因为即使我创建了简单的统计数据，我也得到了相同的结果。

迄今为止我遇到的最好的1.Kimberely tripp偏斜统计视频
2.Technet统计白皮书

但仍然无法理解为什么过滤后的统计数据会在这里产生影响

提前谢谢。更新：7/4

在martin和james回答后重新表述问题：

1.有什么方法可以避免数据倾斜
除了kimberely脚本，还有一种估计方法是计算一个值的行数。

2.在你的经历中，你是否遇到过数据偏倚的问题。我想这取决于大桌子。但我正在寻找一些详细的答案

3.我们必须承担sql扫描表的IO成本，有时还会为触发更新统计时出现的查询带来一些阻塞。在维护统计时，你看到了除此之外的其他开销吗。

原因是我也在考虑基于DTA输入的几个条件来创建归档的统计数据。

再次感谢

我认为这就是发生这种情况的原因。您得到了相同的估计（500.5）行，因为SQL Server没有统计信息来判断哪些ID是与哪个区域相关的ID。统计信息ix_Region\id_name同时具有这两个字段，但由于直方图只存在于第一列，因此它确实无助于估计Sales表中的行数。

如果运行dbcc show_statistics ('Region','ix_Region_id_name')，结果将是：

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
0              0            1         0                     1
1              0            1         0                     1

因此，这告诉每个ID有一行，但没有到名称的链接。

但当您创建统计数据Region_stats_id（用于达拉斯）时，dbcc show_statistics ('Region','Region_stats_id')将显示：

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
0              0            1         0                     1

因此，SQL Server知道只有1行，并且它的ID为0。

类似地，Region_stats_id2:

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
1              0            1         0                     1

销售额中的行数在ix_sales_id_detail中，这将有助于确定每个id:的行数

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
0              0            1         0                     1
1              0            1000      0                     1

信息：这是@MartijnPieters删除的答案的副本，因为这是我想要回答的问题——我似乎对删除的答案无能为力。从今天起，我不小心把这篇文章第一次写到了TheGameiswar的另一个统计问题上，但我已经删除了自己。

相关内容

最新更新

热门标签：