Teradata 中偏斜因子的错误数字是什么



我以这种方式确定偏斜因子:

SELECT 
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM, 
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR 
FROM 
DBC.TABLESIZE 
WHERE DATABASENAME= <DATABASENAME> 
AND 
TABLENAME =<TABLENAME>  
GROUP BY 1;

对于某些大小为 ~600 Gb 的表,偏斜系数为 30%。对于大小为 10 Gb 的表,98% 相当高。上面的数字到底有多糟糕?有没有官方文章说应该重新分配超过10%?我需要它来证实市场开发人员的要求。我只找到了这个

偏斜因子为 30 意味着与平均值相比,最大 AMP 的数据多出约 40%。这可能仍然是可以接受的(当然这取决于),与您的DBSa讨论他们通常认为太大的事情。

另一方面,98 意味着 max-AMP 上的数据要多 40 到 50 ,这已经很多了。

这比较了计算偏斜的两种方法:

SELECT
   t.DatabaseName
   ,t.TableName
   -- currently used diskspace in GB
   ,SUM(t.CurrentPerm) / 1024**3 (DEC(9,2)) AS CurrentPermGB
   -- currently needed diskspace in GB to store this table as standalone (due to Skew)
   ,MAX(t.CurrentPerm) / 1024**3 * (HASHAMP() + 1) (DEC(9,2)) AS SkewedPermGB
   ,SkewedPermGB - CurrentPermGB  AS WastedPermGB
   -- AMP with higehst disk usage
   ,MAX(t.MaxPermAMP) AS SkewedAMP
   -- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP
   ,MAX(t.CurrentPerm) / NULLIF(AVG(t.CurrentPerm),0) (DEC(5,2)) AS SkewFactor
   -- skew factor, between 0 and 99.  Same calculation as WinDDI/ TD Administrator
   ,(100 - (AVG(t.CurrentPerm) / NULLIF(MAX(t.CurrentPerm),0) * 100)) (DEC(3,0)) AS SkewFactor_WINDDI
FROM
 (
   SELECT
      DatabaseName,
      TableName,
      CurrentPerm,
      CASE WHEN CurrentPerm = MAX(CurrentPerm) OVER (PARTITION BY DatabaseName, TableName) THEN vproc END AS MaxPermAMP
   FROM dbc.TableSizeV
   WHERE DatabaseName = '???' --
) AS t
GROUP BY 1,2
HAVING SkewFactor > 1.1 -- or whatever
   AND SkewedPermGB > 10 -- or whatever
ORDER BY WastedPermGB DESC
;

没有任何神奇的数字,但有一个 98% 偏斜的表意味着几乎所有数据都位于单个 AMP 中,这意味着 (1) 您正在失去并行数据库的性能优势 (2) 您在系统上创建不平衡的负载。

相关内容

  • 没有找到相关文章

最新更新