设置一个表的子集,使样本正态分布在一列中的一个值周围



我有一个名为original_table:的下表

name      height      age
personA   180         21
personB   190         37
personC   168         27
personD   182         56
...

我的目标是从original_table中创建两个大小为100的随机样本,使年龄的平均值正态分布在25岁左右,身高的平均值接近175。基本上,25岁、身高175的人最有可能进入排行榜,但这并不能保证。

诀窍是从其他语言中找到正态分布函数,然后将其转换为mysql语法。例如,从下面的java代码转换为mysql

//java.util.Random
synchronized public double nextGaussian() {
// See Knuth, ACP, Section 3.4.1 Algorithm C.
if (haveNextNextGaussian) {
haveNextNextGaussian = false;
return nextNextGaussian;
} else {
double v1, v2, s;
do {
v1 = 2 * nextDouble() - 1; // between -1 and 1
v2 = 2 * nextDouble() - 1; // between -1 and 1
s = v1 * v1 + v2 * v2;
} while (s >= 1 || s == 0);
double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s)/s);
nextNextGaussian = v2 * multiplier;
haveNextNextGaussian = true;
return v1 * multiplier;
}
}
with recursive t0(v1, v2, s, num1, num2, rn) as (
select @v1:= rand() * 2 - 1,
@v2:= rand() * 2 - 1,
@s:= @v1*@v1 + @v2*@v2,
sqrt(-2 * log(@s) / (@s)) * @v1,
sqrt(-2 * log(@s) / (@s)) * @v2,
@rn:=case when @s < 1 and @s > 0 then 1 else 0 end
union all
select @v1:= rand() * 2 - 1,
@v2:= rand() * 2 - 1,
@s:= @v1*@v1 + @v2*@v2,
sqrt(-2 * log(@s) / (@s)) * @v1,
sqrt(-2 * log(@s) / (@s)) * @v2,
@rn:=case when @s < 1 and @s > 0 then 1 else 0 end + @rn
from t0
where @rn < 100
)
select 175 + t1.num1 * 10,
175 + t1.num2 * 10,
25 + t1.num1 * 8,
25 + t1.num2 * 8
from t0 t1
where t1.num1 is not null
order by t1.num1
;

相关内容

最新更新