如何使用人造丝并行计算PI



我的代码可以使用拉马努金的公式计算Pi,而无需Rayon,我想实现Rayon进行并行线程,因为这是我的项目。

我知道我需要使用它

use rayon::prelude::*;
fn sum_of_squares(input: &[f64]) ->f64 {
for i in total.iter() // <-- just change that!
.map(|&i| i * i)
.sum()
}

但我仍然不明白该怎么办。

这是我的代码

use rayon::prelude::*;
pub fn factorial(n: f64) -> f64 {
if n == 0.0 {
return 1.00;
} else {
let result: f64 = factorial(n - 1.0) * n;
return result;
}
}
pub fn estimate_pi() -> f64 {
let mut total: f64 = 0.0;
let mut k: f64 = 0.0;
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
loop {
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
total += term;
if term.abs() < 1e-15 {
break;
}
k += 1.0;
}
return 1.0 / total;
}
fn main() {
println!("{}", estimate_pi());
}

操场

第一步是通过使每次迭代独立来使算法可并行化。我做的第一件事是添加一个调试语句来打印k的最终值:

k += 1.0;
}
dbg!(k);
return 1.0 / total;

打印k = 2,所以我可以使用它来创建独立于每次迭代的k值范围:

(0..=iterations) // [0, 1, 2] for iterations = 2

我们将遍历该范围内的元素,而不是使用 epsilon 检查:

pub fn estimate_pi(iterations: usize) -> f64 {
let mut total: f64 = 0.0;
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
for i in 0..=iterations {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
total += term;
}
return 1.0 / total;
}
// call estimate_pi(2)

Total 只是所有迭代的总和,因此我们可以将其从循环转换为 map-reduce 操作。对于范围内的每个数字,我们计算term。然后,我们使用fold(reduce(来计算总和。

pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
let sum: f64 = (0..=iterations).into_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).fold(0.0, |a, b| a + b);
return 1.0 / sum;
}

现在我们可以使用 rayon 的方法将其转换为并行操作。将into_iter()替换为into_par_iter()fold(0.0, |a, b| a + b)替换为reduce(|| 0.0, |a, b| a + b)

pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
// map is now a parallel map, and reduce is a parallel reduce
let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).reduce(|| 0.0, |a, b| a + b);
return 1.0 / sum;
}

现在稍微清理一下代码以使其更惯用:

  • 在适当的情况下删除显式键入
  • 使用隐式返回
  • 对 sqrt(2( 使用常量
  • 更有意义的变量名称
  • 在表达式中嵌入396
use std::f64::consts::*;
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}

作为最后一步,我们也可以并行factorial

// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
// notice we now pass the `i: usize` in here
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}

最终代码

use rayon::prelude::*;
use std::f64::consts::*;
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
fn main() {
// our algorithm results in the same value as the constant
println!("pi_a: {:.60}", estimate_pi(2));
println!("pi_c: {:.60}", PI);
}

输出

pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000

操场

建议

您应该使用不同数量的并行性对不同版本进行基准测试,以查看性能如何或多或少。可能是人造丝并行迭代导致性能降低,因为您的总迭代次数很少。

您还可以考虑使用阶乘查找表,因为n <= k * 4 <= 8

pub fn factorial(n: usize) -> f64 {
const TABLE: [f64; 9] = [
1.0,     // 0!
1.0,     // 1!
2.0,     // 2!
6.0,     // 3!
24.0,    // 4!
120.0,   // 5!
720.0,   // 6!
5040.0,  // 7!
40320.0, // 8!
];
TABLE[n]
}

操场

当然,启用内联也会有所帮助。

最新更新