如何尽快处理 Parallel.ForEach 结果?



我正在尝试优化一个大集合的过程(~100万件左右)。我正在考虑使用Parallel.ForEach但似乎在我的情况下它不一定使事情变得更快。

基本上这个过程是双重的:

  1. 使用一些 CPU 密集型操作处理集合
  2. 执行一些 IO 绑定(与数据库相关的内容)

我想到的一个优化是通过BlockingCollection处理在 1. 中处理的集合部分。

理论上,使用Parallel.ForEach处理 1. 比使用顺序方法更快,但如果 IO 可以在 CPU 绑定操作的结果被阻止时出现会更好(但这部分必须按顺序完成......由于相关的数据库锁...这样至少 1.经过优化且速度更快。

public static class Program
{
[CoreJob]
[RPlotExporter, RankColumn]
public class Paralleling
{
private IEnumerable<int> _items;
[GlobalSetup]
public void Setup()
{
_items = Enumerable.Range(0, 1000);
}
public static long Ackermann(long m, long n)
{
if (m > 0)
{
if (n > 0)
{
return Ackermann(m - 1, Ackermann(m, n - 1));
}
if (n == 0)
{
return Ackermann(m - 1, 1);
}
}
else if (m == 0)
{
if (n >= 0)
{
return n + 1;
}
}
throw new ArgumentOutOfRangeException();
}
[Benchmark]
public async Task ParallelAndSequential()
{
var blockingCollection = new BlockingCollection<(bool, int, int)>();
Task.Run(() =>
{
Parallel.ForEach(_items, item =>
{
Ackermann(0, 2);
var dummy = item % 2 == 0;
blockingCollection.Add((dummy, item, Thread.CurrentThread.ManagedThreadId));
});
blockingCollection.CompleteAdding();
});
using(var streamWriter = new StreamWriter(new MemoryStream()))
{
foreach (var result in blockingCollection.GetConsumingEnumerable())
{
await streamWriter.WriteLineAsync(result.ToString());
await Task.Delay(10);
}
}
}
[Benchmark]
public async Task AllSequential()
{
using(var streamWriter = new StreamWriter(new MemoryStream()))
{
foreach (var item in _items)
{
Ackermann(0, 2);
var dummy = item % 2 == 0;
var result = (dummy, item, Thread.CurrentThread.ManagedThreadId);
await streamWriter.WriteLineAsync(result.ToString());
await Task.Delay(10);
}
}
}
}
public static void Main(params string[] args)
{
var summary = BenchmarkRunner.Run<Paralleling>();
}
}

基准测试结果:

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 2 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
// start dotnet restore  /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 in C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasenetcoreapp2.2e6babe6d-16ff-42cd-aa3e-d457250f812c
// command took 1.83s and exited with 0
// start dotnet build -c Release  --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 in C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasenetcoreapp2.2e6babe6d-16ff-42cd-aa3e-d457250f812
c
// command took 3.48s and exited with 0
// ***** Done, took 00:00:05 (5.46 sec)   *****
// Found 2 benchmarks:
//   Paralleling.ParallelAndSequential: Core(Runtime=Core)
//   Paralleling.AllSequential: Core(Runtime=Core)
Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)// **************************
// Benchmark: Paralleling.ParallelAndSequential: Core(Runtime=Core)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "e6babe6d-16ff-42cd-aa3e-d457250f812c.dll" --benchmarkName "ConsoleApp.Program+Paralleling.ParallelAndSequential" --job "Core" --benchmarkId 0 in C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinRe
leasenetcoreapp2.2e6babe6d-16ff-42cd-aa3e-d457250f812cbinReleasenetcoreapp2.2
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: Core(Runtime=Core)
OverheadJitting  1: 1 op, 595300.00 ns, 595.3000 us/op
WorkloadJitting  1: 1 op, 15646340800.00 ns, 15.6463 s/op
WorkloadWarmup   1: 1 op, 15623246700.00 ns, 15.6232 s/op
WorkloadWarmup   2: 1 op, 15633394200.00 ns, 15.6334 s/op
WorkloadWarmup   3: 1 op, 15621610400.00 ns, 15.6216 s/op
WorkloadWarmup   4: 1 op, 15623904400.00 ns, 15.6239 s/op
WorkloadWarmup   5: 1 op, 15628894600.00 ns, 15.6289 s/op
WorkloadWarmup   6: 1 op, 15619927500.00 ns, 15.6199 s/op
// BeforeActualRun
WorkloadActual   1: 1 op, 15622656400.00 ns, 15.6227 s/op
WorkloadActual   2: 1 op, 15625515000.00 ns, 15.6255 s/op
WorkloadActual   3: 1 op, 15615469600.00 ns, 15.6155 s/op
WorkloadActual   4: 1 op, 15631936300.00 ns, 15.6319 s/op
WorkloadActual   5: 1 op, 15619036800.00 ns, 15.6190 s/op
WorkloadActual   6: 1 op, 15622770800.00 ns, 15.6228 s/op
WorkloadActual   7: 1 op, 15625282100.00 ns, 15.6253 s/op
WorkloadActual   8: 1 op, 15621714600.00 ns, 15.6217 s/op
WorkloadActual   9: 1 op, 15641690200.00 ns, 15.6417 s/op
WorkloadActual  10: 1 op, 15661029200.00 ns, 15.6610 s/op
WorkloadActual  11: 1 op, 15625002000.00 ns, 15.6250 s/op
WorkloadActual  12: 1 op, 15614647200.00 ns, 15.6146 s/op
WorkloadActual  13: 1 op, 15630444900.00 ns, 15.6304 s/op
WorkloadActual  14: 1 op, 15620751600.00 ns, 15.6208 s/op
WorkloadActual  15: 1 op, 15639731400.00 ns, 15.6397 s/op
// AfterActualRun
WorkloadResult   1: 1 op, 15622656400.00 ns, 15.6227 s/op
WorkloadResult   2: 1 op, 15625515000.00 ns, 15.6255 s/op
WorkloadResult   3: 1 op, 15615469600.00 ns, 15.6155 s/op
WorkloadResult   4: 1 op, 15631936300.00 ns, 15.6319 s/op
WorkloadResult   5: 1 op, 15619036800.00 ns, 15.6190 s/op
WorkloadResult   6: 1 op, 15622770800.00 ns, 15.6228 s/op
WorkloadResult   7: 1 op, 15625282100.00 ns, 15.6253 s/op
WorkloadResult   8: 1 op, 15621714600.00 ns, 15.6217 s/op
WorkloadResult   9: 1 op, 15641690200.00 ns, 15.6417 s/op
WorkloadResult  10: 1 op, 15625002000.00 ns, 15.6250 s/op
WorkloadResult  11: 1 op, 15614647200.00 ns, 15.6146 s/op
WorkloadResult  12: 1 op, 15630444900.00 ns, 15.6304 s/op
WorkloadResult  13: 1 op, 15620751600.00 ns, 15.6208 s/op
WorkloadResult  14: 1 op, 15639731400.00 ns, 15.6397 s/op
GC:  0 0 0 0 0
// AfterAll
Mean = 15.6255 s, StdErr = 0.0022 s (0.01%); N = 14, StdDev = 0.0081 s
Min = 15.6146 s, Q1 = 15.6208 s, Median = 15.6239 s, Q3 = 15.6304 s, Max = 15.6417 s
IQR = 0.0097 s, LowerFence = 15.6062 s, UpperFence = 15.6450 s
ConfidenceInterval = [15.6164 s; 15.6346 s] (CI 99.9%), Margin = 0.0091 s (0.06% of Mean)
Skewness = 0.66, Kurtosis = 2.36, MValue = 2
// **************************
// Benchmark: Paralleling.AllSequential: Core(Runtime=Core)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "e6babe6d-16ff-42cd-aa3e-d457250f812c.dll" --benchmarkName "ConsoleApp.Program+Paralleling.AllSequential" --job "Core" --benchmarkId 1 in C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasene
tcoreapp2.2e6babe6d-16ff-42cd-aa3e-d457250f812cbinReleasenetcoreapp2.2
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: Core(Runtime=Core)
OverheadJitting  1: 1 op, 313300.00 ns, 313.3000 us/op
WorkloadJitting  1: 1 op, 15627659000.00 ns, 15.6277 s/op
WorkloadWarmup   1: 1 op, 15618290800.00 ns, 15.6183 s/op
WorkloadWarmup   2: 1 op, 15615060100.00 ns, 15.6151 s/op
WorkloadWarmup   3: 1 op, 15640535400.00 ns, 15.6405 s/op
WorkloadWarmup   4: 1 op, 15627643200.00 ns, 15.6276 s/op
WorkloadWarmup   5: 1 op, 15618477200.00 ns, 15.6185 s/op
WorkloadWarmup   6: 1 op, 15630480200.00 ns, 15.6305 s/op
WorkloadWarmup   7: 1 op, 15618496000.00 ns, 15.6185 s/op
// BeforeActualRun
WorkloadActual   1: 1 op, 15643436500.00 ns, 15.6434 s/op
WorkloadActual   2: 1 op, 15633023800.00 ns, 15.6330 s/op
WorkloadActual   3: 1 op, 15622361000.00 ns, 15.6224 s/op
WorkloadActual   4: 1 op, 15624673600.00 ns, 15.6247 s/op
WorkloadActual   5: 1 op, 15622833000.00 ns, 15.6228 s/op
WorkloadActual   6: 1 op, 15631459600.00 ns, 15.6315 s/op
WorkloadActual   7: 1 op, 15637421400.00 ns, 15.6374 s/op
WorkloadActual   8: 1 op, 15623196600.00 ns, 15.6232 s/op
WorkloadActual   9: 1 op, 15640573100.00 ns, 15.6406 s/op
WorkloadActual  10: 1 op, 15621312000.00 ns, 15.6213 s/op
WorkloadActual  11: 1 op, 15633047100.00 ns, 15.6330 s/op
WorkloadActual  12: 1 op, 15624742400.00 ns, 15.6247 s/op
WorkloadActual  13: 1 op, 15626075700.00 ns, 15.6261 s/op
WorkloadActual  14: 1 op, 15622062500.00 ns, 15.6221 s/op
WorkloadActual  15: 1 op, 15627008400.00 ns, 15.6270 s/op
// AfterActualRun
WorkloadResult   1: 1 op, 15643436500.00 ns, 15.6434 s/op
WorkloadResult   2: 1 op, 15633023800.00 ns, 15.6330 s/op
WorkloadResult   3: 1 op, 15622361000.00 ns, 15.6224 s/op
WorkloadResult   4: 1 op, 15624673600.00 ns, 15.6247 s/op
WorkloadResult   5: 1 op, 15622833000.00 ns, 15.6228 s/op
WorkloadResult   6: 1 op, 15631459600.00 ns, 15.6315 s/op
WorkloadResult   7: 1 op, 15637421400.00 ns, 15.6374 s/op
WorkloadResult   8: 1 op, 15623196600.00 ns, 15.6232 s/op
WorkloadResult   9: 1 op, 15640573100.00 ns, 15.6406 s/op
WorkloadResult  10: 1 op, 15621312000.00 ns, 15.6213 s/op
WorkloadResult  11: 1 op, 15633047100.00 ns, 15.6330 s/op
WorkloadResult  12: 1 op, 15624742400.00 ns, 15.6247 s/op
WorkloadResult  13: 1 op, 15626075700.00 ns, 15.6261 s/op
WorkloadResult  14: 1 op, 15622062500.00 ns, 15.6221 s/op
WorkloadResult  15: 1 op, 15627008400.00 ns, 15.6270 s/op
GC:  0 0 0 0 0
// AfterAll
Mean = 15.6289 s, StdErr = 0.0019 s (0.01%); N = 15, StdDev = 0.0072 s
Min = 15.6213 s, Q1 = 15.6228 s, Median = 15.6261 s, Q3 = 15.6330 s, Max = 15.6434 s
IQR = 0.0102 s, LowerFence = 15.6075 s, UpperFence = 15.6484 s
ConfidenceInterval = [15.6212 s; 15.6366 s] (CI 99.9%), Margin = 0.0077 s (0.05% of Mean)
Skewness = 0.69, Kurtosis = 1.99, MValue = 2
Successfully reverted power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// ***** BenchmarkRunner: Finish  *****
// * Export *
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-report.csv
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-report-github.md
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-report.html
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-measurements.csv
BuildPlots.R
RPlotExporter couldn't find Rscript.exe in your PATH and no R_HOME environment variable is defined
// * Detailed results *
Paralleling.ParallelAndSequential: Core(Runtime=Core)
Runtime = .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT; GC = Concurrent Workstation
Mean = 15.6255 s, StdErr = 0.0022 s (0.01%); N = 14, StdDev = 0.0081 s
Min = 15.6146 s, Q1 = 15.6208 s, Median = 15.6239 s, Q3 = 15.6304 s, Max = 15.6417 s
IQR = 0.0097 s, LowerFence = 15.6062 s, UpperFence = 15.6450 s
ConfidenceInterval = [15.6164 s; 15.6346 s] (CI 99.9%), Margin = 0.0091 s (0.06% of Mean)
Skewness = 0.66, Kurtosis = 2.36, MValue = 2
-------------------- Histogram --------------------
[15.612 s ; 15.645 s) | @@@@@@@@@@@@@@
---------------------------------------------------
Paralleling.AllSequential: Core(Runtime=Core)
Runtime = .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT; GC = Concurrent Workstation
Mean = 15.6289 s, StdErr = 0.0019 s (0.01%); N = 15, StdDev = 0.0072 s
Min = 15.6213 s, Q1 = 15.6228 s, Median = 15.6261 s, Q3 = 15.6330 s, Max = 15.6434 s
IQR = 0.0102 s, LowerFence = 15.6075 s, UpperFence = 15.6484 s
ConfidenceInterval = [15.6212 s; 15.6366 s] (CI 99.9%), Margin = 0.0077 s (0.05% of Mean)
Skewness = 0.69, Kurtosis = 1.99, MValue = 2
-------------------- Histogram --------------------
[15.619 s ; 15.646 s) | @@@@@@@@@@@@@@@
---------------------------------------------------
// * Summary *
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.202
[Host] : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
Core   : .NET Core 2.2.3 (CoreCLR 4.6.27414.05, CoreFX 4.6.27414.05), 64bit RyuJIT
Job=Core  Runtime=Core  
|                Method |    Mean |    Error |   StdDev | Rank |
|---------------------- |--------:|---------:|---------:|-----:|
| ParallelAndSequential | 15.63 s | 0.0091 s | 0.0081 s |    1 |
|         AllSequential | 15.63 s | 0.0077 s | 0.0072 s |    1 |
// * Hints *
Outliers
Paralleling.ParallelAndSequential: Core -> 1 outlier  was  removed (15.66 s)
// * Legends *
Mean   : Arithmetic mean of all measurements
Error  : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Rank   : Relative position of current benchmark mean among all benchmarks (Arabic style)
1 s    : 1 Second (1 sec)
// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:11:44 (704.57 sec), executed benchmarks: 2
Global total time: 00:11:50 (710.04 sec), executed benchmarks: 2
// * Artifacts cleanup *
Process finished with exit code 0.

我真的不明白为什么Parallel.ForEach不比整个顺序策略快,是因为BlockingCollection是真正的瓶颈吗?

有没有另一种策略可以更好地利用Parallel.ForEach

我决定遵循建议来利用数据流以及重构我的示例代码以更加强调 CPU 受限(基本上是一个很好的旧Thread.Sleep)。

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace ConsoleApp
{
public static class EnumerableExtensions
{
public static void ForEach<TSource>(this IEnumerable<TSource> source, Action<TSource> action)
{
foreach (var item in source)
{
action(item);
}
}
}
public static class Program
{
[ClrJob]
[RPlotExporter, RankColumn]
public class Paralleling
{
private IEnumerable<int> _items;
[GlobalSetup]
public void Setup()
{
_items = Enumerable.Range(0, 5000);
}
private bool GetMeALongCpuBool(int number)
{
Thread.Sleep(10);
return number % 2 == 0;
}
[Benchmark]
public async Task DataFlow()
{
var transformBlock = new TransformBlock<int, (bool, int, int)>(item =>
{
var dummy = GetMeALongCpuBool(item);
return (dummy, item, Thread.CurrentThread.ManagedThreadId);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
});
using (var streamWriter = new StreamWriter(new MemoryStream()))
{
var actionBlock = new ActionBlock<(bool, int, int)>(async result =>
await streamWriter.WriteLineAsync(result.ToString()));
transformBlock.LinkTo(actionBlock, new ExecutionDataflowBlockOptions
{
PropagateCompletion = true
});
_items.ForEach(async item => await transformBlock.SendAsync(item));
transformBlock.Complete();
await actionBlock.Completion;
}
}

[Benchmark]
public async Task ParallelAndSequential()
{
var blockingCollection = new BlockingCollection<(bool, int, int)>();
Task.Run(() =>
{
Parallel.ForEach(_items, item =>
{
var dummy = GetMeALongCpuBool(item);
blockingCollection.Add((dummy, item, Thread.CurrentThread.ManagedThreadId));
});
blockingCollection.CompleteAdding();
});
using (var streamWriter = new StreamWriter(new MemoryStream()))
{
foreach (var result in blockingCollection.GetConsumingEnumerable())
{
await streamWriter.WriteLineAsync(result.ToString());
}
}
}
[Benchmark]
public async Task AllSequential()
{
using (var streamWriter = new StreamWriter(new MemoryStream()))
{
foreach (var item in _items)
{
var dummy = GetMeALongCpuBool(item);
var result = (dummy, item, Thread.CurrentThread.ManagedThreadId);
await streamWriter.WriteLineAsync(result.ToString());
}
}
}
}
public static void Main(params string[] args)
{
var summary = BenchmarkRunner.Run<Paralleling>();
}
}
}

出于某些原因,这次似乎Parallel.ForEach是最快的选项,其次是数据流(也许幕后的演员系统增加了一些开销,但它提供了与async/await更流畅的集成),这似乎仍然比仅使用拥有相对简单的实现的BlockingCollection.GetConsumingEnumerable更具侵略性。

在这两种情况下(Parallel.ForEach和数据流),(它们)仍然比普通的顺序旧方式快得多,这是我最初所期望的。

新的基准测试结果如下:

// Validating benchmarks:
// ***** BenchmarkRunner: Start   *****
// ***** Found 3 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start   *****
BuildScript: C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasenet47234fab948-1750-4a20-832f-c235d6c6b967.bat
// ***** Done, took 00:00:03 (3.47 sec)   *****
// Found 3 benchmarks:
//   Paralleling.DataFlow: Clr(Runtime=Clr)
//   Paralleling.ParallelAndSequential: Clr(Runtime=Clr)
//   Paralleling.AllSequential: Clr(Runtime=Clr)
Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)// **************************
// Benchmark: Paralleling.DataFlow: Clr(Runtime=Clr)
// *** Execute ***
// Launch: 1 / 1
// Execute: C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasenet47234fab948-1750-4a20-832f-c235d6c6b967.exe --b
enchmarkName "ConsoleApp.Program+Paralleling.DataFlow" --job "Clr" --benchmarkId 0 in 
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0
// GC=Concurrent Workstation
// Job: Clr(Runtime=Clr)
OverheadJitting  1: 1 op, 301500.00 ns, 301.5000 us/op
WorkloadJitting  1: 1 op, 6766114100.00 ns, 6.7661 s/op
WorkloadWarmup   1: 1 op, 6684374600.00 ns, 6.6844 s/op
WorkloadWarmup   2: 1 op, 6741597000.00 ns, 6.7416 s/op
WorkloadWarmup   3: 1 op, 6678205800.00 ns, 6.6782 s/op
WorkloadWarmup   4: 1 op, 6694086900.00 ns, 6.6941 s/op
WorkloadWarmup   5: 1 op, 6725310100.00 ns, 6.7253 s/op
WorkloadWarmup   6: 1 op, 6739073300.00 ns, 6.7391 s/op
WorkloadWarmup   7: 1 op, 6701728400.00 ns, 6.7017 s/op
// BeforeActualRun
WorkloadActual   1: 1 op, 6739354900.00 ns, 6.7394 s/op
WorkloadActual   2: 1 op, 6705538000.00 ns, 6.7055 s/op
WorkloadActual   3: 1 op, 6705645700.00 ns, 6.7056 s/op
WorkloadActual   4: 1 op, 6734594900.00 ns, 6.7346 s/op
WorkloadActual   5: 1 op, 6687179600.00 ns, 6.6872 s/op
WorkloadActual   6: 1 op, 6681016900.00 ns, 6.6810 s/op
WorkloadActual   7: 1 op, 6708053800.00 ns, 6.7081 s/op
WorkloadActual   8: 1 op, 6703350900.00 ns, 6.7034 s/op
WorkloadActual   9: 1 op, 6709533800.00 ns, 6.7095 s/op
WorkloadActual  10: 1 op, 6742676000.00 ns, 6.7427 s/op
WorkloadActual  11: 1 op, 6710231600.00 ns, 6.7102 s/op
WorkloadActual  12: 1 op, 6695547000.00 ns, 6.6955 s/op
WorkloadActual  13: 1 op, 6708751300.00 ns, 6.7088 s/op
WorkloadActual  14: 1 op, 6731132100.00 ns, 6.7311 s/op
WorkloadActual  15: 1 op, 6828591700.00 ns, 6.8286 s/op
// AfterActualRun
WorkloadResult   1: 1 op, 6739354900.00 ns, 6.7394 s/op
WorkloadResult   2: 1 op, 6705538000.00 ns, 6.7055 s/op
WorkloadResult   3: 1 op, 6705645700.00 ns, 6.7056 s/op
WorkloadResult   4: 1 op, 6734594900.00 ns, 6.7346 s/op
WorkloadResult   5: 1 op, 6687179600.00 ns, 6.6872 s/op
WorkloadResult   6: 1 op, 6681016900.00 ns, 6.6810 s/op
WorkloadResult   7: 1 op, 6708053800.00 ns, 6.7081 s/op
WorkloadResult   8: 1 op, 6703350900.00 ns, 6.7034 s/op
WorkloadResult   9: 1 op, 6709533800.00 ns, 6.7095 s/op
WorkloadResult  10: 1 op, 6742676000.00 ns, 6.7427 s/op
WorkloadResult  11: 1 op, 6710231600.00 ns, 6.7102 s/op
WorkloadResult  12: 1 op, 6695547000.00 ns, 6.6955 s/op
WorkloadResult  13: 1 op, 6708751300.00 ns, 6.7088 s/op
WorkloadResult  14: 1 op, 6731132100.00 ns, 6.7311 s/op
GC:  0 0 0 0 0
// AfterAll
Mean = 6.7116 s, StdErr = 0.0050 s (0.07%); N = 14, StdDev = 0.0188 s
Min = 6.6810 s, Q1 = 6.7034 s, Median = 6.7084 s, Q3 = 6.7311 s, Max = 6.7427 s
IQR = 0.0278 s, LowerFence = 6.6617 s, UpperFence = 6.7728 s
ConfidenceInterval = [6.6904 s; 6.7328 s] (CI 99.9%), Margin = 0.0212 s (0.32% of Mean)
Skewness = 0.24, Kurtosis = 1.85, MValue = 2
// **************************
// Benchmark: Paralleling.ParallelAndSequential: Clr(Runtime=Clr)
// *** Execute ***
// Launch: 1 / 1
// Execute: C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasenet47234fab948-1750-4a20-832f-c235d6c6b967.exe --b
enchmarkName "ConsoleApp.Program+Paralleling.ParallelAndSequential" --job "Clr" --benchmarkId 1 in 
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0
// GC=Concurrent Workstation
// Job: Clr(Runtime=Clr)
OverheadJitting  1: 1 op, 374800.00 ns, 374.8000 us/op
WorkloadJitting  1: 1 op, 6956299700.00 ns, 6.9563 s/op
WorkloadWarmup   1: 1 op, 6046076100.00 ns, 6.0461 s/op
WorkloadWarmup   2: 1 op, 6852625900.00 ns, 6.8526 s/op
WorkloadWarmup   3: 1 op, 5474965800.00 ns, 5.4750 s/op
WorkloadWarmup   4: 1 op, 5473499200.00 ns, 5.4735 s/op
WorkloadWarmup   5: 1 op, 5518366900.00 ns, 5.5184 s/op
WorkloadWarmup   6: 1 op, 6859923700.00 ns, 6.8599 s/op
WorkloadWarmup   7: 1 op, 6862675600.00 ns, 6.8627 s/op
WorkloadWarmup   8: 1 op, 5536335800.00 ns, 5.5363 s/op
// BeforeActualRun
WorkloadActual   1: 1 op, 5528208700.00 ns, 5.5282 s/op
WorkloadActual   2: 1 op, 5519601400.00 ns, 5.5196 s/op
WorkloadActual   3: 1 op, 5500471200.00 ns, 5.5005 s/op
WorkloadActual   4: 1 op, 5539328200.00 ns, 5.5393 s/op
WorkloadActual   5: 1 op, 5465452200.00 ns, 5.4655 s/op
WorkloadActual   6: 1 op, 5470452100.00 ns, 5.4705 s/op
WorkloadActual   7: 1 op, 5512168300.00 ns, 5.5122 s/op
WorkloadActual   8: 1 op, 5472912400.00 ns, 5.4729 s/op
WorkloadActual   9: 1 op, 5855563000.00 ns, 5.8556 s/op
WorkloadActual  10: 1 op, 5992147700.00 ns, 5.9921 s/op
WorkloadActual  11: 1 op, 6143075000.00 ns, 6.1431 s/op
WorkloadActual  12: 1 op, 6397675600.00 ns, 6.3977 s/op
WorkloadActual  13: 1 op, 6909029100.00 ns, 6.9090 s/op
WorkloadActual  14: 1 op, 5525335600.00 ns, 5.5253 s/op
WorkloadActual  15: 1 op, 5470029700.00 ns, 5.4700 s/op
WorkloadActual  16: 1 op, 5481800300.00 ns, 5.4818 s/op
WorkloadActual  17: 1 op, 5507468600.00 ns, 5.5075 s/op
WorkloadActual  18: 1 op, 5465031900.00 ns, 5.4650 s/op
WorkloadActual  19: 1 op, 5478785200.00 ns, 5.4788 s/op
WorkloadActual  20: 1 op, 5511074600.00 ns, 5.5111 s/op
WorkloadActual  21: 1 op, 6080651400.00 ns, 6.0807 s/op
WorkloadActual  22: 1 op, 6829810900.00 ns, 6.8298 s/op
WorkloadActual  23: 1 op, 6878471900.00 ns, 6.8785 s/op
WorkloadActual  24: 1 op, 5506067800.00 ns, 5.5061 s/op
WorkloadActual  25: 1 op, 5442596200.00 ns, 5.4426 s/op
WorkloadActual  26: 1 op, 5463614300.00 ns, 5.4636 s/op
WorkloadActual  27: 1 op, 5491205500.00 ns, 5.4912 s/op
WorkloadActual  28: 1 op, 5482997900.00 ns, 5.4830 s/op
WorkloadActual  29: 1 op, 5494967800.00 ns, 5.4950 s/op
WorkloadActual  30: 1 op, 5517805900.00 ns, 5.5178 s/op
WorkloadActual  31: 1 op, 5486377000.00 ns, 5.4864 s/op
// AfterActualRun
WorkloadResult   1: 1 op, 5528208700.00 ns, 5.5282 s/op
WorkloadResult   2: 1 op, 5519601400.00 ns, 5.5196 s/op
WorkloadResult   3: 1 op, 5500471200.00 ns, 5.5005 s/op
WorkloadResult   4: 1 op, 5539328200.00 ns, 5.5393 s/op
WorkloadResult   5: 1 op, 5465452200.00 ns, 5.4655 s/op
WorkloadResult   6: 1 op, 5470452100.00 ns, 5.4705 s/op
WorkloadResult   7: 1 op, 5512168300.00 ns, 5.5122 s/op
WorkloadResult   8: 1 op, 5472912400.00 ns, 5.4729 s/op
WorkloadResult   9: 1 op, 5855563000.00 ns, 5.8556 s/op
WorkloadResult  10: 1 op, 5992147700.00 ns, 5.9921 s/op
WorkloadResult  11: 1 op, 6143075000.00 ns, 6.1431 s/op
WorkloadResult  12: 1 op, 6397675600.00 ns, 6.3977 s/op
WorkloadResult  13: 1 op, 5525335600.00 ns, 5.5253 s/op
WorkloadResult  14: 1 op, 5470029700.00 ns, 5.4700 s/op
WorkloadResult  15: 1 op, 5481800300.00 ns, 5.4818 s/op
WorkloadResult  16: 1 op, 5507468600.00 ns, 5.5075 s/op
WorkloadResult  17: 1 op, 5465031900.00 ns, 5.4650 s/op
WorkloadResult  18: 1 op, 5478785200.00 ns, 5.4788 s/op
WorkloadResult  19: 1 op, 5511074600.00 ns, 5.5111 s/op
WorkloadResult  20: 1 op, 6080651400.00 ns, 6.0807 s/op
WorkloadResult  21: 1 op, 5506067800.00 ns, 5.5061 s/op
WorkloadResult  22: 1 op, 5442596200.00 ns, 5.4426 s/op
WorkloadResult  23: 1 op, 5463614300.00 ns, 5.4636 s/op
WorkloadResult  24: 1 op, 5491205500.00 ns, 5.4912 s/op
WorkloadResult  25: 1 op, 5482997900.00 ns, 5.4830 s/op
WorkloadResult  26: 1 op, 5494967800.00 ns, 5.4950 s/op
WorkloadResult  27: 1 op, 5517805900.00 ns, 5.5178 s/op
WorkloadResult  28: 1 op, 5486377000.00 ns, 5.4864 s/op
GC:  0 0 0 0 0
// AfterAll
Mean = 5.6001 s, StdErr = 0.0469 s (0.84%); N = 28, StdDev = 0.2479 s
Min = 5.4426 s, Q1 = 5.4758 s, Median = 5.5033 s, Q3 = 5.5268 s, Max = 6.3977 s
IQR = 0.0509 s, LowerFence = 5.3995 s, UpperFence = 5.6032 s
ConfidenceInterval = [5.4272 s; 5.7730 s] (CI 99.9%), Margin = 0.1729 s (3.09% of Mean)
Skewness = 1.93, Kurtosis = 5.45, MValue = 2
// **************************
// Benchmark: Paralleling.AllSequential: Clr(Runtime=Clr)
// *** Execute ***
// Launch: 1 / 1
// Execute: C:UserseperretDesktopPlaygroundConsoleAppConsoleAppConsoleAppbinReleasenet47234fab948-1750-4a20-832f-c235d6c6b967.exe -
-benchmarkName "ConsoleApp.Program+Paralleling.AllSequential" --job "Clr" --benchmarkId 2 in 
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0
// GC=Concurrent Workstation
// Job: Clr(Runtime=Clr)
OverheadJitting  1: 1 op, 293200.00 ns, 293.2000 us/op
WorkloadJitting  1: 1 op, 53809888100.00 ns, 53.8099 s/op
WorkloadWarmup   1: 1 op, 53587422400.00 ns, 53.5874 s/op
WorkloadWarmup   2: 1 op, 53646293300.00 ns, 53.6463 s/op
WorkloadWarmup   3: 1 op, 53664071000.00 ns, 53.6641 s/op
WorkloadWarmup   4: 1 op, 53565564100.00 ns, 53.5656 s/op
WorkloadWarmup   5: 1 op, 53753653900.00 ns, 53.7537 s/op
WorkloadWarmup   6: 1 op, 53765022400.00 ns, 53.7650 s/op
WorkloadWarmup   7: 1 op, 53548369900.00 ns, 53.5484 s/op
// BeforeActualRun
WorkloadActual   1: 1 op, 53571559600.00 ns, 53.5716 s/op
WorkloadActual   2: 1 op, 53777716600.00 ns, 53.7777 s/op
WorkloadActual   3: 1 op, 53634262000.00 ns, 53.6343 s/op
WorkloadActual   4: 1 op, 53555998800.00 ns, 53.5560 s/op
WorkloadActual   5: 1 op, 53529152600.00 ns, 53.5292 s/op
WorkloadActual   6: 1 op, 53609217000.00 ns, 53.6092 s/op
WorkloadActual   7: 1 op, 53511316400.00 ns, 53.5113 s/op
WorkloadActual   8: 1 op, 53246673500.00 ns, 53.2467 s/op
WorkloadActual   9: 1 op, 53770915500.00 ns, 53.7709 s/op
WorkloadActual  10: 1 op, 54462781200.00 ns, 54.4628 s/op
WorkloadActual  11: 1 op, 54111001500.00 ns, 54.1110 s/op
WorkloadActual  12: 1 op, 54194051600.00 ns, 54.1941 s/op
WorkloadActual  13: 1 op, 53924013600.00 ns, 53.9240 s/op
WorkloadActual  14: 1 op, 53855213500.00 ns, 53.8552 s/op
WorkloadActual  15: 1 op, 53919584800.00 ns, 53.9196 s/op
// AfterActualRun
WorkloadResult   1: 1 op, 53571559600.00 ns, 53.5716 s/op
WorkloadResult   2: 1 op, 53777716600.00 ns, 53.7777 s/op
WorkloadResult   3: 1 op, 53634262000.00 ns, 53.6343 s/op
WorkloadResult   4: 1 op, 53555998800.00 ns, 53.5560 s/op
WorkloadResult   5: 1 op, 53529152600.00 ns, 53.5292 s/op
WorkloadResult   6: 1 op, 53609217000.00 ns, 53.6092 s/op
WorkloadResult   7: 1 op, 53511316400.00 ns, 53.5113 s/op
WorkloadResult   8: 1 op, 53246673500.00 ns, 53.2467 s/op
WorkloadResult   9: 1 op, 53770915500.00 ns, 53.7709 s/op
WorkloadResult  10: 1 op, 54462781200.00 ns, 54.4628 s/op
WorkloadResult  11: 1 op, 54111001500.00 ns, 54.1110 s/op
WorkloadResult  12: 1 op, 54194051600.00 ns, 54.1941 s/op
WorkloadResult  13: 1 op, 53924013600.00 ns, 53.9240 s/op
WorkloadResult  14: 1 op, 53855213500.00 ns, 53.8552 s/op
WorkloadResult  15: 1 op, 53919584800.00 ns, 53.9196 s/op
GC:  0 0 0 0 0
// AfterAll
Mean = 53.7782 s, StdErr = 0.0804 s (0.15%); N = 15, StdDev = 0.3113 s
Min = 53.2467 s, Q1 = 53.5560 s, Median = 53.7709 s, Q3 = 53.9240 s, Max = 54.4628 s
IQR = 0.3680 s, LowerFence = 53.0040 s, UpperFence = 54.4760 s
ConfidenceInterval = [53.4454 s; 54.1110 s] (CI 99.9%), Margin = 0.3328 s (0.62% of Mean)
Skewness = 0.49, Kurtosis = 2.53, MValue = 2
Successfully reverted power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// ***** BenchmarkRunner: Finish  *****
// * Export *
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-report.csv
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-report-github.md
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-report.html
BenchmarkDotNet.ArtifactsresultsConsoleApp.Program.Paralleling-measurements.csv
BuildPlots.R
RPlotExporter couldn't find Rscript.exe in your PATH and no R_HOME environment variable is defined
// * Detailed results *
Paralleling.DataFlow: Clr(Runtime=Clr)
Runtime = .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0; GC = Concurrent Workstation
Mean = 6.7116 s, StdErr = 0.0050 s (0.07%); N = 14, StdDev = 0.0188 s
Min = 6.6810 s, Q1 = 6.7034 s, Median = 6.7084 s, Q3 = 6.7311 s, Max = 6.7427 s
IQR = 0.0278 s, LowerFence = 6.6617 s, UpperFence = 6.7728 s
ConfidenceInterval = [6.6904 s; 6.7328 s] (CI 99.9%), Margin = 0.0212 s (0.32% of Mean)
Skewness = 0.24, Kurtosis = 1.85, MValue = 2
-------------------- Histogram --------------------
[6.674 s ; 6.750 s) | @@@@@@@@@@@@@@
---------------------------------------------------
Paralleling.ParallelAndSequential: Clr(Runtime=Clr)
Runtime = .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0; GC = Concurrent Workstation
Mean = 5.6001 s, StdErr = 0.0469 s (0.84%); N = 28, StdDev = 0.2479 s
Min = 5.4426 s, Q1 = 5.4758 s, Median = 5.5033 s, Q3 = 5.5268 s, Max = 6.3977 s
IQR = 0.0509 s, LowerFence = 5.3995 s, UpperFence = 5.6032 s
ConfidenceInterval = [5.4272 s; 5.7730 s] (CI 99.9%), Margin = 0.1729 s (3.09% of Mean)
Skewness = 1.93, Kurtosis = 5.45, MValue = 2
-------------------- Histogram --------------------
[5.420 s ; 5.562 s) | @@@@@@@@@@@@@@@@@@@@@@@
[5.562 s ; 5.705 s) | 
[5.705 s ; 5.852 s) | 
[5.852 s ; 5.995 s) | @@
[5.995 s ; 6.183 s) | @@
[6.183 s ; 6.326 s) | 
[6.326 s ; 6.469 s) | @
---------------------------------------------------
Paralleling.AllSequential: Clr(Runtime=Clr)
Runtime = .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0; GC = Concurrent Workstation
Mean = 53.7782 s, StdErr = 0.0804 s (0.15%); N = 15, StdDev = 0.3113 s
Min = 53.2467 s, Q1 = 53.5560 s, Median = 53.7709 s, Q3 = 53.9240 s, Max = 54.4628 s
IQR = 0.3680 s, LowerFence = 53.0040 s, UpperFence = 54.4760 s
ConfidenceInterval = [53.4454 s; 54.1110 s] (CI 99.9%), Margin = 0.3328 s (0.62% of Mean)
Skewness = 0.49, Kurtosis = 2.53, MValue = 2
-------------------- Histogram --------------------
[53.136 s ; 54.573 s) | @@@@@@@@@@@@@@@
---------------------------------------------------
// * Summary *
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Core i7-7820HQ CPU 2.90GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
[Host] : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0
Clr    : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3221.0
Job=Clr  Runtime=Clr  
|                Method |     Mean |    Error |   StdDev | Rank |
|---------------------- |---------:|---------:|---------:|-----:|
|              DataFlow |  6.712 s | 0.0212 s | 0.0188 s |    2 |
| ParallelAndSequential |  5.600 s | 0.1729 s | 0.2479 s |    1 |
|         AllSequential | 53.778 s | 0.3328 s | 0.3113 s |    3 |
// * Hints *
Outliers
Paralleling.DataFlow: Clr              -> 1 outlier  was  removed (6.83 s)
Paralleling.ParallelAndSequential: Clr -> 3 outliers were removed (6.83 s..6.91 s)
// * Legends *
Mean   : Arithmetic mean of all measurements
Error  : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Rank   : Relative position of current benchmark mean among all benchmarks (Arabic style)
1 s    : 1 Second (1 sec)
// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:27:06 (1626.41 sec), executed benchmarks: 3
Global total time: 00:27:09 (1629.9 sec), executed benchmarks: 3
// * Artifacts cleanup *

这里发生了很多事情。

  1. 工作量不平衡。写入流是作业的 90% 以上,并且未并行化。也许甚至不可并行化。
  2. 嵌入的 Task.Delay 正在扭曲结果。
  3. 工作负载过于精细。BlockingCollection成为争论的焦点。
  4. BlockingCollection没有上限,再加上 CPU 密集型任务的工作负载较小,这意味着几乎所有输出在被 IO 密集型任务消耗之前都会在BlockingCollection中缓冲,从而导致内存分配较大。
  5. 等待未配置ConfigureAwait(false),从而导致多个上下文捕获和恢复。
  6. Parallel.ForEach没有配置MaxDegreeOfParallelism选项,这加剧了工作负载的不平衡。计算机的所有处理器都执行 CPU 密集型任务,没有一个处理器可用于单个 IO 密集型任务。

在对配置进行一些更改后,我获得了有利于ParallelAndSequential方法而不是AllSequential方法的结果,比率为 ~2/5。

_items = Enumerable.Range(0, 100);
Ackermann(m: 3, n: 6);
var blockingCollection = new BlockingCollection<(bool, int, int)>(boundedCapacity: 10);
await streamWriter.WriteLineAsync(result.ToString()).ConfigureAwait(false);
//await Task.Delay(10);
new ParallelOptions() {MaxDegreeOfParallelism = Environment.ProcessorCount - 1};

你的时代表明,写入内存是主要的耗时的事情。这与以下事实非常吻合,即我的处理器在运行期间几乎没有达到 20% 的利用率。当你注释掉stream.WriteLine你可以看到,任务是利用并行化的方式

|                Method |      Mean |      Error |   StdDev | Rank |
|---------------------- |----------:|-----------:|---------:|-----:|
| ParallelAndSequential | 564.50 us | 1,759.4 us | 96.44 us |    2 |
|         AllSequential |  54.31 us |   400.0 us | 21.92 us |    1 |

如果你觉得应用很慢,最好是在编写基准测试之前分析它。它将显示哪个零件/线是瓶颈

相关内容

  • 没有找到相关文章

最新更新