Spark调用ShuffleBlockFetcherIterator时发生了什么



我的spark工作似乎花了很多时间来获取块。有时它会这样做一两个小时。我的数据集有一个分区,所以我不知道为什么它要做这么多的洗牌。有人知道这里到底发生了什么吗?

15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:46 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:46 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/12/16 18:07:47 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:47 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms

ShuffleBlockFetcherIterator是一个Scala迭代程序,它从本地和远程BlockManager中获取多个shuffle块(也称为shuffle映射输出)。

它允许以(BlockId,InputStream)对的形式在一系列块上迭代,这样调用方就可以在接收到混洗块时以流水线方式处理它们。

为了提高性能,您需要调整您的操作;或配置。

FYI spark可能做的不仅仅是获取块,日志记录可能对其他一切都被禁用。如果您还没有转到spark历史服务器并查看查询的SQL选项卡。很可能这是一个症状,你正在洗牌和处理过多的数据。试着减少你拖动的数据量,或者把它分成更小的部分,或者得到一个更大的集群。

"我的数据集有一个分区,所以我不确定为什么它要做这么多的洗牌。"

还要记住,分区是spark中的一个重载词。即使您有一个表分区,在处理数据时也可能有多个spark数据片分区。

相关内容

  • 没有找到相关文章

最新更新