Apache Beam Flatten Iterable<String>



在groupbyKey之后的下面的代码中,我得到了PCollection>>。如何在发送到FileIO之前使值中的Iterable变平。

.apply(GroupByKey.<String, String>create())
.apply("Write file to output",FileIO.< String, KV<String,String>>writeDynamic()
.by(KV::getKey)
.withDestinationCoder(StringUtf8Coder.of())
.via(Contextful.fn(KV::getValue), TextIO.sink())
.to("Out")
.withNaming(key -> FileIO.Write.defaultNaming("file-" + key, ".txt")));

谢谢你的好心帮助。

您需要使用ParDo来压平PCollection的Iterable部分,如下所示:-

PCollection<KV<String, Doc>> urlDocPairs = ...;
PCollection<KV<String, Iterable<Doc>>> urlToDocs =
urlDocPairs.apply(GroupByKey.<String, Doc>create());
PCollection<R> results =
urlToDocs.apply(ParDo.of(new DoFn<KV<String, Iterable<Doc>>, R>() {
{@literal @}ProcessElement
public void processElement(ProcessContext c) {
String url = c.element().getKey();
for <String,Doc> docsWithThatUrl : c.element().getValue();
c.output(docsWithThatUrl)
}}));

相关内容

最新更新