如何在齐柏林飞艇中编写Flink变量的内容



我尝试在Apache Zeppelin中运行以下简单命令。

%flink
var rabbit = env.fromElements(
"ARTHUR:  What, behind the rabbit?",
"TIM:  It is the rabbit!", 
"ARTHUR:  You silly sod!  You got us all worked up!",
"TIM:  Well, that's no ordinary rabbit.  That's the most foul, cruel, and bad-tempered rodent you ever set eyes on.",
"ROBIN:  You tit!  I soiled my armor I was so scared!", 
"TIM:  Look, that rabbit's got a vicious streak a mile wide, it's a killer!")
var counts = rabbit.flatMap { _.toLowerCase.split("\W+")}.map{ (_,1)}.groupBy(0).sum(1) 
counts.print()
我试着把结果打印在笔记本上。但不幸的是,我只得到以下输出:
rabbit: org.apache.flink.api.scala.DataSet[String] = org.apache.flink.api.scala.DataSet@37fdb65c
counts: org.apache.flink.api.scala.AggregateDataSet[(String, Int)] = org.apache.flink.api.scala.AggregateDataSet@1efc7158
res103: org.apache.flink.api.java.operators.DataSink[(String, Int)] = DataSink '<unnamed>' (Print to System.out)

我怎么能把计数的内容泄漏到齐柏林飞艇的笔记本上?

在齐柏林飞艇中打印计算结果的方法如下:

%flink
counts.collect().foreach(println(_))
//or one might prefer
//counts.collect foreach println 
输出:

(a,3)
(all,1)
(and,1)
(armor,1)
...

观察到的行为的原因在于Apache Zeppelin和Apache Flink之间的相互作用。齐柏林飞船捕获Console的所有标准输出。然而,Flink也打印输出到System.out,这正是当你调用counts.print()时发生的事情。bzz的解决方案工作的原因是它使用Console打印结果。

我打开了JIRA问题[1]并打开了拉请求[2]来纠正这种行为,以便您也可以使用counts.print()

  • [1] https://issues.apache.org/jira/browse/zeppelin - 287
  • [2] https://github.com/apache/incubator-zeppelin/pull/288

相关内容

  • 没有找到相关文章

最新更新