假设我有一个表单文件(每行一个事件):
Source,Timestamp
aa,2014-05-02 22:12:11
bb,2014-05-02 22:22:11
我想总结一下按源分组的事件数量,连续时间窗口为 5 分钟。我将如何使用 Flink 做到这一点?
我现在拥有的是:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStreamSource<Event> stream = env.fromCollection(new EventFileReader(new File("path/to/file")), Event.class);
stream
.keyBy("getSource()")
.timeWindow(Time.minutes(5))
.sum("getTimestamp()");
env.execute();
public class Event {
private final String source;
private final long timestamp;
public Event(String source, long timestamp) {
this.source = source;
this.timestamp = timestamp;
}
public String getSource() {
return source;
}
public long getTimestamp() {
return timestamp;
}
}
我错过了两件事。首先,这失败了,并表示Event
类不是POJO。其次,我不知道如何计算窗口中的事件数。现在我正在使用.sum("getTimestamp()")
,但我确定不是这样。有什么想法吗?
我建议使用 fold
函数来执行窗口聚合。以下代码片段应完成这项工作:
public class Job {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<Event> stream = env.fromElements(new Event("a", 1), new Event("b", 2), new Event("a", 2)).assignTimestampsAndWatermarks(new AssignerWithPunctuatedWatermarks<Event>() {
@Nullable
@Override
public Watermark checkAndGetNextWatermark(Event event, long l) {
return new Watermark(l);
}
@Override
public long extractTimestamp(Event event, long l) {
return event.getTimestamp();
}
});
DataStream<Tuple2<String, Integer>> count = stream.keyBy(new KeySelector<Event, String>() {
@Override
public String getKey(Event event) throws Exception {
return event.getSource();
}
})
.timeWindow(Time.minutes(5))
.fold(Tuple2.of("", 0), new FoldFunction<Event, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> fold(Tuple2<String, Integer> acc, Event o) throws Exception {
return Tuple2.of(o.getSource(), acc.f1 + 1);
}
});
count.print();
env.execute();
}
public static class Event {
private final String source;
private final long timestamp;
public Event(String source, long timestamp) {
this.source = source;
this.timestamp = timestamp;
}
public String getSource() {
return source;
}
public long getTimestamp() {
return timestamp;
}
}
}