什么应该是flume.conf参数保存tweets到单个FlumeData文件每小时



我们将tweet保存在目录顺序中,例如/user/flume/2016/06/28/13/FlumeData... .但是每小时它会创建100多个FlumeData文件。我已经更改了TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb),同样的事情又发生了。在那之后,我尝试改变滚动参数,但没有工作。如何设置参数以每小时获得一个FlumeData文件

rollInterval呢?你把它设为零了吗?如果是,那么问题可能是别的。如果rollInterval被设置为某个值,它将覆盖rollSizerollCount的值。文件旋转可能在文件大小达到rollSize值之前发生。另外,检查您设置的HDFS块大小。如果设置为,那么太小的值也可能导致文件滚动。

Try this -

    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 100

    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 1000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100

我解决了这个问题设置rollInterval=3600 rollcount=0和batchSize=100 flume.conf参数作为@vkgade建议

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1

TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000

最新更新