分析异常:在数据架构中发现重复列:"小时"、"事件时间"



我想从json文件加载数据,但是我得到这个异常: 分析异常:在数据架构中找到重复的列:houreventTime,这是我的代码

ss.sqlContext.setConf("spark.sql.caseSensitive", "true")
val pathList = buildFilePath(eid, url, startTime, endTime)
println(pathList)
val writePath = "/result/" + id + "/" + eid
ss.read
.json(pathList: _*)
.select(columns.split(",").map(m => new Column(m.trim)): _*)
.repartition(1)
.write.option("header", "true").csv(writePath)
ss.close()
def buildFilePath(eid: String, urls: String, startTime: String, endTime: String): List[String] = {
var eventPath = ""
if (eid.equals("1")) {
eventPath = basePath + "/event1"
} else if (eid.equals("2")) {
eventPath = basePath + "/event2"
}
urls
.split(",")
.flatMap(url => {
val dateList = getTimeRange(startTime, endTime, "yyyy-MM-dd")
dateList
.par
.map(date => eventPath + "/" + url.trim + "/" + date)
.flatMap(p => Hdfs.files(p).flatMap(f => Hdfs.files(f)))
})
.map(m => m.toString)
.toList

}

问题已解决。 由于加载多个文件,它需要这样做:.json(ss.read.textFile(pathList: _*))

最新更新