如何从RDD[String]中创建特定字段的hashMap ?
{
count: 1,
itemId: "1122334",
country: {
code: {
preferred: "USA"
},
name: {
preferred: "America"
}
},
states: "50",
self: {
otherInfo: [
],
preferred: "National Parks"
},
Rating: 4
}
Ho do I get a hashmap maps which have {itemid , self.preferred} :
itemId : 1122334 self.preferred : "National Parks"
itemId : 1144444 self.preferred : "State Parks"
....
我试过了,它的工作,但效率不高,因为我正在转换为JSON Obj和做解析:
val filteredMappingsList = countryMapping.filter(x=> {
val jsonObj = new JSONObject(x)
jsonObj.has("itemId") && jsonObj.get("itemId").toString.startsWith("11")
})
val finalMapping = filteredMappingsList.map(x=>{
val jsonObj = new JSONObject(x);
val itemId = jsonObj.get("itemId").toString()
val preferred = jsonObj.getJSONObject("self").get("preferred ").toString()
(itemId, preferred)
}).collectAsMap
使用众多JSON库中的一个来解析数据可能仍然是您最好的选择。但是,看起来您将字符串解析为JSON两次,一次在过滤器中,一次在映射中。我不确定它是否真的是这样执行的。但是考虑只解析一次:
val result = countryMapping.map(x => newJSONObject(x)).
filter(jsonObj => ...).
map{jsonObj =>
...
(itemId, preferred)
}.collectAsMap