基于不同的关键字格式在Scala Map中搜索关键字



我有一个Map,它包含作为键的RDBMS数据类型和作为值的Hive数据类型。

var dataMap:Map[String, String] = dataMapper
for((k,v) <- dataMap) {
println(k + "->"+ v)
}

输出:

character varying->string
character([0-9]{1,3})->string
timestamp without time zone->timestamp
name->string
timestamp([0-9]{1,3}) without time zone->timestamp
timestamp with time zone->timestamp
timestamp->timestamp
real->double
character varying([0-9]{1,4})->string
numeric([0-9]{1,3},[1-9][0-9]{0,2})->double
smallint->int
timestamp([0-9]{1,3}) with time zone->timestamp
timestamp([0-9]{1,3})->timestamp
unknown->string
text->string
time without time zone->timestamp
bpchar->string
date->date
character->string
numeric->double
numeric([0-9]{1,3},0)->bigint
integer->int
bigint->bigint
time with time zone->timestamp
double precision->double

有一个列表,其中包含列名及其数据类型(数据类型为GreenPlum数据库(RDBMS(,如下所示:

*Column Name             Datatype*
forecast_id             bigint
period_year             numeric(15,0)
period_name             character varying(15)
org                     character varying(10)
ledger_id               bigint
currency_code           character varying(15)
source_system_name      character varying(30)
db_source_system_name   character varying(30)
year                    character varying(256)
ptd_balance             numeric
xx_creation_tms         timestamp without time zone
xx_last_update_log_id   integer
xx_data_hash_code       character varying(32)
xx_pk_id                bigint

我需要通过检查映射:dataMap是否包含作为关键字的数据类型来更改列的数据类型,如果存在,则获取其值并将其与列名放在一起。当我执行以下代码时:

class ChangeDataTypes(var gpColumnDetails: List[String], var dataMapper:Map[String, String]) {
var recGpDet:ListBuffer[String] = gpColumnDetails.to[ListBuffer]
var dataMap:Map[String, String] = dataMapper
def gpDetails(): Unit = {
val schemaString:List[String] = recGpDet.map(s => s.split(":")).map(s => s(0) + " " + dMap(s(1))).toList
for(i <- schemaString) {
println(i)
}
}
def dMap(rdbmsColDataType: String): String ={
var hiveDataType:String=null
if(dataMap.keysIterator.contains(rdbmsColDataType)) {
dataMap(rdbmsColDataType)
}
hiveDataType
}
}

当我运行代码时,我得到以下输出:

forecast_id             bigint
period_year             null
period_name             null
org                     null
ledger_id               bigint
currency_code           null
source_system_name      null
db_source_system_name   null
year                    null
ptd_balance             double
xx_creation_tms         timestamp
xx_last_update_log_id   int
xx_data_hash_code       null
xx_pk_id                null

输出中的正确值是由于Map中存在精确的键String。由于这些键,我得到了null值:character varying([0-9]{1,4}), numeric([0-9]{1,3},[1-9][0-9]{0,2}), numeric([0-9]{1,3},0)等。有人能告诉我如何写一个条件来查找dataMap 中的各种密钥吗

要从dataMap中按键查找值,您需要首先将Greenplum数据类型映射到dataMap中键的格式。这可以通过Regex将每个Greenplum数据类型与dataMap密钥进行匹配来实现,如以下示例所示(只组装了dataMap的一个子集(:

val dataMap: Map[String, String] = Map(
"character varying" -> "string",
"character\([0-9]{1,3}\)" -> "string",
"character varying\([0-9]{1,4}\)" -> "string",
"timestamp without time zone" -> "timestamp",
"timestamp" -> "timestamp",
"numeric" -> "double",
"numeric\([0-9]{1,3},0\)" -> "bigint",
"integer" -> "int",
"bigint" -> "bigint"
)
val gpSchema: List[String] = List(
"forecast_id: bigint",
"period_year: numeric(15,0)",
"period_name: character varying(15)",
"org: character varying(10)",
"ledger_id: bigint",
"currency_code: character varying(15)",
"source_system_name: character varying(30)",
"db_source_system_name: character varying(30)",
"year: character varying(256)",
"ptd_balance: numeric",
"xx_creation_tms: timestamp without time zone",
"xx_last_update_log_id: integer",
"xx_data_hash_code: character varying(32)",
"xx_pk_id: bigint"
)
val patterns = dataMap.keySet
gpSchema.
map( _.split(":\s*") match { case Array(x: String, y: String) => (x, y) } ).
map{ case (k, v) =>
val vkey = patterns.dropWhile{ p => v != p.r.findFirstIn(v).getOrElse("") }.
headOption match {
case Some(p) => p
case None => ""
}
(k, dataMap.getOrElse(vkey, "n/a"))
}
// res1: List[(String, String)] = List(
//   (forecast_id,bigint), (period_year,bigint), (period_name,string), (org,string),
//   (ledger_id,bigint), (currency_code,string), (source_system_name,string),
//   (db_source_system_name,string), (year,string), (ptd_balance,double),
//   (xx_creation_tms,timestamp), (xx_last_update_log_id,int), (xx_data_hash_code,string),
//   (xx_pk_id,bigint)
// )

为了使上述模式匹配适应您现有的代码,ChangeDataTypes类可以修改如下:

class ChangeDataTypes(val gpColumnDetails: List[String], val dataMap: Map[String, String]) {
def gpDetails(): Unit =
gpColumnDetails.map(_.split(":\s*")).map(s => s(0) + "t" + dMap(s(1))).toList.
foreach(println)
def dMap(gpColType: String): String = {
val patterns = dataMap.keySet
val mkey = patterns.dropWhile{
p => gpColType != p.r.findFirstIn(gpColType).getOrElse("")
}.
headOption match {
case Some(p) => p
case None => ""
}
dataMap.getOrElse(mkey, "n/a")
}
}

最新更新