我正在尝试将下面的Hive SQL语句转换为Spark数据帧,但出现错误。
case when (lower(message_txt) rlike '.*sampletext(\s?is\s?)newtext.*' ) then 'P' else 'Y'
示例数据:message_txt = "This is new sampletext, followed by newtext"
请帮助我提供等效的火花数据帧语句。
使用when(lower($"value").rlike(""".sampletext(siss?)newtext."""),lit('P')).otherwise("Y")
scala> df.withColumn("condition",when(lower($"value").rlike(""".sampletext(s?iss?)newtext."""),lit('P')).otherwise("Y")).show(false)
+-------------------------------------------+---------+
|value |condition|
+-------------------------------------------+---------+
|This is new sampletext, followed by newtext|Y |
+-------------------------------------------+---------+
在sql 中case statement
末尾添加end
。
Example:
In spark Sql:
val df=Seq(("This is new sampletext, followed by newtext")).toDF("message_txt")
df.createOrReplaceTempView("tmp")
spark.sql("select case when (lower(message_txt) rlike '.sampletext(\s?is\s?)newtext.' ) then 'P' else 'Y' end from tmp").show()
//Result
//+--------------------------------------------------------------------------------+
//|CASE WHEN lower(message_txt) RLIKE .sampletext(s?iss?)newtext. THEN P ELSE Y END|
//+--------------------------------------------------------------------------------+
//| Y|
//+--------------------------------------------------------------------------------+
In dataframe API:
df.withColumn("status", when(lower(col("message_txt")).rlike(".sampletext(\s?is\s?)newtext."),"P").otherwise("Y")).show()
//Result
//+--------------------+------+
//| message_txt|status|
//+--------------------+------+
//|This is new sampl...| Y|
//+--------------------+------+
UPDATE:
检查message_txt列中的字符串示例文本和新文本。
//using rlike
df.withColumn("status", when(lower(col("message_txt")).rlike("sampletext.*newtext"),"P").otherwise("Y")).show()
//using like
df.withColumn("status", when(lower(col("message_txt")).like("%sampletext%newtext%"),"P").otherwise("Y")).show()
//+--------------------+------+
//| message_txt|status|
//+--------------------+------+
//|This is new sampl...| P|
//+--------------------+------+