我想用换行符拆分列值,并创建一个包含最后两项(行(的新列
df1 = spark.createDataFrame([
["001rnLuc Krierrn2363 Ryan Road, Long Lake South Dakota"],
["002rnJeanny Thornrn2263 Patton Lane Raleigh North Carolina"],
["003rnTeddy E Beecherrn2839 Hartland Avenue Fond Du Lac Wisconsin"],
["004rnPhilippe Schaussrn1 Im Oberdorf Allemagne"],
["005rnMeindert I TholenrnHagedoornweg 138 Amsterdam"]
]).toDF("s")
这不起作用(无值(:
df.withColumn('last_2', split(df.s, 'rn')[-2])
您只需将函数substring_index
df1.withColumn('last2',f.substring_index('s','rn',-2)).drop('s').show(10,False)
+-----------------------------------------------------------+
|last2 |
+-----------------------------------------------------------+
|Luc Krier
2363 Ryan Road, Long Lake South Dakota |
|Jeanny Thorn
2263 Patton Lane Raleigh North Carolina |
|Teddy E Beecher
2839 Hartland Avenue Fond Du Lac Wisconsin|
|Philippe Schauss
1 Im Oberdorf Allemagne |
|Meindert I Tholen
Hagedoornweg 138 Amsterdam |
+-----------------------------------------------------------+
希望对你有帮助
是的,我也面临着负索引的相同问题,但正索引有效。 我尝试使用切片功能,它工作正常。你能试试这个吗:
import pyspark.sql.functions as F
df1 = sqlContext.createDataFrame([ ["001rnLuc Krierrn2363 Ryan Road, Long Lake South Dakota"], ["002rnJeanny Thornrn2263 Patton Lane Raleigh North Carolina"], ["003rnTeddy E Beecherrn2839 Hartland Avenue Fond Du Lac Wisconsin"], ["004rnPhilippe Schaussrn1 Im Oberdorf Allemagne"], ["005rnMeindert I TholenrnHagedoornweg 138 Amsterdam"] ]).toDF("s")
df_r = df1.withColumn('spl',F.split(F.col('s'),'rn'))
df_res = df_r.withColumn("res",F.slice(F.col("spl"),-1,1))
也许这很有帮助 -
val sDF = Seq("""001rnLuc Krierrn2363 Ryan Road, Long Lake South Dakota""",
"""002rnJeanny Thornrn2263 Patton Lane Raleigh North Carolina""",
"""003rnTeddy E Beecherrn2839 Hartland Avenue Fond Du Lac Wisconsin""",
"""004rnPhilippe Schaussrn1 Im Oberdorf Allemagne""",
"""005rnMeindert I TholenrnHagedoornweg 138 Amsterdam""").toDF("""s""")
val processedDF = sDF.withColumn("col1", slice(split(col("s"), """\r\n"""), -2, 2))
processedDF.show(false)
processedDF.printSchema()
/**
* +--------------------------------------------------------------------+-------------------------------------------------------------+
* |s |col1 |
* +--------------------------------------------------------------------+-------------------------------------------------------------+
* |001rnLuc Krierrn2363 Ryan Road, Long Lake South Dakota |[Luc Krier, 2363 Ryan Road, Long Lake South Dakota] |
* |002rnJeanny Thornrn2263 Patton Lane Raleigh North Carolina |[Jeanny Thorn, 2263 Patton Lane Raleigh North Carolina] |
* |003rnTeddy E Beecherrn2839 Hartland Avenue Fond Du Lac Wisconsin|[Teddy E Beecher, 2839 Hartland Avenue Fond Du Lac Wisconsin]|
* |004rnPhilippe Schaussrn1 Im Oberdorf Allemagne |[Philippe Schauss, 1 Im Oberdorf Allemagne] |
* |005rnMeindert I TholenrnHagedoornweg 138 Amsterdam |[Meindert I Tholen, Hagedoornweg 138 Amsterdam] |
* +--------------------------------------------------------------------+-------------------------------------------------------------+
*
* root
* |-- s: string (nullable = true)
* |-- col1: array (nullable = true)
* | |-- element: string (containsNull = true)
*/