根据pyspark中其他列的过滤值创建一个列



我尝试创建一个名为k的新变量,其值取决于metric是否为I,否则我想返回一个空值。

谢谢你的回答:)


data = [["1", "Amit", "DU", "I", "8", "6"],
["2", "Mohit", "DU", "I", "4", "2"],
["3", "rohith", "BHU", "I", "5", "3"],
["4", "sridevi", "LPU", "I", "1", "6"],
["1", "sravan", "KLMP", "M", "2", "4"],
["5", "gnanesh", "IIT", "M", "6", "8"],
["6", "gnadesh", "KLM", "c", "10", "9"]]
columns = ['ID', 'NAME', 'college', 'metric', 'x', 'y']

dataframe = spark.createDataFrame(data, columns)
+---+-------+-------+------+---+---+
| ID|   NAME|college|metric|  x|  y|
+---+-------+-------+------+---+---+
|  1|   Amit|     DU|     I|  8|  6|
|  2|  Mohit|     DU|     I|  4|  2|
|  3| rohith|    BHU|     I|  5|  3|
|  4|sridevi|    LPU|     I|  1|  6|
|  1| sravan|   KLMP|     M|  2|  4|
|  5|gnanesh|    IIT|     M|  6|  8|
|  6|gnadesh|    KLM|     c| 10|  9|
+---+-------+-------+------+---+---+

我试图使用这个,但它不工作

dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (' '))
from pyspark.sql.functions import lit
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (lit(' ')))

from pyspark.sql.functions import lit
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (lit(None)))

我猜你在代码的otherwise部分得到错误。DataFrame.withColumn的参数应该是Column类型,而' '不是。

相关内容

  • 没有找到相关文章

最新更新