我尝试创建一个名为k
的新变量,其值取决于metric
是否为I或,否则我想返回一个空值。
谢谢你的回答:)
data = [["1", "Amit", "DU", "I", "8", "6"],
["2", "Mohit", "DU", "I", "4", "2"],
["3", "rohith", "BHU", "I", "5", "3"],
["4", "sridevi", "LPU", "I", "1", "6"],
["1", "sravan", "KLMP", "M", "2", "4"],
["5", "gnanesh", "IIT", "M", "6", "8"],
["6", "gnadesh", "KLM", "c", "10", "9"]]
columns = ['ID', 'NAME', 'college', 'metric', 'x', 'y']
dataframe = spark.createDataFrame(data, columns)
+---+-------+-------+------+---+---+
| ID| NAME|college|metric| x| y|
+---+-------+-------+------+---+---+
| 1| Amit| DU| I| 8| 6|
| 2| Mohit| DU| I| 4| 2|
| 3| rohith| BHU| I| 5| 3|
| 4|sridevi| LPU| I| 1| 6|
| 1| sravan| KLMP| M| 2| 4|
| 5|gnanesh| IIT| M| 6| 8|
| 6|gnadesh| KLM| c| 10| 9|
+---+-------+-------+------+---+---+
我试图使用这个,但它不工作
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (' '))
from pyspark.sql.functions import lit
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (lit(' ')))
或
from pyspark.sql.functions import lit
dataframe= dataframe.withColumn('k', when ((col('metric') == 'M',(dataframe['metric'] / 10)))
.when ((col('metric') == 'I',(dataframe['metric'] / 10 * 2,54)))
.otherwise (lit(None)))
我猜你在代码的otherwise
部分得到错误。DataFrame.withColumn
的参数应该是Column
类型,而' '
不是。