Pyspark如何反向groupby计数?



输入数据帧:

tbody> <<tr>BCD
class malecount femalecount
21
31
03
24

您可以为每个类创建男性和女性数组,然后将其爆炸。

参见下面的例子

data_sdf. 
withColumn('male_arr', func.expr('concat_ws(",", array_repeat("m", cast(malecount as int)))')). 
withColumn('female_arr', func.expr('concat_ws(",", array_repeat("f", cast(femalecount as int)))')). 
withColumn('male_female', func.concat_ws(',', 
func.expr('if(male_arr="", null, male_arr)'), 
func.expr('if(female_arr="", null, female_arr)')
)
). 
selectExpr('class', 'explode(split(male_female, ",")) as gender'). 
show()
# +-----+------+
# |class|gender|
# +-----+------+
# |    A|     m|
# |    A|     m|
# |    A|     f|
# |    B|     m|
# |    B|     m|
# |    B|     m|
# |    B|     f|
# |    C|     f|
# |    C|     f|
# |    C|     f|
# |    D|     m|
# |    D|     m|
# |    D|     f|
# |    D|     f|
# |    D|     f|
# |    D|     f|
# +-----+------+

最新更新