输入数据帧:
class | malecount | femalecount | 2 | 1 | B
---|---|---|
3 | 1 | |
0 | 3 | |
2 | 4 |
您可以为每个类创建男性和女性数组,然后将其爆炸。
参见下面的例子
data_sdf.
withColumn('male_arr', func.expr('concat_ws(",", array_repeat("m", cast(malecount as int)))')).
withColumn('female_arr', func.expr('concat_ws(",", array_repeat("f", cast(femalecount as int)))')).
withColumn('male_female', func.concat_ws(',',
func.expr('if(male_arr="", null, male_arr)'),
func.expr('if(female_arr="", null, female_arr)')
)
).
selectExpr('class', 'explode(split(male_female, ",")) as gender').
show()
# +-----+------+
# |class|gender|
# +-----+------+
# | A| m|
# | A| m|
# | A| f|
# | B| m|
# | B| m|
# | B| m|
# | B| f|
# | C| f|
# | C| f|
# | C| f|
# | D| m|
# | D| m|
# | D| f|
# | D| f|
# | D| f|
# | D| f|
# +-----+------+