date red,heavy,new blue,light,old
1-2-20 320 120
2-3-20 220 125
我想遍历所有行和列,这样我就可以解析列名并将它们用作新列的值。我想要得到如下格式的数据:
我想要日期重复。
date color weight condition. value
1-2-20 red heavy new 320
1-2-20 blue light. old. 120
2-3-20 red. heavy new. 220
我试过了,当我只有一列
时,它工作了colName = df_retransform.columns[1]
lst = colName.split(",")
color = lst[0]
weight = lst[1]
condition = lst[2]
df_retransform.rename(columns={colName: 'value'}, inplace=True)
df_retransform['color'] = color
df_retransform['weight'] = weight
df_retransform['condition'] = condition
但是我不能修改它,以便我可以对所有列都这样做。
将DataFrame.melt
与Series.str.split
一起使用,DataFrame.pop
用于使用并删除列variable
,必要时更改列名的最后顺序:
首先你可以测试是否所有没有数据的列都有2个,
:
print ([col for col in df.columns if col.count(',') != 2])
['date']
df = df.melt('date')
df[['color', 'weight', 'condition']] = df.pop('variable').str.split(',', expand=True)
df = df[['date', 'color', 'weight', 'condition', 'value']]
print (df)
date color weight condition value
0 1-2-20 red heavy new 320
1 2-3-20 red heavy new 220
2 1-2-20 blue light old 120
3 2-3-20 blue light old 125
或者对MultiIndex Series
使用DataFrame.stack
,然后拆分并为新列重新创建新的所有级别:
print (df)
date red,heavy,new blue,light,old
0 1-2-20 320 NaN
1 NaN 220 125.0
s = df.set_index('date').stack(dropna=False)
s.index = pd.MultiIndex.from_tuples([(i, *j.split(',')) for i, j in s.index],
names=['date', 'color', 'weight', 'condition'])
df = s.reset_index(name='value')
print (df)
date color weight condition value
0 1-2-20 red heavy new 320.0
1 1-2-20 blue light old NaN
2 NaN red heavy new 220.0
3 NaN blue light old 125.0
你也可以使用pyjanitor中的pivot_longer函数;目前,您必须从github安装最新的开发版本:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.pivot_longer(index="date",
names_to=("color", "weight", "condition"),
names_sep=",")
date color weight condition value
0 1-2-20 red heavy new 320
1 2-3-20 red heavy new 220
2 1-2-20 blue light old 120
3 2-3-20 blue light old 125
将新列的名称传递给names_to
,并在names_sep
中指定分隔符(,
)。
如果您希望它按照出现的顺序返回,您可以将布尔值True
传递给sort_by_appearance
参数:
df.pivot_longer(
index="date",
names_to=("color", "weight", "condition"),
names_sep=",",
sort_by_appearance=True,
)
date color weight condition value
0 1-2-20 red heavy new 320
1 1-2-20 blue light old 120
2 2-3-20 red heavy new 220
3 2-3-20 blue light old 125