解析所有的colname并创建新的列


date     red,heavy,new  blue,light,old
1-2-20   320             120
2-3-20   220             125

我想遍历所有行和列,这样我就可以解析列名并将它们用作新列的值。我想要得到如下格式的数据:

我想要日期重复。

date     color           weight   condition.  value
1-2-20   red             heavy     new        320
1-2-20   blue            light.    old.       120
2-3-20   red.            heavy     new.       220

我试过了,当我只有一列

时,它工作了
colName = df_retransform.columns[1]
lst = colName.split(",")
color = lst[0]
weight = lst[1]
condition = lst[2]

df_retransform.rename(columns={colName: 'value'}, inplace=True)
df_retransform['color'] = color
df_retransform['weight'] = weight
df_retransform['condition'] = condition

但是我不能修改它,以便我可以对所有列都这样做。

DataFrame.meltSeries.str.split一起使用,DataFrame.pop用于使用并删除列variable,必要时更改列名的最后顺序:

首先你可以测试是否所有没有数据的列都有2个,:

print ([col for col in df.columns if col.count(',') != 2])
['date'] 

df = df.melt('date')
df[['color', 'weight', 'condition']] = df.pop('variable').str.split(',', expand=True)
df = df[['date', 'color', 'weight', 'condition', 'value']]
print (df)
date color weight condition  value
0  1-2-20   red  heavy       new    320
1  2-3-20   red  heavy       new    220
2  1-2-20  blue  light       old    120
3  2-3-20  blue  light       old    125

或者对MultiIndex Series使用DataFrame.stack,然后拆分并为新列重新创建新的所有级别:

print (df)
date  red,heavy,new  blue,light,old
0  1-2-20            320             NaN
1     NaN            220           125.0
s = df.set_index('date').stack(dropna=False)
s.index = pd.MultiIndex.from_tuples([(i, *j.split(',')) for i, j in s.index], 
names=['date', 'color', 'weight', 'condition'])
df = s.reset_index(name='value')
print (df)
date color weight condition  value
0  1-2-20   red  heavy       new  320.0
1  1-2-20  blue  light       old    NaN
2     NaN   red  heavy       new  220.0
3     NaN  blue  light       old  125.0

你也可以使用pyjanitor中的pivot_longer函数;目前,您必须从github安装最新的开发版本:

# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.pivot_longer(index="date", 
names_to=("color", "weight", "condition"), 
names_sep=",")
date    color   weight  condition   value
0   1-2-20  red     heavy   new     320
1   2-3-20  red     heavy   new     220
2   1-2-20  blue    light   old     120
3   2-3-20  blue    light   old     125

将新列的名称传递给names_to,并在names_sep中指定分隔符(,)。

如果您希望它按照出现的顺序返回,您可以将布尔值True传递给sort_by_appearance参数:

df.pivot_longer(
index="date",
names_to=("color", "weight", "condition"),
names_sep=",",
sort_by_appearance=True,
)

date    color   weight  condition   value
0   1-2-20  red     heavy   new     320
1   1-2-20  blue    light   old     120
2   2-3-20  red     heavy   new     220
3   2-3-20  blue    light   old     125

最新更新