我有一个数据集,我想将特定字段名转换为值,同时执行将值分解为自己唯一的行以及执行长枢轴。
Start Date End Area Final Type Middle Stat Low Stat High Stat Middle Stat1 Low Stat1 High Stat1
8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 226 20 10 0 0 0
8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 130 50 0 0 0 0
data = {
"Start": ['8/1/2013', '8/1/2013'],
"Date": ['9/1/2013', '9/1/2013'],
"End": ['10/1/2013', '10/1/2013'],
"Area": ['NY', 'CA'],
"Final": ['3/1/2023', '3/1/2023'],
"Type": ['CC', 'AA'],
"Middle Stat": [226, 130],
"Low Stat": [20, 50],
"High Stat": [10, 0],
"Middle Stat1": [0, 0],
"Low Stat1": [0, 0],
"High Stat1": [0, 0]
}
的
Start Date End Area Final Type Stat Range Stat1
8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 20 Low 0
8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 50 Low 0
8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 226 Middle 0
8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 130 Middle 0
8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC 10 High 0
8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA 0 High 0
我相信我必须注入某种宽到长方法,(SO成员协助)但不确定如何将此合并,同时在目标(感兴趣的列)列名中具有相同的后缀。
pd.wide_to_long(df,
stubnames=['Low','Middle','High'],
i=['Start','Date','End','Area','Final'],
j='',
sep=' ',
suffix='(stat)'
).unstack(level=-1, fill_value=0).stack(level=0).reset_index()
欢迎提出任何建议。
#原始数据集
import pandas as pd
# create DataFrame
data = {'Start': ['9/1/2013', '10/1/2013', '11/1/2013', '12/1/2013'],
'Date': ['10/1/2016', '11/1/2016', '12/1/2016', '1/1/2017'],
'End': ['11/1/2016', '12/1/2016', '1/1/2017', '2/1/2017'],
'Area': ['NY', 'NY', 'NY', 'NY'],
'Final': ['3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023'],
'Type': ['CC', 'CC', 'CC', 'CC'],
'Low Stat': ['', '', '', ''],
'Low Stat1': ['', '', '', ''],
'Middle Stat': ['0', '0', '0', '0'],
'Middle Stat1': ['0', '0', '0', '0'],
'Re': ['','','',''],
'Set': ['0', '0', '0', '0'],
'Set2': ['0', '0', '0', '0'],
'Set3': ['0', '0', '0', '0'],
'High Stat': ['', '', '', ''],
'High Stat1': ['', '', '', '']}
df = pd.DataFrame(data)
df.melt(id_vars=df.columns[:6], value_name='Values')
Start Date End Area Final Type variable Values
0 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC Middle Stat 226
1 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA Middle Stat 130
2 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC Low Stat 20
3 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA Low Stat 50
4 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC High Stat 10
5 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA High Stat 0
一个选项是使用pyjanitor的pivot_longer -在这种情况下,我们使用特殊占位符.value
来标识我们希望保留为标题的列的部分,而其余部分则被整理成一个新列:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = slice('Start', 'Type'),
names_to = ("Range", ".value"),
names_sep = " ")
)
Start Date End Area Final Type Range Stat Stat1
0 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC Middle 226 0
1 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA Middle 130 0
2 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC Low 20 0
3 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA Low 50 0
4 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 CC High 10 0
5 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 AA High 0 0
您可以尝试先重命名列:
import re
df = df.rename(columns=lambda x: re.sub(r'(Low|Middle|High) Stat', r'Stat1', x))
x = pd.wide_to_long(df,
stubnames='Stat',
i=['Start','Date','End','Area','Final'],
j='Range', suffix=r'(?:Low|Middle|High)').reset_index()
print(x)
打印:
Start Date End Area Final Range Type Stat
0 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 Middle CC 226
1 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 Low CC 20
2 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 High CC 10
3 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 Middle AA 130
4 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 Low AA 50
5 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 High AA 0
编辑:有更多的存根名称:
import re
df = df.rename(columns=lambda x: re.sub(r"(Low|Middle|High) Stat$", r"Stat1", x))
df = df.rename(columns=lambda x: re.sub(r"(Low|Middle|High) Stat1$", r"1Stat1", x))
x = (
pd.wide_to_long(
df,
stubnames=["Stat", "1Stat"],
i=["Start", "Date", "End", "Area", "Final"],
j="Range",
suffix=r"(?:Low|Middle|High)",
)
.reset_index()
.rename(columns={"1Stat": "Stat1"})
)
打印:
Start Date End Area Final Range Type Stat Stat1
0 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 Middle CC 226 0
1 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 Low CC 20 0
2 8/1/2013 9/1/2013 10/1/2013 NY 3/1/2023 High CC 10 0
3 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 Middle AA 130 0
4 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 Low AA 50 0
5 8/1/2013 9/1/2013 10/1/2013 CA 3/1/2023 High AA 0 0