Python Pandas从csv中提取重复项并聚合值​列的



我有一个CSV文件,由4列a、B、C、D组成。我想:

  • 查找列A、B、C具有相同值的所有重复项
  • 对于这些,取D的值并创建一个没有重复项的单行,其中D列是所有重复项的D列的并集

CSV输入示例:

John,Yes,123,street 1
John,Yes,123,street 2
Tom,No,345,street 1
Tom,No,345,street 2
Tom,No,345,street 3
Jason,Yes,567,street 1
Thomas,No,123,street 1
Jess,No,999,street 1

预期结果:

John,Yes,123,street 1 street 2
Tom,No,345,street 1 street 2 street 3
Jason,Yes,567,street 1
Thomas,No,123,street 1
Jess,No,999,street 1

df.groupby(['A','B','C'])['D'].apply(' '.join).reset_index()

完整代码:

from io import StringIO
df = """A,B,C,D
John,Yes,123,street 1
John,Yes,123,street 2
Tom,No,345,street 1
Tom,No,345,street 2
Tom,No,345,street 3
Jason,Yes,567,street 1
Thomas,No,123,street 1
Jess,No,999,street 1"""
df = pd.read_csv(StringIO(df))
df.groupby(['A','B','C'])['D'].apply(' '.join).reset_index()

输出:

td style="text-align:right;">3
A
0Jason2JohnThomas4Tom

最新更新