使用 Python/Pandas 合并具有相似值但合并与单个行不同的数据的行



我一直在挖掘堆栈溢出试图解决问题,我每次都接近,但我没有得到我需要的。(这是通用的csv文件,我为示例编造(什么东西.csv

lastName, firstName, address, tool, description
Franks, James, 321 Hammond, hammer, "It hammers"
Franks, James, 321 Hammond, nails, "It Nails stuff"
Phiilips, Tom, 773 James St, mower, "It mows"
Phiilips, Tom, 773 James St, weed-wacker, "It whacks"}

我正在尝试将这些行合并到字典中,以便他们阅读类似内容

Franks: [(hammer, "It hammers"), (nails, "It Nails stuff")]
Phiilips: [(mower, "It mows"),  (weed-wacker, "It whacks")]

想知道这是否可能,或者我只是让事情变得太难了......

这是我到目前为止尝试过的

df3 = pd.read_csv("results.csv", encoding="utf-8", skipinitialspace=True)
df3.groupby("lastname")[["tool","description"]].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

结果:

{Franks: [("hammer", "It hammers"), ("nails", "It Nails stuff")]}
{Franks: [("hammer", "It hammers"), ("nails", "It Nails stuff")]}
{Phiilips:[("mower", "It mows"), ("weed-wacker", "It whacks")]}
{Phiilips:[("mower", "It mows"), ("weed-wacker", "It whacks")]}

还不够好,无法弄清楚为什么我会得到重复的行,但是 没有重复行的这样的东西是我的目标 为。

您可以使用csv模块及其DictReader

import csv
from collections import defaultdict
dd = defaultdict(list)
with open('results.csv', 'r') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        dd[row['lastName']].append((row['tool'], row['description']))

输出:

defaultdict(list,
        {'Franks': [('hammer', 'It hammers'), ('nails', 'It Nails stuff')],
         'Phiilips': [('mower', 'It mows'), ('weed-wacker', 'It whacks')]})

最新更新