我对我的数据框架有疑问。在一列中,对于每一行,我都有一个相关人员(人数)的列表和一个人的演讲列表(语音)(相关和无关的人的演讲)。现在,我想选择相关人员的演讲(从人名单中),在此信息中,信息是否相关的信息在另一列列中列表(人体列表)中给出,然后随后将所有演讲添加在一起,同时忽略了无关的演讲。因此,一列提供了我要寻找的姓氏,另一列提供了所有说话者的列表(名字和姓氏)及其演讲,我想创建一个新列,其中添加了相关人员的演讲(被一个空间分开)并存储在各个行中。
所以我的初始数据集看起来像这样:
ticker year quarter personlist jobposition speech
xx 2009 1 ("Angle", "Barth") CEO [("Mike Angle", "Thank you"), ("Barbara Barth", "It is"), ("Will Cook", "Yes, true")]
xx 2009 1 ("Angle", "Barth") CFO [("Mike Angle", "Thank you"), ("Barbara Barth", "It is"), ("Will Cook", "Yes, true")]
xx 2009 2 ("Angle", "Barth") CEO [("Mike Angle", "I am surprised"), ("Barbara Barth", "So am I"), ("Will Cook", "Me too")]
xx 2009 2 ("Angle", "Barth") CFO [("Mike Angle", "I am surprised"), ("Barbara Barth", "So am I"), ("Will Cook", "Me too")]
yy 2008 3 ("Cruz", "Dolm") CEO [("Damien Cruz", "Hello"), ("Lara Dolm", "Nice to meet you"), ("Lara Bel", "You too")]
yy 2008 3 ("Cruz", "Dolm") CFO [("Damien Cruz", "Hello"), ("Lara Dolm", "Nice to meet you"), ("Lara Bel", "You too")]
例如,对于第一个行,我想检查每个键值对,第一个列表条目是否以人体列表的姓氏结束,如果没有继续,则为否,如果是的,则请使用语音部分(即对条目的价值)并将其存储在新列中,为其他列重复并将匹配项添加在一起。因此,我想要以下数据集(我在此处隐藏了初始列语音,但仍应包含它,所以我不想替换它,只需创建一个新列)。
ticker year quarter personlist relevantspeeches
xx 2009 1 ("Angle", "Barth") "Thank you It is"
xx 2009 1 ("Angle", "Barth") "Thank you It is"
xx 2009 2 ("Angle", "Barth") "I am surprised So am I"
xx 2009 2 ("Angle", "Barth") "I am surprised So am I"
yy 2008 3 ("Cruz", "Dolm") "Hello Nice to meet you"
yy 2008 3 ("Cruz", "Dolm") "Hello Nice to meet you"
有人可以帮助我解决这个问题吗?
谢谢!朱莉娅
带有理解列表并应用方法:
def select(row):
return " ".join([said for person in row.personlist
for name,said in row.speech if person in name])
df['relevant'] = df.apply(select,axis=1)
df.relevant
是:
"""
0 Thank you It is
1 Thank you It is
2 I am surprised So am I
3 I am surprised So am I
4 Hello Nice to meet you
5 Hello Nice to meet you
Name: relevant, dtype: object
"""