如何比较两个列表的组件



我想逐行比较两个列表。如果两行相等,则只将其中一行添加到新数据框架中。如果没有,则将这两行都添加到新的数据框架中。

这是我的两个列表:

original = data_Unos['text']
13      Speaking to Africa Review   he also pointed ou...
17      Through Gawad Kalinga   Meloto has proven to b...
21      May you attain Nibbana Sena   thank you so muc...
22      Dodgeballs were flying fast and hard at Mornin...
26      Most are from desperately poor Horn of Africa ...
...                        
3155    The statement signed by Ikonomwan Francis   le...
3159      Most of them   the homeless   have the abili...
3162      In Metro Manila   7 464 families of disabled...
3163      We are working with an aim to build a countr...
3172    Our hearts go out to the hundreds of thousands...
Name: text, Length: 794, dtype: object

:

backTranslated = backTranslated['text']
backTranslated
0      Talking to Africa Review also noted that most ...
1      Through Gawad Kalinga Meloto has proven to be ...
2      May you reach Nibbana Sena thank you so much f...
3      Dodgeballs were flying fast and hard at Mornin...
4      Most of them are from poor countries in the Ho...
...                        
789    The declaration signed by Ikonomwan Francis le...
790    Most of them homeless have the ability to work...
791    In Metro Manila 7 464 families of disabled cyc...
792    We are working with the objective of building ...
793    Our hearts are directed to the hundreds of tho...
Name: text, Length: 794, dtype: object

这就是我要做的:

final = pd.DataFrame()
for i in original:
for j in backTranslated:
if(set(i)!=set(j)):
final = final.append(i,ignore_index=True) 
final = final.append(j,ignore_index=True) 
else:
final = final.append(i,ignore_index=True) 

但是这一行出现了以下错误:

final = final.append(j,ignore_index=True)
TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

我该怎么做呢?

最简单的方法是同时添加它们并删除重复项:

final = data_Unos.append(backTranslated)
final.drop_duplicates(subset=['text'], inplace=True)

在Pandas中迭代应该是最后一个资源

pandas.DataFrame.append方法自1.4.0以来已弃用,可选择使用pandas.concat方法。

这就是熊猫。定义了Concat方法

熊猫。concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

参数objs,这里需要是一个Series或Dataframe对象。所以在代码中正确的做法是

final = pd.DataFrame()
for i in original:
for j in backTranslated:
series_i = pd.Series(i)
if(set(i)!=set(j)):
series_j = pd.Series(j)
final = pd.concat((final, series_i, series_j), ignore_index=True) 
else:
final = pd.concat((final, series_i), ignore_index=True)

此外,您可以通过key参数定义列名。

最新更新