我想逐行比较两个列表。如果两行相等,则只将其中一行添加到新数据框架中。如果没有,则将这两行都添加到新的数据框架中。
这是我的两个列表:
original = data_Unos['text']
13 Speaking to Africa Review he also pointed ou...
17 Through Gawad Kalinga Meloto has proven to b...
21 May you attain Nibbana Sena thank you so muc...
22 Dodgeballs were flying fast and hard at Mornin...
26 Most are from desperately poor Horn of Africa ...
...
3155 The statement signed by Ikonomwan Francis le...
3159 Most of them the homeless have the abili...
3162 In Metro Manila 7 464 families of disabled...
3163 We are working with an aim to build a countr...
3172 Our hearts go out to the hundreds of thousands...
Name: text, Length: 794, dtype: object
:
backTranslated = backTranslated['text']
backTranslated
0 Talking to Africa Review also noted that most ...
1 Through Gawad Kalinga Meloto has proven to be ...
2 May you reach Nibbana Sena thank you so much f...
3 Dodgeballs were flying fast and hard at Mornin...
4 Most of them are from poor countries in the Ho...
...
789 The declaration signed by Ikonomwan Francis le...
790 Most of them homeless have the ability to work...
791 In Metro Manila 7 464 families of disabled cyc...
792 We are working with the objective of building ...
793 Our hearts are directed to the hundreds of tho...
Name: text, Length: 794, dtype: object
这就是我要做的:
final = pd.DataFrame()
for i in original:
for j in backTranslated:
if(set(i)!=set(j)):
final = final.append(i,ignore_index=True)
final = final.append(j,ignore_index=True)
else:
final = final.append(i,ignore_index=True)
但是这一行出现了以下错误:
final = final.append(j,ignore_index=True)
TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
我该怎么做呢?
最简单的方法是同时添加它们并删除重复项:
final = data_Unos.append(backTranslated)
final.drop_duplicates(subset=['text'], inplace=True)
在Pandas中迭代应该是最后一个资源
pandas.DataFrame.append
方法自1.4.0以来已弃用,可选择使用pandas.concat
方法。
这就是熊猫。定义了Concat方法
熊猫。concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
参数objs
,这里需要是一个Series或Dataframe对象。所以在代码中正确的做法是
final = pd.DataFrame()
for i in original:
for j in backTranslated:
series_i = pd.Series(i)
if(set(i)!=set(j)):
series_j = pd.Series(j)
final = pd.concat((final, series_i, series_j), ignore_index=True)
else:
final = pd.concat((final, series_i), ignore_index=True)
此外,您可以通过key
参数定义列名。