Pandas APPLY - Dataframe has NULL values



我正在尝试使用电话号码库清理电话号码。我创建了一个函数来获取国家代码&国家号码和存储在'country_code'和'national_number'列

我试图在数据帧上使用apply(),它有嘈杂的数字。由于性能增加,我试图使用apply over循环。下面是代码:

import phonenumbers
import pandas as pd
df_phone = pd.read_csv(r'D:CodeAddress-NominatimAddress-NominatimPhone_Valid.csv',encoding='utf8')
df_phone['country_code'] = ''
df_phone['national_number'] = ''
df_phone['valid']=''
def phone_valid(phone):
try:
#print(phone['PHONE'] + " " + phone['COUNTRY'])
x = phonenumbers.parse(phone['PHONE'],phone['COUNTRY'])
df_phone['country_code'] = x.country_code
df_phone['national_number'] = x.national_number
df_phone['valid']=phonenumbers.is_possible_number(x)
except:
df_phone['country_code'] = "Error"
df_phone['national_number'] = "Error"

df_phone=df_phone.apply(phone_valid,axis=1)
print(df_phone)

但是dataframe df_phone没有值。下面是df_phone

的示例输出
<表类>没有没有tbody><<tr>1没有2没有

当您使用apply时,您不应该分配到数据框架中。(考虑一下您实际上甚至无法访问df_phone(全局)变量的情况。)

相反,只返回apply的新值,以便Pandas分配它们;由于需要返回多个列,因此需要类似(自包含示例;用你的实现替换phone_valid):

import pandas as pd
df_phone = pd.DataFrame({
'PHONE': ['100', '200', '300', '400', '500'],
'COUNTRY': ['FI', 'US', 'SV', 'DE', 'FR'],
})

def parse(phone, country):
return (phone * 3, country[::-1])

def phone_valid(phone):
national, country = parse(phone['PHONE'], phone['COUNTRY'])
return (national, country, True)

df_phone[['national', 'country', 'valid']] = df_phone.apply(phone_valid, axis=1, result_type="expand")
print(df_phone)

输出为

PHONE COUNTRY   national country  valid
0   100      FI  100100100      IF   True
1   200      US  200200200      SU   True
2   300      SV  300300300      VS   True
3   400      DE  400400400      ED   True
4   500      FR  500500500      RF   True

最新更新