我正在尝试使用电话号码库清理电话号码。我创建了一个函数来获取国家代码&国家号码和存储在'country_code'和'national_number'列
我试图在数据帧上使用apply(),它有嘈杂的数字。由于性能增加,我试图使用apply over循环。下面是代码:
import phonenumbers
import pandas as pd
df_phone = pd.read_csv(r'D:CodeAddress-NominatimAddress-NominatimPhone_Valid.csv',encoding='utf8')
df_phone['country_code'] = ''
df_phone['national_number'] = ''
df_phone['valid']=''
def phone_valid(phone):
try:
#print(phone['PHONE'] + " " + phone['COUNTRY'])
x = phonenumbers.parse(phone['PHONE'],phone['COUNTRY'])
df_phone['country_code'] = x.country_code
df_phone['national_number'] = x.national_number
df_phone['valid']=phonenumbers.is_possible_number(x)
except:
df_phone['country_code'] = "Error"
df_phone['national_number'] = "Error"
df_phone=df_phone.apply(phone_valid,axis=1)
print(df_phone)
但是dataframe df_phone没有值。下面是df_phone
的示例输出<表类>没有 没有 tbody><<tr>1 没有 2没有 表类>
当您使用apply时,您不应该分配到数据框架中。(考虑一下您实际上甚至无法访问df_phone
(全局)变量的情况。)
相反,只返回apply
的新值,以便Pandas分配它们;由于需要返回多个列,因此需要类似(自包含示例;用你的实现替换phone_valid
):
import pandas as pd
df_phone = pd.DataFrame({
'PHONE': ['100', '200', '300', '400', '500'],
'COUNTRY': ['FI', 'US', 'SV', 'DE', 'FR'],
})
def parse(phone, country):
return (phone * 3, country[::-1])
def phone_valid(phone):
national, country = parse(phone['PHONE'], phone['COUNTRY'])
return (national, country, True)
df_phone[['national', 'country', 'valid']] = df_phone.apply(phone_valid, axis=1, result_type="expand")
print(df_phone)
输出为
PHONE COUNTRY national country valid
0 100 FI 100100100 IF True
1 200 US 200200200 SU True
2 300 SV 300300300 VS True
3 400 DE 400400400 ED True
4 500 FR 500500500 RF True