对于每个带有"phone"的熊猫数据帧列，删除非数字

我有一个从csv创建的python数据帧(df(。我想取每个包含"PHONE"(或"PHONE"或"PHONE"(的列名，并将其所有行更改为5555555555的格式。因此：

(555(555-5555将是5555555555，

555-555-5555将是5555555555，

等等

我尝试了以下操作，但出现语法错误。希望我至少有点接近：

phone_format = df.loc[:, df.columns.str.contains('PHONE')]
for col in phone_format:
df['col'] = df.['col'].map(lambda x: x.replace('.', '').replace(' ', '').replace('-', '').replace('(', '').replace(')', ''))

使用filter选择具有"；电话"；(以不区分大小写的方式，使用(?i)phoneregex(和apply与str.replace一起删除非数字，最后update将DataFrame放置到位。

df.update(df.filter(regex='(?i)phone').apply(lambda s: s.str.replace(r'D+', '', regex=True)))

示例：

# before
pHoNe  other Phone  other col
0  (555) 55 5555  555-555-555    (55-55)
# after
pHoNe  other Phone  other col
0  555555555    555555555    (55-55)

可再现输入：

df = pd.DataFrame({'pHoNe': ['(555) 55 5555'], 'other Phone': ['555-555-555'], 'other col': ['(55-55)']})

phone_format = df.loc[:, df.columns.str.contains('PHONE')]
for col in phone_format:    
df[col] = df[col].str.replace(r"D+", "", regex=True)

使用您的代码作为最小工作示例的起点：

df = pd.DataFrame([['(555) 555-5555', '555-555-5555']], columns=['phone', 'Phone'])
phone_format = df.columns[df.columns.str.contains(pat='PHONE', case=False)]
for col in phone_format:
df[col] = df[col].map(lambda x: x.replace('.', '').replace(' ', '').replace('-', '').replace('(', '').replace(')', ''))
df

相关内容

最新更新

热门标签：