修改了字符串熊猫中的阻止七位数字



Background

我有以下 df,它是对字符串熊猫中阻止七位数字的修改

import pandas as pd
df = pd.DataFrame({'Text':['This person num is (111)888-8780 and other',
'dont block 23 here',
'two numbers: 001-002-1234 and here',
'block this (666)6636666',
'1-510-999-9999 is one more'], 
'P_ID': [1,2,3,4,5],
'N_ID' : ['A1', 'A2', 'A3','A4', 'A5']}) 

N_ID    P_ID    Text
0   A1  1   This person num is (111)888-8780 and other
1   A2  2   dont block 23 here
2   A3  3   two numbers: 001-002-1234 and here
3   A4  4   block this (666)6636666
4   A5  5   1-510-999-9999 is one more

目标

1) 用括号阻止所有七位数字,例如(111)888-8780(666)6636666变得**Block**

2) 避免阻塞非七位数字,例如23

3) 创建新列

df['New'] = df['Text'].str.replace(r'((?:[d]-?){7,})','**block**')

输出

N_ID P_ID Text New
0                  This person num is (111)**block** and other
1                  dont block 23 here
2                  two numbers: **block** and here
3                  block this (666)**block**
4                   **block** is one more

但这无法完全阻止(111)888-8780(666)6636666

问题

如何调整str.replace(r'((?:[d]-?){7,})以完全阻止括号中的数字,例如(111)

一种可能性是将要删除的所有字符集包含在字符类中。

df['New'] = df['Text'].str.replace(r'[()d-]{7,}','**block**')

在这里,字符集包括括号、数字和连字符。这些必须至少发生七次。这返回

df['New']
Out[14]: 
0    This person num is **block** and other
1                        dont block 23 here
2           two numbers: **block** and here
3                      block this **block**
4                     **block** is one more
Name: New, dtype: object

最新更新