Regex、Pandas和标志行



我正试图标记任何包含"用户定义"的记录;不正确的";字符。在这种情况下,记录二(2(应该作为非有效记录返回,但我似乎捕获了记录1或3。这些将被视为";正确">关于为什么这些标记而不是";不正确的记录">

import pandas as pd
import numpy as np
import re
data = {'HOME1': ['123 Main St', '567 Country Road', 'PO Box 900']}
dft = pd.DataFrame(data)
from itertools import chain
chars =[]
acceptable = [x for x in chain(range(48,58),range(32,33), range(65,91), range(97,123))]
for ch in acceptable:
chars.append(chr(ch))
reg_list = map(re.compile,chars)
for x in dft['HOME1']:
print(x)
if any(re.match(x) for re in reg_list):
conditions = [dft['HOME1'].apply(lambda x: x)!=x, dft['HOME1'].apply(lambda x: x)==x]
choices = [0,1]
dft['NonValidHOME1'] = np.select(conditions,choices,default=0)
try:
print(dft.groupby(['NonValidHOME1'])[['HOME1']].filter(lambda x: len(x) ==1).agg(lambda x: x.tolist()))
except:
print("no invalid Home1")

for x in dft['HOME1']:
for c in x:
if c not in chars:
print(c,x)
conditions = [dft['HOME1'].apply(lambda x: x)==x, dft['HOME1'].apply(lambda x: x)!=x]
choices = [1,0]
dft['NonValidHOME1'] = np.select(conditions,choices,default=0)
#[print(c) for x in dft['HOME1'] for c in x if c not in chars]

谢谢你的评论。这让我上了一个";"更好";路径或至少一个让我找到答案的路径。

最新更新