Regex验证不适用于列Pandas中的大数字



我正在尝试验证数据框架中特定正则表达式上的列。数字的限制是(20,3),即int数据类型的最大长度为20,float数据类型的最大长度为23。但是pandas将原始数字转换为随机整数,我的正则表达式验证失败了。我检查了我的正则表达式是否正确。

Dataframe:

FirstColumn,SecondColumn,ThirdColumn
111900987654123.123,111900987654123.123,111900987654123.123
111900987654123.12,111900987654123.12,111900987654123.12
111900987654123.1,111900987654123.1,111900987654123.1
111900987654123,111900987654123,111900987654123
111900987654123,-111900987654123,-111900987654123
-111900987654123.123,-111900987654123.123,-111900987654123.1
-111900987654123.12,-111900987654123.12,-111900987654123.12
-111900987654123.1,-111900987654123.1,-111900987654123.1
11119009876541231111,1111900987654123,1111900987654123

代码:

NumberValidationRegexnegative = r"^-?[0-9]{1,20}(?:.[0-9]{1,3})?$"
df_CPCodeDF=pd.read_csv("D:\FTPLocalUser\NCCLCOLL\COLLATERALUPLOAD\upld\SplitFiles\AACCR6675H_22102021_07_1 - Copy.csv")
pd.set_option('display.float_format', '{:.3f}'.format)
rslt_df2=df_CPCodeDF[df_CPCodeDF.iloc[:, 0].notna()]
rslt_df1=rslt_df2[~rslt_df2.iloc[:,0].apply(str).str.contains(NumberValidationRegexnegative, regex=True)].index   
print("rslt_df1",rslt_df1)   

输出结果:

rslt_df1 Int64Index([8], dtype='int64')

预期结果:

rslt_df1 Int64Index([], dtype='int64')

使用dtype=str作为pd.read_csv的参数:

NumberValidationRegexnegative = r"^-?[0-9]{1,20}(?:.[0-9]{1,3})?$"
df_CPCodeDF = pd.read_csv("data.csv", dtype=str)
rslt_df2 = df_CPCodeDF[df_CPCodeDF.iloc[:, 0].notna()]
rslt_df1 = rslt_df2[~rslt_df2.iloc[:,0] 
.str.contains(NumberValidationRegexnegative, regex=True)].index

输出:

>>> print("rslt_df1", rslt_df1)
rslt_df1 Int64Index([], dtype='int64')

相关内容

  • 没有找到相关文章

最新更新