打印不符合正则表达式条件或特定条件的数据帧的某些行



数据帧:

Date Clearing Member PAN Trading Member PAN   CPCode       CPPAN  ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE  TCM_TM_ID  CCD_CLI_CD
0    1/12/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890.12345    M50219                     220        220        AC79
1    1/13/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        AD03
2    1/14/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50220                     220        220        N552
3    1/15/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        KF99
4    1/16/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        D745

我需要对每一列逐一执行验证,并打印从数据帧中删除的行,并将这些行打印到新的数据帧中。类似地在每个列验证之后过滤掉行。

例如:

NumberRegex = r"^[0-9]d{1,20}(?:.d{1,3})?$"
df=df['ValueofMTM1'].apply(str).str.contains(NumberRegex, regex=True)

这应该删除我的第一行数据帧和输出数据帧应该看起来像,不符合条件的行应该存储在其他数据帧中

Date Clearing Member PAN Trading Member PAN   CPCode       CPPAN  ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE  TCM_TM_ID  CCD_CLI_CD
1    1/13/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        AD03
2    1/14/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50220                     220        220        N552
3    1/15/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        KF99
4    1/16/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        D745

错误行:

Date Clearing Member PAN Trading Member PAN   CPCode       CPPAN  ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE  TCM_TM_ID  CCD_CLI_CD
0    1/12/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890.12345    M50219                     220        220        AC79

下一个示例

df=df['TCM_CM_ID ']== 'M50219'

这将给出输出数据帧,因为第二行与条件不匹配:

Date Clearing Member PAN Trading Member PAN   CPCode       CPPAN  ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE  TCM_TM_ID  CCD_CLI_CD
1    1/13/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        AD03
3    1/15/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        KF99
4    1/16/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50219                     220        220        D745

错误行:

Date Clearing Member PAN Trading Member PAN   CPCode       CPPAN  ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE  TCM_TM_ID  CCD_CLI_CD
0    1/12/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890.12345    M50219                     220        220        AC79
2    1/14/2021          AAACM6094R         AAACM6094R      123  GMAPS5536A  ...  1234567890    M50220                    220        220        N552

与其他列验证类似。如何执行此活动:

我的代码:

vaex_df2 = pd.read_csv('D:\test\CR6645DOC\GMAPS5536C_01092021_01.csv')
vaex_df3=pd.read_csv('D:\test\CR6645DOC\CM_TM_UCC_2.csv')
df = pd.merge(vaex_df2, vaex_df3,  how='inner', left_on=['Clearing Member PAN','Trading Member PAN','Client PAN'], right_on = ['OFF_TM_PANNO','OFF_TM_PANNO_1','CCD_PAN_NO']).drop_duplicates()
df.drop(['OFF_TM_PANNO','OFF_TM_PANNO_1','CCD_PAN_NO'],axis=1, inplace=True)
NumberRegex = r"^[0-9]d{1,20}(?:.d{1,3})?$"
df=df['ValueofMTM'].apply(str).str.contains(NumberRegex, regex=True))
df=df['AType'].isin(['P','C']))

右侧表达式返回TrueFalse值的筛选器:df['ValueofMTM'].astype(str).str.contains(NumberRegex, regex=True))相反,您可以将.loc与每个条件的否定组合使用,如下所示:

df = df.loc[~df['ValueofMTM'].astype(str).str.contains(NumberRegex, regex=True)), :]
print(df)

常见的模式是:df.loc[<condition as a Series or lambda>, <columns>]这会将DF限制为满足条件的行。在您的条件下使用否定(~)运算符来选择补码。

最新更新