数据帧:
Date Clearing Member PAN Trading Member PAN CPCode CPPAN ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE TCM_TM_ID CCD_CLI_CD
0 1/12/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890.12345 M50219 220 220 AC79
1 1/13/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 AD03
2 1/14/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50220 220 220 N552
3 1/15/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 KF99
4 1/16/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 D745
我需要对每一列逐一执行验证,并打印从数据帧中删除的行,并将这些行打印到新的数据帧中。类似地在每个列验证之后过滤掉行。
例如:
NumberRegex = r"^[0-9]d{1,20}(?:.d{1,3})?$"
df=df['ValueofMTM1'].apply(str).str.contains(NumberRegex, regex=True)
这应该删除我的第一行数据帧和输出数据帧应该看起来像,不符合条件的行应该存储在其他数据帧中
Date Clearing Member PAN Trading Member PAN CPCode CPPAN ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE TCM_TM_ID CCD_CLI_CD
1 1/13/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 AD03
2 1/14/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50220 220 220 N552
3 1/15/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 KF99
4 1/16/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 D745
错误行:
Date Clearing Member PAN Trading Member PAN CPCode CPPAN ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE TCM_TM_ID CCD_CLI_CD
0 1/12/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890.12345 M50219 220 220 AC79
下一个示例
df=df['TCM_CM_ID ']== 'M50219'
这将给出输出数据帧,因为第二行与条件不匹配:
Date Clearing Member PAN Trading Member PAN CPCode CPPAN ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE TCM_TM_ID CCD_CLI_CD
1 1/13/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 AD03
3 1/15/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 KF99
4 1/16/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50219 220 220 D745
错误行:
Date Clearing Member PAN Trading Member PAN CPCode CPPAN ... ValueofMTM1 TCM_CM_ID TCM_PRIMARY_MEMBER_CODE TCM_TM_ID CCD_CLI_CD
0 1/12/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890.12345 M50219 220 220 AC79
2 1/14/2021 AAACM6094R AAACM6094R 123 GMAPS5536A ... 1234567890 M50220 220 220 N552
与其他列验证类似。如何执行此活动:
我的代码:
vaex_df2 = pd.read_csv('D:\test\CR6645DOC\GMAPS5536C_01092021_01.csv')
vaex_df3=pd.read_csv('D:\test\CR6645DOC\CM_TM_UCC_2.csv')
df = pd.merge(vaex_df2, vaex_df3, how='inner', left_on=['Clearing Member PAN','Trading Member PAN','Client PAN'], right_on = ['OFF_TM_PANNO','OFF_TM_PANNO_1','CCD_PAN_NO']).drop_duplicates()
df.drop(['OFF_TM_PANNO','OFF_TM_PANNO_1','CCD_PAN_NO'],axis=1, inplace=True)
NumberRegex = r"^[0-9]d{1,20}(?:.d{1,3})?$"
df=df['ValueofMTM'].apply(str).str.contains(NumberRegex, regex=True))
df=df['AType'].isin(['P','C']))
右侧表达式返回True
或False
值的筛选器:df['ValueofMTM'].astype(str).str.contains(NumberRegex, regex=True))
相反,您可以将.loc
与每个条件的否定组合使用,如下所示:
df = df.loc[~df['ValueofMTM'].astype(str).str.contains(NumberRegex, regex=True)), :]
print(df)
常见的模式是:df.loc[<condition as a Series or lambda>, <columns>]
这会将DF限制为满足条件的行。在您的条件下使用否定(~
)运算符来选择补码。