我有一个样本数据,其中包含一些虚假的银行账户信息,其中有一个名为' account indicator'的列,我需要根据重复的账户数量填充该列。
数据框看起来像这样:
Customer code Account name Account number Account balance Account indicator
0 20041 Tom Brown 24911467 30155.00
1 20041 Tom Brown 25193860 45.00
2 20021 Kate Perry 24092030 20450.00
3 20021 Kate Perry 24092007 50.00
4 20153 Harry Smith 20641387 5000.00
5 20154 Bella Smith 20641387 5000.00
6 20149 John William 20753196 7000.00
7 20144 Emily William 20753196 5000.00
8 20146 Rosie William 20753196 2000.00
- 帐号指示灯必须以"00"和数字开头。它的工作方式是,例如(请参考下表),'20641387'的账号已经重复了两次因为哈利·史密斯和贝拉·史密斯的账号是一样的。因此,这个索引的Account指示器是'002"。这被称为联合账户。
我希望在Account指示器中看到的输出是:
Customer code Account name Account number Account balance Account indicator
0 20041 Tom Brown 24911467 30155.00 001
1 20041 Tom Brown 25193860 45.00 001
2 20021 Kate Perry 24092030 20450.00 001
3 20021 Kate Perry 24092007 50.00 001
4 20153 Harry Smith 20641387 5000.00 002
5 20154 Bella Smith 20641387 5000.00 002
6 20149 John William 20753196 7000.00 003
7 20144 Emily William 20753196 5000.00 003
8 20146 Rosie William 20753196 2000.00 003
此外,对于指标中显示的任何联名账户(例如002,003),我还需要检查其账号的账户余额是否相等。从上面我们可以看到,Harry和Bella的数字"20641387"的余额都是5000.00。这是正确的。但是,'20753196'的余额是不正确的,因为它们不相等。
我还需要建议和帮助检查是否有多个帐户。这可以通过检查Account name列对于两个不同的Account number是否具有相同的名称来识别。例如,汤姆·布朗和凯特·佩里。
下面是我尝试过的,任何建议都将非常感激!
import pandas as pd
import numpy as np
data = {
'Customer code': ['20041', '20041', '20021', '20021', '20153', '20154', '20149', '20144', '20146'],
'Account name': ['Tom Brown', 'Tom Brown', 'Kate Perry', 'Kate Perry', 'Harry Smith', 'Bella Smith', 'John William', 'Emily William', 'Rosie William'],
'Account number': ['24911467', '25193860', '24092030', '24092007', '20641387', '20641387', '20753196', '20753196', '20753196'],
'Account balance': ['30155.00', '45.00', '20450.00', '50.00', '5000.00', '5000.00', '7000.00', '5000.00', '2000.00'],
'Account indicator': ''}
account_details = pd.DataFrame(data)
dups_account_number = account_details.pivot_table(columns=['Account number'], aggfunc='size')
#Populate Account_Indicator with the right code
#each value in the Account indicator column = '00' + the count of its duplicated Account number
for i in account_details['Account indicator']:
for j in account_details['Account number']:
if account_details['Account number'].duplicated().any() == True:
account_details['Account indicator'] = account_details['Account indicator'].apply(lambda x: '00' + dups_account_number)
print('There are joint accounts')
else:
account_details['Account indicator'] = '001' #001 means 1 account number
#Here --> Code to check if the Account balances for a joint account are equal
#Could use --> if account_details['Account indicator'] != 001 to start with? Because 001 indicators are definitely not the joint account
#Here --> Code to check for multiple accounts if one name has 2 or more different Account_numbers
print(account_details)
请分享你的想法!谢谢你!
从您的描述和示例代码中可以看出,共同账户的Account Indicator
只取决于同一Account number
的分录数。因此,您可以使用groupby()
对Account number
进行分组,然后在size
上使用.transform()
来获得相同帐号的条目计数(重复和单一条目)。
通过.astype()
进一步将计数转换为字符串,并通过str.zfill()
填充为3位数字,如下所示:
account_details['Account indicator'] = (
account_details.groupby('Account number')['Account number']
.transform('size')
.astype(str)
.str.zfill(3)
)
结果:
print(account_details)
Customer code Account name Account number Account balance Account indicator
0 20041 Tom Brown 24911467 30155.0 001
1 20041 Tom Brown 25193860 45.0 001
2 20021 Kate Perry 24092030 20450.0 001
3 20021 Kate Perry 24092007 50.0 001
4 20153 Harry Smith 20641387 5000.0 002
5 20154 Bella Smith 20641387 5000.0 002
6 20149 John William 20753196 7000.0 003
7 20144 Emily William 20753196 5000.0 003
8 20146 Rosie William 20753196 2000.0 003
第2部分:检查联合帐户的帐户余额是否相等
可以在Account number
上分组检查联合账户的Account余额是否相等,在nunique
上使用.transform()
检查Account balance
的唯一计数。如果此唯一计数不等于1
,则它们是无效的联合帐户:
account_details['Joint A/C Valid?'] = (
account_details.groupby('Account number')['Account balance']
.transform('nunique')
.eq(1)
)
结果:
print(account_details)
Customer code Account name Account number Account balance Account indicator Joint A/C Valid?
0 20041 Tom Brown 24911467 30155.0 001 True
1 20041 Tom Brown 25193860 45.0 001 True
2 20021 Kate Perry 24092030 20450.0 001 True
3 20021 Kate Perry 24092007 50.0 001 True
4 20153 Harry Smith 20641387 5000.0 002 True
5 20154 Bella Smith 20641387 5000.0 002 True
6 20149 John William 20753196 7000.0 003 False
7 20144 Emily William 20753196 5000.0 003 False
8 20146 Rosie William 20753196 2000.0 003 False
第3部分:检查多个帐户是否有相同的名称
可以在Account name
上分组检查多个相同名称的帐户,在nunique
上使用.transform()
检查Account number
的唯一计数。如果此唯一计数大于1
,则存在多个相同名称的帐户。
特别提示:相同姓名的不同人在同一家银行开户并不罕见。因此,您应该将此作为可能的情况同一人的多个帐户。在这种情况下,通过每个人都有唯一编号的身份证件进行检查更准确。
代码:
account_details['Multiple A/C?'] = (
account_details.groupby('Account name')['Account number']
.transform('nunique')
.gt(1)
)
结果:print(account_details)
Customer code Account name Account number Account balance Account indicator Joint A/C Valid? Multiple A/C?
0 20041 Tom Brown 24911467 30155.0 001 True True
1 20041 Tom Brown 25193860 45.0 001 True True
2 20021 Kate Perry 24092030 20450.0 001 True True
3 20021 Kate Perry 24092007 50.0 001 True True
4 20153 Harry Smith 20641387 5000.0 002 True False
5 20154 Bella Smith 20641387 5000.0 002 True False
6 20149 John William 20753196 7000.0 003 False False
7 20144 Emily William 20753196 5000.0 003 False False
8 20146 Rosie William 20753196 2000.0 003 False False
您可以尝试以下操作:
df['Account number'] = df['Account number'].astype(str)
counts_ser = df['Account number'].value_counts()
# Now we create a dictionary for using as a mapper in replace method
mapper = {k: str(counts_ser[k]).zfill(3) for k in df['Account number'].value_counts().index}
df['Account indicator'] = df['Account number'].replace(mapper)
输出:
Account number Account indicator
0 24911467 001
1 25193860 001
2 24092030 001
3 24092007 001
4 20641387 002
5 20641387 002
6 20753196 003
7 20753196 003
8 20753196 003
如何映射" Account "指示器";usinggroupby
/transform
+nunique
.
请用关于余额的问题的另一部分的示例更新您的输出:
account_details['Account indicator'] = (account_details.groupby('Account number')
['Account name']
.transform('nunique')
.map('{:03d}'.format)
)
输出:
Customer code Account name Account number Account balance Account indicator
0 20041 Tom Brown 24911467 30155.00 001
1 20041 Tom Brown 25193860 45.00 001
2 20021 Kate Perry 24092030 20450.00 001
3 20021 Kate Perry 24092007 50.00 001
4 20153 Harry Smith 20641387 5000.00 002
5 20154 Bella Smith 20641387 5000.00 002
6 20149 John William 20753196 7000.00 003
7 20144 Emily William 20753196 5000.00 003
8 20146 Rosie William 20753196 2000.00 003
平衡:
account_details['Balance indicator'] = (account_details.groupby('Account number')
['Account balance']
.transform('nunique')
.eq(1)
.map({True: 'valid balance', False: 'invalid balance'})
)