Pandas:通过遍历现有列中的值来计算重复值的数量,从而填充数据框中的特定列



我有一个样本数据,其中包含一些虚假的银行账户信息,其中有一个名为' account indicator'的列,我需要根据重复的账户数量填充该列。

数据框看起来像这样:

Customer code   Account name  Account number  Account balance  Account indicator
0         20041      Tom Brown       24911467        30155.00                  
1         20041      Tom Brown       25193860           45.00                  
2         20021     Kate Perry       24092030        20450.00                  
3         20021     Kate Perry       24092007           50.00                  
4         20153    Harry Smith       20641387         5000.00                  
5         20154    Bella Smith       20641387         5000.00                  
6         20149   John William       20753196         7000.00                  
7         20144  Emily William       20753196         5000.00                  
8         20146  Rosie William       20753196         2000.00   
  1. 帐号指示灯必须以"00"和数字开头。它的工作方式是,例如(请参考下表),'20641387'的账号已经重复了两次因为哈利·史密斯和贝拉·史密斯的账号是一样的。因此,这个索引的Account指示器是'002"。这被称为联合账户。

我希望在Account指示器中看到的输出是:

Customer code   Account name  Account number  Account balance  Account indicator
0         20041      Tom Brown       24911467       30155.00      001             
1         20041      Tom Brown       25193860          45.00      001             
2         20021     Kate Perry       24092030       20450.00      001             
3         20021     Kate Perry       24092007          50.00      001             
4         20153    Harry Smith       20641387        5000.00      002             
5         20154    Bella Smith       20641387        5000.00      002             
6         20149   John William       20753196        7000.00      003             
7         20144  Emily William       20753196        5000.00      003             
8         20146  Rosie William       20753196        2000.00      003

此外,对于指标中显示的任何联名账户(例如002,003),我还需要检查其账号的账户余额是否相等。从上面我们可以看到,Harry和Bella的数字"20641387"的余额都是5000.00。这是正确的。但是,'20753196'的余额是不正确的,因为它们不相等。

我还需要建议和帮助检查是否有多个帐户。这可以通过检查Account name列对于两个不同的Account number是否具有相同的名称来识别。例如,汤姆·布朗和凯特·佩里。

下面是我尝试过的,任何建议都将非常感激!

import pandas as pd
import numpy as np
data = {
'Customer code': ['20041', '20041', '20021', '20021', '20153', '20154', '20149', '20144', '20146'],
'Account name': ['Tom Brown', 'Tom Brown', 'Kate Perry', 'Kate Perry', 'Harry Smith', 'Bella Smith', 'John William', 'Emily William', 'Rosie William'],
'Account number': ['24911467', '25193860', '24092030', '24092007', '20641387', '20641387', '20753196', '20753196', '20753196'],
'Account balance': ['30155.00', '45.00', '20450.00', '50.00', '5000.00', '5000.00', '7000.00', '5000.00', '2000.00'],
'Account indicator': ''}

account_details = pd.DataFrame(data)

dups_account_number = account_details.pivot_table(columns=['Account number'], aggfunc='size')
#Populate Account_Indicator with the right code
#each value in the Account indicator column = '00' + the count of its duplicated Account number
for i in account_details['Account indicator']:
for j in account_details['Account number']:
if account_details['Account number'].duplicated().any() == True:
account_details['Account indicator'] = account_details['Account indicator'].apply(lambda x: '00' + dups_account_number)
print('There are joint accounts')
else:
account_details['Account indicator'] = '001' #001 means 1 account number
#Here --> Code to check if the Account balances for a joint account are equal
#Could use --> if account_details['Account indicator'] != 001 to start with? Because 001 indicators are definitely not the joint account


#Here --> Code to check for multiple accounts if one name has 2 or more different Account_numbers


print(account_details)

请分享你的想法!谢谢你!

从您的描述和示例代码中可以看出,共同账户的Account Indicator只取决于同一Account number的分录数。因此,您可以使用groupby()Account number进行分组,然后在size上使用.transform()来获得相同帐号的条目计数(重复和单一条目)。

通过.astype()进一步将计数转换为字符串,并通过str.zfill()填充为3位数字,如下所示:

account_details['Account indicator'] = (
account_details.groupby('Account number')['Account number']
.transform('size')
.astype(str)
.str.zfill(3)
)

结果:

print(account_details)
Customer code   Account name  Account number  Account balance Account indicator
0          20041      Tom Brown        24911467          30155.0               001
1          20041      Tom Brown        25193860             45.0               001
2          20021     Kate Perry        24092030          20450.0               001
3          20021     Kate Perry        24092007             50.0               001
4          20153    Harry Smith        20641387           5000.0               002
5          20154    Bella Smith        20641387           5000.0               002
6          20149   John William        20753196           7000.0               003
7          20144  Emily William        20753196           5000.0               003
8          20146  Rosie William        20753196           2000.0               003

第2部分:检查联合帐户的帐户余额是否相等

可以在Account number上分组检查联合账户的Account余额是否相等,在nunique上使用.transform()检查Account balance的唯一计数。如果此唯一计数不等于1,则它们是无效的联合帐户:

account_details['Joint A/C Valid?'] = (
account_details.groupby('Account number')['Account balance']
.transform('nunique')
.eq(1)
)

结果:

print(account_details)
Customer code   Account name  Account number  Account balance Account indicator  Joint A/C Valid?
0          20041      Tom Brown        24911467          30155.0               001              True
1          20041      Tom Brown        25193860             45.0               001              True
2          20021     Kate Perry        24092030          20450.0               001              True
3          20021     Kate Perry        24092007             50.0               001              True
4          20153    Harry Smith        20641387           5000.0               002              True
5          20154    Bella Smith        20641387           5000.0               002              True
6          20149   John William        20753196           7000.0               003             False
7          20144  Emily William        20753196           5000.0               003             False
8          20146  Rosie William        20753196           2000.0               003             False

第3部分:检查多个帐户是否有相同的名称

可以在Account name上分组检查多个相同名称的帐户,在nunique上使用.transform()检查Account number的唯一计数。如果此唯一计数大于1,则存在多个相同名称的帐户。

特别提示:相同姓名的不同人在同一家银行开户并不罕见。因此,您应该将此作为可能的情况同一人的多个帐户。在这种情况下,通过每个人都有唯一编号的身份证件进行检查更准确。

代码:

account_details['Multiple A/C?'] = (
account_details.groupby('Account name')['Account number']
.transform('nunique')
.gt(1)
)
结果:

print(account_details)

Customer code   Account name  Account number  Account balance Account indicator  Joint A/C Valid?  Multiple A/C?
0          20041      Tom Brown        24911467          30155.0               001              True           True
1          20041      Tom Brown        25193860             45.0               001              True           True
2          20021     Kate Perry        24092030          20450.0               001              True           True
3          20021     Kate Perry        24092007             50.0               001              True           True
4          20153    Harry Smith        20641387           5000.0               002              True          False
5          20154    Bella Smith        20641387           5000.0               002              True          False
6          20149   John William        20753196           7000.0               003             False          False
7          20144  Emily William        20753196           5000.0               003             False          False
8          20146  Rosie William        20753196           2000.0               003             False          False

您可以尝试以下操作:

df['Account number'] = df['Account number'].astype(str)
counts_ser = df['Account number'].value_counts()
# Now we create a dictionary for using as a mapper in replace method
mapper = {k: str(counts_ser[k]).zfill(3) for k in df['Account number'].value_counts().index}
df['Account indicator'] = df['Account number'].replace(mapper)

输出:

Account number  Account indicator
0   24911467             001
1   25193860             001
2   24092030             001
3   24092007             001
4   20641387             002
5   20641387             002
6   20753196             003
7   20753196             003
8   20753196             003

如何映射" Account "指示器";usinggroupby/transform+nunique.

请用关于余额的问题的另一部分的示例更新您的输出:

account_details['Account indicator'] = (account_details.groupby('Account number')
['Account name']
.transform('nunique')
.map('{:03d}'.format)
)

输出:

Customer code   Account name Account number Account balance Account indicator
0         20041      Tom Brown       24911467        30155.00               001
1         20041      Tom Brown       25193860           45.00               001
2         20021     Kate Perry       24092030        20450.00               001
3         20021     Kate Perry       24092007           50.00               001
4         20153    Harry Smith       20641387         5000.00               002
5         20154    Bella Smith       20641387         5000.00               002
6         20149   John William       20753196         7000.00               003
7         20144  Emily William       20753196         5000.00               003
8         20146  Rosie William       20753196         2000.00               003

平衡:

account_details['Balance indicator'] = (account_details.groupby('Account number')
['Account balance']
.transform('nunique')
.eq(1)
.map({True: 'valid balance', False: 'invalid balance'})
)

最新更新