我正在尝试为列做屏蔽,需要在各自列的单个单元格中基于字符数进行屏蔽



在下面的代码中,我试图做完全屏蔽,我实现了完全屏蔽,但问题是在下面的代码中,我使用静态"maskvalue">但是,这需要基于特定列的单元格值的动态。

def masking(filename,columnname,value):
maskvalue = "XXXXXXXXX"
column_dataset1 = pd.read_csv(filename)
print(column_dataset1)
if value == 0:
# mask entire row
maskvalue = "XXXXXXXXX"
column_dataset1[columnname] = column_dataset1[columnname].astype(str).str[:0] + maskvalue
# print(column_dataset1)
elif value == '':
column_dataset1[columnname] = column_dataset1[columnname].astype(str).str[:0] + maskvalue
print(column_dataset1)
else:
column_dataset1[columnname] = column_dataset1[columnname].astype(str).str[:-value] + maskvalue
print(column_dataset1)
masking("path/to/file","phonenumber",0)

例如,我使用以下数据:

sno,Name,Type 1,Type 2,phonenumber
1,Bulbasaur,Grass,Poison,987654321256464684846646464611631646466464
2,Ivysaur,Grass,Poison,98765432121314564645663114646464666432016364
3,Venusaur,Grass,Poison,9876543212
3,VenusaurMega Venusaur,Grass,Poison,9876543212
4,Charmander,Fire,Flying,9876543212

,我得到这样的输出:

sno                       Name Type 1  Type 2 phonenumber
0     1                  Bulbasaur  Grass  Poison   XXXXXXXXX
1     2                    Ivysaur  Grass  Poison   XXXXXXXXX
2     3                   Venusaur  Grass  Poison   XXXXXXXXX
3     3      VenusaurMega Venusaur  Grass  Poison   XXXXXXXXX
4     4                 Charmander   Fire  Flying   XXXXXXXXX

这里如果我选择column为"Type 1">遮罩字符应该是"XXXXX">有5个字符预期的输出:

sno,Name,Type 1,Type 2,phonenumber
1,Bulbasaur,XXXXX,Poison,987654321256464684846646464611631646466464
2,Ivysaur,XXXXX,Poison,98765432121314564645663114646464666432016364
3,Venusaur,XXXXX,Poison,9876543212
3,VenusaurMega Venusaur,XXXXX,Poison,9876543212
4,Charmander,XXXX,Flying,9876543212

注意:字符串和整数列都应该屏蔽

这样做怎么样?

column_dataset1[columnname] = ['X'*len(i) for i in column_dataset1[columnname].astype(str)]

最新更新