我想使用python
将此函数hideEmail
应用于我的csv
文件(大文件)的特定列函数示例:
def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text
Csv文件(大文件>1 gb):
id;Name;firstName;email;profession
100;toto;tata;test@test.com;developer
101;titi;tete;test@test.com;doctor
..
..
将csv
数据加载到DataFrame
:
df = pd.read_csv(r'/path/to/csv')
那么你可以直接使用pd.Series.str.replace
,因为它默认支持正则表达式:
df = df.astype(str).apply(lambda x: x.str.replace(r'[^@.]', 'x'), axis=1)
也就是说,如果您想做的只是更改一个大的csv
文件,那么pandas
可能是多余的。你可以看看sed
。下面是一个例子:
sed -E 's/(w+)@(w+)/xxx@xxx/' /path/to/file.csv > /path/to/new_file.csv
没有数据帧就很难知道,但是您可以尝试:
import pandas as pd #import pandas
df = pd.read_csv('enter_file_path_here') #read the data
df['col'] = df['col'].apply(lambda x: hideEmail(x))
#if you want to make it back to a csv:
df.to_csv('name.csv')
使用熊猫
您可以使用前面问题中描述的pandas来应用作为参数传递的函数。
导出得到的数据框,使用这里描述的to_csv
函数
import pandas as pd
def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text
column_name = "email"
df = pd.read_csv(r'Path of your CSV fileFile Name.csv')
df[column_name] = df[column_name].map(hideEmail)
df.to_csv(r'Path where you want to store the exported CSV fileFile Name.csv')
您可以使用内置的map()
函数来完成以下操作:
def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text
with open('path/to/csvfile', 'r') as file:
lines = [l.strip().split(';') for l in file.readlines()]
modifiedlines = [] # to store lines after email field is modified
for i in lines[1:]: # iterating from index 1 as index 0 is header
i[3] = hideEmail(i[3]) # as email field is at index 3
modifiedlines.append(';'.join(i)) # appending modified line
with open('path/to/csvfile', 'w') as file:
file.writelines(modifiedlines) # writing the lines back to file
您可以使用内置的map()
方法将函数映射到文件的每一行:
import re
def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text
with open('file.csv', 'r') as r:
r = map(hideEmail, r.readlines())
with open('file2.csv', 'w') as f:
for line in r:
f.write(line + 'n')
EDIT(感谢juanpa。Arrivillaga指出):
r = map(hideEmail, r.readlines())
可以只用r = map(hideEmail, r)
代替。