Python在csv文件(大文件)中应用lambda函数



我想使用python

将此函数hideEmail应用于我的csv

文件(大文件)的特定列函数示例:

def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text 

Csv文件(大文件>1 gb):

id;Name;firstName;email;profession
100;toto;tata;test@test.com;developer
101;titi;tete;test@test.com;doctor
..
..

csv数据加载到DataFrame:

df = pd.read_csv(r'/path/to/csv')

那么你可以直接使用pd.Series.str.replace,因为它默认支持正则表达式:

df = df.astype(str).apply(lambda x: x.str.replace(r'[^@.]', 'x'), axis=1)

也就是说,如果您想做的只是更改一个大的csv文件,那么pandas可能是多余的。你可以看看sed。下面是一个例子:

sed -E 's/(w+)@(w+)/xxx@xxx/' /path/to/file.csv > /path/to/new_file.csv

没有数据帧就很难知道,但是您可以尝试:

import pandas as pd #import pandas
df = pd.read_csv('enter_file_path_here') #read the data
df['col'] = df['col'].apply(lambda x: hideEmail(x))
#if you want to make it back to a csv:
df.to_csv('name.csv')

使用熊猫

您可以使用前面问题中描述的pandas来应用作为参数传递的函数。

导出得到的数据框,使用这里描述的to_csv函数

import pandas as pd
def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text 

column_name = "email"
df = pd.read_csv(r'Path of your CSV fileFile Name.csv')
df[column_name] = df[column_name].map(hideEmail)
df.to_csv(r'Path where you want to store the exported CSV fileFile Name.csv')

您可以使用内置的map()函数来完成以下操作:

def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text

with open('path/to/csvfile', 'r') as file:
lines = [l.strip().split(';') for l in file.readlines()]
modifiedlines = []       # to store lines after email field is modified 
for i in lines[1:]:         # iterating from index 1 as index 0 is header
i[3] = hideEmail(i[3])       # as email field is at index 3
modifiedlines.append(';'.join(i))     # appending modified line
with open('path/to/csvfile', 'w') as file:
file.writelines(modifiedlines)            # writing the lines back to file

您可以使用内置的map()方法将函数映射到文件的每一行:

import re
def hideEmail(email):
#hide email
text = re.sub(r'[^@.]', 'x', email)
return text 
with open('file.csv', 'r') as r:
r = map(hideEmail, r.readlines())
with open('file2.csv', 'w') as f:
for line in r:
f.write(line + 'n')

EDIT(感谢juanpa。Arrivillaga指出):

r = map(hideEmail, r.readlines())可以只用r = map(hideEmail, r)代替。

相关内容

  • 没有找到相关文章

最新更新