如何使用pandas read_csv读取包含反斜杠和双引号的csv文件



我有一个这样的CSV文件(逗号分隔)

ID, Name,Context, Location
123,"John","{"Organization":{"Id":12345,"IsDefault":false},"VersionNumber":-1,"NewVersionId":"88229ef9-e97b-4b88-8eba-31740d48fd15","ApiIntegrationType":0,"PortalIntegrationType":0}","Road 1"
234,"Mike","{"Organization":{"Id":23456,"IsDefault":false},"VersionNumber":-1,"NewVersionId":"88229ef9-e97b-4b88-8eba-31740d48fd15","ApiIntegrationType":0,"PortalIntegrationType":0}","Road 2"

我想创建这样的DataFrame:

ID | Name |Context                                                               |Location
123| John |{"Organization":{"Id":12345,"IsDefault":false},"VersionNumber":-1,"NewVersionId":"88229ef9-e97b-4b88-8eba-31740d48fd15","ApiIntegrationType":0,"PortalIntegrationType":0}|Road 1
234| Mike |{"Organization":{"Id":23456,"IsDefault":false},"VersionNumber":-1,"NewVersionId":"88229ef9-e97b-4b88-8eba-31740d48fd15","ApiIntegrationType":0,"PortalIntegrationType":0}|Road 2

你能告诉我如何使用pandas read_csv来做吗?

一个答案-如果你愿意接受字符被剥离:

pd.read_csv(your_filepath, escapechar='\')
ID  Name                                            Context  Location
0  123  John  {"Organization":{"Id":12345,"IsDefault":false}...    Road 1
1  234  Mike  {"Organization":{"Id":23456,"IsDefault":false}...    Road 2

如果你真的想要反斜杠-使用自定义转换器:

def backslash_it(x):
return x.replace('"','\"')
pd.read_csv(your_filepath, escapechar='\', converters={'Context': backslash_it})
ID  Name                                            Context Location
0  123  John  {"Organization":{"Id":12345,"IsDefault":...   Road 1
1  234  Mike  {"Organization":{"Id":23456,"IsDefault":...   Road 2

read_csv上的escapechar用于实际读取csv,然后自定义转换器将反斜杠放回。

注意,我调整了标题行,使列名匹配更容易:

ID,Name,Context,Location

相关内容

  • 没有找到相关文章

最新更新