如何使用熊猫清理数据



我必须' \n, *, ' ==> 'n *'但我试着用df['Course_content']=df['Course_content'].replace(' \n, *, ','n *',regex=True)但它对我不起作用

>>> df['Course_content'][0]
'The syllabus for this course will cover the following:, \n, *,  The nature and purpose of cost and management accounting, \n, *,  Source documents and coding, \n, *,  Cost classification and measuring, \n, *,  Recording costs, \n, *,  Spreadsheets'
>>> df['Course_content']=df['Course_content'].replace(' \n, *,  ','n *',regex=True)
>>> df['Course_content'][0]
'The syllabus for this course will cover the following:, \n, *,  The nature and purpose of cost and management accounting, \n, *,  Source documents and coding, \n, *,  Cost classification and measuring, \n, *,  Recording costs, \n, *,  Spreadsheets'
>>>

我也尝试使用以下代码,但它也不适用于我

d = {
'Not Mentioned':'',
"rn": "n",
"\r": "n",
'u00a0':' ',
' \n, *,':  "n * ",
' \n,':'n',
}
df=df.replace(d.keys(),d.values(),regex=True)

您可以将这两个参数放入r-string中,并在第一个参数的*之前添加一个。这是必要的,因为*是正则表达式中的特殊元字符,您必须使用额外的和/或r-string将这些字符"转义"为其文字值。

您可以使用:

df['Course_content'] = df['Course_content'].replace(r' \n, *,  ', r'n *', regex=True) 

演示:

data = {'Course_content': ['The syllabus for this course will cover the following:, \n, *,  The nature and purpose of cost and management accounting, \n, *,  Source documents and coding, \n, *,  Cost classification and measuring, \n, *,  Recording costs, \n, *,  Spreadsheets']}
df = pd.DataFrame(data)
df['Course_content'] = df['Course_content'].replace(r' \n, *,  ', r'n *', regex=True) 

结果:

print(df['Course_content'][0])

'The syllabus for this course will cover the following:,n *The nature and purpose of cost and management accounting,n *Source documents and coding,n *Cost classification and measuring,n *Recording costs,n *Spreadsheets'

相关内容

  • 没有找到相关文章

最新更新