如何使用熊猫清理数据

我必须' \n, *, ' ==> 'n *'但我试着用df['Course_content']=df['Course_content'].replace(' \n, *, ','n *',regex=True)但它对我不起作用

>>> df['Course_content'][0]
'The syllabus for this course will cover the following:, \n, *,  The nature and purpose of cost and management accounting, \n, *,  Source documents and coding, \n, *,  Cost classification and measuring, \n, *,  Recording costs, \n, *,  Spreadsheets'
>>> df['Course_content']=df['Course_content'].replace(' \n, *,  ','n *',regex=True)
>>> df['Course_content'][0]
'The syllabus for this course will cover the following:, \n, *,  The nature and purpose of cost and management accounting, \n, *,  Source documents and coding, \n, *,  Cost classification and measuring, \n, *,  Recording costs, \n, *,  Spreadsheets'
>>>

我也尝试使用以下代码，但它也不适用于我

d = {
'Not Mentioned':'',
"rn": "n",
"\r": "n",
'u00a0':' ',
' \n, *,':  "n * ",
' \n,':'n',
}
df=df.replace(d.keys(),d.values(),regex=True)

您可以将这两个参数放入r-string中，并在第一个参数的*之前添加一个。这是必要的，因为和*是正则表达式中的特殊元字符，您必须使用额外的和/或r-string将这些字符"转义"为其文字值。

您可以使用：

df['Course_content'] = df['Course_content'].replace(r' \n, *,  ', r'n *', regex=True)

演示：

data = {'Course_content': ['The syllabus for this course will cover the following:, \n, *,  The nature and purpose of cost and management accounting, \n, *,  Source documents and coding, \n, *,  Cost classification and measuring, \n, *,  Recording costs, \n, *,  Spreadsheets']}
df = pd.DataFrame(data)
df['Course_content'] = df['Course_content'].replace(r' \n, *,  ', r'n *', regex=True)

结果：

print(df['Course_content'][0])

'The syllabus for this course will cover the following:,n *The nature and purpose of cost and management accounting,n *Source documents and coding,n *Cost classification and measuring,n *Recording costs,n *Spreadsheets'

演示：

相关内容

最新更新

热门标签：