如何根据特定条件将Pandas dataframe中的元素替换为列表值?



我有一个CSV文件,其中包含2列,查询和描述。这是文件的示例描述:-

| Query                                        | Description |
| --------                                     | -------------- |
| What is the type of <mach-name> machine>    |  <mach-name> is ...       |
| What is the use of <mach-name> machine>     |  The use of <mach-name> is ...         |
| How long it takes to rain in <state-name>   | It rains for ... hours in <state-name>          |
| What is the best restaurant in <state-name> | <state-name>'s best food is in ...         |
|
...
etc.

每个查询列和描述列都有这样的唯一字符串。假设通过Pandas将CSV文件读入数据框df。目标是根据特定条件替换<mach-name><>型元素。

这些替换需要通过替换标签<>

mach_name = ["Drilling", "ABC", XYZ".... etc.]
state_name = ["New York", "London", "Delhi"... etc.]

示例:任意一行的"查询列"one_answers"描述列"中出现"if(<mach-name>)",替换通过mach_name列表中相应的元素来标记。因此,例如,如果mach_name列表有10个元素,则需要将更多这样的句子附加到数据框df中。预期的输出如下所示:

| Query                                   | Description |
| --------                                | -------------- |
| What is the type of Drilling machine.   |  Drilling is ...        |
| What is the type of ABC machine.        |  ABC is ...        |
| What is the type of XYZ machine.        |  XYZ is ...      |
| What is the use of Drilling machine     |  The use of Drilling is ...        |
| What is the use of ABC machine          |  The use of ABC is ...       |
| What is the use of XYZ machine.         |  The use of XYZ is ...       |
| How long it takes to rain in New York   | It rains for ... hours in New York          |
| How long it takes to rain in London     | It rains for ... hours in London          |
| How long it takes to rain in Delhi      | It rains for ... hours in Delhi          |
| What is the best restaurant in New York | New York's best food is in ...         |
| What is the best restaurant in London   | London's best food is in ...         |
| What is the best restaurant in Delhi    |Delhi's best food is in ...         |
|

…等。

我希望使用str.replace()执行一个简单的Python替换,但它可能涉及到一个for循环来迭代Pandas数据框,所以答案建议不要迭代数据框,但我找不到一个明确的方法来替换基于这些条件的值,同时也根据列表元素添加新的行。任何帮助/指导是感激的。谢谢。

如果您读取原始csv,处理它,然后将结果转换为pandas数据框架,这将更容易,但如果您需要之前读取数据框架,这可能是一个选项:

data=[ {"query": "What is the type of <mach-name> machine>", "description": "<mach-name> is ..."},
{"query": "What is the use of <mach-name> machine>", "description": "The use of <mach-name> is ..."},
{"query": "How long it takes to rain in <state-name>", "description": "It rains for ... hours in <state-name>"}]

df = pd.DataFrame(data)
#mark rows that should that satisfy the conditions
df["replace_mach"] = df['query'].str.contains('<mach-name>') & 
df['description'].str.contains('<mach-name>')
df["replace_state"] = df['query'].str.contains('<state-name>') & 
df['description'].str.contains('<state-name>')

dfs_list = []
mach_name = ["Drilling", "ABC", "XYZ"]
state_name = ["New York", "London", "Delhi"]

for n in mach_name:
aux = df[df["replace_mach"]].copy()
aux["query"] = aux["query"].str.replace(r"\<mach-name>",n)
aux["description"] = aux["description"].str.replace(r"\<mach-name>",n)
dfs_list.append(aux)

for n in state_name:
aux = df[df["replace_state"]].copy()
aux["query"] = aux["query"].str.replace(r"\<state-name>",n)
aux["description"] = aux["description"].str.replace(r"\<state-name>",n)
dfs_list.append(aux)

# add records without wild cards to dataframe
dfs_list.append(df[~((df["replace_mach"])|(df["replace_state"]))]
replaced_df = pd.concat(dfs_list)
replaced_df

最新更新