使用带元组键的dict替换Pandas DataFrame中的列值



我有两个Pandas DataFrames,一个包含我想要更新的数据,另一个提供基于MultiIndex键的查找以设置值。

例如,我有两个csv:

水果.csv

Fruit,Color,State,more,data
Apple,Red,Good,etc.,etc.
Apple,Green,Mouldy,etc.,etc.
Apple,Green,Excellent,etc.,etc.
Pear,Red,Excellent,etc.,etc.
Pear,Green,Good,etc.,etc.
Lime,Green,Bad,etc.,etc.

评级.csv

Fruit,State,Rating
Apple,Excellent,11
Apple,Good,8
Apple,Bad,4
Apple,Mouldy,0
Pear,Excellent,9
Pear,Good,5
Pear,Bad,2
Pear,Mouldy,1
Lime,Excellent,10
Lime,Good,7
Lime,Bad,5
Lime,Mouldy,2

我已经读到DataFrames:

static_data_dir = Path(__file__).resolve().parent
fruit = pd.read_csv(static_data_dir.joinpath("fruit.csv"), index_col=["Fruit","Color"])
rating = pd.read_csv(static_data_dir.joinpath("rating.csv"), index_col=["Fruit","State"])
State  more  data
Fruit Color
Apple Red         Good  etc.  etc.
Green     Mouldy  etc.  etc.
Green  Excellent  etc.  etc.
Pear  Red    Excellent  etc.  etc.
Green       Good  etc.  etc.
Lime  Green        Bad  etc.  etc.
Rating
Fruit State
Apple Excellent      11
Good            8
Bad             4
Mouldy          0
Pear  Excellent       9
Good            5
Bad             2
Mouldy          1
Lime  Excellent      10
Good            7
Bad             5
Mouldy          2

现在想用评级DataFrame的Rating值替换水果DataFrame中的State值,结果如下。

State  more  data
Fruit Color
Apple Red            8  etc.  etc.
Green          0  etc.  etc.
Green         11  etc.  etc.
Pear  Red            9  etc.  etc.
Green          5  etc.  etc.
Lime  Green          5  etc.  etc.

实际上,我想使用pandas.Series.replace,但传入一个带有元组键的dict,但这似乎不受支持。

{'Rating': {('Apple', 'Bad'): 4,
('Apple', 'Excellent'): 11,
('Apple', 'Good'): 8,
('Apple', 'Mouldy'): 0,
('Lime', 'Bad'): 5,
('Lime', 'Excellent'): 10,
('Lime', 'Good'): 7,
('Lime', 'Mouldy'): 2,
('Pear', 'Bad'): 2,
('Pear', 'Excellent'): 9,
('Pear', 'Good'): 5,
('Pear', 'Mouldy'): 1}}

我该如何最好地实现这一目标?

读取两个csv作为正常数据帧,然后通过设置how="left",使用fruit数据帧中的键合并FruitState列。最后设置FruitColor列作为索引。

import pandas as pd
fruit = pd.read_csv("fruit.csv")
rating = pd.read_csv("rating.csv")
fruit['State'] = fruit.merge(rating, on=["Fruit", "State"], how="left")["Rating"]
fruit.set_index(["Fruit","Color"], inplace=True)
print(fruit)
State  more  data
Fruit Color                   
Apple Red        8  etc.  etc.
Green      0  etc.  etc.
Green     11  etc.  etc.
Pear  Red        9  etc.  etc.
Green      5  etc.  etc.
Lime  Green      5  etc.  etc.

最新更新