基于行值将单元格从一个panda数据帧覆盖到另一个pandas数据帧



我有两个数据集(空白和填充(,如下所示,我需要将信息从filled复制到blank

blank.head()
| Student name  | Student number    | Mark      | Grade     | Marked by     | Notes     |
|-------------- |----------------   |--------   |--------   |-----------    |--------   |
| John          |16                 | NaN       | NaN       | NaN           | NaN   |
| Mary          |19                 | NaN       | NaN       | NaN           | NaN   |
| Colm          |17                 | NaN       | NaN       | NaN           | NaN   |
| Ellen         |20                 | NaN       | NaN       | NaN           | NaN   |
| Fionna        |21                 | NaN       | NaN       | NaN           | NaN   |
filled.head()
| Student name  | Student number    | Mark  | Grade     | Marked by     | Notes                 |
|-------------- |----------------   |------ |-------    |-----------    |--------------------   |
| Tara          | 31                | 71    | B1        | JL            |        Good           |
| Leah          | 40                | 54    | C2        | CL            | Needs more dragons    |
| john          | 16                | 53    | C2        | MG            |        Good           |
| Aisling       | 200               | 60    | B3        | MOB           |    keep working       |
| Adam          | 88                | 74    | B1        | KOM           |  don't forget apa     |

空白是我想要维护的顺序的主要文件,填写的包含每个学生的成绩和其他信息,但它与df1的顺序不同。

我需要将df2的"Mark"、"Grade"、"Marked by"one_answers"Notes"列复制到df1,保持df1的索引不变,并为每个学生复制正确的信息。

我的第一个想法是使用学生编号作为两个数据帧的索引(我猜这些是唯一的数字(,然后像这样复制:

blank.set_index('student number')
filled.set_index('student number')
list = ['Mark', 'Grade', 'Marked by', 'Notes'] 
blank[list] = filled[list]

但不确定这是否对你有效。。。

EDIT:由于在我之前的代码的注释中提到了一个关键错误,因此按照问题中提供的内容创建了填充和空白的dfs。

输入:

blank= pd.DataFrame({'Student name' : ['John','Mary','Colm','Ellen','Fionna'], "Student number": [16,19,17,20,21], 'Mark' : [np.NaN,np.NaN,np.NaN,np.NaN,np.NaN],'Grade' : [np.NaN,np.NaN,np.NaN,np.NaN,np.NaN], 'Marked by' : [np.NaN,np.NaN,np.NaN,np.NaN,np.NaN], 'Notes' : [np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]})
filled= pd.DataFrame({'Student name' : ['Tara','Leah','john','Aisling','Adam'], "Student number": [31,40,16,200,88], 'Mark' : [71,54,53,60,74],'Grade' : ['B1','C2','C2','B3','B1'], 'Marked by' : ['JL','CL','MG','MOB','KOM'], 'Notes' : ['Good','Needs more dragons','Good','keep working','dont forget apa']})

空白:

Student name  Student number  Mark Grade Marked by Notes
0         John              16  53.0    C2        MG  Good
1         Mary              19   NaN   NaN       NaN   NaN
2         Colm              17   NaN   NaN       NaN   NaN
3        Ellen              20   NaN   NaN       NaN   NaN
4       Fionna              21   NaN   NaN       NaN   NaN

填写:

Student name  Student number  Mark Grade Marked by               Notes
0         Tara              31    71    B1        JL                Good
1         Leah              40    54    C2        CL  Needs more dragons
2         john              16    53    C2        MG                Good
3      Aisling             200    60    B3       MOB        keep working
4         Adam              88    74    B1       KOM     dont forget apa

假设"学生人数"是两个数据帧的公共密钥。以下代码:

blank[['Mark','Grade','Marked by','Notes']] = blank.merge(filled,on='Student number')[['Mark_y', 'Grade_y','Marked by_y','Notes_y']]

输出:

Student name  Student number  Mark Grade Marked by Notes
0         John              16  53.0    C2        MG  Good
1         Mary              19   NaN   NaN       NaN   NaN
2         Colm              17   NaN   NaN       NaN   NaN
3        Ellen              20   NaN   NaN       NaN   NaN
4       Fionna              21   NaN   NaN       NaN   NaN

最新更新