如何通过引用单个列对 CSV 文件中的行重新排序?



在Test1.csv中,在条目列第二行之后的所有字符串中,我想编写一个代码,根据Test2.csv中条目列的顺序对Test1的所有行进行排序.csv。 我将不胜感激你的建议。感谢您的合作。 这是此数据的简化版本(超过 1000 行)。

import pandas as pd
input_path1 = "Test1.csv"
input_path2 = "Test2.csv"
output_path = "output.csv"
df1 = pd.read_csv(filepath_or_buffer=input_path1, encoding="utf-8")
df2 = pd.read_csv(filepath_or_buffer=input_path2, encoding="utf-8")
(df1.merge(df2, how='left', on='Entry')
.set_index('Entry')
.drop('Number_x', axis='columns')
.rename({'Number_y': 'Number'}, axis='columns')
.to_csv(output_path)

错误按摩

Traceback (most recent call last):
File "narabekae.py", line 28, in <module>
.drop('Number_x', axis='columns')
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/frame.py", line 4102, in drop
errors=errors,
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/generic.py", line 3914, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/generic.py", line 3946, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5340, in drop
raise KeyError("{} not found in axis".format(labels[mask]))
KeyError: "['Number_x'] not found in axis"

输出我想要的

,V1,V2,>sp,Entry,details,PepPI
1,OS=Ha,MTNKG,>sp,A4G4K7,HFQ_HERAR,7.028864399
2,OS=Sh,MAKGQ,>sp,B4TFA6,HFQ_SALHS,7.158609631
3,OS=Oi,MAQSV,>sp,Q8EQQ9,HFQ_OCEIH,9.229953074
4,OS=Bc,MAERS,>sp,A9M5C4,HFQ_BRUC2,8.154348935
5,OS=Re,MAERS,>sp,Q2K8U6,HFQ_RHIEC,8.154348935

测试1.csv

,V1,V2,>sp,Entry,details,PepPI
1,OS=Re,MAERS,>sp,Q2K8U6,HFQ_RHIEC,8.154348935
2,OS=Sh,MAKGQ,>sp,B4TFA6,HFQ_SALHS,7.158609631
3,OS=Ha,MTNKG,>sp,A4G4K7,HFQ_HERAR,7.028864399
4,OS=Bc,MAERS,>sp,A9M5C4,HFQ_BRUC2,8.154348935
5,OS=Oi,MAQSV,>sp,Q8EQQ9,HFQ_OCEIH,9.229953074

测试2.csv

pI,Molecular weight (average),Entry,Entry name,Organism
6.82,8763.13,A4G4K7,HFQ_HERAR,Rat
6.97,11119.33,B4TFA6,HFQ_SALHS,Pig
9.22,8438.69,Q8EQQ9,HFQ_OCEIH,Bacteria
7.95,8854.28,A9M5C4,HFQ_BRUC2,Mouse
7.95,9044.5,Q2K8U6,HFQ_RHIEC,Human

其他信息

macOS10.15.4 Python3.7.3 Atom

要对列进行重新排序,只需按所需的顺序定义列列表,并使用df[columns];

In [17]: columns = ["V1","V2",">sp","Entry","details","PepPI"]
In [18]: df = df1.merge(df2, how='left', on='Entry')
In [19]: df[columns]
Out[19]:
V1     V2  >sp   Entry    details     PepPI
0  OS=Re  MAERS  >sp  Q2K8U6  HFQ_RHIEC  8.154349
1  OS=Sh  MAKGQ  >sp  B4TFA6  HFQ_SALHS  7.158610
2  OS=Ha  MTNKG  >sp  A4G4K7  HFQ_HERAR  7.028864
3  OS=Bc  MAERS  >sp  A9M5C4  HFQ_BRUC2  8.154349
4  OS=Oi  MAQSV  >sp  Q8EQQ9  HFQ_OCEIH  9.229953

当然,您可以使用to_csv()方法正常保存它:

df[columns].to_csv(output_path)

笔记

  • 给定的数据无法重现错误,因为数据帧中没有Numberdf1df2.
  • 如果你想把它保存在中间的.csv中,你不应该set_index("Entry")(因为在"输出我想要的"中,你有简单的基于整数的索引)。

最新更新