我正在使用一个包,该软件包对于列表中的每个元素,在文件中打印以下行:
Entry Entry name Status Protein names Gene names Organism
A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200 Ostreococcus lucimarinus virus 7
Entry Entry name Status Protein names Gene names Organism
A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159 Ostreococcus lucimarinus virus 7
Entry Entry name Status Protein names Gene names Organism
A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c Ostreococcus lucimarinus virus 7
... x 1000
因此,如果我用大熊猫打开此文件,我会得到一个数据框架,例如:
>>> blast
Entry Entry name Status Protein names Gene names
0 A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200
1 NaN NaN NaN NaN NaN
2 A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159
3 NaN NaN NaN NaN NaN
4 Entry Entry name Status Protein names Gene names
5 A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c
我只想创建一个带有colnames的数据框架:
Entry Entry name Status Protein names Gene names Organism
A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200 Ostreococcus lucimarinus virus 7
A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159 Ostreococcus lucimarinus virus 7
A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c Ostreococcus lucimarinus virus 7
您知道使用Python3中使用Pandas的方法吗?
更新的数据框架:
Entry Entry name Status Protein names Gene names
0 A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200
2 A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159
4 Entry Entry name Status Protein names Gene names
5 A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c
第4行仍然具有行名。
因此,获得该类型输出的一种方法是删除NAN值。
所以你可以做, blast.dropna(inplace=True)
blast.drop(blast[blast['Entry'] == 'Entry'].index, inplace=True)
这应该有效。