合并几个数据框架,仅保留一组Colnames



我正在使用一个包,该软件包对于列表中的每个元素,在文件中打印以下行:

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A20CSC4   A0A20CSC4_1PHYC unreviewed  Uncharacterized protein OlL7_200    Ostreococcus lucimarinus virus 7
Entry   Entry name  Status  Protein names   Gene names  Organism
A0A0P0DZ8   A0A0PCDZ8_9PLYC unreviewed  Uncharacterized protein OlL7_159    Ostreococcus lucimarinus virus 7
Entry   Entry name  Status  Protein names   Gene names  Organism
A0A1P0BY71  A0A1P0BY71_9PHYC    unreviewed  Uncharacterized protein OlL7_111c   Ostreococcus lucimarinus virus 7

... x 1000

因此,如果我用大熊猫打开此文件,我会得到一个数据框架,例如:

>>> blast
        Entry        Entry name      Status            Protein names  Gene names
0   A0A20CSC4   A0A20CSC4_1PHYC  unreviewed  Uncharacterized protein    OlL7_200
1         NaN               NaN         NaN                      NaN         NaN
2   A0A0P0DZ8   A0A0PCDZ8_9PLYC  unreviewed  Uncharacterized protein    OlL7_159
3         NaN               NaN         NaN                      NaN         NaN
4       Entry        Entry name      Status            Protein names  Gene names
5  A0A1P0BY71  A0A1P0BY71_9PHYC  unreviewed  Uncharacterized protein   OlL7_111c

我只想创建一个带有colnames的数据框架:

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A20CSC4   A0A20CSC4_1PHYC unreviewed  Uncharacterized protein OlL7_200    Ostreococcus lucimarinus virus 7
A0A0P0DZ8   A0A0PCDZ8_9PLYC unreviewed  Uncharacterized protein OlL7_159    Ostreococcus lucimarinus virus 7
A0A1P0BY71  A0A1P0BY71_9PHYC    unreviewed  Uncharacterized protein OlL7_111c   Ostreococcus lucimarinus virus 7

您知道使用Python3中使用Pandas的方法吗?

更新的数据框架:

        Entry        Entry name      Status            Protein names  Gene names
0   A0A20CSC4   A0A20CSC4_1PHYC  unreviewed  Uncharacterized protein    OlL7_200
2   A0A0P0DZ8   A0A0PCDZ8_9PLYC  unreviewed  Uncharacterized protein    OlL7_159
4       Entry        Entry name      Status            Protein names  Gene names
5  A0A1P0BY71  A0A1P0BY71_9PHYC  unreviewed  Uncharacterized protein   OlL7_111c

第4行仍然具有行名。

因此,获得该类型输出的一种方法是删除NAN值。

所以你可以做, blast.dropna(inplace=True)

blast.drop(blast[blast['Entry'] == 'Entry'].index, inplace=True)

这应该有效。

最新更新