树突:在两个节点中间添加内节点



我对DendroPy很陌生。我想做的事情似乎很简单,但我不知道如何正确地做,而且我在网上找不到任何东西。

我想在现有的有根树形树的两个节点之间添加一个节点。

from dendropy import Tree, Taxon, Node
t1 = Tree.get_from_string(
"(((sp1: 0.35, sp2: 0.15):0.75, sp3:1): 0.5, (sp4: 0.5, sp5: 0.05)MRCA_sp4&sp5: 1)root;",
"newick", rooting='force-rooted'
)
t1.print_plot()
mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())

树和MRCA节点描述

正确发现了sp4和sp5的MRCA。现在我尝试在MRCA和根之间添加一个节点,使用下面的代码:

def add_node_midway_between_2_nodes(lowernode, taxon_label=None, node_label=None):
newtaxon = Taxon(label=taxon_label)
newnode = Node(taxon=newtaxon, label=node_label)
newnode.parent_node = lowernode.parent_node
newnode.edge_length = lowernode.edge_length/2
lowernode.parent_node = newnode
lowernode.edge_length = newnode.edge_length
return newnode
node = add_node_midway_between_2_nodes(mrca, node_label="midway between root and MRCA sp4&sp5")
t1.print_plot()
str_t1 = t1.as_string(schema='newick')
print(str_t1)

根与MRCA sp4&sp5之间有一个节点的树

(和R) (((sp1:0.35 sp2:0.15): 0.75, sp3:1.0): 0.5 ((sp4:0.5, sp5:0.05) MRCA_sp4& sp5:0.5) midway_between_root_and_MRCA_sp4& sp5:0.5)根;

看看情节和字符串,它似乎起作用了。但是当我试图再次计算sp4和sp5的MRCA时,它没有找到"MRCA sp4&sp5&;"除了根节点。

mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())

输出=根节点描述

从sp5遍历parent_node,我仍然找到了"MRCA sp4&sp5"">

出于绝望,我尝试使用字符串str_t1重做树,但它也不起作用,甚至给了我另一个结果(同样不正确):节点在根和MRCA sp4&sp5"

t1 = Tree.get_from_string(
str_t1,
"newick", rooting='force-rooted'
)
mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())

输出=根和MRCA之间的节点描述sp4&sp5"

那么,在两个节点之间添加一个节点的干净方法是什么,并且不会在之后产生奇怪的事件?

Thank you very much

您的代码大部分工作,但您应该update_taxon_namespaceupdate_bipartitions正确应用树拓扑的任何更改,正如文档中建议的那样。所以,在你的例子中,它看起来像这样:

def add_node_midway_between_2_nodes(lowernode, taxon_label=None, node_label=None):
newtaxon = Taxon(label=taxon_label)
newnode = Node(taxon=newtaxon, label=node_label)
newnode.parent_node = lowernode.parent_node
newnode.edge_length = lowernode.edge_length/2
lowernode.parent_node = newnode
lowernode.edge_length = newnode.edge_length
return newnode
node = add_node_midway_between_2_nodes(
mrca, node_label="midway between root and MRCA sp4&sp5"
)
t1.update_taxon_namespace()
t1.update_bipartitions(
suppress_unifurcations=False, suppress_storage=True
)  # suppress_storage is optional, I just do not want to create a bipartitions list
t1.print_plot()
str_t1 = t1.as_string(schema='newick')
print(str_t1)

注!

更新分类名称空间应该先于更新双分区,因为后者必须使用正确的TaxonNamespace。否则,您仍然会得到奇怪的行为。

然而,使用内置Node方法进行精细树重建效果更好。例如,我可以这样重写函数:

def insert_new_node_posterior(
node: Node,
*,
taxon_label: Optional[str] = None,
node_label: Optional[str] = None,
edge_length: Real,
# If it was product or at least reusable in the future code,
# I would add more arguments for proportion specification,
# using height, distance from root &c.
) -> Node:
parent = node.parent_node
if not parent:
raise Exception("You cannot insert a node in posterior to the root.")
new_taxon = Taxon(label=taxon_label)
i = parent.child_nodes().index(node)
parent.remove_child(node)
intermediate_node = parent.insert_new_child(
index=i, taxon=new_taxon, label=node_label, edge_length=edge_length
)
node.edge_length -= edge_length
intermediate_node.add_child(node)
return intermediate_node

node = insert_new_node_posterior(
mrca,
node_label="midway between root and MRCA sp4&sp5",
edge_length=mrca.edge_length / 2
)
t1.update_taxon_namespace()
t1.update_bipartitions(
suppress_unifurcations=False, suppress_storage=True
)
t1.print_plot()
str_t1 = t1.as_string(schema='newick')
print(str_t1)

尽管如此,Tree.mrca仍然显示了一个不正确的节点:

Node object at 0x1be8b959460<Node object at 0x1be8b959460: 'midway between root and MRCA sp4&sp5' (<Taxon 0x1be8cdceeb0 'None'>)>
[Edge]
Edge object at 0x1be8b959400 (1917897249792, Length=0.5)
[Taxon]
Taxon object at 0x1be8cdceeb0: <Unnamed Taxon>
[Parent]
Node object at 0x1be8c3b2b80<Node object at 0x1be8c3b2b80: 'root' (None)>
[Children]
[0] Node object at 0x1be8be4d6a0<Node object at 0x1be8be4d6a0: 'MRCA sp4&sp5' (None)>

虽然,在这种情况下这不是一个bug,因为这只是方法的一个特性。因此在源代码中:

if cms:
# for at least one taxon cm has 1 and bipartition has 1
if cms == leafset_bitmask:
# curr_node has all of the 1's that bipartition has
if cm == leafset_bitmask:
return curr_node  # Vovin's comment: Since there is a unifurcation,
# it returns the current node
# instead of the next iteration
last_match = curr_node
nd_source = iter(curr_node.child_nodes())
else:
# we have reached a child that has some, but not all of the
#   required taxa as descendants, so we return the last_match
return last_match

例如,如果我们向新节点添加一个子节点,它会工作得很好:

node = insert_new_node_posterior(
mrca,
node_label="midway between root and MRCA sp4&sp5",
edge_length=mrca.edge_length / 2
)
node.new_child(label="sp6", edge_length=1, taxon=Taxon(label="sp6"))
t1.update_taxon_namespace()
t1.update_bipartitions(
suppress_unifurcations=False, suppress_storage=True
)
mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())
Node object at 0x1be8c1e22b0<Node object at 0x1be8c1e22b0: 'MRCA sp4&sp5' (None)>
[Edge]
Edge object at 0x1be8c1e2340 (1917906199360, Length=0.5)
[Taxon]
None
[Parent]
Node object at 0x1be8ba8da30<Node object at 0x1be8ba8da30: 'midway between root and MRCA sp4&sp5' (<Taxon 0x1be8ba8d370 'None'>)>
[Children]
[0] Node object at 0x1be8c1e2910<Node object at 0x1be8c1e2910: 'None' (<Taxon 0x1be8c1e2640 'sp4'>)>
[1] Node object at 0x1be8c1e28e0<Node object at 0x1be8c1e28e0: 'None' (<Taxon 0x1be8c1e2e50 'sp5'>)>

最新更新