持久化文件树结构和节点引用



我有一个递归树数据结构,其中每个节点可以有零个或多个子节点,就像代码片段中一样。

public class TreeNode
{
public string Value { get; set; }
public List<TreeNode> Children { get; set; }
}

有一个根节点,所有其他节点都是它的子节点(或其子节点的子节点…(这是一个大集合(数百万个项目(,所以我不想在内存中加载。我正在考虑一种将其保存/持久保存到文件中的方法。对于每一项,我都要保存每个子项的值(字符串(和文件中的位置。

|value of node1|number of children|child1 location (int), child2, ...,childN|

|value of node2|number of childs|child locations of node 2|

不确定我是否解释得很好,但我希望Children的指针/引用是文件中的实际位置。我的问题是预先计算每个节点项开始的位置。我想这将是一种递归方法,为每个节点保存其值、子节点的数量和每个子节点的位置。问题是,对于每个节点,值的长度都是可变的,并且项的数量不是固定的。

public class TreeNode
{
public string Value { get; set; }
public void Test()
{
BinaryWriter binaryWriter = new BinaryWriter(File.OpenWrite("my file.dat"));
//node 1 has 2 childs
//node 1
binaryWriter.Write("node1 value");
binaryWriter.Write(2); //2 children
binaryWriter.Write(10); // location of child 1
binaryWriter.Write(20); // location of child 2
//child 1
binaryWriter.BaseStream.Position = 10;
binaryWriter.Write("child1 value");
binaryWriter.Write(5); //5 children
// write locations of childen of child 1
//child2
binaryWriter.BaseStream.Position = 20;
binaryWriter.Write("child2 value");
binaryWriter.Write(3); //3 children
// write locations of childen of child 2
//...
//continue for every 
}
}

文件的使用将类似

  1. 转到根位置0
  2. 读取值和任何子位置,即456和789
  3. 到达456号位置并阅读
  4. 转到789

这还没有经过测试,但类似的东西应该可以工作:

public long WriteNode(BinaryWriter writer, TreeNode node) 
{
// record the position where we will write this node
long initialPosition = writer.BaseStream.Position;

// write node content
writer.Write(node.Value);

// write number of children
writer.Write(node.Children.Count);

// write child position vector initialized with zeros 
long vectorPosition = writer.BaseStream.Position;
for (int i = 0; i < node.Children.Count; ++i) 
{
writer.Write(0L);
}
// write children, updating position vector as we go
for (int i = 0; i < node.Children.Count; ++i)
{
// write the child and get the position where it was written
var childNode = node.Children[i];
long childPosition = WriteNode(writer, childNode);

// compute the file position where we need to store the child's position
long childVectorPosition = vectorPosition + (i * sizeof(long));

// write the child's file position into the vector while retaining the current file position
long currentPosition = writer.BaseStream.Position;
writer.BaseStream.Seek(childVectorPosition, SeekOrigin.Begin); 
writer.Write(childPosition);
writer.BaseStream.Seek(currentPosition, SeekOrigin.Begin); 
}
// return file position where this node was written
return initialPosition;
}

递归地编写节点,将子位置初始化为零,并在必要时回溯以使用实际值更新子位置。

最新更新