如何驱动空间分离的值文件

我有一个巨大的文件，其中包含如下所示的空间分离值：

key1 0.553 1.45 0.666
key2 2.66 1.77 0.001
...

我想通过使用搁架(或您建议的任何其他最合适的模块(将此文件作为字典驱动。通过这种方式，我可以将第一列查询为键，结果将以下值作为列表，即。

In [1]: with shelve.open("file") as db:
   ...:    print db["key2"]
   ...:
Out [1]: [2.66, 1.77, 0.001]

非常感谢您的支持。

注释：...可能有效地检索文件末尾的项目的可能方法？

添加offset参数。
如果将逻辑实现到class DictFloatReader中，则可以自动化。

def __getitem__(self, item):
    offset = 0
    if isinstance(item, tuple):
        offset = item[1]
        item = item[0]
    self.fh.seek(offset)
# Usage
print(db["key2", 300*1024])

如果您的keys是预留，例如。1、2、3、4或A，B，C，您可以使用btree搜索。这将导致每个key的搜索时间几乎相同。
切换到 real 数据库fileformat，提供 indexing 和随机访问。
在内存中握住它"在内存中保持并不是一个选项"

这将执行您想要的事情，例如：

class DictFloatReader(object):
    def __init__(self, fpath):
        self.fpath = fpath
        self.fh = None
    def __enter__(self):
        self.fh = open(self.fpath)
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.fh.close()
    def __getitem__(self, item):
        self.fh.seek(0)
        for line in self.fh:
            if line.startswith(item):
                return [float(f) for f in line[:-1].split(' ')[1:]]

用法

with DictFloatReader('file') as db:
    print(db["key2"])
    print(db["key1"])
    print(db["key2"])

输出
[2.66，1.77，0.001]
[0.553，1.45，0.666]
[2.66，1.77，0.001]

用Python测试：3.4.2和2.7.9

相关内容

最新更新

热门标签：