在HDF5文件(matlab)中查找唯一数据集的全名



我有一个充满HDF5(扩展名.h5)文件的文件夹,我想用matlab打开。每个文件只包含一个数据集:一个矩阵。我能够在文件上循环,但是为了用h5read打开它们,我需要知道数据集名称。我知道如何使用h5info为每个文件手动执行此操作,但我需要以快速的方式为数百个文件执行此操作,不幸的是,这些文件是由不同的人以不一致的方式创建的(其中一个在"组"层深处有矩阵,例如)。

我的问题是,给定Matlab中的file.h5,我怎么能做像

这样的事情
A = h5read('file.h5',...) 

使A等于file.h5中的matrix(文件中唯一的数据集)?

这似乎是一个简单的问题,但我还没有找到一个方法来做。

我发现有人写了一个Matlab脚本来做这件事。

function data=h5load(filename, path)
%
% data = H5LOAD(filename)
% data = H5LOAD(filename, path_in_file)
%
% Load data in a HDF5 file to a Matlab structure.
%
% Parameters
% ----------
%
% filename
%     Name of the file to load data from
% path_in_file : optional
%     Path to the part of the HDF5 file to load
%
% Author: Pauli Virtanen <pav@iki.fi>
% This script is in the Public Domain. No warranty.
if nargin > 1
  path_parts = regexp(path, '/', 'split');
else
  path = '';
  path_parts = [];
end
loc = H5F.open(filename, 'H5F_ACC_RDONLY', 'H5P_DEFAULT');
try
  data = load_one(loc, path_parts, path);
  H5F.close(loc);
catch exc
  H5F.close(loc);
  rethrow(exc);
end

function data=load_one(loc, path_parts, full_path)
% Load a record recursively.
while ~isempty(path_parts) & strcmp(path_parts{1}, '')
  path_parts = path_parts(2:end);
end
data = struct();
num_objs = H5G.get_num_objs(loc);
% 
% Load groups and datasets
%
for j_item=0:num_objs-1,
  objtype = H5G.get_objtype_by_idx(loc, j_item);
  objname = H5G.get_objname_by_idx(loc, j_item);
  
  if objtype == 1
    % Group
    name = regexprep(objname, '.*/', '');
  
    if isempty(path_parts) | strcmp(path_parts{1}, name)
      if ~isempty(regexp(name,'^[a-zA-Z].*'))
    group_loc = H5G.open(loc, name);
    try
      sub_data = load_one(group_loc, path_parts(2:end), full_path);
      H5G.close(group_loc);
    catch exc
      H5G.close(group_loc);
      rethrow(exc);
    end
    if isempty(path_parts)
      data = setfield(data, name, sub_data);
    else
      data = sub_data;
      return
    end
      end
    end
   
  elseif objtype == 2
    % Dataset
    name = regexprep(objname, '.*/', '');
  
    if isempty(path_parts) | strcmp(path_parts{1}, name)
      if ~isempty(regexp(name,'^[a-zA-Z].*'))
    dataset_loc = H5D.open(loc, name);
    try
      sub_data = H5D.read(dataset_loc, ...
          'H5ML_DEFAULT', 'H5S_ALL','H5S_ALL','H5P_DEFAULT');
      H5D.close(dataset_loc);
    catch exc
      H5D.close(dataset_loc);
      rethrow(exc);
    end
    
    sub_data = fix_data(sub_data);
    
    if isempty(path_parts)
      data = setfield(data, name, sub_data);
    else
      data = sub_data;
      return
    end
      end
    end
  end
end
% Check that we managed to load something if path walking is in progress
if ~isempty(path_parts)
  error(sprintf('Path "%s" not found in the HDF5 file', full_path));
end

function data=fix_data(data)
% Fix some common types of data to more friendly form.
if isstruct(data)
  fields = fieldnames(data);
  if length(fields) == 2 & strcmp(fields{1}, 'r') & strcmp(fields{2}, 'i')
    if isnumeric(data.r) & isnumeric(data.i)
      data = data.r + 1j*data.i;
    end
  end
end
if isnumeric(data) & ndims(data) > 1
  % permute dimensions
  data = permute(data, fliplr(1:ndims(data)));
end

原文可在此找到:http://scipy-cookbook.readthedocs.io/items/hdf5_in_Matlab.html

最新更新