如何将数据集保存到正确的HDF5组



我正在运行一个脚本,该脚本遍历文件夹并创建一组组来复制HDF5文件中的目录。然后,我将浏览这些文件,并将文件中的数据添加到HDF5数据集。然而,我希望将数据集保存到正确的组中,但不确定如何保存?

#Development Script Save files in groups
#Create HDF5 File name
TestFilename = 'N:/TestingPyhonHDF5/progress/automation/GroupTesting.h5'
#Set target directory to extract data 
TargetFolder = 'N:MeasurementsT2+Rx-BB001'
# giving file extensions
bin = ('.bin')
csv = ('.CSV')
tmp = ('.tmp')
head = ('.head')
ext = ('.bin', '.head', '.tmp', '.CSV')
# Create HDF5 Strucutre
with h5py.File(TestFilename,'w') as tf:
for root, dirs, _ in os.walk(TargetFolder, topdown=True):
#print(f'ROOT: {root}')
# for Windows, modify root: remove drive letter and replace backslashes:
grp_name = root[2:].replace( '\', '/')
#print(f'grp_name: {grp_name}n')
tf.create_group(grp_name)
#Open HDF5 file
with h5py.File(TestFilename,'a') as tfile:
#Iterate files to send to HDF5 file
for path, dirc, files in os.walk(TargetFolder):
for file in files:
if file.endswith(bin):
# Create a dtype with the binary data format and the desired column names
filePath = os.path.join(path, file)
dt = np.dtype('B')
data = np.fromfile(filePath, dtype=dt)
df = pd.DataFrame(data)
#Save as csv
savetxt('TempData.csv', df, delimiter=',')
#Read bin to HDF5
dfBIN = pd.read_csv('TempData.csv')  
tfile.create_dataset(grp_name/file, data=dfBIN) #put data in hdf file
#add attrs
os.remove("TempData.csv")
else:
continue

当前代码显示错误

TypeError                                 Traceback (most recent call last)
Cell In [51], line 39
37 #Read bin to HDF5
38 dfBIN = pd.read_csv('TempData.csv')  
---> 39 tfile.create_dataset(grp_name/file, data=dfBIN) #put data in hdf file
40 #add attrs
41 os.remove("TempData.csv")
TypeError: unsupported operand type(s) for /: 'str' and 'str'

/仅适用于联接pathlib.Path对象,并且您有字符串。

只做

f"{grp_name}/{file}"

posixpath.join(grp_name, file)

(无论您在哪个平台上,posixpath都能确保前向斜线(

您自然也需要在第二个循环中进行相同的grp_name确定:

grp_name = path[2:].replace('\', '/')

否则,您将使用前面循环中grp_name的最后一个值。

总而言之,你可能只想做一个循环。由于我不知道h5py是否忽略了尝试(重新(创建一个已经存在的组,我添加了一个跟踪已经创建的路径的集合。此外,中间的CSV文件似乎很无关。

import os
import posixpath
import pandas as pd
import numpy as np
import h5py
TestFilename = "N:/TestingPyhonHDF5/progress/automation/GroupTesting.h5"
TargetFolder = "N:MeasurementsT2+Rx-BB001"
groups_created = set()
with h5py.File(TestFilename, "a") as tfile:
for path, dirc, files in os.walk(TargetFolder):
for file in files:
if file.endswith(".bin"):
grp_name = path[2:].replace("\", "/")
if grp_name not in groups_created:
tfile.create_group(grp_name)
groups_created.add(grp_name)
# Create a dtype with the binary data format and the desired column names
filePath = os.path.join(path, file)
dt = np.dtype("B")
data = np.fromfile(filePath, dtype=dt)
df = pd.DataFrame(data)
tfile.create_dataset(posixpath.join(grp_name, file), data=df)

最新更新