Python,NetCDF4:需要为NetCDF创建无限时间维度的诱导



有人可以介绍如何为 NetCDF 文件创建无限时间维度吗?我试图使用data.createDimension('t', None), 但是当我看t它是一个 Numpy 数组时。如果可能的话,也请介绍一下给它赋值。 我正在使用python 2.7。

编辑问题

我有多个 NetCDF 文件(3 维),对于每个文件,我必须计算一个数组(3 维)。文件之间的时间步长为 3 小时。现在,我必须使用每个时间步长的计算数组创建一个新的 NetCDF。我的问题是,我不知道如何访问时间轴,以便我可以将计算数组分配给不同的时间步长。

编辑问题

我想为时间轴分配一个日期。为了创建日期,我使用了这样的datetime

t_start = dt.datetime(1900,1,1)
t_delta = dt.timedelta(hours=3)

两个时间步长之间的时间为 3 小时。循环文件时,时间步长的日期计算如下:

t_mom = t_start + i*t_delta
t_mom_str = t_mom.strftime("%d %B %Y %H  %M  %S")
t_mom_var = netCDF4.stringtochar(np.array([t_mom_str]))

我创建了一个这样的变量:

time = data.createVariable('time', np.float32, ('time'))

现在我想将日期分配给时间变量:

time[i] = t_mom_var[:]

但它不是这样工作的。感谢您的帮助。

createDimensionNone一起使用应该可以:

import netCDF4 as nc4
import numpy as np
f = nc4.Dataset('test.nc', 'w')
# Create the unlimited time dimension:
dim_t = f.createDimension('time', None)
# Create a variable `time` using the unlimited dimension:
var_t = f.createVariable('time', 'int', ('time'))
# Add some values to the variable:
var_t[:] = np.arange(10)
f.close()

这将导致 (ncdump -h test.nc):

netcdf test {
dimensions:
time = UNLIMITED ; // (10 currently)
variables:
int64 time(time) ;
}

对于更新的问题,一个如何通过添加新的无限维度将多个文件合并为一个的最小工作示例:

import netCDF4 as nc4
import numpy as np
# Lets quickly create 3 NetCDF files with 3 dimensions
for i in range(3):
f = nc4.Dataset('test_{0:1d}.nc'.format(i), 'w')
# Create the 3 dimensions
dim_x = f.createDimension('x', 2)
dim_y = f.createDimension('y', 3)
dim_z = f.createDimension('z', 4)
var_t = f.createVariable('temperature', 'double', ('x','y','z'))
# Add some dummy data
var_t[:,:,:] = np.random.random(2*3*4).reshape(2,3,4)
f.close()
# Now the actual merging:
# Get the dimensions (sizes) from the first file:
f_in = nc4.Dataset('test_0.nc', 'r')
dim_size_x = f_in.dimensions['x'].size
dim_size_y = f_in.dimensions['y'].size
dim_size_z = f_in.dimensions['z'].size
dim_size_t = 3
f_in.close()
# Create new NetCDF file:
f_out = nc4.Dataset('test_merged.nc', 'w')
# Add the dimensions, including an unlimited time dimension:
dim_x = f_out.createDimension('x', dim_size_x)
dim_y = f_out.createDimension('y', dim_size_y)
dim_z = f_out.createDimension('z', dim_size_z)
dim_t = f_out.createDimension('time', None)
# Create new variable with 4 dimensions
var_t = f_out.createVariable('temperature', 'double', ('time','x','y','z'))
# Add the data
for i in range(3):
f_in = nc4.Dataset('test_{0:1d}.nc'.format(i), 'r')
var_t[i,:,:,:] = f_in.variables['temperature'][:,:,:]
f_in.close()
f_out.close()

>@Bart是正确的,但没有回答您问题的第二部分。您需要创建一个按时间维度标注的时间变量。

import numpy as np
import dateutil.parser
# create a time variable, using the time dimension.
var_t = nc4.createVariable('time', 'int32', ('time'))
var_t.setncattr('units', 'seconds since 1970-01-01 00:00:00 UTC')
# create a start time
dt = dateutil.parser.parse("2017-05-01T00:00)
ntime = nc4.date2num(dt, var_t.units)
# add some hours
times = [ntime, ntime + 3600, ntime + 7200]
# Not sure but you may need a numpy array
times = np.array([times])
var_t[:] = times

您可以通过xarrayxr.open_dataset()读取NetCDF文件:

# Get all the files as a list and open them as Datasets
import glob
folder = '<folder directory with files>'
ncfiles = glob.glob(folder+'*.nc')
ds_l = [ xr.open_dataset(i) for i in ncfiles]
# To make this a stand alone example, i'll just create a list of Datasets too
ds = xr.Dataset( data_vars={'data': ( [ 'lon', 'lat',], arr)}, 
coords={'lat': np.arange(30), 'lon': np.arange(50)}, ) 
ds_l = [ds]*5

现在,您可以将日期添加为新坐标:
(这里我用pandas'pd.data_range()方法制作日期列表)

# List of dates
start = datetime.datetime(1900,1,1)
end = datetime.datetime(1900,1,5)
import pandas as pd
dates = pd.date_range( start, end, freq='3H')
# Now add these dates to the datasets
for n, ds in enumerate( ds_l ):
ds.coords['time'] = dates[n]

然后,您可以通过xr.concat()方法沿时间轴连接,并通过xr.to_netdf()方法另存为netCDF (注意将时间维度设置为无限制)

# Then concatenate them:
ds = xr.concat( ds_l, dim='time' )
ds.to_netcdf('mynewfile.nc', unlimited_dims={'time':True})

最新更新