如何利用特定地区的netcdf数据绘制特定条件下降雨季节的第一天开始?



我有NetCDF日降水数据,维度:时间:153(我裁剪了NC文件,所以它的第一个日期是8月1日),经度:401,纬度:121。

我想用这个条件在某些地区计算和绘制雨季的第一天:雨季开始日期的定义是,在开始日期后的30天内,连续5天降雨量至少为40毫米,而其后连续10天降雨量至少为5毫米。8月1日以后开始计算

我试着在空间上绘制它,但我想仅仅绘制一年的数据要花很多时间,因为我必须绘制10年的数据。所以,我正在寻找一种更方便的方法来做到这一点,而我目前正在做一些代码只有一个点(我想要的日期空间绘制为特定区域),如下所示:

import pandas as pd
import xarray as xr
import numpy as np
file='CMA.nc'
data = xr.open_dataset(file)
precip = data['tp']
#Single point 
point = precip.sel(lon=106.11, lat=-6.11, method='nearest')
point.plot()
def wet_onset_date(data):
array = data.values
count1 = 0 
count2 = 5 
wet_onset = []
onset_date = []
while count2 <= array.size:
wet_onset.append(array[count1:count2].sum())
tonset_date.append(count1)
count1 += 1
count2 += 1

'''dry spell'''
count3 = 5
count4 = 5+30
thirty = []
dry_spell = []
while count4 <= array.size:
thirty.append(array[count3:count4])

for each_30 in thirty:
count5 = 0
count6 = 11
weekly_sum = []
while count6 <= thirty[0].size:
weekly_sum.append(each_30[count5:count6].sum())
count5 += 1
count6 += 1
if np.min(weekly_sum) <= 5:
dry_spell.append(True)
else:
dry_spell.append(False)

count3 += 1
count4 += 1 

wet_onset_final = wet_onset[:len(dryspell)]
onset_final_date = onset_date[:len(dry_spell)]
for rain, not_dry, date in zip(wet_onset_final, dry_spell, onset_final_date):
if (rain >= 40) and (not_dry == false):
target_date = data.isel(time=date).time.values
return target_date
break
on = wet_onset_date(point)
print(on)
>> 2017-11-27T00:00:00.000000000

让我们从这个问题的最小可重复示例(MRE)开始。您需要一个包含降水数组的数据集,其中包含至少一整年的每日时间序列数据,以及其他几个维度:

import xarray as xr, pandas as pd, numpy as np
x = np.arange(-110.5, 100)
y = np.arange(30.5, 40)
time = pd.date_range('2020-01-01', '2022-12-31', freq='D')
# generate random precip-ish data
random_lognorm = np.exp(np.random.random(size=(len(time), len(y), len(x)))) * 200
# random seasonal-ish mask
raining = (
(time.dayofyear.values.reshape(-1, 1, 1)
* np.random.random(size=random_lognorm.shape)) > 40
)
# finally, precip is the rain array * the "is raining" array
pr = random_lognorm * raining
# now we can construct an xarray Dataset with this data to form our MRE
ds = xr.Dataset(
{'pr': (('time', 'lat', 'lon'), pr)},
coords={'lat': y, 'lon': x, 'time': time},
)

是这样的:

In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions:  (time: 1096, lat: 10, lon: 211)
Coordinates:
* lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
* lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
* time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2022-12-31
Data variables:
pr       (time, lat, lon) float64 0.0 0.0 0.0 0.0 ... 413.6 308.0 386.9

与numpy和pandas中的性能类似,为了有效地处理xarray对象中的大型数组,最好弄清楚如何使用数组操作,而不是遍历元素。这对于窗口/滚动操作来说绝对是正确的。查看xarray用户指南中的滚动窗口操作指南-这是对这个主题的有用介绍。

我不完全理解你在这里试图应用的所有条件,但我可以把一些东西放入一个快速的演示中,希望对你有帮助。

xarray中一个非常有用的特性是滚动模块的construct方法。DataArrayRollingDatasetRolling对象的这个方法(分别)返回一个重构的DataArray/Dataset,并带有一个进入原始数组的滚动窗口。因此,下面我指定滚动窗口time=30。构造方法给出了一个重构的"视图"。这是一种内存高效的重塑数据的方式,它提供了一个新的维度(我将其命名为"窗口")。),您可以使用滚动数据。

In [8]: rolled = ds.pr.rolling(time=30, min_periods=30).construct('window')
In [9]: rolled
Out[9]:
<xarray.DataArray 'pr' (time: 1096, lat: 10, lon: 211, window: 30)>
array([[[[         nan,          nan,          nan, ...,          nan,
nan,   0.        ],
[         nan,          nan,          nan, ...,          nan,
nan,   0.        ],
[         nan,          nan,          nan, ...,          nan,
nan,   0.        ],
...
...,
[443.96641513, 524.82969347, 419.95639311, ...,   0.        ,
500.87393858, 413.55965161],
[352.36603332, 427.1653476 , 236.46898157, ..., 469.71452213,
235.31558598, 308.02273055],
[396.360887  , 520.49089188, 242.73958665, ..., 234.32972887,
252.48534392, 386.93237596]]]])
Coordinates:
* lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
* lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
* time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2022-12-31
Dimensions without coordinates: window

我们可以处理这个窗口维度,就好像它是我们数据集中每30天的一组。现在我们可以定义一个任意复杂的函数来降低window维数:

def complex_condition(rolled):
# first 5 days are > 40mm
first_5d_over_40mm = (rolled.isel(window=slice(None, 5)) > 40).all(dim='window')
# first 30 days are > 5 mm
all_30d_over_5mm = (rolled > 5).all(dim='window')
# result is True when both conditions are met
return first_5d_over_40mm & all_30d_over_5mm

这可以简单地应用于滚动数据集:

In [11]: meets_criteria = complex_condition(rolled)
In [12]: meets_criteria
Out[12]:
<xarray.DataArray 'pr' (time: 1096, lat: 10, lon: 211)>
array([[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]]])
Coordinates:
* lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
* lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
* time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2022-12-31

现在,我们可以用idxmax找到满足这些条件的第一个索引(确保屏蔽掉任何不满足条件的单元格):

In [13]: meets_criteria.idxmax(dim='time').where(meets_criteria.any(dim='time'))
Out[13]:
<xarray.DataArray 'time' (lat: 10, lon: 211)>
array([[                          'NaT',                           'NaT',
'NaT', ...,
'NaT',                           'NaT',
'2022-12-02T00:00:00.000000000'],
['2020-12-14T00:00:00.000000000',                           'NaT',
'2020-12-20T00:00:00.000000000', ...,
'NaT', '2021-09-22T00:00:00.000000000',
'2021-10-20T00:00:00.000000000'],
['2021-12-24T00:00:00.000000000',                           'NaT',
'2021-12-26T00:00:00.000000000', ...,
'NaT', '2022-12-18T00:00:00.000000000',
'NaT'],
...,
['2021-08-21T00:00:00.000000000',                           'NaT',
'NaT', ...,
'2021-08-06T00:00:00.000000000', '2020-11-07T00:00:00.000000000',
'2022-10-04T00:00:00.000000000'],
[                          'NaT', '2020-12-11T00:00:00.000000000',
'NaT', ...,
'2020-12-18T00:00:00.000000000', '2022-10-31T00:00:00.000000000',
'NaT'],
['2021-09-28T00:00:00.000000000', '2020-11-18T00:00:00.000000000',
'NaT', ...,
'2021-10-14T00:00:00.000000000',                           'NaT',
'NaT']], dtype='datetime64[ns]')
Coordinates:
* lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
* lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5

需要注意的一点是,滚动窗口在默认情况下将返回窗口末尾的索引。如果你想要窗口的开始,你可以用da.shift重新索引meets_criteria的结果。

你在这个问题中提到了很多其他的事情,但是对于一个问题来说,这是一个很大的范围。希望这能给你指明正确的方向!

另外,只是一个提示-当您绘制时间地图时,您将获得每个datetime对象的数字表示形式,其单位为nanoseconds since 1970,因此结果将是一个大得离谱的数字。如果您愿意,您可以使用每个datetime对象的dayofyear属性来获取一年中的日期,例如:

In [14]: (
...:     meets_criteria
...:     .groupby('time.year')
...:     .apply(lambda x: x.idxmax(dim='time').dt.dayofyear.where(x.any(dim='time')))
...: )
Out[14]:
<xarray.DataArray 'dayofyear' (year: 3, lat: 10, lon: 211)>
array([[[ nan,  nan,  nan, ...,  nan,  nan,  nan],
[349.,  nan, 355., ...,  nan,  nan,  nan],
[ nan,  nan,  nan, ...,  nan,  nan,  nan],
...,
[ nan,  nan,  nan, ...,  nan, 312.,  nan],
[ nan, 346.,  nan, ..., 353.,  nan,  nan],
[ nan, 323.,  nan, ...,  nan,  nan,  nan]],
[[ nan,  nan,  nan, ...,  nan,  nan,  nan],
[ nan,  nan,  nan, ...,  nan, 265., 293.],
[358.,  nan, 360., ...,  nan,  nan,  nan],
...,
[233.,  nan,  nan, ..., 218., 278.,  nan],
[ nan,  nan,  nan, ...,  nan,  nan,  nan],
[271.,  nan,  nan, ..., 287.,  nan,  nan]],
[[ nan,  nan,  nan, ...,  nan,  nan, 336.],
[ nan,  nan,  nan, ...,  nan,  nan,  nan],
[ nan,  nan, 305., ...,  nan, 352.,  nan],
...,
[217.,  nan,  nan, ...,  nan,  nan, 277.],
[ nan, 357.,  nan, ...,  nan, 304.,  nan],
[267., 314.,  nan, ...,  nan,  nan,  nan]]])
Coordinates:
* lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
* lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
* year     (year) int64 2020 2021 2022

相关内容

  • 没有找到相关文章