我目前在xarray中有一个4D数据集ds
,看起来像这样:
<xarray.Dataset>
Dimensions: (lat: 60, lon: 78, time: 216, pres: 395)
Coordinates:
* lat (lat) float32 0.5 1.5 2.5 3.5 4.5 5.5 ... 55.5 56.5 57.5 58.5 59.5
* lon (lon) float32 -45.5 -44.5 -43.5 -42.5 ... -69.5 -75.5 -74.5 -76.5
* time (time) float32 7.32e+05 7.32e+05 7.32e+05 ... 7.385e+05 7.385e+05
* pres (pres) float64 2.5 7.5 12.5 17.5 ... 1.962e+03 1.968e+03 1.972e+03
Data variables:
var (pres, lat, lon, time) float64 2.03e+03 2.03e+03 ... nan nan>
我的目标是把它变成一个熊猫df,看起来像这样:
id time pres param 20.5-70.5 20.5-71.5 20.5-72.5
0 0 0 var 2085 2073 2057
1 0 1 var 2114 2156 2054
2 0 2 var 2039 2006 2179
3 1 0 var 2199 2144 2033
4 1 1 var 2056 2102 2191
5 1 2 var 2062 2033 2052
6 2 0 var 2001 2153 2170
7 2 1 var 2187 2120 2100
8 2 2 var 2138 2076 2002
,其中我有某种time
和pres
的多索引,param
列(因为我可能一次有多个变量),每个像素(因此成对laton)作为列标题,因此对于每个像素列,我有var
值对应于time
和pres
。我在下一部分的分析中需要这种格式,其中包括一些矢量化。
我尝试了一些事情,包括stacked = ds.stack(coordinates=["lat", "lon"])
,我认为这是我想做的事情的开始,然后做stacked.to_dataframe()
,但后来后者解开了我的配对坐标。我想我遗漏了一些东西,但我不太确定该怎么做?
任何帮助都非常感谢!
感谢(由于您没有提供示例,您应该改编此示例)
使用stack/unstack
重塑你的数据集:
import xarray as xr
import pandas as pd
ds = xr.tutorial.load_dataset('air_temperature')
df = ds.to_dataframe().rename_axis(columns='param').stack('param').unstack(['lat', 'lon'])
df.columns = [f"{lat}-{lon}" for lat, lon in df.columns]
输出:
>>> df
75.0-200.0 75.0-202.5 75.0-205.0 75.0-207.5 ... 15.0-322.5 15.0-325.0 15.0-327.5 15.0-330.0
time param ...
2013-01-01 00:00:00 air 241.199997 242.500000 243.500000 244.000000 ... 297.600006 296.899994 296.790009 296.600006
2013-01-01 06:00:00 air 242.099991 242.699997 243.099991 243.389999 ... 296.899994 296.399994 296.399994 296.600006
2013-01-01 12:00:00 air 242.299988 242.199997 242.299988 242.500000 ... 297.600006 297.000000 297.000000 296.790009
2013-01-01 18:00:00 air 241.889999 241.799988 241.799988 242.099991 ... 298.199982 297.790009 298.000000 297.899994
2013-01-02 00:00:00 air 243.199997 243.099991 243.099991 243.299988 ... 297.699982 297.100006 297.399994 297.399994
... ... ... ... ... ... ... ... ... ...
2014-12-30 18:00:00 air 243.089996 243.389999 243.689987 243.789993 ... 297.989990 297.389984 296.889984 296.089996
2014-12-31 00:00:00 air 242.489990 242.389999 242.189987 241.689987 ... 297.290009 296.589996 295.989990 295.489990
2014-12-31 06:00:00 air 243.489990 242.989990 242.089996 240.689987 ... 297.089996 296.089996 295.790009 295.790009
2014-12-31 12:00:00 air 245.789993 244.789993 243.489990 241.889999 ... 296.589996 295.690002 295.489990 295.190002
2014-12-31 18:00:00 air 245.089996 244.289993 243.289993 242.189987 ... 297.190002 296.489990 296.190002 295.690002
[2920 rows x 1325 columns]
>>> ds
<xarray.Dataset>
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.5 296.2 295.7