我有四个numpy数组,每个数组有5个值。我需要将它们组合成一个数据框架,以便我可以运行ANOVA测试和Tukey诚实显著差异测试。
数组是:
low = np.array([59.5, 53.3, 56.8, 63.1, 58.7]) # 1.6 nmhos/cm
med = np.array([55.2, 59.1, 52.8, 54.5, np.nan]) # 3.8
medh = np.array([51.7, 48.8, 53.9, 49.0, np.nan]) # 6.0
high = np.array([44.6, 48.5, 41.0, 47.3, 46.1]) # 10.2
和我需要将它们组合成一个数据帧,打印时将产生以下内容:
Yield EC
0 59.5 Low
1 53.3 Low
2 56.8 Low
3 63.1 Low
4 58.7 Low
5 55.2 Med
6 59.1 Med
7 52.8 Med
8 54.5 Med
9 NaN Med
10 51.7 Medh
11 48.8 Medh
12 53.9 Medh
13 49.0 Medh
14 NaN Medh
15 44.6 high
16 48.5 high
17 41.0 high
18 47.3 high
19 46.1 high
实现这一目标的最佳方法是什么?我曾尝试组合成一个numpy数组并将其传递到一个数据帧,但我得到错误信息"必须传递二维输入">
data_vals = np.array([[low],[med],[medh],[high]])
tomato_df = pd.DataFrame(data = data_vals)
一种方法是使用嵌套的for循环:
res = (
pd.DataFrame([[v, name] for arr, name in zip([low, med, medh, high], ["Low", "Med", "Medh", "High"]) for v in arr],
columns=["Yield", "EC"]))
print(res)
Yield EC
0 59.5 Low
1 53.3 Low
2 56.8 Low
3 63.1 Low
4 58.7 Low
5 55.2 Med
6 59.1 Med
7 52.8 Med
8 54.5 Med
9 NaN Med
10 51.7 Medh
11 48.8 Medh
12 53.9 Medh
13 49.0 Medh
14 NaN Medh
15 44.6 High
16 48.5 High
17 41.0 High
18 47.3 High
19 46.1 High
您需要将它们转换为dataframe,然后append
:
df_low = pd.DataFrame(low)
df_low['EC'] = 'Low'
df_med = pd.DataFrame(med)
df_med['EC'] = 'Med'
df_medh = pd.DataFrame(medh)
df_medh['EC'] = 'Medh'
df_high = pd.DataFrame(high)
df_high['EC'] = 'High'
df = df_low.append([df_med,df_medh, df_high])
df.rename(columns={ df.columns[0]: 'yield'}, inplace = True)
df
yield EC
0 59.5 Low
1 53.3 Low
2 56.8 Low
3 63.1 Low
4 58.7 Low
0 55.2 Med
1 59.1 Med
2 52.8 Med
3 54.5 Med
4 NaN Med
0 51.7 Medh
1 48.8 Medh
2 53.9 Medh
3 49.0 Medh
4 NaN Medh
0 44.6 High
1 48.5 High
2 41.0 High
3 47.3 High
4 46.1 High