我正在尝试从文本文件中访问数据,并应用正常测试,置信区间,方差分析测试等。
是否有一种更简单的方法来使用我的数据中的熊猫创建条件阵列,而无需手动键入36行代码,就像我在下面完成的36行代码?
以后我需要在这些包装中访问不同的口味,因此我需要进行大约7次的配方。
revels_data = pd.read_csv("revels2.txt")
rd = revels_data
# packet sums
total_1 = (rd.loc[rd["Packet number"] == 1, "Contents"].sum())
total_2 = (rd.loc[rd["Packet number"] == 2, "Contents"].sum())
total_3 = (rd.loc[rd["Packet number"] == 3, "Contents"].sum())
total_4 = (rd.loc[rd["Packet number"] == 4, "Contents"].sum())
total_5 = (rd.loc[rd["Packet number"] == 5, "Contents"].sum())
total_6 = (rd.loc[rd["Packet number"] == 6, "Contents"].sum())
total_7 = (rd.loc[rd["Packet number"] == 7, "Contents"].sum())
total_8 = (rd.loc[rd["Packet number"] == 8, "Contents"].sum())
total_9 = (rd.loc[rd["Packet number"] == 9, "Contents"].sum())
total_10 = (rd.loc[rd["Packet number"] == 10, "Contents"].sum())
total_11 = (rd.loc[rd["Packet number"] == 11, "Contents"].sum())
total_12 = (rd.loc[rd["Packet number"] == 12, "Contents"].sum())
total_13 = (rd.loc[rd["Packet number"] == 13, "Contents"].sum())
total_14 = (rd.loc[rd["Packet number"] == 14, "Contents"].sum())
total_15 = (rd.loc[rd["Packet number"] == 15, "Contents"].sum())
total_16 = (rd.loc[rd["Packet number"] == 16, "Contents"].sum())
total_17 = (rd.loc[rd["Packet number"] == 17, "Contents"].sum())
total_18 = (rd.loc[rd["Packet number"] == 18, "Contents"].sum())
total_19 = (rd.loc[rd["Packet number"] == 19, "Contents"].sum())
total_20 = (rd.loc[rd["Packet number"] == 20, "Contents"].sum())
total_21 = (rd.loc[rd["Packet number"] == 21, "Contents"].sum())
total_22 = (rd.loc[rd["Packet number"] == 22, "Contents"].sum())
total_23 = (rd.loc[rd["Packet number"] == 23, "Contents"].sum())
total_24 = (rd.loc[rd["Packet number"] == 24, "Contents"].sum())
total_25 = (rd.loc[rd["Packet number"] == 25, "Contents"].sum())
total_26 = (rd.loc[rd["Packet number"] == 26, "Contents"].sum())
total_27 = (rd.loc[rd["Packet number"] == 27, "Contents"].sum())
total_28 = (rd.loc[rd["Packet number"] == 28, "Contents"].sum())
total_29 = (rd.loc[rd["Packet number"] == 29, "Contents"].sum())
total_30 = (rd.loc[rd["Packet number"] == 30, "Contents"].sum())
total_31 = (rd.loc[rd["Packet number"] == 31, "Contents"].sum())
total_32 = (rd.loc[rd["Packet number"] == 32, "Contents"].sum())
total_33 = (rd.loc[rd["Packet number"] == 33, "Contents"].sum())
total_34 = (rd.loc[rd["Packet number"] == 34, "Contents"].sum())
total_35 = (rd.loc[rd["Packet number"] == 35, "Contents"].sum())
total_36 = (rd.loc[rd["Packet number"] == 36, "Contents"].sum())
# create total array
a = np.array([total_1, total_2, total_3, total_4, total_5, total_6, total_7,
total_8, total_9, total_10, total_11, total_12, total_13, total_14, total_15,
total_16, total_17, total_18, total_19, total_20, total_21, total_22, total_23,
total_24, total_25, total_26, total_27, total_28, total_29, total_30, total_31,
total_32, total_33, total_34, total_35, total_36])
# mean confidence interval
print(st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a)))
谢谢!
编辑:
数据集看起来像:
Packet number,Flavour,Contents
1,orange,4
2,orange,3
3,orange,2
4,orange,4
5,orange,3
...
36,orange,3
1,toffee,4
2,toffee,3
...
1,chocolate,5
...
等。
所需的数据:
对于每种风味类型,我都需要一个阵列/内容列表进行分析,即
橙色:
4
3
2
4
...
因此,我可以对这些新创建的数组进行各种测试
iiuc您可以执行以下操作。
如果您在Packet number
列中只有36个不同的值(从1
到36
(:
a = rd.groupby('Packet number')['Contents'].sum()
如果您有更多,并且想先过滤它们:
a = rd[rd['Packet number'].between(1, 36)].groupby('Packet number')['Contents'].sum()
更新:
源DF
In [233]: df
Out[233]:
Packet number Flavour Contents
0 1 orange 4
1 2 orange 3
2 3 orange 2
3 4 orange 4
4 5 orange 3
5 36 orange 3
6 1 toffee 4
7 2 toffee 3
8 1 chocolate 5
简单布尔索引
In [234]: df.loc[df.Flavour == 'orange', 'Contents']
Out[234]:
0 4
1 3
2 2
3 4
4 3
5 3
Name: Contents, dtype: int64
...加总和
In [235]: df.loc[df.Flavour == 'orange', 'Contents'].sum()
Out[235]: 19
滤波器,groupby,聚合
In [237]: df.loc[df.Flavour.isin(['orange','toffee'])].groupby('Flavour')['Contents'].sum()
Out[237]:
Flavour
orange 19
toffee 7
Name: Contents, dtype: int64