我们有一个熊猫数据框架如下:
Unnamed:0 T1 T2 T3 ... T120
0 cheetah Running x1 x2 x1 ... x3
1 Running Jaguar x1 x10 x3 ... x7
2 Cougar Running x1 x2 x1 ... x3
3 Bengal Tiger Running x5 x2 x4 ... x17
4 Sleeping Bali Tiger x55 x61 x11 ... x31
5 Javan Leopard Sleeping x42 x67 x17 ... x34
6 Leopard Running x2 x5 x2 ... x3
7 Bengal Tiger Running x5 x2 x4 ... x17
.. ... ... ... ... ... ...
199 Florida Panther Eating x71 x80 x101 ... x94
200 Running Eastern Cougar x5 x1 x2 ... x3
201 Congo Lion Sleeping x57 x61 x14 ... x38
我们正尝试按如下方式重组此数据框架。在上面的数据框架中,第一列是未命名的。我们尝试在未命名的列中检查已知的常见行为,如"运行"、"睡眠"等,并尝试按如下方式重新排列数据帧。
Unnamed:0 T1 T2 T3 ... T120
0 cheetah Running x1 x2 x1 ... x3
1 Running Jaguar x1 x10 x3 ... x7
2 Cougar Running x1 x2 x1 ... x3
3 Bengal Tiger Running x5 x2 x4 ... x17
4 Running Eastern Cougar x5 x1 x2 ... x3
5 Bengal Tiger Running x5 x2 x4 ... x17
6 Leopard Running x2 x5 x2 ... x3
4 Sleeping Bali Tiger x55 x61 x11 ... x31
5 Javan Leopard Sleeping x42 x67 x17 ... x34
6 Congo Lion Sleeping x57 x61 x14 ... x38
.. ... ... ... ... ... ...
201 Florida Panther Eating x71 x80 x101 ... x94
我尝试用下面的方法,但是我给列添加了一个名称。我试过df[df.columns.str.contains('^Unnamed')]
,但运气不好。
import pandas as pd
df = pd.read_csv('a_behav_cat.csv')
df_new = pd.DataFrame()
df_new = df_new.append(df[df["name"].str.contains("Running")])
df_new = df_new.append(df[df["name"].str.contains("Sleeping")])
print(df_new.to_string())
问题1:我认为应该有一个更好的或python的方法来做这件事。感谢您对此事的关照。此外,这将检查字符串的精确匹配,这不是理想的,因为数据集可能有简单的"运行"。和简单的"睡觉";:)等。我尝试了.lower()
功能,但没有工作。
:这样做的目的是为单个观察确定有多少个不同的x类别。这里T1, T2, T3,…T120是观测值。我们需要确定每个观测值有多少个共同值。即对于T1,对于Running"有3个'x1'和3个'x5'和1个'x2'
要做到这一点,我们首先考虑重新排列数据帧,如上所述。
然而,我们不确定这种重新排列是否为目的所必需。此外,输出看起来是多维的。这是T1,也就是"跑步"有多少x1 x3 x5需要存储。同样,这也需要应用到其他行为上,比如"吃饭"、"睡觉"等等。
问题2:实现这一目标的最佳方法是什么?有适合这个目的的数据结构吗?有没有更好的方法来实现上述目的而不重新安排数据框架?
如果你想做测试,这里有一个示例csv。
,T1,T2,T3,T4
cheetah Running,x1,x2,x1,x3
Running Jaguar,x1,x10,x3,x7
Cougar Running,x1,x2,x1,x3
Bengal Tiger Running,x5,x2,x4,x17
Sleeping Bali Tiger,x55,x61,x11,x31
Javan Leopard Sleeping,x42,x67,x17,x34
Leopard Running,x2,x5,x2,x3
Bengal Tiger Running,x5,x2,x4,x17
Florida Panther Eating,x71,x80,x101,x94
Running Eastern Cougar,x5,x1,x2,x3
Congo Lion Sleeping,x57,x61,x14,x38
iuc,您可以使用字典映射将正确的类别设置为行:
# Your list of patterns
MAPPING = {'S': ['sleep', 'sleeping'],
'R': ['run', 'running'],
'E': ['eat', 'eating']}
# Reverse the mapping (swap keys and values)
rev = {v: k for k, l in MAPPING.items() for v in l}
# Create the regex pattern
pat = fr"b({'|'.join(rev)})b"
# Extract from data
df['CAT'] = df['Unnamed: 0'].str.lower().str.extract(pat, expand=False).map(rev)
输出:
>>> df
Unnamed: 0 T1 T2 T3 T4 CAT
0 cheetah Running x1 x2 x1 x3 R
1 Running Jaguar x1 x10 x3 x7 R
2 Cougar Running x1 x2 x1 x3 R
3 Bengal Tiger Running x5 x2 x4 x17 R
4 Sleeping Bali Tiger x55 x61 x11 x31 S
5 Javan Leopard Sleeping x42 x67 x17 x34 S
6 Leopard Running x2 x5 x2 x3 R
7 Bengal Tiger Running x5 x2 x4 x17 R
8 Florida Panther Eating x71 x80 x101 x94 E
9 Running Eastern Cougar x5 x1 x2 x3 R
10 Congo Lion Sleeping x57 x61 x14 x38 S