如何在Python中打乱列表中的数据帧块(不同大小)



下面是我想要实现的一些伪代码,我的问题在最后。我想在Python中的列表中打乱数据帧块(不同大小(。谢谢

设置一个伪字典:

dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

把字典变成数据框架:

dummy_df = pd.DataFrame(dummy)

创建具有所需大小的数据帧块:

blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
blocks

下面是"0"的输出;块";。它是一个列表中大小为1-4行的4个数据帧块:

[   ID Alphabet  Fruit
0   1        A  apple,    
ID Alphabet    Fruit
1   2        B   banana
2   3        C  coconut,    
ID Alphabet           Fruit
3   4        D            date
4   5        E  elephant apple
5   6        F          feijoa,    
ID Alphabet       Fruit
6   7        G       guava
7   8        H    honeydew
8   9        I    ita palm
9  10        J  jack fruit]

我陷入了困境。

我尝试了很多不同的东西,但总是出错。我想打乱列表中的数据帧块,然后将它们组合回一个数据帧。下面是一个混洗输出的例子。我该怎么做?

示例理想输出:

ID  Alphabet    Fruit
1   2   B   banana
2   3   C   coconut
0   1   A   apple
6   7   G   guava
7   8   H   honeydew
8   9   I   ita palm
9   10  J   jack fruit
3   4   D   date
4   5   E   elephant apple
5   6   F   feijoa

获得列表后,可以使用random.shuffle对块进行混洗。之后,您可以创建一个新的空数据帧,然后从(混洗的(列表中附加每个块。

试试这个代码:

import pandas as pd
import random
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}
dummy_df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
random.shuffle(blocks)  # shuffle blocks in list
dfs = pd.DataFrame()  # new empty dataframe
for b in blocks: # each block 
dfs = dfs.append(b) # add to dataframe

print(dfs)

输出

ID Alphabet           Fruit
3   4        D            date
4   5        E  elephant apple
5   6        F          feijoa
1   2        B          banana
2   3        C         coconut
6   7        G           guava
7   8        H        honeydew
8   9        I        ita palm
9  10        J      jack fruit
0   1        A           apple

您可以使用.sample(frac=1)直接在数据帧中混洗数据

blocks.append( df[start:end].sample(frac=1) )

稍后您可以使用df.append(list_of_df)一次加入所有dataframes

df = blocks[0].append(blocks[1:])

import pandas as pd
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
blocks.append(df[start:end].sample(frac=1))
start = end
#for item in blocks:
#    print(item)
df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)

其他洗牌方法:洗牌DataFrame行

医生:熊猫。DataFrame.sample


另一个想法是使用.sample(frac=1)只获得混洗索引

blocks += df[start:end].sample(frac=1).index.tolist()

random.shuffle()

indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes

然后使用这些索引创建新的DataFrame

df = df.iloc[blocks]

import pandas as pd
import random
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
#blocks += df[start:end].sample(frac=1).index.tolist()

indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes

start = end
#for item in blocks:
#    print(item)
df = df.iloc[blocks]
print(df)

最新更新