下面是我想要实现的一些伪代码,我的问题在最后。我想在Python中的列表中打乱数据帧块(不同大小(。谢谢
设置一个伪字典:
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}
把字典变成数据框架:
dummy_df = pd.DataFrame(dummy)
创建具有所需大小的数据帧块:
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
blocks
下面是"0"的输出;块";。它是一个列表中大小为1-4行的4个数据帧块:
[ ID Alphabet Fruit
0 1 A apple,
ID Alphabet Fruit
1 2 B banana
2 3 C coconut,
ID Alphabet Fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa,
ID Alphabet Fruit
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit]
我陷入了困境。
我尝试了很多不同的东西,但总是出错。我想打乱列表中的数据帧块,然后将它们组合回一个数据帧。下面是一个混洗输出的例子。我该怎么做?
示例理想输出:
ID Alphabet Fruit
1 2 B banana
2 3 C coconut
0 1 A apple
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa
获得列表后,可以使用random.shuffle
对块进行混洗。之后,您可以创建一个新的空数据帧,然后从(混洗的(列表中附加每个块。
试试这个代码:
import pandas as pd
import random
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}
dummy_df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
random.shuffle(blocks) # shuffle blocks in list
dfs = pd.DataFrame() # new empty dataframe
for b in blocks: # each block
dfs = dfs.append(b) # add to dataframe
print(dfs)
输出
ID Alphabet Fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa
1 2 B banana
2 3 C coconut
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit
0 1 A apple
您可以使用.sample(frac=1)
直接在数据帧中混洗数据
blocks.append( df[start:end].sample(frac=1) )
稍后您可以使用df.append(list_of_df)
一次加入所有dataframes
。
df = blocks[0].append(blocks[1:])
import pandas as pd
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
blocks.append(df[start:end].sample(frac=1))
start = end
#for item in blocks:
# print(item)
df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)
其他洗牌方法:洗牌DataFrame行
医生:熊猫。DataFrame.sample
另一个想法是使用.sample(frac=1)
只获得混洗索引
blocks += df[start:end].sample(frac=1).index.tolist()
或random.shuffle()
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes
然后使用这些索引创建新的DataFrame
df = df.iloc[blocks]
import pandas as pd
import random
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
#blocks += df[start:end].sample(frac=1).index.tolist()
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes
start = end
#for item in blocks:
# print(item)
df = df.iloc[blocks]
print(df)