如何根据数据中对象的类型将大型数据集操作为较小的数据集



在我的代码中,用户输入一个文本文件。文本文件包含4列,行数将随加载的文本文件而变化,因此代码必须是通用的。从文本文件生成的数组的第一列包含一种动物类型,第二列是它在字段中的X位置,第三列是它的Y位置,第四列是字段中的动物Z位置。加载数据如果你不想按照链接到数据图片,这里是加载数据的代码和返回的数组的副本:

#load the data
emplaced_animals_data = np.genfromtxt('animal_data.txt', skip_header = 1, dtype = str)
print(type(emplaced_animals_data))
print(emplaced_animals_data)
[['butterfly' '1' '1' '3']
['butterfly' '2' '2' '3']
['butterfly' '3' '3' '3']
['dragonfly' '4' '1' '1']
['dragonfly' '5' '2' '1']
['dragonfly' '6' '3' '1']
['cat' '4' '4' '2']
['cat' '5' '5' '2']
['cat' '6' '6' '2']
['cat' '7' '8' '3']
['elephant' '8' '9' '3']
['elephant' '9' '10' '4']
['elephant' '10' '10' '4']
['camel' '10' '11' '5']
['camel' '11' '6' '5']
['camel' '12' '5' '6']
['camel' '12' '3' '6']
['bear' '13' '13' '7']
['bear' '5' '15' '7']
['bear' '4' '10' '5']
['bear' '6' '9' '2']
['bear' '15' '13' '1']
['dog' '1' '3' '9']
['dog' '2' '12' '8']
['dog' '3' '10' '1']
['dog' '4' '8' '1']]

数据加载后,数据中总会有两种动物我们不想知道,所以我从第一列中删除了这些动物的名称,但我不确定如何从整行中删除数据。我如何将数据的选择从动物类型扩展到其位置,并为不想要的动物删除它?我已经包含了一些图像来显示我目前所做的工作的输出。移除不需要的动物

#Removes unwanted animals from list
print('Original list:', emplaced_animals_data[:,0])
all_the_animals = list(emplaced_animals_data[:,0])
Butterfly = set('butterfly')
Dragonfly = set('dragonfly')
for i in range(0, len(emplaced_animals_data)):
for animal in all_the_animals:
if Butterfly == set(animal):
all_the_animals.remove(animal)
if Dragonfly == set(animal):
all_the_animals.remove(animal)
print('Updated list:', words)

接下来,我想取剩下的动物,将每只动物及其位置数据排序到自己的数组中,该数组将保存为一些变量,但目前我只能将动物类型排序到它们自己的数组。我该如何扩展我对动物的选择,以纳入它们的位置,并根据动物类型将动物和它们的位置保存到自己的阵列中?动物分组

#Groups all of the items with the same name together
setofanimals = set(all_the_animals)
animal_groups = {}
for one in setofanimals:
ids = [one for i in emplaced_animals_data[:,0] if i == one]
animal_groups.update({one:ids})
for one in animal_groups:
print(one, ":", animal_groups[one])

我的最终目标是能够绘制每种动物的每次出现,而不管加载的文本文件是什么。

这是我正在处理的数据,从Excel电子表格中复制,我将其保存为文本文件:

数据

以下函数应该可以实现这一点。您输入的txt可以是任意长度的,两个函数都会根据所述列表中包含的动物来删除或选择动物列表:

import numpy as np
# note that my delimiter is a tab, which might be different from yours
emplaced_animals = np.genfromtxt('animals.txt', skip_header=1, dtype=str, delimiter='   ')
listed_animals = ['cat', 'dog', 'bear', 'camel', 'elephant']
def get_specific_animals_from(list_of_all_animals, specific_animals):
"""get a list only containing rows of a specific animal"""
list_of_specific_animals = np.array([])
for specific_animal in specific_animals:
for animal in list_of_all_animals:
if animal[0] == specific_animal:
list_of_specific_animals = np.append(list_of_specific_animals, animal, 0)
return list_of_specific_animals
def delete_specific_animals_from(list_of_all_animals, bad_animals):
"""
delete all rows of bad_animal in provided list
takes in a list of bad animals e.g. ['dragonfly', 'butterfly']
returns list of only desired animals
"""
all_useful_animals = list_of_all_animals
positions_of_bad_animals = []
for n, animal in enumerate(list_of_all_animals):
if animal[0] in bad_animals:
positions_of_bad_animals.append(n)
if len(positions_of_bad_animals):
for position in sorted(positions_of_bad_animals, reverse=True):
# reverse is important
# without it, list positions change as you delete items
all_useful_animals = np.delete(all_useful_animals, (position), 0)
return all_useful_animals
emplaced_animals = delete_specific_animals_from(emplaced_animals, ['dragonfly', 'butterfly'])
list_of_elephants = get_specific_animals_from(emplaced_animals, ['elephant'])
list_of_needed_animals = get_specific_animals_from(emplaced_animals, listed_animals)

我不知道这是否正是你想要的,但请看一看。首先,对于您的评论,您可能必须将分隔符更改为","或";"。该代码经过测试,与逗号分隔的文本文件配合使用效果良好

输入(.txt(:

Animals,Xlocation,Ylocation,Zlocation
butterfly,1,1,3
butterfly,2,2,3
butterfly,3,3,3
dragonfly,4,1,1
dragonfly,5,2,1
dragonfly,6,3,1
cat,4,4,2
cat,5,5,2
cat,6,6,2
cat,7,8,3
elephant,8,9,3
elephant,9,10,4
elephant,10,10,4
camel,10,11,5
camel,11,6,5
camel,12,5,6
camel,12,3,6
bear,13,13,7
bear,5,15,7
bear,4,10,5
bear,6,9,2
bear,15,13,1
dog,1,3,9
dog,2,12,8
dog,3,10,1
dog,4,8,1

代码:

def main():
result = readFile("C:\Users\Desktop\animals.txt")
# Array of animals to remove from main list
to_remove = ["butterfly", "dragonfly"]
# returns a new list with all rows except the 'to_remove animals'
useful_animals = [one for one in result if one["Animals"] not in to_remove]
cats = get_animal_group(useful_animals, "cat")
camels = get_animal_group(useful_animals, "camel")
# returns a new list with all rows where animals_list match given animal
def get_animal_group(animal_list, animal):
return [one for one in animal_list if one["Animals"] == animal]
def readFile(path):
# From this you get a list of dict which is much easier to handle
result = pandas.read_csv(path, encoding="utf-8",
usecols=["Animals", "Xlocation", "Ylocation", "Zlocation"]).to_dict("records")
return result

输出:

# for animal in useful_animals:
{'Animals': 'cat', 'Xlocation': 4, 'Ylocation': 4, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 5, 'Ylocation': 5, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 6, 'Ylocation': 6, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 7, 'Ylocation': 8, 'Zlocation': 3.0}
{'Animals': 'elephant', 'Xlocation': 8, 'Ylocation': 9, 'Zlocation': 3.0}
{'Animals': 'elephant', 'Xlocation': 9, 'Ylocation': 10, 'Zlocation': 4.0}
{'Animals': 'elephant', 'Xlocation': 10, 'Ylocation': 10, 'Zlocation': 4.0}
{'Animals': 'camel', 'Xlocation': 10, 'Ylocation': 11, 'Zlocation': 5.0}
{'Animals': 'camel', 'Xlocation': 11, 'Ylocation': 6, 'Zlocation': 5.0}
{'Animals': 'camel', 'Xlocation': 12, 'Ylocation': 5, 'Zlocation': 6.0}
{'Animals': 'camel', 'Xlocation': 12, 'Ylocation': 3, 'Zlocation': 6.0}
{'Animals': 'bear', 'Xlocation': 13, 'Ylocation': 13, 'Zlocation': 7.0}
{'Animals': 'bear', 'Xlocation': 5, 'Ylocation': 15, 'Zlocation': 7.0}
{'Animals': 'bear', 'Xlocation': 4, 'Ylocation': 10, 'Zlocation': 5.0}
{'Animals': 'bear', 'Xlocation': 6, 'Ylocation': 9, 'Zlocation': 2.0}
{'Animals': 'bear', 'Xlocation': 15, 'Ylocation': 13, 'Zlocation': 1.0}
{'Animals': 'dog', 'Xlocation': 1, 'Ylocation': 3, 'Zlocation': 9.0}
{'Animals': 'dog', 'Xlocation': 2, 'Ylocation': 12, 'Zlocation': 8.0}
{'Animals': 'dog', 'Xlocation': 3, 'Ylocation': 10, 'Zlocation': 1.0}
{'Animals': 'dog', 'Xlocation': 4, 'Ylocation': 8, 'Zlocation': 1.0}
# for cat in cats:
{'Animals': 'cat', 'Xlocation': 4, 'Ylocation': 4, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 5, 'Ylocation': 5, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 6, 'Ylocation': 6, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 7, 'Ylocation': 8, 'Zlocation': 3.0}

如果您还有其他问题,请随时询问(评论(

问候

最新更新