我有一个任务是根据特定的条件创建日期集,例如"大于2"将被传递,我需要创建一个月中所有日期> 2的日期集。我还将得到一个开始时间和一个停止时间,例如上午10点到下午6点,在这种情况下,我将创建一组所有日期> 2,并且每天都有一个时间,从上午10点开始,到下午6点结束,下面是一个例子:
greater > 2 less < 9
start time :10am
stop time :6 pm
month:july
date1: 2016-07-03 10:00, 2016-07-03 16:00
date2: 2016-07-04 10:00, 2016-07-04 16:00
date3: 2016-07-05 10:00, 2016-07-05 16:00
.
.
.
date6: 2016-07-8 10:00, 2016-07-8 16:00
我决定像这样将这些日期存储到字典中:
dictD = {'dates_between_2_9':[[2016-07-03 10:00, 2016-07-03 16:00], [2016-07-04 10:00, 2016-07-04 16:00], ....., [2016-07-08 10:00, 2016-07-08 16:00]]}
我使用字典是因为我将有多个条件,我需要为它们创建日期集,所以将有另一个键,而不是dates_between_2_5。
另一方面,我也得到了另一个基于条件的请求,创建具有开始时间的日期,如下所示:greater > 1 less than 12
start time : 2pm
date1: 2016-07-02 14:00
date2: 2016-07-03 14:00
date3: 2016-07-04 14:00
.
.
.
date10: 2016-07-11 14:00
我决定将这些日期存储在一个列表中:
listL = [2016-07-02 14:00,2016-07-03 14:00,2016-07-04 14:00 ... 2016-07-11 14:00]
之后,我将ListL中的每个日期与dicd中每个键的日期列表进行比较,如果ListL中的日期位于开始,停止时间内,那么我应该将其从列表中删除,并仅返回ListL中不与dicd中的日期重叠的日期,我的逻辑如下:
for L from ListL:
for every key in DictD:
for item from DictD[key]:
if DictD[key][0] < L < DictD[key][1] # check if item from list overlap with start,stop time from dictionary.
ListL.remove(L) # I know I can't remove items from list while iterating so I will probably create a set and store all overlapped items and then subtract this set to set(ListL) to get the difference.
return ListL
我的问题是,我是否使用有效的数据结构来处理我的需求?我看到我的逻辑不是那么有效,所以我想知道是否有更好的方法来解决这个问题?
任何帮助都将非常感激。提前感谢!
听起来你在尝试优化你的算法。老实说,对于这种大小的数据,可能没有必要这样做。然而,如果你感兴趣的话,一般的经验法则是,在Python中,当检查成员时,集合比列表要快。
在这种情况下,不清楚您的集合可能是什么。我假设您最多有一分钟级别的粒度,但您可以更低(用于更多内存),或者通过使用更大的粒度(例如小时)来提高占用率和性能。这段代码显示,即使相对较大的数据集也可以至少快5倍(并且在比较数据集时看起来更简单):
from copy import copy
from datetime import datetime, timedelta
from timeit import timeit
import time
def make_range(start, open, close, days):
result = []
base_start = start + open
base_close = start + close
while days > 0:
result.append([base_start, base_close])
base_start += timedelta(days=1)
base_close += timedelta(days=1)
days -= 1
return result
def make_range2(start, open, close, days):
result = set()
base_start = start + open
base_close = start + close
while days > 0:
now = base_start
while now <= base_close:
result.add(now)
now += timedelta(minutes=1)
base_start += timedelta(days=1)
base_close += timedelta(days=1)
days -= 1
return result
dateRange = {
'range1': make_range(datetime(2016, 7, 3, 0, 0),
timedelta(hours=10),
timedelta(hours=18),
6),
}
dateRange2 = {
'range1': make_range2(datetime(2016, 7, 3, 0, 0),
timedelta(hours=10),
timedelta(hours=18),
6),
}
dateList = [
datetime(2016, 7, 2, 14, 0),
datetime(2016, 7, 3, 14, 0),
datetime(2016, 7, 4, 14, 0),
datetime(2016, 7, 5, 14, 0),
datetime(2016, 7, 6, 14, 0),
datetime(2016, 7, 7, 14, 0),
datetime(2016, 7, 8, 14, 0),
datetime(2016, 7, 9, 14, 0),
datetime(2016, 7, 10, 14, 0),
datetime(2016, 7, 11, 14, 0)
]
dateSet = set(dateList)
def f1():
result = copy(dateList)
for a in dateList:
for b in dateRange:
for i in dateRange[b]:
if i[0] <= a <= i[1]:
result.remove(a)
return result
def f2():
result = copy(dateSet)
for b in dateRange2:
result = result.difference(dateRange2[b])
return result
print(f1())
print(timeit("f1()", "from __main__ import f1", number=100000))
print(f2())
print(timeit("f2()", "from __main__ import f2", number=100000))
作为记录,结果如下:
[datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)]
1.922587754837455
{datetime.datetime(2016, 7, 2, 14, 0), datetime.datetime(2016, 7, 9, 14, 0), datetime.datetime(2016, 7, 10, 14, 0), datetime.datetime(2016, 7, 11, 14, 0)}
0.30558400587733225
您也可以将字典dateRange转换为一个列表,但是只有1或2个成员,这不太可能在性能上产生任何真正的差异。然而,它在逻辑上更有意义,因为您实际上并没有使用字典查找任何特定的键值—您只是遍历所有值。
坦率地说,我不确定我是否理解你的问题是什么,我试了这样做:
for date in dateList:
for everyrange in dateRange:
find=False
for i in dateRange[everyrange]:
#print('date={date} ,key={everyrange},i={i}'.format(date=date, everyrange=everyrange,i=i))
if i[0] <= date <= i[1]:
print(date)
find=True
break
else:
print(0)
if find:
break
我不确定我完全理解你的问题,但我假设你想从'dateRange' dic中特定范围之间的'dateList'列表中找到日期。
我试着根据你的逻辑来构建我的代码。这应该可以工作:
for date in dateList:
for key,value in dateRange.items():
for i in range(0,len(value)):
if date>=value[i][0] and date<=value[i][1]:
print('The date:',date,'lies between the data points:',value[i][0],'and',value[i][1],'in',key)
在你的数据中,dateRange dic包含键('range')和值,它们是2个日期时间对象的列表。使用我提供的代码,dateRange dic可以有任意多的键,每个键的值可以包含任意多的datetime对象列表。
我尝试了这个例子,根据您的需求和工作良好=)。这个算法和你发布的那个很相似,唯一的区别是算法的结尾。我选择创建一个新列表,它将在您正在构建的函数中返回。
代码如下:
list_1 = ['a 1', 'a 2', 'a 3', 'a 4', 'a 5', 'b 1', 'b 2', 'b 3', 'b 4', 'b 5', 'c 1', 'c 2', 'c 3', 'c 4', 'c 5']
dict = {'example_between_2_5': [['a 3', 'a 4'], ['b 3', 'b 4'], ['c 3', 'c 4']]}
new_list = []
# Defining the number of repetitions based on how many 'lists' inside the dict you have.
for x in range(0, len(dict['example_between_2_5'])):
dict_list_elements = dict['example_between_2_5'][x]
# Defining the number of repetitions based on the elements inside the list of the dict.
for y in range(0, len(dict_list_elements)):
#Picking the element
dict_list_element = dict_list_elements[y]
for z in range(0, len(list_1)):
#Comparing to all elements in list_1
if dict_list_element == list_1[z]:
#The element will be append if doesn't exist in the new list
if list_1[z] not in new_list:
new_list.append(list_1[z])
#Printing the result just to check if it worked.
print("list_1: ", list_1)
print("New_list: ", new_list)
希望能有所帮助=)
我仍然不完全确定你想要实现什么,但请看看这段代码并告诉我这是否是你想要的。
还可以输入月份
名为list1的列表相当于您的字典dictD。
名为list2的列表相当于列表listL。这只包含那些不与list1(dictD)中的日期重叠的日期。
代码如下:
from datetime import datetime
#Converts 12-hour(am/pm) to 24-hour format
def get_time(time):
digit = int(time[0:-2])
if time[-2:] == 'am':
return digit
else:
return digit+12
month_number = {
'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6,
'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12
}
gt1 = input('Enter first setngreater > ')
lt1 = input('less < ')
start1 = raw_input('start time: ')
stop1 = raw_input('stop time: ')
month1 = raw_input('month: ')
gt2 = input('nEnter second setngreater > ')
lt2 = input('less < ')
start2 = raw_input('start time: ')
month2 = raw_input('month: ')
list1 = []
list2 = []
today = datetime.today()
start1 = get_time(start1)
stop1 = get_time(stop1)
start2 = get_time(start2)
key = 'dates_between_%s_%s'%(gt1, gt2)
for i in range(gt1+1, lt1):
list1.append(
[
datetime(today.year, month_number[month1], i, start1, 0).strftime("%Y-%m-%d %H:%M"),
datetime(today.year, month_number[month1], i, stop1, 0).strftime("%Y-%m-%d %H:%M")
]
)
for i in range(gt2+1, lt2):
if (month1 == month2) and (gt1 < i < lt1) and (start1 < start2 < stop1):
pass
else:
list2.append(datetime(today.year, month_number[month2], i, start2, 0).strftime("%Y-%m-%d %H:%M"))
print 'List1:n',list1
print 'nList2:n',list2