Django -绕过应用程序中的加载代码.进行迁移或迁移到数据库时的准备功能



对于初学者来说,我之前已经搜索过这个,没有运气找到关于这个的东西。

背景:

我已经为工作创建了一个库存应用程序,以帮助我的团队能够快速查看我们IT基础设施的统计数据。我在应用程序加载时启动了一些线程,以启动一些抓取功能。这段代码工作得很好,但是每当我创建和应用数据库迁移(manage.py makemigmigrations &jobs迁移)。

目标:

我想只在发出runserver命令(manage.py runserver)时才启动抓取代码。这样,我就没有资源在迁移活动和抓取活动之间竞争。它也经常产生很多错误,因为有时不是所有的数据库模型/字段都存在于数据库中。

的想法:

  1. 修改django存储库中的代码,以引入一个标记,以便在运行抓取代码之前进行检查。不推荐当我更新django时,它将被覆盖,并且它不会在我的dev服务器和prod服务器之间持续存在。

  2. 找到一种方法来检查哪个命令正在使用manage.py运行,并引入一个检查,只有当该命令运行时才开始抓取。推荐它留在我的代码库中,可以很容易地在dev和prod实例之间移动。

我愿意接受其他方法来完成这个任务。如果有一种不同的方法在启动应用程序时启动抓取活动,那么这种方法也可能有效。的应用程序。ready函数是我能找到的唯一能在应用程序启动时运行的东西。

编辑:以下是apps.ready()函数的内部内容:

def ready(self):
set_default_database_items()
if environment == "prod":
from .threading.scraping import TimerScrape
from .threading.keep_alive import KeepAliveThread
TimerScrape()
KeepAliveThread(1)
KeepAliveThread(2)

下面是TimerScrape()线程:

def run(self):
sleep(60)
while True:
idle = True
vcenters = Vcenter.objects.all()
connection.close()
netapps = StorageSystem.objects.all()
connection.close()
rubriks = BackupSystem.objects.all()
connection.close()
current_time = datetime.now(timezone.utc)
# get list of current threading and their names
threads = enumerate()
thread_list = []
for thread in threads:
thread_list.append(thread.name)
# go through each vCenter and start scrape
for vc in vcenters:
thread_name = vc.name + "_thread"
if thread_name not in thread_list:
if vc.last_updated is None:
self.vcscrape(vc.name, vc.user, vc.password)
elif vc.last_updated is not None:
time_difference = current_time - vc.last_updated
if time_difference.seconds > 14400:
self.vcscrape(vc.name, vc.user, vc.password)
else:
print("vCenters: Too soon to update vCenter " + vc.name)
else:
idle = False
print("vCenter " + vc.name + " update is in progress")
# go through each NetApp and start scrape
for cluster in netapps:
thread_name = cluster.name + "_thread"
if thread_name not in thread_list:
if cluster.last_updated is None:
self.netappscrape(cluster.name, cluster.user, cluster.password)
elif cluster.last_updated is not None:
time_difference = current_time - cluster.last_updated
if time_difference.seconds > 14400:
self.netappscrape(cluster.name, cluster.user, cluster.password)
else:
print("Clusters: Too soon to update Cluster " + cluster.name)
else:
idle = False
print("Cluster " + cluster.name + " update is in progress")
# go through each Rubrik and start scrape
for cluster in rubriks:
thread_name = "backup_" + cluster.name + "_thread"
if thread_name not in thread_list:
if cluster.last_updated is None:
self.rubrikscrape(cluster.name, cluster.user, cluster.password)
elif cluster.last_updated is not None:
time_difference = current_time - cluster.last_updated
if time_difference.seconds > 14400:
self.rubrikscrape(cluster.name, cluster.user, cluster.password)
else:
print("Backups: Too soon to update Cluster " + cluster.name)
else:
idle = False
print("Backups " + cluster.name + " update is in progress")
if idle:
platforms = Platform.objects.all()
connection.close()
applications = Application.objects.all()
connection.close()
functions = Function.objects.all()
connection.close()
regions = Region.objects.all()
connection.close()
sites = Site.objects.all()
connection.close()
environments = Environment.objects.all()
connection.close()
tag_reports = TagsReport.objects.all()
connection.close()
for obj in platforms:
thread_name = "Tag_report_" + "platform_" + obj.name + "_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.tagscrape(obj, "platform")
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.tagscrape(obj, "platform")
else:
print("Too soon to update platform " + obj.name)
for obj in applications:
thread_name = "Tag_report" + "application_" + obj.name + "_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.tagscrape(obj, "application")
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.tagscrape(obj, "application")
else:
print("Too soon to update application " + obj.name)
for obj in functions:
thread_name = "Tag_report" + "function_" + obj.name + "_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.tagscrape(obj, "function")
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.tagscrape(obj, "function")
else:
print("Too soon to update function " + obj.name)
for obj in regions:
thread_name = "Tag_report" + "region_" + obj.name + "_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.tagscrape(obj, "region")
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.tagscrape(obj, "region")
else:
print("Too soon to update region " + obj.name)
for obj in sites:
thread_name = "Tag_report" + "site_" + obj.name + "_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.tagscrape(obj, "site")
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.tagscrape(obj, "site")
else:
print("Too soon to update site " + obj.name)
for obj in environments:
thread_name = "Tag_report" + "environment_" + obj.name + "_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.tagscrape(obj, "environment")
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.tagscrape(obj, "environment")
else:
print("Too soon to update environment " + obj.name)
for obj in tag_reports:
thread_name = "Missing_tags_report_thread"
if thread_name not in thread_list:
if obj.last_updated is None:
self.missing_tag_scrape(obj)
elif obj.last_updated is not None:
time_difference = current_time - obj.last_updated
if time_difference.seconds > 14400:
self.missing_tag_scrape(obj)
else:
print("Too soon to update missing tags reports")
sleep(900)

一点解释。这个想法是,每隔15分钟,这个线程将检查,看看是否4小时过去了,因为最后一次记录更新这些项目。如果是,并且没有为该对象运行的抓取线程,它将启动一个新的抓取作业,将信息刷新到数据库中。

如果没有刮痧正在运行,那么它将允许运行一些报告,如果距离上次运行已有4小时。

这个线程是自包含的,并且,正如我在下面的回答中提到的,我能够弄清楚,如果我在线程开始时卡住了60秒的睡眠计时器,我可以避免在应用程序启动时加载apps.ready()函数时触发scrapes。

您试图绕过代码的事实意味着您将代码放在了错误的位置。启动scraper不属于apps.ready,因为您只需要在某些情况下运行它,而不需要在其他情况下运行。

另外,在执行./manage.py runserver时启动抓取代码听起来是个坏主意。您应该只将runserver用于开发,而不是作为生产环境中的实际web服务使用。相反,你应该部署一个真正的web服务器,如Apache或Nginx。

您可以使用cron作业或其他调度器启动抓取代码。如果启动这些抓取代码很复杂,那么bash脚本或管理命令是封装需要完成的工作的好方法。

在仔细考虑了这个问题之后,我找到了一个简单的解决方案,即在管理抓取功能的线程的开始处引入一个睡眠计时器。它允许我运行makemigration和migrate命令,而不需要立即启动抓取线程。

我还研究了使用Apache和mod_wsgi,但是dev和prod实例都在Windows上运行,而mod_wsgi很难在Windows上运行。我可能最终会把它转移到unix主机上,但这会涉及一些额外的操作,我现在不想这样做。

最新更新