DPC_WATCHDOG_VIOLATION(133/1)可能与NdisFIndicateReceiveNetBuffe



我们有一个NDIS LWF驱动程序,在一台机器上,当他们试图连接到他们的VPN连接到互联网时,我们得到一个DPC_WATCHDOG_VIOLATION 133/1错误检查。这可能与我们的NdisFIndicateReceiveNetBufferLists有关,因为IRQL在调用它之前被提升为DISPATCH(并且显然降低到之后的值),这确实出现在!dpcwatchdog的输出中,如下所示。这是由于这里解释的另一个错误的解决方案:

IRQL_UNEXPECTED_VALUE NdisFIndicateReceiveNetBufferLists后的BSOD ?

现在是bugcheck:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
DISPATCH_LEVEL or above. The offending component can usually be
identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: fffff805422fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains
additional information regarding the cumulative timeout
Arg4: 0000000000000000
STACK_TEXT:  
nt!KeBugCheckEx
nt!KeAccumulateTicks+0x1846b2
nt!KiUpdateRunTime+0x5d
nt!KiUpdateTime+0x4a1
nt!KeClockInterruptNotify+0x2e3
nt!HalpTimerClockInterrupt+0xe2
nt!KiCallInterruptServiceRoutine+0xa5
nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
nt!KiInterruptDispatchNoLockNoEtw+0x37
nt!KxWaitForSpinLockAndAcquire+0x2c
nt!KeAcquireSpinLockAtDpcLevel+0x5c
wanarp!WanNdisReceivePackets+0x4bb
ndis!ndisMIndicateNetBufferListsToOpen+0x141
ndis!ndisMTopReceiveNetBufferLists+0x3f0e4
ndis!ndisCallReceiveHandler+0x61
ndis!ndisInvokeNextReceiveHandler+0x1df
ndis!NdisMIndicateReceiveNetBufferLists+0x104
ndiswan!IndicateRecvPacket+0x596
ndiswan!ApplyQoSAndIndicateRecvPacket+0x20b
ndiswan!ProcessPPPFrame+0x16f
ndiswan!ReceivePPP+0xb3
ndiswan!ProtoCoReceiveNetBufferListChain+0x442
ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xf6
ndis!NdisMCoIndicateReceiveNetBufferLists+0x11
raspptp!CallIndicateReceived+0x210
raspptp!CallProcessRxNBLs+0x199
ndis!ndisDispatchIoWorkItem+0x12
nt!IopProcessWorkItem+0x135
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28

SYMBOL_NAME:  wanarp!WanNdisReceivePackets+4bb
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: wanarp
IMAGE_NAME:  wanarp.sys

下面是!dpcwatchdog的输出,但我仍然找不到导致这个bug检查的原因,也找不到哪个函数在DISPATCH级别消耗了太多的时间,这导致了这个bug检查。尽管我认为这可能与wanarp完成的一些自旋锁定有关?会不会是wanarp的问题?请注意,我们没有在我们的驱动程序中使用任何自旋锁定,并且我们提高IRQL应该不会引起任何问题,因为实际上在Ndis中指示在IRQL DISPATCH中完成是非常常见的。

那么我怎样才能找到这个bug检查的根本原因呢?在ndis栈中没有其他第三方LWF。


3: kd> !dpcwatchdog 
All durations are in seconds (1 System tick = 15.625000 milliseconds)
Circular Kernel Context Logger history: !logdump 0x2
DPC and ISR stats: !intstats /d
--------------------------------------------------
CPU#0
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type      KDPC       Function
dpcs: no pending DPCs found
--------------------------------------------------
CPU#1
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type      KDPC       Function
1: Normal  : 0xfffff80542220e00 0xfffff805418dbf10 nt!PpmCheckPeriodicStart
1: Normal  : 0xfffff80542231d40 0xfffff8054192c730 nt!KiBalanceSetManagerDeferredRoutine
1: Normal  : 0xffffbd0146590868 0xfffff80541953200 nt!KiEntropyDpcRoutine

DPC Watchdog Captures Analysis for CPU #1.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #1 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr           Call Site
00 fffff805418d8991  nt!KiUpdateRunTime+0x5D 
01 fffff805418d2803  nt!KiUpdateTime+0x4A1 
02 fffff805418db1c2  nt!KeClockInterruptNotify+0x2E3 
03 fffff80541808a45  nt!HalpTimerClockInterrupt+0xE2 
04 fffff805419fab9a  nt!KiCallInterruptServiceRoutine+0xA5 
05 fffff805419fb107  nt!KiInterruptSubDispatchNoLockNoEtw+0xFA 
06 fffff805418a9a9c  nt!KiInterruptDispatchNoLockNoEtw+0x37 
07 fffff805418da3cc  nt!KxWaitForSpinLockAndAcquire+0x2C 
08 fffff8054fa614cb  nt!KeAcquireSpinLockAtDpcLevel+0x5C 
09 fffff80546ba1eb1  wanarp!WanNdisReceivePackets+0x4BB 
0a fffff80546be0b84  ndis!ndisMIndicateNetBufferListsToOpen+0x141 
0b fffff80546ba7ef1  ndis!ndisMTopReceiveNetBufferLists+0x3F0E4 
0c fffff80546bddfef  ndis!ndisCallReceiveHandler+0x61 
0d fffff80546ba4a94  ndis!ndisInvokeNextReceiveHandler+0x1DF 
0e fffff8057c32d17e  ndis!NdisMIndicateReceiveNetBufferLists+0x104 
0f fffff8057c30d6c7  ndiswan!IndicateRecvPacket+0x596 
10 fffff8057c32d56b  ndiswan!ApplyQoSAndIndicateRecvPacket+0x20B 
11 fffff8057c32d823  ndiswan!ProcessPPPFrame+0x16F 
12 fffff8057c308e62  ndiswan!ReceivePPP+0xB3 
13 fffff80546c5c006  ndiswan!ProtoCoReceiveNetBufferListChain+0x442 
14 fffff80546c5c2d1  ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xF6 
15 fffff8057c2b0064  ndis!NdisMCoIndicateReceiveNetBufferLists+0x11 
16 fffff8057c2b06a9  raspptp!CallIndicateReceived+0x210 
17 fffff80546bd9dc2  raspptp!CallProcessRxNBLs+0x199 
18 fffff80541899645  ndis!ndisDispatchIoWorkItem+0x12 
19 fffff80541852b65  nt!IopProcessWorkItem+0x135 
1a fffff80541871d25  nt!ExpWorkerThread+0x105 
1b fffff80541a00778  nt!PspSystemThreadStartup+0x55 
1c ----------------  nt!KiStartSystemThread+0x28 
--------------------------------------------------
CPU#2
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type      KDPC       Function
2: Normal  : 0xffffbd01467f0868 0xfffff80541953200 nt!KiEntropyDpcRoutine

DPC Watchdog Captures Analysis for CPU #2.
DPC Watchdog capture size: 641 stacks.
Number of unique stacks: 1.
No common functions detected!
The captured stacks seem to indicate that only a single DPC or generic function is the culprit.
Try to analyse what other processors were doing at the time of the following reference capture:
CPU #2 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec
# RetAddr           Call Site
00 fffff805418d245a  nt!KeClockInterruptNotify+0x453 
01 fffff80541808a45  nt!HalpTimerClockIpiRoutine+0x1A 
02 fffff805419fab9a  nt!KiCallInterruptServiceRoutine+0xA5 
03 fffff805419fb107  nt!KiInterruptSubDispatchNoLockNoEtw+0xFA 
04 fffff805418a9a9c  nt!KiInterruptDispatchNoLockNoEtw+0x37 
05 fffff805418a9a68  nt!KxWaitForSpinLockAndAcquire+0x2C 
06 fffff8054fa611cb  nt!KeAcquireSpinLockRaiseToDpc+0x88 
07 fffff80546ba1eb1  wanarp!WanNdisReceivePackets+0x1BB 
08 fffff80546be0b84  ndis!ndisMIndicateNetBufferListsToOpen+0x141 
09 fffff80546ba7ef1  ndis!ndisMTopReceiveNetBufferLists+0x3F0E4 
0a fffff80546bddfef  ndis!ndisCallReceiveHandler+0x61 
0b fffff80546be3a81  ndis!ndisInvokeNextReceiveHandler+0x1DF 
0c fffff80546ba804e  ndis!ndisFilterIndicateReceiveNetBufferLists+0x3C611 
0d fffff8054e384d77  ndis!NdisFIndicateReceiveNetBufferLists+0x6E 
0e fffff8054e3811a9  ourdriver+0x4D70 
0f fffff80546ba7d40  ourdriver+0x11A0 
10 fffff8054182a6b5  ndis!ndisDummyIrpHandler+0x100 
11 fffff80541c164c8  nt!IofCallDriver+0x55 
12 fffff80541c162c7  nt!IopSynchronousServiceTail+0x1A8 
13 fffff80541c15646  nt!IopXxxControlFile+0xC67 
14 fffff80541a0aab5  nt!NtDeviceIoControlFile+0x56 
15 ----------------  nt!KiSystemServiceCopyEnd+0x25 
--------------------------------------------------
CPU#3
--------------------------------------------------
Current DPC: No Active DPC
Pending DPCs:
----------------------------------------
CPU Type      KDPC       Function
dpcs: no pending DPCs found

目标机版本:Windows 10 Kernel Version 19041 MP(4 procs)

还注意,如果当前IRQL正在调度,我们也将NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL标志传递给NdisFIndicateReceiveNetBufferLists

Edit1:这也是!locks!qlocks!ready的输出,其中一个资源上的争用计数是49135,这是正常的还是太高了?这和我们的问题有关吗?等待它或拥有它的线程是为正常进程(如chrome、csrss等)服务的。

3: kd> !kdexts.locks 
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.
Resource @ nt!ExpTimeRefreshLock (0xfffff80542219440)    Exclusively owned
Contention Count = 17
Threads: ffffcf8ce9dee640-01<*> 
KD: Scanning for held locks.....
Resource @ 0xffffcf8cde7f59f8    Shared 1 owning threads
Contention Count = 62
Threads: ffffcf8ce84ec080-01<*> 
KD: Scanning for held locks...............................................................................................
Resource @ 0xffffcf8ce08d0890    Exclusively owned
Contention Count = 49135
NumberOfSharedWaiters = 1
NumberOfExclusiveWaiters = 6
Threads: ffffcf8cf18e3080-01<*> ffffcf8ce3faf080-01    
Threads Waiting On Exclusive Access:
ffffcf8ceb6ce080       ffffcf8ce1d20080       ffffcf8ce77f1080       ffffcf8ce92f4080       
ffffcf8ce1d1f0c0       ffffcf8ced7c6080       
KD: Scanning for held locks.
Resource @ 0xffffcf8ce08d0990    Shared 1 owning threads
Threads: ffffcf8cf18e3080-01<*> 
KD: Scanning for held locks.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource @ 0xffffcf8ceff46350    Shared 1 owning threads
Threads: ffffcf8ce6de8080-01<*> 
KD: Scanning for held locks......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Resource @ 0xffffcf8cf0cade50    Exclusively owned
Contention Count = 3
Threads: ffffcf8ce84ec080-01<*> 
KD: Scanning for held locks.........................
Resource @ 0xffffcf8cf0f76180    Shared 1 owning threads
Threads: ffffcf8ce83dc080-02<*> 
KD: Scanning for held locks.......................................................................................................................................................................................................................................................
Resource @ 0xffffcf8cf1875cb0    Shared 1 owning threads
Contention Count = 3
Threads: ffffcf8ce89db040-02<*> 
KD: Scanning for held locks.
Resource @ 0xffffcf8cf18742d0    Shared 1 owning threads
Threads: ffffcf8cee5e1080-02<*> 
KD: Scanning for held locks....................................................................................
Resource @ 0xffffcf8cdceeece0    Shared 2 owning threads
Contention Count = 4
Threads: ffffcf8ce3a1c080-01<*> ffffcf8ce5625040-01<*> 
Resource @ 0xffffcf8cdceeed48    Shared 1 owning threads
Threads: ffffcf8ce5625043-02<*> *** Actual Thread ffffcf8ce5625040
KD: Scanning for held locks...
Resource @ 0xffffcf8cf1d377d0    Exclusively owned
Threads: ffffcf8cf0ff3080-02<*> 
KD: Scanning for held locks....
Resource @ 0xffffcf8cf1807050    Exclusively owned
Threads: ffffcf8ce84ec080-01<*> 
KD: Scanning for held locks......
245594 total locks, 13 locks currently held
3: kd> !qlocks 
Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt
Processor Number
Lock Name         0  1  2  3
KE   - Unused Spare             
MM   - Unused Spare             
MM   - Unused Spare             
MM   - Unused Spare             
CC   - Vacb                     
CC   - Master                   
EX   - NonPagedPool             
IO   - Cancel                   
CC   - Unused Spare             
IO   - Vpb                      
IO   - Database                 
IO   - Completion               
NTFS - Struct                   
AFD  - WorkQueue                
CC   - Bcb                      
MM   - NonPagedPool       

3: kd> !ready
KSHARED_READY_QUEUE fffff8053f1ada00: (00) ****------------------------------------------------------------
SharedReadyQueue fffff8053f1ada00: No threads in READY state
Processor 0: No threads in READY state
Processor 1: Ready Threads at priority 15
THREAD ffffcf8ce9dee640  Cid 2054.2100  Teb: 000000fab7bca000 Win32Thread: 0000000000000000 READY on processor 1
Processor 2: No threads in READY state
Processor 3: No threads in READY state


3: kd> dt nt!_ERESOURCE 0xffffcf8ce08d0890
+0x000 SystemResourcesList : _LIST_ENTRY [ 0xffffcf8c`e08d0610 - 0xffffcf8c`e08cf710 ]
+0x010 OwnerTable       : 0xffffcf8c`ee6e8210 _OWNER_ENTRY
+0x018 ActiveCount      : 0n1
+0x01a Flag             : 0xf86
+0x01a ReservedLowFlags : 0x86 ''
+0x01b WaiterPriority   : 0xf ''
+0x020 SharedWaiters    : 0xffffae09`adcae8e0 Void
+0x028 ExclusiveWaiters : 0xffffae09`a9aabea0 Void
+0x030 OwnerEntry       : _OWNER_ENTRY
+0x040 ActiveEntries    : 1
+0x044 ContentionCount  : 0xbfef
+0x048 NumberOfSharedWaiters : 1
+0x04c NumberOfExclusiveWaiters : 6
+0x050 Reserved2        : (null) 
+0x058 Address          : (null) 
+0x058 CreatorBackTraceIndex : 0
+0x060 SpinLock         : 0
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 (*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0))
(*((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8ce08d08c0))                 [Type: _OWNER_ENTRY]
[+0x000] OwnerThread      : 0xffffcf8cf18e3080 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x0 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced  : 0x0 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount       : 0x1 [Type: unsigned long]
[+0x008] TableSize        : 0xc [Type: unsigned long]
3: kd> dx -id 0,0,ffffcf8cdcc92040 -r1 ((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210)
((ntkrnlmp!_OWNER_ENTRY *)0xffffcf8cee6e8210)                 : 0xffffcf8cee6e8210 [Type: _OWNER_ENTRY *]
[+0x000] OwnerThread      : 0x0 [Type: unsigned __int64]
[+0x008 ( 0: 0)] IoPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 ( 1: 1)] OwnerReferenced  : 0x1 [Type: unsigned long]
[+0x008 ( 2: 2)] IoQoSPriorityBoosted : 0x1 [Type: unsigned long]
[+0x008 (31: 3)] OwnerCount       : 0x0 [Type: unsigned long]
[+0x008] TableSize        : 0x7 [Type: unsigned long]

谢谢你的报道。我已经追踪到这是一个操作系统的错误:在wanarp中有一个死锁。这个问题似乎影响到从Windows Vista开始的每个版本的操作系统。

我已经提交了内部问题任务。ms/42393356跟踪这个问题:如果你有微软的支持合同,你的代表可以让你在这个问题上的状态更新。

同时,你可以部分地解决这个问题:

  • 每次1个数据包(NumberOfNetBufferLists==1);或
  • 指示一次在单个CPU上运行

当2个或更多的cpu同时处理3个或更多的nbl时,wanarp中的错误会暴露。因此,任何一种解决方案都可以避免触发条件。

根据你通过这个网络接口推送的带宽大小,这些选项可能对CPU/电池/吞吐量相当不利。所以请尽量避免悲观的批处理,除非它真的是必要的。(例如,您可以将此选项设置为默认关闭,除非客户特别使用wanarp。)

请注意,您自己无法完全防止这个问题。堆栈中的其他驱动程序,包括NDIS本身,有权将数据包分组在一起,这可能会产生副作用,即重新批处理您小心地取消批处理的数据包。但是,我相信如果您一次只指示一个NBL,或者一次在一个CPU上指示多个NBL,则可以在统计上显著减少崩溃。

对不起,这种事又发生在你身上了!wanarp是……一个非常旧的代码库。

相关内容

  • 没有找到相关文章

最新更新