Azure Webjob,KeyVault配置扩展,套接字错误



需要一些帮助来确定这是我的代码中还是配置kevault扩展中的错误。

我有一个基于netcore控制台的网络作业。所有工作都很好,直到几周前我们说偶尔会出现启动错误,这些错误是套接字错误10060-套接字超时或"错误";由于连接方在一段时间后没有正确响应而导致连接尝试失败,或者由于连接的主机未能响应而导致建立的连接失败;

这些都与加载配置层(应用程序设置、环境、命令行和密钥库(有关。在hostbuilder上执行生成后,错误源于keyvault。

我最初添加了带有默认HttpStatusCodeErrorDetectionStrategy和指数后退的重试策略,但这并没有执行。

最后,我添加了自己的重试策略和自己的检测策略(见下文(。仍然没有被解雇。

我已经将代码简化为一个类似hello world的示例,并包含了来自webjob的消息。

以下是代码摘要:

public static async Task<int> Main(string[] args)
{

var host = CreateHostBuilder(args)
.UseConsoleLifetime()
.Build();
using var serviceScope = host.Services.CreateScope();
var services = serviceScope.ServiceProvider;
//**stripped down to logging just for debug
var loggerFactory = host.Services.GetRequiredService<ILoggerFactory>();
var logger = loggerFactory.CreateLogger("Main");
logger.LogDebug("Hello Test App Started OK.  Exiting.");
//**Normally lots of service calls go here to do real work**
return 0;
}

HostBuilder-为什么选择HostBuilder?我们使用了许多为webapi和webapp构建的组件,因此使用类似的服务模型非常方便。

public static IHostBuilder CreateHostBuilder(string[] args)
{
var host = Host
.CreateDefaultBuilder(args)
.ConfigureAppConfiguration((ctx, config) =>
{
//override with keyvault
var azureServiceTokenProvider = new AzureServiceTokenProvider();   //this is awesome - it will use MSI or Visual Studio connection
var keyVaultClient = new KeyVaultClient(new KeyVaultClient.AuthenticationCallback(azureServiceTokenProvider.KeyVaultTokenCallback));
var retryPolicy = new RetryPolicy<ServerErrorDetectionStrategy>(
new ExponentialBackoffRetryStrategy(
retryCount: 5,
minBackoff: TimeSpan.FromSeconds(1.0),
maxBackoff: TimeSpan.FromSeconds(16.0),
deltaBackoff: TimeSpan.FromSeconds(2.0)
)
);
retryPolicy.Retrying += RetryPolicy_Retrying;
keyVaultClient.SetRetryPolicy(retryPolicy);

var prebuiltConfig = config.Build();
config.AddAzureKeyVault(prebuiltConfig.GetSection("KeyVaultSettings").GetValue<string>("KeyVaultUri"), keyVaultClient, new DefaultKeyVaultSecretManager());
config.AddCommandLine(args);
})
.ConfigureLogging((ctx, loggingBuilder) =>  //note - this is run AFTER app configuration - whatever the order it is in.
{
loggingBuilder.ClearProviders();
loggingBuilder
.AddConsole()
.AddDebug()
.AddApplicationInsightsWebJobs(config => config.InstrumentationKey = ctx.Configuration["APPINSIGHTS_INSTRUMENTATIONKEY"]);
})
.ConfigureServices((ctx, services) =>
{
services
.AddApplicationInsightsTelemetry();
services
.AddOptions();

});
return host;
}

事件-这是从不触发的。

private static void RetryPolicy_Retrying(object sender, RetryingEventArgs e)
{
Console.WriteLine($"Retrying, count = {e.CurrentRetryCount}, Last Exception={e.LastException}, Delay={e.Delay}");
}

重试策略-仅在非MSI尝试联系密钥保管库时触发。

public class ServerErrorDetectionStrategy : ITransientErrorDetectionStrategy
{
public bool IsTransient(Exception ex)
{
if (ex != null)
{
Console.WriteLine($"Exception {ex.Message} received, {ex.GetType()?.FullName}");
HttpRequestWithStatusException httpException;
if ((httpException = ex as HttpRequestWithStatusException) != null)
{
switch(httpException.StatusCode)
{
case HttpStatusCode.RequestTimeout:
case HttpStatusCode.GatewayTimeout:
case HttpStatusCode.InternalServerError:
case HttpStatusCode.ServiceUnavailable:
return true;
}
}
SocketException socketException;
if((socketException = (ex as SocketException)) != null)
{
Console.WriteLine($"Exception {socketException.Message} received, Error Code: {socketException.ErrorCode}, SocketErrorCode: {socketException.SocketErrorCode}");
if (socketException.SocketErrorCode == SocketError.TimedOut)
{
return true;
}
}
}
return false;
}
}

WebJob输出

[SYS INFO] Status changed to Initializing
[SYS INFO] Run script 'run.cmd' with script host - 'WindowsScriptHost'
[SYS INFO] Status changed to Running
[INFO] 
[INFO] D:localTempjobstriggeredHelloWebJob42wj5ipx.ukj>dotnet HelloWebJob.dll  
[INFO] Exception Response status code indicates server error: 401 (Unauthorized). received,     Microsoft.Rest.TransientFaultHandling.HttpRequestWithStatusException
[INFO] Exception A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. received, System.Net.Http.HttpRequestException
[ERR ] Unhandled exception. System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
[ERR ]  ---> System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
[ERR ]    at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
[ERR ]    --- End of inner exception stack trace ---
[ERR ]    at Microsoft.Rest.RetryDelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[ERR ]    at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
[ERR ]    at Microsoft.Azure.KeyVault.KeyVaultClient.GetSecretWithHttpMessagesAsync(String vaultBaseUrl, String secretName, String secretVersion, Dictionary`2 customHeaders, CancellationToken cancellationToken)
[ERR ]    at Microsoft.Azure.KeyVault.KeyVaultClientExtensions.GetSecretAsync(IKeyVaultClient operations, String secretIdentifier, CancellationToken cancellationToken)
[ERR ]    at Microsoft.Extensions.Configuration.AzureKeyVault.AzureKeyVaultConfigurationProvider.LoadAsync()
[ERR ]    at Microsoft.Extensions.Configuration.AzureKeyVault.AzureKeyVaultConfigurationProvider.Load()
[ERR ]    at Microsoft.Extensions.Configuration.ConfigurationRoot..ctor(IList`1 providers)
[ERR ]    at Microsoft.Extensions.Configuration.ConfigurationBuilder.Build()
[ERR ]    at Microsoft.Extensions.Hosting.HostBuilder.BuildAppConfiguration()
[ERR ]    at Microsoft.Extensions.Hosting.HostBuilder.Build()
[ERR ]    at HelloWebJob.Program.Main(String[] args) in C:UsersmarkSourceReposHelloWebJobHelloWebJobProgram.cs:line 21
[ERR ]    at HelloWebJob.Program.<Main>(String[] args)
[SYS INFO] Status changed to Failed
[SYS ERR ] Job failed due to exit code -532462766

这是由PG确定的KV连接问题。以下是产品组的官方声明:

Microsoft Azure应用程序服务团队已发现应用程序服务和Azure功能的密钥保管库参考与运行时解析引用的间歇性故障有关。

工程师们在系统中发现了一个回归,该回归降低了我们的秤单元检索能力的性能和可用性运行时的密钥vault引用。已编写并部署修补程序我们的虚拟机团队来缓解这一问题。

我们正在不断采取措施改进Azure Web App服务以及我们确保此类事件在未来不会发生的流程,在这种情况下,它包括(但不限于(:改进检测和测试密钥库的性能和可用性应用程序设置参考功能改进我们的平台以确保此功能在运行时的高可用性。我们对任何不便

对于几乎所有人来说,将程序包更新为新的Microsoft.Azure程序包可以缓解这个问题,因此我的第一个建议是尝试这些程序包

谢谢@HarshitaSingh MSFT,这是有道理的,尽管我在遇到问题时搜索了这个,但找不到。

作为变通办法,我在启动时添加了一些基本的重试代码。

目前主要内容如下:

public static async Task<int> Main(string[] args)
{
IHost host = null;
int retries = 5;
while (true)
{
try
{
Console.WriteLine("Building Host...");
var hostBuilder = CreateHostBuilder(args)
.UseConsoleLifetime();
host = hostBuilder.Build();
break;
}
catch (HttpRequestException hEx)
{
Console.WriteLine($"HTTP Exception in host builder. {hEx.Message}, Name:{hEx.GetType().Name}");
SocketException se;
if ((se = hEx.InnerException as SocketException) != null)
{
if (se.SocketErrorCode == SocketError.TimedOut)
{
Console.WriteLine($"Socket error in host builder.  Retrying...");
if (retries > 0)
{
retries--;
await Task.Delay(5000);
host?.Dispose();
}
else
{
throw;
}
}
else
{
throw;
}
}
}
}
using var serviceScope = host.Services.CreateScope();
var services = serviceScope.ServiceProvider;
var transferService = services.GetRequiredService<IRunPinTransfer>();
var result = await transferService.ProcessAsync();
return result;
}

最新更新