Help! My Azure WebJob, Azure Function or Azure Automation job has stopped working with "Token request failed"
I work a lot with automation of tasks in Microsoft 365 and Azure, and many of them involve connecting to other Microsoft 365 services like SharePoint Online. Years back Azure WebJob was the only viable solution for running automated .NET and PowerShell based tasks, but over the years this has moved along to Azure Functions, Azure Automation or even other solutions like Logic Apps and Power Automate. Automation jobs can be running spotless for years in Azure - but everything great has to come to an end ;-)
Example of error messages
From a WebJob (C#):
[05/07/2021 15:35:01 > 25cd0d: ERR ] Unhandled Exception: Microsoft.IdentityModel.SecurityTokenService.RequestFailedException: Token request failed. ---> System.Net.WebException: The remote server returned an error: (401) Unauthorized. ... [05/07/2021 15:35:01 > 25cd0d: SYS INFO] Status changed to Failed [05/07/2021 15:35:01 > 25cd0d: SYS ERR ] Job failed due to exit code -532462766
From a PowerShell script:
Connect-PnPOnline : Token request failed.
Common issues and easy fixes
There are a number of reasons why you "suddenly" can get into trouble with authentication on an app that have been running as normal for a long time. It can be a bit hard to understand where to start, so this is my recommended order to troubleshoot:
- Check if the ClientSecret or certificate for application registered in Azure AD has expired
- Check if the permissions for the app has been removed/changed
- Set up a local dev environment to check if you can reproduce the error locally to get more detailed error messages
- At you local dev computer set up a tool for inspecting the network traffic to and from you app like Fiddler (Fiddler| Web Debugging Proxy and Troubleshooting Solutions (telerik.com)). If you suspect you solution to not send the request to the remote API correctly any more, you probably must move ahead with a bit mot in depth approach to locate the error as further described.
Still no luck?
What if the solution runs just fine on my local dev computer and only fails in Azure? Often this can be the case, and the reason can be a tricky combination of outdated packages, modules or frameworks no longer supported by the runtime host or the API's in use.
- Make sure your modules are updated
- For runbooks in Azure Automation the modules are centralized managed per automation account, so remember to update them from time to time
- Specific for SharePoint Online make sure to update to the latest versions of PnP.PowerShell and make sure you are up to speed with the change from "SharePointPnPPowerShellOnline" to the new module "PnP.PowerShell" (Upgrading from the Legacy version of PnP PowerShell | PnP PowerShell)
- Switch over to PowerShell Core if there is nothing holding you back (Migrating from Windows PowerShell 5.1 to PowerShell 7 - PowerShell | Microsoft Docs)
.NET Framework / .NET Core
- Make sure all you Nuget packages are up to date. Sometimes (always?) updating large amount of packages will break your code and bring some extra work. Checking changelogs for breaking changes can make this easier, or at least update to the latest version before the changes
- Check your .NET Framework or .NET Core runtime version. Even if this should be supported by you Azure App Service or Azure Function thing change. Update your project to the latest version of the runtime that should be supported. In one of my cases it just involved switching to the lastest .NET Runtime to get back up and running after hours of troubleshooting.
- Make sure you find and replace deprecated Nuget packages. As with PowerShell and SharePoint Online the same case is to make sure to know the change from "OfficeDevPnPCore16" to "SharePointPnPCoreOnline". But even this is leagacy know, and you should try to move ahead to "PnP.Framework" with support for .NET Core apps.
Locating errors in automation tasks running in Azure can either be a quick fix or in worst case taking days to figure out and fix. Don't underestimate the need to do regular updates of running jobs even if they still work flawlessly. In the long run you will most likely save money and avoid downtime :-)
Image credits: stockvault.net