I have a ticket in with Cisco but no response yet so thought I'd check here.
I have Tidal 6.0.3 which has been relatively stable for 2 months now. I have some windows servers with agents 220.127.116.11 running several months, and now jobs hange in "launched" status on these.
When this has happened in the past a restart of the agent, or reboot of the agent host has resolved and jobs could run again, this time its not working. event restarted all Tidal services to see if it cleared up anything with same results.
No changes/updates to the servers, has anyone had this before? Can someone point out a resolution to look at? My Scheduler group are starting to hound me on getting these agents going again.
Waits in the production schedule for its dependencies to be met.
Enters a queue and waits for an execution slot to become available.
Launches on its designated agent.
Starts execution successfully on its designated agent.
so the agent should have been assigned the job by the master and it is getting ready for execution (goes active)
it should be getting a PID
does the job status tab have a External ID? and does that ID/PID exist on the Tidal Agent as being active?
(Tidal External ID= Server PID)
Remote to Agent... open Task Manager ...Process Tab .. select menu item View ...Select columns..Choose PID (Process Identifier). Make sure the check box is checked for [x]Show processes from all users
look for the External ID in the PID column..
if it is there (probalby using no cpu/mem) then problem is likely on agent side and could be code itself..
if it is not there (more likely) than the master was unable to commuicate with Agent and you can investigate the master logs (check agent communicaiton port, increase logging level,/high debug,get Cisco to assist, check network, etc)
looks like a corrupted file event or bad file event (although these file events have been running for several months or longer). Spent 2 hours with Support, and altough only 14 file event jobs associated to this agent, one of them was the culprit to hange up response. disabled all file events, restarted the agent, jobs worked, enabled the file events, jobs still work.
Thanks for updating us. I always find it helpful to know different things to look for. Did they say this was bug related or a weird anomaly? We are on a different version but always something good to watch out for.