Tidal job grabbing partial files. File dependency not working.
Hello, Need some help with a file grabbing issue with my Tidal job. My job is scheduled to run from 3 PM to 9 PM daily repeating every 10 mins. The job looks for three files in UNIX directory /extractdirectory/ and when found, processes and moves them. Recently the job has been grabbing partial files as it was grabbing files before the copy process could complete. I setup a file dependency stable for 2 mins and when the files arrived at 4 PM six instances of the jobs ,which were waiting on dependency, fired at once. Is there a way I can stop the job from grabbing partial files and also only let one instance of the job run on the files?
What you mean by partial files? How files are generated? Based on file generation/completion time, can you not run your tidal job? or probably you need to write a script to check the file size and do post processing accordingly.
The files are copied into my directory /extractdirectory/ by an external process and I do not have any control over it. The files do not have a fixed size or fixed time of arrival. The three files totally take around 75 seconds to get copied completely. If the Tidal job happens to run within this 75 sec window then it doesn't get the complete data and it also does not wait for the copy to finish either before starting to work on the data.
There's couple of ways you might go about this. One is to create a queue with a job limit of 1 and a filter for this particular set of jobs. If you use the File event variable <Filemon.Filename> then you can feed each individual instance of the file picked up to the associated job/group. This effectively stops multiple instances of the same job from stepping on each other.
You can also set the constraints on the File event eg "1 occurrence in x minutes".
The approach will vary based on your specific use case. However I would have thought the option in the file event "file size stable for" should get you where you need to be. A combination of stable file size, size > and a delay ought to guarantee success.
Did you ever get this resolved? I am curious as to what is the arrival frequency for these files. If you know there are X number sets of files Y minutes apart between 3pm and 9pm time window, then instead of scheduling your job to run every 10 minutes, you should be able to configure your job to run with the following:
ALL file dependencies must be met (with constraints, file size is stable for 2 minutes etc)