I recently had the requirement to traverse a set of web pages. I had to start at a given web page and then by inspecting the links on that first page I would be sent to more web pages. Then on those web pages there would be links to still more web pages. This would continue until there were no more links. This is basically the concept of traversing a tree. Another common way to think of a tree is the directory structure on a computer. You start at the root node, like the C:\ drive on a Windows machine, and then follow the folders to see where they will lead.
My main sticking point with my problem is that I was limited to making 10 call-outs within a given transaction. I needed to do these call-outs so I could do an HTTP ‘GET’ on the pages so I could read in the source. What I figured out I could do was to set my batch size at 10 so that I could process 10 pages with each running of the execute batch method. I always want my batch size to be as high as possible so that there are fewer cycles when the batch job runs. Remember that there are three methods that you must implement with an APEX batch job:
1) The ‘start’ method will run once. This is where you define which records will be a part of this batch job.
2) The ‘execute’ method will run as many times as needed to complete the job. So if there are 100 items to be processed and the batch size is set to be 20, then the ‘execute’ method will run 5 times.
3) The ‘finish’ method will run only once and this is where our magic will occur.
With a normal batch job you know how many records you are going to process or you at least know how you are going to get the records. Usually a simple query is done and those records will be processed. But with my requirements, I did not know how many branches and leaf nodes I would find in this tree structure. I decided to create a custom object that would hold the nodes in the tree. At first I would only have to populate the root node to get the process started. The batch job would run the first time and only process that one page. While that one page was being processed the other nodes/links that were found were saved to that same custom object. Then in the finalize method of the batch job the exact same batch job was called again to start the process over again. On this 2nd run the batch job would be processing all of the nodes/links just below that root node. I still had to keep the batch job size at 10 since that is the maximum number of external call-outs that could be made. Here is an example of that finish method that is kicking off the batch job again:
Notice how if I was running a test that I set the batch size to be the same as the current size of the list. This is important when you are building tests and your batch job is setup to run off of an Iterable instead of a Database.QueryLocator. An interable is a list that is created and populated because a simple query will not suffice. If QueryLocator is used, then an actual query will be returning the data.
But how would I know if a given node/link had already been processed? I added a custom checkbox field to my object that marked that the node/link had been processed. So my query each time to populate the batch job would only query those rows that had not yet been processed. Of course those rows are the nodes/links in the tree that I was traversing.
With this solution I am able to fully traverse the tree and I do not have to specify some arbitrary number in order to make my batch job run with enough iterations to complete the job.
I hope this example can help you in your requirements to deal with long-running jobs with data that is variable in length.
Gotta love those trees!