I recently had to set up sitemap on CM/CD environment where we had the typical sitecore configuration one CM and two load balanced CD servers.
I wanted to run some experiments to see optimal items and configuration needed to ensure both CD’s have the sitemap.xml auto generated.
I did this because I was curious to know why folks said we need multiple same scheduled tasks one for each server node. I was definitely positive that I can pull this off with just one scheduled task across all environments. Turns out, I was over ambitious!
Below are the experiments I did to content my curious brain.
Case 1 – Only one task across all environments
My ambitious drive of making it work with just one scheduled task, in our patch file where we have the path for scheduled tasks defined which would be incorporated on showconfig, I ensured that all the servers pointed to the same path.
For example –
<agent type=”Sitecore.Tasks.DatabaseAgent” method=”Run” interval=”00:10:00″ name=”Master_Database_Agent”>
<param desc=”database”>master</param>
<param desc=”schedule root”>/sitecore/system/tasks/schedules/CM</param>
<LogActivity>true</LogActivity>
</agent>
Results
Post this setup, all is well on CD1, the sitemap was generated and by the looks of it covered all URL’s of items in content and protocol was followed fine as well per setup. The issue is the CD2, turns out sitemap is not updated on this server.
I checked the logs on CD2 to see if there was an error, there were none related to sitemap generation, but, one thing I did notice on the logs is that – every time I would see that the schedule specific to refreshing sitemap would come back as not due.
In the logs, I would see entry like below –
ManagedPoolThread #3 18:19:38 INFO Scheduling.DatabaseAgent started. Database: liveweb
ManagedPoolThread #3 18:19:38 INFO Examining schedules (count: 1)
ManagedPoolThread #3 18:19:38 INFO Not due: Refresh XML Sitemap
ManagedPoolThread #3 18:19:38 INFO Job ended: Web_Database_Agent (units processed: 1)
This would mean that every time the scheduling agent runs it finds out the task has already been run per schedule and does not run it any more.
Case 2 – One task per server
Now, the case which I did not wanted to do, I always strive to avoid duplication which would mean more management and more configuration, but, I had to test and ensure load balanced servers have sitemap served properly when crawlers do their stuff.
What I did to test this case is to duplicate the scheduled tasks one for each server, so on my master database I would have three different nodes and scheduled tasks and on each server the path on configuration would be different as well.
On CD1 –
<agent type=”Sitecore.Tasks.DatabaseAgent” method=”Run” interval=”00:10:00″ name=”Web_Database_Agent”>
<param desc=”database”>liveweb</param>
<param desc=”schedule root”>/sitecore/system/tasks/schedules/CD1</param>
<LogActivity>true</LogActivity>
</agent>
On CD2-
<agent type=”Sitecore.Tasks.DatabaseAgent” method=”Run” interval=”00:10:00″ name=”Web_Database_Agent”>
<param desc=”database”>liveweb</param>
<param desc=”schedule root”>/sitecore/system/tasks/schedules/CD2</param>
<LogActivity>true</LogActivity>
</agent>
Results –
In this scenario and setup, everything works well and servers are happily serving the most updated sitemap.xml
So, the bottom line is, if you want a task to run on all servers at least once and can not possibly live with running just once regardless of which server it ran on, then, we need to duplicate and there seems to be no option.
Do you all have any other ways to make this work with no duplication of tasks when both load balanced servers are pointing to same database. (web)?
I would be super curious to know, do post comments or suggestions.