Do you remember an issue that took almost a month to resolve which constantly was in the back of you head bothering you and reminding you that it is still pending? This case is one such incident.
Here it goes the whole deal ! I appreciate Sitecore support team’s resilience and how they had been proactive in responding. But, I also feel that this should be like known or documented because guess what who ever used xDB before knows there could be challenges and they have to solved quickly to mitigate data latency issues.
Problem: Experience Editor search was not working on older Contacts as PII indexing was not enabled and later enabled.
Story: Human error, it happens even after you have documented painfully every single step that need to be done manually after a deployment. Unfortunately config edits on Processing server or Search Indexer side of things are manual deployments on our end at this time. Anything we push to CM/CD roles is absolutely one click deployments using patches as a standard. So, I missed doing this step noted here during production deployment. I realized the problem noted above could be because of that and changed it to ‘true’ on applicable roles/locations on Managed Cloud and hoped for the best. When I did this search did start to work, but, was only working on newer Contacts that got added to xDB post the change I had made. Why?
Resolution: I had this issue before when I first started playing with xDB, Experience profile and various other xConnect related services. I was pretty confident I need to rebuild xDB Index that is usually answer for such funky behaviors. I did that by doing the below steps:
- Open up Azure portal and open up resource group of concern
- Go to specific resource for xc-search app service
- Open up Advanced Tools and hit Go -> this will take you to Kudu
- Selected Powershell option -> Drilled down all the way in to Index Worker location and followed steps noted here
- It said the rebuild succeeded, but, nope issue is not resolved and still search does not work on older contacts.
I was not sure on what else to do, I logged a support ticket with Sitecore and tried explaining what I tried so far. They confirmed that rebuild is what I should be doing in order to resolve the issue on hand. Strange, but, I did that exactly what is written on documentation. It turns out, rebuild has to be run subtly differently and in different location when the matter is about Managed Cloud. Support suggested that I do the rebuild command on location below such as below –
D:\local\Temp\jobs\continuous\IndexWorker\<Random Web Job Name>
Command is also subtly different from documentation it should be .\Sitecore.XConnectSearchIndexer.exe -rr
And then I started patiently monitoring the rebuild process using instructions here https://doc.sitecore.com/developers/100/sitecore-experience-platform/en/monitor-the-index-rebuild-process.html
You can also monitor the same from SOLR following below steps that were shared by Sitecore support :
- Go to the admin panel
- Switch to the xdb collection and click “query”
- Then run the query: id:”xdb-rebuild-status”
- This will tell us the exact current rebuild status of your xdb index.
Yep, I did all of that and every single time I tried it was stuck at 95% in finishing state, it never processed completely. So, Sitecore support asked me to do lot of steps to debug further to enhance the logging and log more things to index worker logs to help us understand why it is stuck and not completing. They identified the issue to be a setting that is higher than default one. The funny thing is, we did not set these settings up, they came with Managed Cloud. Any way below is the setting that I had to swap
Go to location below on Kudu: “D:\home\site\wwwroot\App_Data\jobs\continuous\IndexWorker\App_Data\config\sitecore\SearchIndexer\sc.Xdb.Collection.IndexWriter.SOLR.xml” )
I could see the config below:
<Solr.SolrWriterSettings>
<Type>Sitecore.Xdb.Collection.Search.Solr.SolrWriterSettings, Sitecore.Xdb.Collection.Search.Solr</Type>
<LifeTime>Singleton</LifeTime>
<Options>
<ConnectionStringName>solrCore</ConnectionStringName>
<RequireHttps>true</RequireHttps>
<MaximumUpdateBatchSize>1000</MaximumUpdateBatchSize>
<MaximumDeleteBatchSize>1000</MaximumDeleteBatchSize>
<MaximumCommitMilliseconds>600000</MaximumCommitMilliseconds>
<ParallelizationDegree>4</ParallelizationDegree>
<MaximumRetryDelayMilliseconds>5000</MaximumRetryDelayMilliseconds>
<RetryCount>5</RetryCount>
<Encoding>utf-8</Encoding>
</Options>
</Solr.SolrWriterSettings>
“MaximumCommitMilliseconds” value was too high and it was recommended to change it to “1000” . This apparently was the default value. Suspicion is when best practices were followed according to KB article by Managed Cloud team they must have swapped it as part of default set up steps.
I did the above change and queued rebuild process again for nth time and monitored patiently. Worked!!! I almost cried out loud given I had to do these steps so many steps, waited so many days to get the issue resolved.
Hoping Sitecore team can update documentation around this specifically for Managed cloud, until then, hope this helps some one else loosing sleep over similar problem and needs to queue in that rebuild index.