ePrivacy and GPDR Cookie Consent by Cookie Consent Skip to main content

Workspaces Monitoring Alert & Analyst Duty

Overview

On all client instances that are in production, monitoring alerts have been set up. Whenever an instance/workspace fails, an error notification will be sent to the respective Slack channel to notify us of the error.

Here is a list of the alerts channels under the purview of the implementation team: 

Slack Channel Types of Alerts
#alerts_cdp Mainly cache refresh alerts and segment export failures in Business Explorer 
#alerts_cps
Alerts sent whenever CPS component in Meiro Integrations fails
#alerts_me Alerts sent from Meiro Events Monitoring App whenever there is no Meiro Events data for a period of time (depends on how it is set up). 
#alerts_workspaces Alerts are sent when there is a workspaces failure in Meiro Integrations
#alerts_reports Alerts sent when there is a failure in PowerBI dashboard refresh.

 

Analysts Duty

Analysts team take turns to monitor these alert channels on a weekly rotation. Every Monday we update on #team_support_duty channel the analyst who is on duty that week, and the week after. 

As the analyst-on-duty of that week, please go through the alerts every working day. When there is an error, tag the analysts involved for that client instance and the analyst should proceed to investigate & resolve that error. 


The list of all client instances and their stakeholders can be found here: Meiro All Instances, Project, Analysts - Google Sheets

 

SOP

When you are the analyst-on-duty:

  1. Make sure you have the above-mentioned channels unmuted if you usually mute them (to ensure that you get the new alerts)
  2. Check the channels at least 2 times a day.
  3. For a new alert: If you have access to the workspace, check the error message.
  4. Tag the analyst in charge of that project / workspace
    • If you have encountered similar errors before, you can include the error message and the suggested fix when tagging the analyst.
  5. If the workspace has no errors, it usually means that the workspace has been re-run successfully and there is no need to tag the analyst.

 

Cache refresh (#alerts_cdp)

Cache refreshes are done for the diagnostic dashboard and a few other things which are too expensive to calculate in real time. If it is delayed for some reason (DB being down or overutilized, some other errors) or if it has not been refreshed for the scheduled time + 20%, an alert will be sent. Usually, it points to 1) cache being slow or 2) some temporary error.
Next time there are cache refresh alerts in #alerts_cdp, here’s the procedure:
  1. Run the cache refresh manually, and
  2. If it does not solve the problem, report it as a bug.