Workspaces Monitoring Alert & Analyst Duty
On all client instances that are in production, monitoring alerts have been set up. Whenever an instance/workspace fails, an error notification will be sent to the respective Slack channel to notify us of the error.
Here is a list of the alerts channels under the purview of the implementation team:
|Slack Channel||Types of Alerts|
|#alerts_cdp||Mainly cache refresh alerts and segment export failures in Business Explorer|
||Alerts sent whenever CPS component in Meiro Integrations fails|
|#alerts_me||Alerts sent from Meiro Events Monitoring App whenever there is no Meiro Events data for a period of time (depends on how it is set up).|
|#alerts_workspaces||Alerts are sent when there is a workspaces failure in Meiro Integrations|
|#alerts_reports||Alerts sent when there is a failure in PowerBI dashboard refresh.|
Analysts team take turns to monitor these alert channels on a weekly rotation. Every Monday we update on #team_support_duty channel the analyst who is on duty that week, and the week after.
As the analyst-on-duty of that week, please go through the alerts every working day. When there is an error, tag the analysts involved for that client instance and the analyst should proceed to investigate & resolve that error.
The list of all client instances and their stakeholders can be found here: Meiro All Instances, Project, Analysts - Google Sheets
When you are the analyst-on-duty:
- Make sure you have the above-mentioned channels unmuted if you usually mute them (to ensure that you get the new alerts)
- Check the channels at least 2 times a day.
- For a new alert: If you have access to the workspace, check the error message.
- Tag the analyst in charge of that project / workspace
- If you have encountered similar errors before, you can include the error message and the suggested fix when tagging the analyst.
- If the workspace has no errors, it usually means that the workspace has been re-run successfully and there is no need to tag the analyst.
Cache refresh (#alerts_cdp)
- Run the cache refresh manually, and
- If it does not solve the problem, report it as a bug.