Skip to main content

Alarms

We have a couple of sources that notify us if something’s wrong on Pixwel.

AWS CloudWatch

This is our first line of defence. If you are on the appropriate mailing list, you might receive a message like:
You are receiving this email because your Amazon CloudWatch Alarm "Production admin unhealthy host" in the US - N. Virginia region has entered the ALARM state, because "Threshold Crossed: 1 datapoint (1.0) was greater than or equal to the threshold (1.0)." at "Wednesday 09 March, 2016 21:01:29 UTC".
This means that, 1 node was intermittently unhealthy within the parameters defined on the ELB (load balancer)‘s health check. This means that it was not responding to the health check on HTTPS:443/api/system-check in a timely fashion at least twice in a row in 10 seconds. The system-check endpoint verifies things like “can it talk to the database? Is PHP happy?” This check is run by the load balancer. So, if you get this alert, it is not causing a problem as such - it’s letting us know “hey, one node is not responding in the system check, this might be a problem” and during this period it will not route traffic to this node. Usually, this gives the node long enough to recover, then it’s allowed back in the pool. This may be for a number of reasons. If this alarm trips
  1. Don’t panic, but don’t ignore it either
  2. Browse to https://platform.pixwel.com/ and assert that it’s up
  3. Log into AWS and check the health of the nodes in the production cluster
  4. Check the logs

New Relic

New Relic also checks the same endpoint as CloudWatch. but checks the whole cluster - so it’s the New Relic ones that should make our blood run cold. If New Relic reports that there’s a problem, then that means the application is likely to not be responding. If this alarm trips
  1. Follow the same steps as above
  2. Wake up/alert your colleagues
  3. If the cluster is down, you may wish to redeploy from Jenkins whilst you’re checking the logs

Back to docs index | Next page in recommended reading order >>