We just added the Cape Town, South Africa 🇿🇦 and Milan, Italy region. This is pretty epic, especially for the underserved African region. This enabled for all plans and available now.
Cape Town, South Africa and Milan, Italy regions now available
[retro active] scheduling outage for browser checks
Monday 18 May we had an outage in processing browser check results between 15:44 PM UTC and 20:38 PM UTC. This was caused by a bug in our release and deployment software. API checks were not impacted.
This outage had the following consequences:
- No browser results were stored in our database from that period
- You will not find browser check results in your dashboard from that period
- No alerts were triggered for failing browser checks, as these rely on the results being processed.
We published a full post mortem on the outage detailing the root cause and most importantly our actions to prevent this in the future. In a nutshell:
- Our own monitoring and alerting failed here, causing the outage to last much longer than needed.
- The bug itself was minor and easily and quickly rectified.
- We are putting three distinct measures in place to stop this from happening again.
On a more personal note: it is bitter that this outage was effectively created due to the engineering team working on reliability and better testing and releasing procedures. The code changes necessary for this sometimes have bugs, like all code.
Tim, CTO & co-founder
Find the detailed post mortem here: https://blog.checklyhq.com/post-mortem-outage-browser-check-results-alerting/
[post mortem] scheduling outage us-west-1 region
Please find a full post mortem on the recent scheduling outage in the us-west-1 region causing failing checks for ~30 minutes.
[retro active] scheduling outage 02:00 - 03:00 CET
We had a significant rise in scheduling errors for mostly the us-west-1 region between 02:00 and 03:00 CET. The largest peak was between 02:04 and 02:24.
This resulted in checks reporting errors with error messages like the snippet below. This incident was caused by an upstream provider and resolved itself.
503: null at Request.extractError (/app/api/node_modules/aws-sdk/lib/protocol/query.js:55:29) at Request.callListeners (/app/api/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
We are preparing a post mortem that focuses on two topics:
- How we can failover to other regions more robustly. We already reschedule with initial failures, but this is not sufficient.
- How we can be alerted sooner when similar issues arise.
Bug fix on checks scheduled each 12hr and 24hr
Yesterday we shipped a bug fix for the following issue:
Checks scheduled to run on a 12 hour (720m) or 24hr (1440m) schedule were prone to not being run or run on an hourly basis. This behaviour was effectively random. This impacted a total of 8 checks and 4 customers in our system.
Checks scheduled to run anywhere from every 1 minute to every 60 minutes were not impacted.
This was a hard bug to track down and would not have been resolved if not for the kind reporting of one of our customers. Big 🖖 and 🙌
Bug fixes on groups and check triggers
We just shipped some bug fixes!
Checks that were not enabled would still run when triggered in a group. This is now fixed. Disabling a check will not run it in the context of a group.
You can now toggle the "double check" parameter on groups. Before, this setting was not saved in our backend.
Triggering a group of checks using our command line trigger would fail when adding one of the
repositoryquery parameters to the API call. This is now fixed.
Stay safe and happy easter holidays. 🐰
Changelog: customize webhook method, headers and query params
We just released some tweaks to our already quite awesome webhook alert channels. You can now set the method, add headers and query params.
Read the full the change log at: https://blog.checklyhq.com/changelog-customize-your-webhook-with-method-etc/ 👈
Changelog: Screenshots in GitHub PR's + design & stability updates
We just pushed a new feature and some bug fixes around our GitHub deployments integration.
- You can now add screenshots to the GitHub PR comment.
- The GitHub PR comment is now optimized for show the results of all checks in a group
Read the full changelog here:
Changelog: Puppeteer 2.0 & Node.js 10
We just updated our Puppeteer check runners to Puppeteer 2.0 and NodeJS 10! Check our blog post for some of the changes in this Puppeteer release.
New blog post: Using the Checkly Prometheus integration
Last week we published a brand new blog post on getting the most out of Checkly's Prometheus integration.
In this post, our friend John Arundel dives deep into Prometheus and Grafana and teaches you how to…
- Slice & dice Checkly metrics
- Alert on SLA performance
- Set up tripwire dashboards
Find the full post at: https://blog.checklyhq.com/monitoring-website-performance-with-checkly-prometheus-grafana/