Changelog: customize webhook method, headers and query params

We just released some tweaks to our already quite awesome webhook alert channels. You can now set the method, add headers and query params.

Read the full the change log at: https://blog.checklyhq.com/changelog-customize-your-webhook-with-method-etc/ 👈

Changelog: Screenshots in GitHub PR's + design & stability updates

We just pushed a new feature and some bug fixes around our GitHub deployments integration.

  1. You can now add screenshots to the GitHub PR comment.
  2. The GitHub PR comment is now optimized for show the results of all checks in a group

Read the full changelog here:

https://blog.checklyhq.com/changelog-screenshots-in-github-prs/

Changelog: Puppeteer 2.0 & Node.js 10

We just updated our Puppeteer check runners to Puppeteer 2.0 and NodeJS 10! Check our blog post for some of the changes in this Puppeteer release.

https://blog.checklyhq.com/changelog-puppeteer-v2-0/

New blog post: Using the Checkly Prometheus integration

Last week we published a brand new blog post on getting the most out of Checkly's Prometheus integration.

In this post, our friend John Arundel dives deep into Prometheus and Grafana and teaches you how to…

  • Slice & dice Checkly metrics
  • Alert on SLA performance
  • Set up tripwire dashboards

Find the full post at: https://blog.checklyhq.com/monitoring-website-performance-with-checkly-prometheus-grafana/

[retro active] Scheduling outage 18:45 - 19:45 CET

Checkly did not run any scheduled API and/or browser checks between 18:44 and 19:42 CET.

Other features like the web application, adhoc checks and checks triggered by deployment or the API were not affected.

Root cause

The exact cause of the outage is as of yet unknown. Two of our background daemons running on a 3rd party cloud infrastructure were effectively stopped for — as of yet — unknown reasons. We've escalated to the 3rd party.

Triage

Our monitoring alerted this outage almost exactly at the time the issue occurred. The reason it took almost an hour to resolve was due to the on call not having his laptop at hand.

Once logged in and online, the issue was found and resolved within 10 minutes. In this case a simple restarted brought everything back to normal.

Lessons learned

  • This was partly avoidable: on call should always have a laptop at hand or very close by.
  • Until we have an analysis of our 3rd party provider we are not sure how we can prohibit this from happening in the future.
  • Our monitoring was on point and alerted correctly and quickly.

Lastly…

This incident was of course very serious. I hope that by being transparent and honest about the ups& downs of the Checkly services we can continue to build trust and make Checkly better every day.

Tim Nolet Founder

Small update: Customise channel names for the Slack integration

When adding a Slack alert channel, you can now customise the channel you want to see the alert in!

This means you can use a single Incoming WebHook URL from Slack for different checks & channels.

Screenshot 2020-02-26 at 16.17.00.png

Small tweak: select a region when running adhoc checks

You can now select the region from which we run you "ad hoc" checks when building and debugging your checks. This helps when your corporate firewall blocks our default Frankfurt (eu-central-1) region, or you want to check the latency from a specific region.

API_check___request.png

Head over to the API check editor or browser check editor and click the small flag icon in the button. Enjoy!

Webhook, public API and basic auth input

We just shipped three bug fixes for the following issues:

  1. Our Webhook alerter was not sending the correct ALERT_SSL value in the ALERT_TYPE variable when an SSL expiry alert was triggered. This is now fixed and a test is added to our code base.

  2. The public API was not paginating the /v1/checks endpoint correctly. Not enough results were returned. Patched that one too and we monitor this now with Checkly!

  3. Lastly, when configuring Basic Auth credentials in the UI, the credentials would not get stored correctly due to a UI bug. That is working again.

Thanks to all that reported these bugs!

Changelog: Alerting, webhook and Prometheus

We just shipped some updates:

  • We now alert on failing setup scripts that prohibit a check from running.
  • We expose the "recovery from degraded" alert type on our web hooks.
  • We introduced a dedicated "degraded status" gauge in our Prometheus integration.

Read all about the details in this blog post

Fixes & small tweaks for Github deployment triggers

We just fixed some bugs and usability/reliability issues with Github deployment triggers.

  1. Environment URLS containing dashes "-" were sometimes not parsed and replaced correctly in API check URL's.

  2. Our queue worker behaved unreliably during maintenance. We moved our queue to a more reliable solution.

  3. We fixed some layout issues in the Github "check" markdown you see in your PR's and commits. We now also report the environment URL used in the Github check.