Log drain improvements for high-performant, reliable delivery to external destinations

Aptible Deploy comes with built-in support for easily aggregating your container, SSH session and HTTP(S) endpoint logs and routing them to your destinations of choice for record-keeping and future analysis, be it in popular external destinations like Datadog, SumoLogic and PaperTrail, or to a self-hosted Elasticsearch database.

Since 2014, Aptible log drains have been used by customers to send hundreds of millions of log lines to various destinations. While the majority of our customers were able to aggregate their logs without hiccup, we also heard a few of them experience issues when the volume of logs being generated were extremely high. These issues ranged from inconvenient delays in receiving logs in their destinations to packet losses during periods of high throughput.

So we decided to fix this by engineering and releasing a new version of Aptible log drains.

What customers can expect with this new version of log drains

The log drains of all Aptible accounts have been updated to the latest version, requiring no additional setup from customers. Customers can expect the following from the latest version.

Improved performance
With this update, users can see a noticeable improvement in the reliability and speed of their log drains. Customers may experience minimal to no lag when generating and sending their logs, even at very high volumes due to the work we put in to increase throughput in the new version of our drains.

Better internal observability for faster remediation
Using a combination of FluentD data, and visualizing and graphing this data into metrics of importance in Grafana, we’ve been able to set up alerts to monitor for issues based on the the the number of logs waiting to be sent , the number of times customer drains retry sending logs, failed output writes to different destinations, and others. We believe these metrics allow our reliability engineers to quickly identify root-causes, be it on Aptible’s side or the customer's side as issues arise, and remediate them more efficiently.

Over time, we’ll evolve these metrics as we learn how our newest version of log drains performs in a wider variety of real world scenarios. Depending on how well these metrics perform, we may also choose to expose them to customers to enable more proactive, self-service remediation of log drain issues.