Those familiar with the cybersecurity space are well acquainted with the information security triad of confidentiality, integrity, and availability (the “CIA triad”). Confidentiality and integrity get all the attention and glory, and it’s pretty clear how Huntress Managed Endpoint Detection and Response (EDR) protects the confidentiality of data and overall system integrity of our customers’ systems, from workstations to servers.
But it's important to note that availability also matters.
What can sometimes be overlooked is the "availability" aspect of the triad. We know that keeping your systems online is critical to your business and your end users. If our EDR agent is unable to maintain its availability, it’s not able to achieve its overall security objectives. Thus, for Huntress to protect our customers’ availability, we need to continually invest in our own EDR agent’s availability and stability.
Achieving this goal at the scale of millions of endpoints generating billions of data points per day, all while maintaining cost efficiency, is no small feat. While we’ve long had a variety of observability insights at our disposal using commonly used commercial tools, we knew we needed to invest more deeply in an internal observability tool that enables granularity and overall cardinality that isn’t possible using third-party tools or services.
To achieve the breadth and depth required for our detailed observability and to fully assess our EDR agent’s health and stability at scale, we decided to use ClickHouse. In the past, we’ve talked about how we used ClickHouse to build our new Managed Security Information and Event Management (SIEM) product. Overall, it’s great at ingesting large amounts of data and providing blazing-fast querying.
It was pretty clear that ClickHouse would be a great fit for our internal observability needs as well and enable both a global-level observability for our agent’s overall health as well as provide granular and detailed insights down to the individual endpoint.
Screenshot of our Grafana dashboard for monitoring our internal service level indicator for EDR data collection on both Windows and macOS
As with any analytics or observability system, we’re counting, averaging, or otherwise statistically aggregating measurements taken from how our agent interacts with the endpoint itself and with our backend API. ClickHouse’s AggregatingMergeTree is an excellent fit for this use case. These merge tables let you define aggregate function columns that maintain an aggregation state in the background. ClickHouse provides all the standard aggregate functions you'd expect in any database management system, plus a whole set of more advanced functions.
Using a combination of AggregatingMergeTree tables and Materialized Views, we’re able to aggregate observability data at an hourly and daily interval with ease. To aid with analysis and with spotting trends, we’re also using ClickHouse’s Map structure to allow for arbitrary tagging of observability data. Usually, we’ll tag our observations with versions, the endpoint’s operating system and architecture information, and other configuration information that might impact overall performance and stability. By sorting this Map structure before storing it, we’re able to group by various observational tags and aggregate by various agent and endpoint characteristics. We also see amazing levels of compression on the sorted tag data, and ClickHouse can automatically take terabytes of data and compress it down to a couple of dozen gigabytes.
Screenshot of a Grafana dashboard that shows percentile breakdowns on how long tasking for investigations and remediations takes to complete for our EDR Agent
Utilizing our rich internal observability data, we’re looking to surface new insights directly to our customers with a fully redesigned EDR dashboard coming later this year. We’re also looking to prioritize certain environmental issues, like low disk space or networking issues, that may affect our agent’s availability and, thus our customers’ overall security.
Huntress is committed to providing a best-in-class EDR product, and we’re constantly investing our internal and external technology in order to achieve the level of security our customers need.
Get insider access to Huntress tradecraft, killer events, and the freshest blog updates.