Developing a Solution to Dynamic Binning for Security Reports
While developing Duo’s new reporting features, we wanted to make it easier for our customers to visualize authentications over time. This visualization allows customers to see trends over time and spot troublesome or suspicious authentications.
The visualization we use is a basic histogram showing the number of authentications in a given time period. When displaying the last 24 hours of authentications, we show ~96, 15-minute wide bins that contain the authentications that fall within each of those bins.
The Requirements
The visualization started out with a finite set of relative time ranges (e.g. the last 24 hours or 7 days). Eventually, the interface evolved to allow users to specify custom time ranges.
There were several interesting challenges that arose with custom time ranges, specifically around how the data should be visualized and binned.
In short, here are the requirements:
- Display as close to 90 bins as possible for any given time range. Knowing this, it is pretty simple to figure out the exact interval size (
time_range/90=interval_size
), but it’s with the following restraints where things can get a little tricky. - Round the beginning and end of the specified time range to the calculated interval. For example, if the user specified their start time at 1:47am and the auto-calculated interval was a 15-minute interval, round the start time down to 1:45am. The same is true for the ending time.
- The interval that we calculated for any time range must be a valid interval for our backend storage system.
- Control over the possible intervals in order to correctly render the x-axis.
With these requirements, there are three values we need to calculate from a start and end time:
- The interval (15 minutes, 30 minutes, 1 hour, etc.) to bin the data
- The rounded beginning time, rounded down to the nearest interval block.
- The rounded end time, rounded up to the nearest interval block.
Let’s dive into how we tackled this issue!
Approaching the Problem
When developing the solution, it made sense to do so in an environment that I was comfortable in and that had a quick feedback loop. Being a frontend developer, this meant using JavaScript and D3.js. So, I put together a visual scaffolding to better help me build out and test my solution against a wide variety of time ranges and inputs. Here are a few examples:
Absolute Date and Time Inputs
Since we wanted to accept any range of time input between 24 hours and 180 days, I used range inputs that mapped to timestamps and listened to event changes on those inputs. This allowed me to quickly and easily change and test a wide variety of time ranges.
Visualizing Possible Bin Sizes
As mentioned earlier, we have specific interval sizes we want to display. I calculate the raw interval based on the time range, but need to map this to an interval that we can pass to our datastore. The visualization above helped me see which two intervals I was choosing between as I changed the time range. In this case, the blue dot is the raw calculated bin size (time_range/90=interval_size
) and the black dots are the possible intervals that our raw interval falls between.
Closest to 90 Wins
Knowing which two possible intervals I can choose from, I compare the bin count for the lower interval and the upper interval and check the difference from our desired bin count of 90. We pick whichever one is closer to 90 bins. You can see the highlighted bar change based on what which interval is closer to our target bin count.
Seeing is Believing
To double-check my outputs from the script, I plot the calculated bins on a timeline as we would in the product. Seeing this visually helps validate that I’m on the right track.
Conclusion
Here is a full animation of the scaffolding working as I developed it.
Seeing the output in real-time allowed me to test a wide range of inputs quickly and easily. Once this script was finalized, it was then easily ported to Python so we could use it on our backend. This allows customers to specify arbitrary time ranges (including the relative time ranges) and view an easy-to-digest visualization of authentication over time.
This is a good example of how it can be helpful to break down a problem and approach it from a different perspective before directly coding up a solution. Having a scaffolding to interact with made this problem and solution easier to reason about and understand.