kvaranasi_ 3 hours ago
I built a tool that flags out anomalies. The rarest of the rarest logs by clustering them. This is how it works:
1. connects to existing Loki/New Relic/Datadog, etc - pulls logs from there every few minutes
2. Applies Drain3(https://github.com/logpai/Drain3) - A template miner to retract PIIs. Also, "user 1234 crashed" and "user 5678 crashed" are the same log pattern but different logs.
3. Applies IsolationForest(https://scikit-learn.org/stable/modules/generated/sklearn.en...) - to detect anomalies. It extracts features like when it happened, how many of the logs are errors/warn. What is the log volume and error rate. Then it splits them into trees(forests). The earlier the split, the farther the anomaly. And scores these anomalies.
4. Generate a snapshot of the log clusters formed. Red dots describe the most anomalous log patterns. Clicking on it gives a few samples from that cluster.
Use cases: You can answer questions like "Have we seen this log before?". We stream a compact snapshot of the clusters formed to an endpoint of your choice. Your developer can write a cheap LLM pass to check if it needs to wake a developer at 3 a.m for this? Or just store them in Slack.