LLM-Driven Risk Scoring for Next-Gen Identity Security

LLM Risk Scoring

Identity security is usually strongest when it understands behavior, not just credentials. That is the idea behind Okta’s experiment with LLM-based risk scoring: use the sequence of identity events itself as the signal, instead of forcing everything into a rigid feature table first. The article presents this as an exploratory approach for adaptive scoring, built around structured security logs and anomaly detection.

Traditional ML Scoring

A login is never just one event. It is a chain of events, each adding context about where the request came from, what device was used, which network was involved, and how the session unfolded. In Okta’s current approach, those identity and authentication events are turned into structured records and fed into a machine learning pipeline that powers risk scoring in products such as Identity Threat Protection.

The weakness of that approach is not that it is wrong. It is that it depends heavily on manual feature engineering. The raw event stream gets compressed into windows, aggregates, and hand-built signals before the model ever sees it. That works, but it also limits how much of the sequence and context survive the transformation.

Why LLM?

LLMs are a natural fit for security logs because logs already look like text sequences with structure. Instead of judging one event in isolation, a language model can read a user’s recent authentication story and learn what typically belongs together. It can pick up recurring patterns, weird combinations, and deviations from the normal rhythm of that identity.

That gives the model three practical advantages. It can reason over full sequences instead of snapshots. It reduces the amount of feature engineering teams need to do by hand. And it generalizes better to unfamiliar combinations, because it can flag a surprising sequence even when no single feature was explicitly designed for that exact pattern.

Fine-Tuning LLM

The experimental setup in the article is straightforward in concept, even if the engineering is not trivial. First, Okta builds a historical profile from past events for a given identity. Then it conditions a GPT-style model on that profile. Finally, it trains the model to predict the next event and treats surprise as part of the risk signal.

A key detail is that the raw log stream is cleaned before training. Noisy values such as random IDs, hashes, and nonces are replaced with stable placeholders so the model focuses on meaningful behavior instead of junk tokens. The resulting profile captures the identity’s usual countries, ASNs, user agents, and access patterns in a text format the model can read directly.

The model is then trained on the full narrative, not just the target event. Okta describes this as contextual prompting, where the historical sequence and the event being scored are combined into one input. The article also notes that the base model is frozen and small adapter layers are trained instead, using an in-house method called TLoRA.

The training objective is next-token prediction. In simple terms, the model learns to assign high probability to the event tokens that make sense given the user’s history, and lower probability to tokens that break the pattern. Over time, that becomes a learned notion of what “normal next behavior” looks like for that identity.

How the Risk Score Is Derived

A plain average of token loss is not enough, because security logs contain a lot of boilerplate. The article’s answer is to focus on the most surprising tokens and calculate a Peak Perplexity score from those highest-loss parts of the target event. That makes the score concentrate on the meaningful fields, such as country, ISP, and device, rather than generic log structure.

The result is easy to interpret. Low perplexity means the event looks consistent with the identity’s history. High perplexity means the event looks unusual or suspicious relative to what the model has learned about that identity. Because the score is conditioned on the user’s own profile, it adapts as behavior changes.

Conceptual Example

The article illustrates the idea with a simple behavioral sequence. The model first reads a short history, such as session start, MFA verification, and internal authentication, all from a consistent context. Then it evaluates a target event that suddenly shifts to a different country, OS, and device. That mismatch produces a sharp rise in loss and therefore a high risk score.

The point is not that every unusual login is malicious. The point is that the model can spot contradictions that are hard to capture with hand-written rules alone. A user who normally authenticates from one region on one device pattern does not become suspicious because of a single field. They become suspicious because the entire sequence stops making sense.

Looking Ahead

Okta presents this as an experimental path, not a finished replacement for existing risk systems. The broader direction is clear, though: move away from purely engineered feature vectors, use a model conditioned on identity history, and turn probabilistic surprise into an additional scoring layer.

The next step is calibration. The model needs better profile building, better mapping from perplexity to real-world risk, and possibly a way to explain why a particular event looked strange in plain language. That is the real value here: not hype, but a more adaptable signal that can sit beside existing detection pipelines.

Conclusion

The article’s core message is simple. Identity risk scoring gets stronger when it stops treating every event as an isolated record and starts treating the full sequence as behavior. LLMs make that possible because they are built to understand structured text over time. The approach is still experimental, but it points toward a more flexible and context-aware way to score identity risk.

SoftwareBunch