Predictable APIs from Stochastic AI
At their core, Large Language Models are probabilistic completion engines. If you ask an LLM to extract data from an email, it will generate freeform text that is impossible to safely parse with deterministic code. To build robust production systems, we must enforce strict schemas at the token-generation level.
Defining Schemas with Pydantic
In python-based LLM frameworks, we define target structures using Pydantic models. This acts as both the type validation layer and the schema metadata provider passed directly to the model API:
from pydantic import BaseModel, Field
from typing import List
class ContactInfo(BaseModel):
name: str = Field(description="The full name of the user")
email: str = Field(description="The primary email address")
tags: List[str] = Field(default=[], description="Category tags")
# The schema is compiled and sent to the LLM client call
# client.beta.chat.completions.parse(..., response_format=ContactInfo)
Under the Hood: Logits Masking
When the model runs, it generates a list of raw scores (logits) for its entire vocabulary (usually ~32k to 128k possible tokens). Under normal decoding, it selects from the top scores. Under grammar-guided decoding:
- The inference engine tracks JSON syntax state (e.g. "Currently inside an open bracket, expecting a quote or space").
- It masks out all invalid tokens by setting their logits to negative infinity.
- This guarantees that the model *cannot* generate syntactically invalid JSON, preventing parse crashes.