MidnightTokensdeveloper portal
Sign In
Unit Study Document

Enforcing Structured JSON Output

6 min readβ€’Visual explainer included

Predictable APIs from Stochastic AI

At their core, Large Language Models are probabilistic completion engines. If you ask an LLM to extract data from an email, it will generate freeform text that is impossible to safely parse with deterministic code. To build robust production systems, we must enforce strict schemas at the token-generation level.

Grammar-Guided Decoding: Rather than using regex to parse raw text after generation, modern inference engines (like llama.cpp or vLLM) use context-free grammars to filter logits. At every token step, tokens that violate the target JSON schema are mathematically assigned a probability of zero.

Defining Schemas with Pydantic

In python-based LLM frameworks, we define target structures using Pydantic models. This acts as both the type validation layer and the schema metadata provider passed directly to the model API:

from pydantic import BaseModel, Field
from typing import List

class ContactInfo(BaseModel):
 name: str = Field(description="The full name of the user")
 email: str = Field(description="The primary email address")
 tags: List[str] = Field(default=[], description="Category tags")

# The schema is compiled and sent to the LLM client call
# client.beta.chat.completions.parse(..., response_format=ContactInfo)

Under the Hood: Logits Masking

When the model runs, it generates a list of raw scores (logits) for its entire vocabulary (usually ~32k to 128k possible tokens). Under normal decoding, it selects from the top scores. Under grammar-guided decoding:

  • The inference engine tracks JSON syntax state (e.g. "Currently inside an open bracket, expecting a quote or space").
  • It masks out all invalid tokens by setting their logits to negative infinity.
  • This guarantees that the model *cannot* generate syntactically invalid JSON, preventing parse crashes.
Fast Drill

Active Recalls

Card 1 of 1
Question

How do modern APIs guarantee valid JSON output?

Tap card to flip
Answer

Grammar-guided decoding constrains the tokens generated to match the target schema.

Mastery: 0%
Knowledge Check

Quiz Practice

Question 1 of 1
What is the primary benefit of grammar-guided decoding?

Chapter Scratchpad

Auto-saves immediately

Active Recall Cards

Review core concepts before doing the quiz

Fast Drill

Active Recalls

Card 1 of 1
Question

How do modern APIs guarantee valid JSON output?

Tap card to flip
Answer

Grammar-guided decoding constrains the tokens generated to match the target schema.

Mastery: 0%

AI Study Buddy

Always online

Hi! I'm Spooky, your study buddy! Let's learn together.