Home

Tag

Infrastructure

20 entries tagged Infrastructure · 1 scroll · 6 wires · 13 terms.

Scrolls

Apr 17, 2026

The Economics of Tokens

Every word you speak to an AI costs something. Learn the currency before the bill comes due.

Wisdom Wire

Fri, May 8, 2026

OpenAI Ships GPT-Realtime-2 Voice API as AI Moves Into Apps

OpenAI launched GPT-Realtime-2 plus voice translation and transcription. Same week, AI moved inside Excel, Word, and Acrobat. The tools are reshaping us.

Thu, May 7, 2026

ChatGPT Goes Ad-Supported: The Free Ride Has a New Price Tag

OpenAI took ChatGPT ads to five new countries and opened a self-serve manager with CPC bidding. The free AI tool is now an ad-supported consumer product.

Wed, May 6, 2026

OpenAI Makes GPT-5.5 Instant the Default for Every ChatGPT User

OpenAI made GPT-5.5 Instant the default for every ChatGPT user, claiming a 52.5% drop in hallucinations. The default still earns no automatic trust.

Fri, May 1, 2026

Anthropic's $900B Pre-IPO Round: The Lab Leapfrogs OpenAI

Anthropic is closing a $50B round at a ~$900B valuation in two weeks, leapfrogging OpenAI's $852B. The frontier labs are now pricing each other.

Thu, Apr 23, 2026

Google Cloud Next 2026: TPU 8t/8i, Gemini Enterprise Agent Platform, and a $750M Partner Fund

Google unveils eighth-gen TPU 8t/8i chips and the Gemini Enterprise Agent Platform at Cloud Next 2026, backed by a $750M partner fund. OpenAI, Tencent, Anthropic, and NEC round out a week of consolidation moves.

Tue, Apr 21, 2026

The Borrower and the Lender

Amazon put up to $25B more into Anthropic. Anthropic pledged $100B back to AWS over a decade. The frontier lab is now a tenant of its largest investor.

Dictionary

Agent Registry

A centralized directory that catalogs every AI agent operating within a system, assigns each one a verifiable identity, and tracks what each agent is authorized to do.

Batch Inference

A processing mode where you submit a batch of requests and the provider returns results over minutes or hours instead of seconds. In exchange, you pay fifty percent less. Anthropic and OpenAI both offer it. For non-urgent work, it cuts your bill in half.

Compute Commitment

A multi-year contractual promise by an AI company to spend a specified amount on a single cloud provider's compute capacity, usually tied to an investment, priority access, or dedicated hardware.

Default Model

The model that a chat product silently serves to users who have not specified one, rotating beneath them without notice as providers update their infrastructure.

Embedded AI

The practice of AI capabilities moving into the response surface of existing productivity tools rather than living as a standalone application the user switches to.

Input and Output Tokens

Input tokens are what you send to the model. Output tokens are what the model writes back. Output is almost always priced higher than input because it requires the model to actually generate rather than just read.

Managed Agent

An AI agent hosted by a platform (most commonly Anthropic's infrastructure) rather than running inside a personal terminal session. A Managed Agent persists beyond the operator's session, exposes a stable invocation surface (an API call or a button in an internal app), and reaches external systems through a governed tool gateway.

Non-Human Identity

A credential, token, API key, service account, or OAuth grant issued to a software system, automation, or AI agent rather than to a human user, used to authenticate and authorize machine-to-machine actions inside enterprise environments.

Prompt Caching

A feature that stores a chunk of your prompt on the provider's side so repeated calls read from the cache instead of re-processing. Claude caches are one-tenth the price of fresh input tokens. Hitting the cache is the single biggest cost lever in production AI.

Supercluster

A supercluster is a single, centrally managed collection of tens of thousands of AI-training GPUs operating as one coordinated computing system.

Token Pricing

The per-million-token rate a provider charges for input and output. Expressed as dollars per million tokens. Claude Opus output is $75 per million tokens. Haiku is $5. The gap is fifteen times, which is what pricing tiers exist to navigate.

Tokens

The basic unit an AI model reads and writes. Roughly 3-4 characters of English per token, so 750 words equals about 1,000 tokens. Every API call is priced by tokens in and tokens out.

Wafer-Scale Engine

A processor built from an entire silicon wafer rather than smaller chips diced from one, integrating dramatically more on-chip memory and compute cores to eliminate the chip-to-chip communication bottleneck that limits conventional GPU clusters at inference time.