Ulam.ai: The Geometry of Intelligence
We’re building an AI research lab focused on inference economics—methods that make modern language models cheaper, faster, and more reliable in the real world.
The bottleneck shifted
The last few years of AI progress were driven by scaling: bigger models, bigger datasets, bigger clusters. That era isn’t over—but the bottleneck for most real deployments has shifted. For teams shipping LLMs, the dominant constraints aren’t only “can we train it?” but: can we serve it?
Inference is where the world meets the model: memory bandwidth, latency SLOs, cost per token, reliability under load, and the uncomfortable reality that “slightly worse quality” can become “product unusable” when it breaks key behaviors.
This is why ulam.ai exists. We are an AI research lab building methods that move the cost ↔ quality frontier in deployment—not in theory, not in toy settings, but under the constraints that actually decide adoption.
Our thesis: geometry is a practical lever
We take a geometric view of neural networks: representations live in structured spaces, and those spaces have symmetries, invariances, and “good coordinate systems.” If you choose the right coordinates, you can keep the important information while throwing away more of the rest.
That sounds abstract—until you connect it to the most practical question in deployment: what can we compress, and what must we preserve?
Our guiding idea is simple: better structure → better compression → cheaper inference. Geometry is how we search for that structure without relying on guesswork or brittle tricks.
What we’re building (and shipping)
ulam.ai runs three tightly connected research programs. Each one is designed to produce results that can be measured, reproduced, and deployed.
1) Better coordinates for quantization & compression
Quantization is not just “use fewer bits.” It’s a negotiation with the geometry of representations. Done well, it reduces memory and cost while preserving behavior. Done poorly, it breaks the exact capabilities you care about.
Our focus is geometry-aware quantization and compression that targets real inference pain: KV cache pressure, bandwidth limits, and stability under load. We pair theory with engineering—clean baselines, reproducible harnesses, and integrations that match how models are actually served.
2) New machine learning architectures from structure
Transformers are an incredible default. But defaults become cages when a field matures. We explore architecture alternatives guided by invariances and representation geometry, aiming for designs that improve efficiency, reliability, and controllability.
We care about architectures that make deployment better—not just leaderboard better. If a design can’t survive the constraints of inference, it’s not finished research.
3) Benchmarks that match economic reality
What you measure shapes what you build. Many benchmarks reward headline capability while ignoring the constraints that dominate production: latency, memory, cost, robustness, and iteration speed.
We build economic benchmarks—evaluation setups that test models the way real teams operate: under budgets, SLAs, and feedback loops. This includes benchmarks for autonomy and tool use, but designed to be measurable, not mystical.
How we work: “publish + ship”
ulam.ai is a research lab, but our output isn’t only papers. We run a tight loop: theory → algorithms → benchmarks → deployment.
Concretely, that means:
- Reproducible experiments (clear baselines, transparent protocols)
- Reference implementations (tools others can run, not just read about)
- Partner pilots (apply methods on real workloads with real constraints)
- Open releases when possible (benchmarks, code, and best practices)
We want our work to be legible: you should be able to point to a plot, a benchmark report, or a deployment outcome and say, “that moved the frontier.”
What we mean by “inference economics”
“Cheaper inference” can sound like a narrow goal. It isn’t. Inference economics is where capability becomes accessible:
- Lower cost means more users, more usage, and more product surface area.
- Lower latency changes what applications are possible in real-time settings.
- Lower memory pressure enables larger contexts, better tools, and more complex agents.
- Higher reliability under load turns demos into infrastructure.
The best model isn’t the one that wins a benchmark in isolation. It’s the one that can be deployed, scaled, and trusted—at a cost that makes sense.
Why now
We believe the next advantage in AI won’t come only from more parameters—it will come from better methods: compression that preserves behavior, architectures that fit constraints, and benchmarks that measure what matters.
This is a moment where small, focused teams can create outsized leverage: turning deep ideas into tools that shift the economics of deployment. That’s the lab we’re building.
Collaborate with ulam.ai
We collaborate with:
- Model teams and research groups who want co-authored work and rigorous evaluation
- Enterprises deploying LLMs who need lower inference cost and better reliability
- Infrastructure teams working on serving, compilers, kernels, and tooling
If you’re running real inference workloads and your constraints are clear—latency, cost, memory, reliability—we’d like to talk. We’ll propose a short, measurable pilot plan.
