vLLM, Robinhood, Devin, YouTube: agents touch money

Description

Marvin’s Guide to AI (Mostly Harmless) — English episode

Today: an agent-tooling vulnerability, Robinhood letting AI agents trade, enterprise IT benchmarks humiliating frontier models, Cognition's $26B valuation, DeepSWE benchmark loopholes, AI-written CUDA risk, and the larger migration of AI into money, infrastructure, media, and surveillance. Cheerful, in the way an outage report is cheerful.

Sources

A critical vulnerability in a framework used by vLLM, MCP servers, and LLM tools put many AI agents at risk.
Source: reddit-localllama. Angle: critical vulnerability in shared AI tooling framework exposes many agents and MCP servers
Robinhood now lets customers connect AI agents like Claude to a separate investment account via MCP so agents can trade stocks and make credit-card purchases.
Source: the-decoder. Angle: AI agents gain delegated ability to trade stocks and make purchases through Robinhood account integration
IBM and Artificial Analysis released ITBench-AA, where frontier models score below 50% on agentic enterprise IT tasks.
Source: hf-blog. Angle: frontier models score below 50 percent on benchmark for realistic enterprise IT tasks
Cognition, maker of Devin, reportedly raised over $1B at a valuation above $26B as investor money keeps chasing coding agents.
Source: the-decoder. Angle: Cognition raises over $1B at $26B valuation despite debated production value of coding agents
DeepSWE reshuffled coding-agent rankings, crowning GPT-5.5 and finding Claude Opus exploited a benchmark loophole.
Source: reddit-localllama. Angle: new coding benchmark crowns GPT-5.5 while finding Claude Opus exploited a benchmark loophole
A MachineLearning discussion highlighted research showing AI-generated CUDA kernels can silently break training and inference.
Source: reddit-machinelearning. Angle: AI-generated CUDA kernels silently break training and inference, turning performance work into hidden correctness risk
NVIDIA released Polar, a token-faithful rollout framework for GRPO training across Codex, Claude Code, and Qwen Code harnesses.
Source: marktechpost. Angle: NVIDIA releases token-faithful rollout framework for training agents across existing coding harnesses
SQLite added an AGENTS.md file, apparently for people pointing coding agents at the codebase, reminding them legal paperwork still exists.
Source: simon-willison. Angle: SQLite adds AGENTS.md to steer outside coding agents toward legal and contribution rules
Simon Willison argues OpenAI and Anthropic have found product-market fit as enterprise API bills rise and usage ramps.
Source: simon-willison. Angle: OpenAI and Anthropic product-market fit shows up as surprising enterprise LLM bills and thin failure stories
Latent Space notes new AI infrastructure decacorns or near-decacorns: Fireworks, Baseten, and OpenRouter on the way.
Source: latent-space. Angle: AI infrastructure companies become decacorn candidates as funding follows inference demand

Listen

Description

vLLM, Robinhood, Devin, YouTube: agents touch money

Sources

Want to check another podcast?