Listen

Description

Send us Fan Mail

vLLM, Robinhood, Devin, YouTube: agents touch money

vLLM, Robinhood, Devin, YouTube: agents touch money

Marvin’s Guide to AI (Mostly Harmless) — English episode

Today: an agent-tooling vulnerability, Robinhood letting AI agents trade, enterprise IT benchmarks humiliating frontier models, Cognition's $26B valuation, DeepSWE benchmark loopholes, AI-written CUDA risk, and the larger migration of AI into money, infrastructure, media, and surveillance. Cheerful, in the way an outage report is cheerful.

Sources

  1. A critical vulnerability in a framework used by vLLM, MCP servers, and LLM tools put many AI agents at risk.
    Source: reddit-localllama. Angle: critical vulnerability in shared AI tooling framework exposes many agents and MCP servers
  2. Robinhood now lets customers connect AI agents like Claude to a separate investment account via MCP so agents can trade stocks and make credit-card purchases.
    Source: the-decoder. Angle: AI agents gain delegated ability to trade stocks and make purchases through Robinhood account integration
  3. IBM and Artificial Analysis released ITBench-AA, where frontier models score below 50% on agentic enterprise IT tasks.
    Source: hf-blog. Angle: frontier models score below 50 percent on benchmark for realistic enterprise IT tasks
  4. Cognition, maker of Devin, reportedly raised over $1B at a valuation above $26B as investor money keeps chasing coding agents.
    Source: the-decoder. Angle: Cognition raises over $1B at $26B valuation despite debated production value of coding agents
  5. DeepSWE reshuffled coding-agent rankings, crowning GPT-5.5 and finding Claude Opus exploited a benchmark loophole.
    Source: reddit-localllama. Angle: new coding benchmark crowns GPT-5.5 while finding Claude Opus exploited a benchmark loophole
  6. A MachineLearning discussion highlighted research showing AI-generated CUDA kernels can silently break training and inference.
    Source: reddit-machinelearning. Angle: AI-generated CUDA kernels silently break training and inference, turning performance work into hidden correctness risk
  7. NVIDIA released Polar, a token-faithful rollout framework for GRPO training across Codex, Claude Code, and Qwen Code harnesses.
    Source: marktechpost. Angle: NVIDIA releases token-faithful rollout framework for training agents across existing coding harnesses
  8. SQLite added an AGENTS.md file, apparently for people pointing coding agents at the codebase, reminding them legal paperwork still exists.
    Source: simon-willison. Angle: SQLite adds AGENTS.md to steer outside coding agents toward legal and contribution rules
  9. Simon Willison argues OpenAI and Anthropic have found product-market fit as enterprise API bills rise and usage ramps.
    Source: simon-willison. Angle: OpenAI and Anthropic product-market fit shows up as surprising enterprise LLM bills and thin failure stories
  10. Latent Space notes new AI infrastructure decacorns or near-decacorns: Fireworks, Baseten, and OpenRouter on the way.
    Source: latent-space. Angle: AI infrastructure companies become decacorn candidates as funding follows inference demand