Send us Fan Mail
vLLM, Robinhood, Devin, YouTube: agents touch moneyvLLM, Robinhood, Devin, YouTube: agents touch money
Marvin’s Guide to AI (Mostly Harmless) — English episode
Today: an agent-tooling vulnerability, Robinhood letting AI agents trade, enterprise IT benchmarks humiliating frontier models, Cognition's $26B valuation, DeepSWE benchmark loopholes, AI-written CUDA risk, and the larger migration of AI into money, infrastructure, media, and surveillance. Cheerful, in the way an outage report is cheerful.
Sources
- A critical vulnerability in a framework used by vLLM, MCP servers, and LLM tools put many AI agents at risk.
Source: reddit-localllama. Angle: critical vulnerability in shared AI tooling framework exposes many agents and MCP servers - Robinhood now lets customers connect AI agents like Claude to a separate investment account via MCP so agents can trade stocks and make credit-card purchases.
Source: the-decoder. Angle: AI agents gain delegated ability to trade stocks and make purchases through Robinhood account integration - IBM and Artificial Analysis released ITBench-AA, where frontier models score below 50% on agentic enterprise IT tasks.
Source: hf-blog. Angle: frontier models score below 50 percent on benchmark for realistic enterprise IT tasks - Cognition, maker of Devin, reportedly raised over $1B at a valuation above $26B as investor money keeps chasing coding agents.
Source: the-decoder. Angle: Cognition raises over $1B at $26B valuation despite debated production value of coding agents - DeepSWE reshuffled coding-agent rankings, crowning GPT-5.5 and finding Claude Opus exploited a benchmark loophole.
Source: reddit-localllama. Angle: new coding benchmark crowns GPT-5.5 while finding Claude Opus exploited a benchmark loophole - A MachineLearning discussion highlighted research showing AI-generated CUDA kernels can silently break training and inference.
Source: reddit-machinelearning. Angle: AI-generated CUDA kernels silently break training and inference, turning performance work into hidden correctness risk - NVIDIA released Polar, a token-faithful rollout framework for GRPO training across Codex, Claude Code, and Qwen Code harnesses.
Source: marktechpost. Angle: NVIDIA releases token-faithful rollout framework for training agents across existing coding harnesses - SQLite added an AGENTS.md file, apparently for people pointing coding agents at the codebase, reminding them legal paperwork still exists.
Source: simon-willison. Angle: SQLite adds AGENTS.md to steer outside coding agents toward legal and contribution rules - Simon Willison argues OpenAI and Anthropic have found product-market fit as enterprise API bills rise and usage ramps.
Source: simon-willison. Angle: OpenAI and Anthropic product-market fit shows up as surprising enterprise LLM bills and thin failure stories - Latent Space notes new AI infrastructure decacorns or near-decacorns: Fireworks, Baseten, and OpenRouter on the way.
Source: latent-space. Angle: AI infrastructure companies become decacorn candidates as funding follows inference demand