Listen

Description

Builders can scale ML from simple API calls to full MLOps pipelines using SST on AWS, utilizing Aurora pgvector for search and Spot instances for 90 percent cost savings. External platforms like Modal or GCP Cloud Run provide superior serverless GPU options for real-time inference when AWS native limits are reached.

Links

Core Infrastructure

SST uses Pulumi to bridge high-level web components (API, Database) with low-level AWS resources (SageMaker, GPU clusters). The framework enables infrastructure-as-code in TypeScript, allowing developers to manage entire ML lifecycles within a single configuration.

Level 1-2: Foundational Models and Edge Inference

Level 3-4: Cost-Effective CPU and Batch Processing

Level 5: Real-Time GPU Inference

Level 6-7: MLOps and Mature Production