Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism