The secret to cost-efficient AI inference