Google launches ATLAS, new scaling laws for multilingual AI models

Google presented ATLAS (Adaptive Transfer Scaling Laws), an innovative set of scaling laws for large multilingual models, addressing a critical gap in public research, which has so far focused almost exclusively on English. The study includes over 774 training runs with models from 10M to 8B parameters, spanning 400+ languages and evaluations in 48 languages. ATLAS offers practical recommendations on how to balance model size, data volume, and language mix to maximize performance in target languages such as Catalan, through cross-transfer of data between related languages. This approach adapts traditional scaling laws to complex multilingual scenarios, allowing developers to make training choices based on empirical data, not assumptions. A central innovation is the cross-transfer matrix, which identifies which languages help or hurt the performance of others. Results show that languages from the same linguistic family or script share significant synergies, while transfer is not always symmetric. English, French, and Spanish appear as especially useful languages due to the quality and diversity of their publicly available texts. ATLAS also formalizes the "curse of multilinguality," where adding new languages can reduce performance due to capacity limitations. The study defines concrete scaling rules: when doubling the number of languages, it recommends increasing model size by 1.18× and data by 1.66×, leveraging positive transfers that compensate for lower data per language. Another technical point is the decision to pre-train from scratch or fine-tune from a multilingual checkpoint. Results indicate that fine-tuning wins with restricted token budgets, but pre-training wins when resources are sufficient, with crossover points depending on model size and number of tokens. This approach provides a practical rule for efficient allocation of computational resources in multilingual models. With ATLAS, developers can plan global multilingual models much more efficiently, choosing language combinations, adjusting model size and data, and deciding the best path of pre-training or fine-tuning.