Large language models (LLMs) have rapidly entered the landscape of historical research. Their capacity to process, annotate and generate texts is transforming scholarly workflows. Yet historians are uniquely positioned to ask a deeper question – who owns the tools that shape our understanding of the past?
Most powerful LLMs today are developed by private companies. While their investments are significant, their goals – focused on profit, platform growth or intellectual property control – rarely align with the values of historical scholarship: transparency, reproducibility, accessibility and cultural diversity.
This raises serious concerns on a) opacity: we often lack insight into training data and embedded biases, b) instability: access terms and capabilities may change without notice, and c) inequity: many researchers, especially in less-resourced contexts, are excluded.
It is time to build public, open-access LLMs for the humanities – trained on curated, multilingual, historically grounded corpuses from our libraries, museums and archives. These models must be transparent, accountable to academic communities and supported by public funding. Building such infrastructure is challenging but crucial. Just as we would not outsource national archives or school curriculums to private firms, we should not entrust them with our most powerful interpretive technologies.
The humanities have a responsibility – and an opportunity – to create culturally aware, academically grounded artificial intelligence. Let us not only use LLMs responsibly but also own them responsibly. Scholarly integrity and the future of public knowledge may depend on it.
Prof Dr Matteo Valleriani
Max Planck Institute for the History of Science, Berlin, Germany