作者:Industry PerspectivesSee more from Industry Perspectives
By Liz Warner, CTO, Percona
AI is moving fast, but it's not moving alone.
Open source, once seen as a niche contributor to AI development, is now driving a growing share of AI innovation. By releasing models, tools, and datasets under permissive licenses, the open source ecosystem has become a powerful engine for progress — lowering barriers, accelerating development, and reshaping who leads and how fast they can move.
This isn't just about openness; it's about how openness changes the game. Removing traditional barriers to participation, open source AI invites a wider range of contributors: researchers, startups, and independent developers can all inspect, test, and improve the underlying systems. This collaborative dynamic produces faster, more resilient innovation. Bugs are caught earlier. Edge cases are uncovered by diverse users. Ideas are sharpened through real-world feedback.
Take DeepSeek-R1, released in late 2024 (100 % open‑source weights, MIT license), as an example. It rivals the performance of top-tier proprietary models in reasoning and code generation, despite being developed at a fraction of the cost. The market certainly took notice. Nvidia's stock fell by 17% in one day amid growing concerns about shifts in AI leadership.
Related:Meta's Llama LLMs Spark Debate Over Open Source AI
While that drop reflected broader market dynamics, it captured something real: Open source is no longer catching up in the world of AI. It's out in front, forcing the rest of the field to rethink its strategy, starting with greater openness in releasing and positioning their own models.
For a long time, closed platforms had major advantages: access to massive datasets, deep engineering talent, and tight control over the full pipeline. But as open models gain ground and adoption accelerates, proprietary vendors are being forced to adapt. Some have begun to revise licensing terms, open portions of their model architectures, or publish previously internal benchmarks.
Microsoft, for example, has embraced DeepSeek's open source R1 model and integrated it into Azure while maintaining flexibility in their partnership with OpenAI. IBM has open-sourced its Granite Code models under the permissive Apache 2.0 license, allowing commercial and research use without restrictions. The company has also expanded its watsonx.governance platform to help organizations track, audit, and manage both proprietary and open source models across the AI lifecycle.
These are signs that traditional vendors recognize the shifting expectations of developers and enterprises alike. But this shift brings new questions. Chief among them: Is greater openness enough?
Related:How Linux Communities Are Driving AI Innovation
Making a model open source doesn't automatically make it trustworthy. Open AI systems also need governance frameworks to ensure AI behaves as intended. Without them, even the most transparent of models can become problematic. Data pipelines are often opaque, training sources are undocumented, and model behavior changes without clear reasons.
That lack of visibility invites friction. Teams hesitate to deploy models whose behavior they can't readily predict or explain. Leaders struggle to approve systems they can't audit. And users lose confidence when there's no clear accountability for how decisions are made.
This isn't a theoretical risk. Microsoft and IDC report that governance and risk concerns are now the top challenge organizations face when trying to scale AI responsibly. IBM notes that explainability gaps in tooling are a persistent obstacle, and a lack of transparency has already caused trust issues in real-world deployments. The Organization for Economic Co-operation and Development highlights that weak data governance continues to stall adoption across sectors.
Related:AI Trends and Predictions 2025 From Industry Insiders
Transparency without control isn't enough. If you can't show where data came from, how it was transformed, or who signed off on its use, it's hard to stand behind the outputs your AI systems generate.
If transparency and control are the foundation of trustworthy AI, infrastructure is where that foundation is built. This is where many organizations hit a wall.
Legacy architectures and even some modern managed platforms require extra tooling or vendor support to answer basic governance questions: Where did this data come from? Who accessed it? What was it used for? When something goes wrong, can we trace it back?
By exposing internal mechanisms such as schemas, logs, and policies, open source infrastructure lets you build governance directly into your system. You don't have to wait on a vendor to expose a log or implement a compliance feature. You can inspect and adapt the environment to your actual requirements.
And at the heart of all this is the database. Once treated as passive storage, it has become an active control point for AI data governance, recording provenance, enforcing access rules, and serving as a system of record for accountability. For many governance tasks, if the database can't show what happened, nothing can.
Enterprises are standardizing on open source databases such as PostgreSQL and MySQL because they can switch on native features (think row-level audit triggers or pluggable audit plugins) that record exactly who touched each record and when, without bolted-on middleware. Unlike proprietary systems that may restrict visibility or customization, open source databases let organizations define their own controls at the schema, permission, and audit-log level. They support fine-grained access management, enable full lineage tracing, and allow teams to implement compliance workflows that align with internal policies, not just vendor defaults.
However, even with open tools, responsible AI requires deliberate design. Teams need to ask foundational questions:
Can we trace our data end-to-end?
Can we limit access based on sensitivity?
Are our audit logs exposed at the infrastructure level?
Can we version and verify data as it flows through the pipeline?
If any of these are unclear, it's a sign your stack may not be ready for what's next.
AI innovation is no longer just about who ships the biggest model first. It's about who can build systems that are both powerful and accountable. Operationalizing governance without slowing down innovation has become the new competitive edge.
Open source plays a critical role in this shift by enabling full visibility into how data is stored, queried, and managed. With that visibility, enterprises gain the confidence to move faster, knowing they can audit and adapt their systems as policies evolve. The infrastructure choices organizations make today, especially at the database and data governance layers, will determine whether their AI systems can scale responsibly tomorrow.