Tooling ≠ Glue Why changing AI workflows still feels like duct tape

December 8, 2025 · 1100 words · 6 min

There’s a weird contradiction in modern AI development. We have better tools than ever. We’re buildi

There’s a weird contradiction in modern AI development. We have better tools than ever. We’re building smarter systems with cleaner abstractions. And yet, every time you try to swap out a component in your stack, things fall apart. Again.
This isn’t just an inconvenience. It’s become the norm.
You’d think with all the frameworks and libraries out there (LangChain, Hugging Face, MLflow, Airflow) we’d be past this by now. These tools were supposed to make our workflows modular and composable. Swap an embedding model? No problem. Try a new vector store? Easy. Switch from OpenAI to an open-source LLM? Go ahead. That was the dream.
But here’s the reality: we’ve traded monoliths for a brittle patchwork of microtools, each with its own assumptions, quirks, and “standard interfaces.” And every time you replace one piece, you end up chasing down broken configs, mismatched input/output formats, and buried side effects in some YAML file you forgot existed.
A lot of the tooling that’s emerged in AI came with solid intentions. Follow the UNIX philosophy. Build small pieces that do one thing well. Expose clear interfaces. Make everything swappable.
In theory, this should’ve made experimentation faster and integration smoother. But in practice, most tools were built in isolation. Everyone had their own take on what an embedding is, how prompts should be formatted, what retry logic should look like, or how to chunk a document.
So instead of composability, we got fragmentation. Instead of plug-and-play, we got “glue-and-hope-it-doesn’t-break.”
And this fragmentation isn’t just annoying; it slows everything down. Want to try a new RAG strategy? You might need to re-index your data, adjust your chunk sizes, tweak your scoring functions, and retrain your vector DB schema. None of that should be necessary. But it is.
AI pipelines today span a bunch of layers:
Each one looks like a clean block on a diagram. But under the hood, they’re often tightly coupled through undocumented assumptions about tokenization quirks, statefulness, retry behavior, latency expectations, etc.
The result? What should be a flexible stack is more like a house of cards. Change one component, and the whole thing can wobble.
The short answer: abstractions leak — a lot.
Every abstraction simplifies something. And when that simplification doesn’t match the underlying complexity, weird things start to happen.
Take LLMs, for example. You might start with OpenAI’s API and everything just works. Predictable latency, consistent token limits, clean error handling. Then you switch to a local model. Suddenly:
What was once a simple
call becomes a whole new engineering problem. The abstraction has leaked, and you’re writing glue code again.
This isn’t just a one-off annoyance. It’s structural. We’re trying to standardize a landscape where variability is the rule, not the exception.
One big reason for the current mess is the lack of solid standards for interoperability.
In other fields, we’ve figured this out:
In AI? We’re not there yet. Most tools define their own contracts. Few agree on what’s universal. And as a result, reuse is hard, swapping is risky, and scaling becomes painful.
But in AI tooling?
So yes, standards like MCP are starting to show up, and they matter. But today, most teams are still stitching things together manually. Until these protocols become part of the common tooling stack, supported by vendors and respected across libraries, the glue will keep leaking.
It’s tempting to say: “But it worked in the notebook.”
Yes, and that’s the problem.
The glue logic that works for your demo, local prototype, or proof-of-concept often breaks down in production. Why?
Much of today’s tooling is optimized for developer ergonomics during experimentation, not for durability in production. The result: we demo pipelines that look clean and modular, but behind the scenes are fragile webs of assumptions and implicit coupling.
Scaling this glue logic, making it testable, observable, and robust, requires more than clever wrappers. It requires system design, standards, and real engineering discipline.
What makes this even more dangerous is the illusion of modularity. On the surface, everything looks composable – API blocks, chain templates, toolkits – but the actual implementations are tightly coupled, poorly versioned, and frequently undocumented.
The AI stack doesn’t break because developers are careless. It breaks because the foundational abstractions are still immature, and the ecosystem hasn’t aligned on how to communicate, fail gracefully, or evolve in sync.
Until we address this,
, no matter how shiny the tools become.
Many AI tools offer SDKs filled with helper functions and syntactic sugar. But this often hides the actual interfaces and creates tight coupling between your code and a specific tool. Instead, composability means exposing formal interface contracts, like:
These contracts:
Most current AI systems assume everything works smoothly (“happy path”). But in reality:
A truly composable system should:
Today, most AI workflows are written in procedural code:
But this logic is hard to:
A declarative pipeline describes the
, not the
:
Be skeptical of tools that promise seamless plug-and-play but lack strong interface contracts.
If a tool markets itself as easy to integrate but doesn’t offer:
Then the “plug-and-play” claim is misleading. These tools often lock you into an SDK and hide the true cost of integration
Design your workflows defensively: isolate components, standardize formats, and expect things to break.
Good system design assumes things will fail.
Treat every tool like an unreliable network service, even if it’s running locally. Embrace declarative and interoperable approaches: less code, more structure. Declarative tools (e.g., YAML workflows, JSON pipelines) offer: This is the difference between wiring by hand and using a circuit board. Declarative systems give you predictable interfaces and reusable components.   : We’ve all seen what’s possible: modular pipelines, reusable components, and AI systems that don’t break every time you swap a model or change a backend. But let’s be honest, we’re not there yet. And we won’t get there just by waiting for someone else to fix it. If we want a future where AI workflows are truly composable, it’s on us, the people building and maintaining these systems, to push things forward. That doesn’t mean reinventing everything. It means starting with what we already control: write clearer contracts, document your internal pipelines like someone else will use them (because someone will), choose tools that embrace interoperability, and speak up when things are too tightly coupled. The tooling landscape doesn’t change overnight, but with every decision we make, every PR we open, and every story we share, we move one step closer to infrastructure that’s built to last, not just duct-taped together.