Technology04-Feb-20255 min read

AI in operational software: when it earns its presence.

Adding AI to a business application is currently a marketing trend. Adding AI where it removes a specific manual cost is a different category. The distinction matters.

By Mohammad Jamnagarwala · Simply Five Studio

Every business application vendor in 2025 advertises AI features. The features range from genuine productivity layers to button-and-popup window dressing that does not change the workflow it claims to improve. As a firm that builds operational software for a living, we have an obligation to be careful about what we ship and why.

This essay is about the test we apply when deciding whether AI belongs in a given module, and what AI looks like when it earns its presence versus when it is being added because the marketing function asked for it.

The test

AI earns its presence in operational software when three conditions are true.

First, there is a specific manual cost it removes. Not "saves time broadly" but a measurable workflow that currently takes the team minutes or hours, where the AI cuts that time substantially.

Second, the manual cost is in translation, classification, or extraction work, not in judgement. AI is good at structural translation between formats (an unstructured customer request to a structured quote line, a photograph of a part to a specification record, a handwritten document to a search index). AI is less good at judgement the way an experienced practitioner exercises it, and pretending otherwise is where the trouble starts.

Third, the human reviews the AI output before it commits to the system. The reviewer is the practitioner whose work the AI is assisting. The audit trail records what came from the AI versus what came from human entry. The relationship between AI and human is clear: the AI accelerates the slow, error-prone work; the human remains responsible for the decision.

When these three conditions are true, AI belongs in the module. When any one of them is missing, AI is being shoehorned in for reasons other than the work.

A working example

In the CFX system we built for three cutting-tool manufacturers in Mumbai, the field sales team uses an Android app that integrates Gemini AI at one specific layer: spec translation.

A field sales engineer at a customer factory captures a request in natural language ("they need 50 of the M16 bolts, length 80mm, stainless 316, with the standard collar fitting"). They optionally attach a photo of an existing part. The model processes both, extracts the structured specification fields the quote system expects, and presents the extracted record to the engineer.

The engineer reviews. They correct any field the model got wrong. They commit. The record lands in the headquarters CRM with the audit trail attached: which fields came from the AI extraction, which were corrected by hand.

The AI is not writing the quote. It is doing the structural translation between conversation and form. The engineer remains responsible for the specification's accuracy. The headquarters team that processes the request never sees the AI; they see the structured record the engineer signed off on.

This module passes all three tests. The manual cost being removed is the clipboard-and-translate work that field sales has always done slowly and error-prone. The work is classification and extraction, not judgement. The human reviews before commit, with a clear audit trail.

The module ships in production. It works.

A counter-example we deliberately did not build

The same client asked, during scoping, whether the AI could draft the follow-up email to the customer after the spec was captured. The field sales engineer would just press send.

We declined to build that feature.

The reason is the third condition. An AI-generated customer email that the engineer just presses send on is not a human-reviewed artefact in any meaningful sense. The engineer skims it at best. The customer receives a message that the engineer did not really write. The relationship language drifts toward generic. The customer notices over time. The trust the engineer has built with the customer erodes.

The work the AI was being asked to do here was not translation. It was relationship maintenance, which is judgement work. The right answer was not to add AI; it was to leave the email writing to the engineer.

This is the harder call. The client wanted the feature. The case for it was plausible. The math of saving the engineer two minutes per email looked positive. We said no because the long-term cost was larger than the short-term saving.

What this means for buyers of operational software

A founder evaluating operational software with AI features should apply the three tests at the demo. For each AI feature the vendor shows, ask three questions.

What specific manual cost does this remove? If the answer is vague, the feature is window dressing.

Is the work translation or judgement? If the vendor claims AI handles judgement well, the vendor is selling something other than what they have.

Does a human review before commit, with a clear audit trail? If the answer is no, the system is making decisions on the firm's behalf that the firm cannot inspect.

A system that passes all three tests for each AI feature is the kind of system worth running. A system that fails one or more is the kind of system that produces marketing slides but operational disappointment.

The category will sort itself out

The current state of AI in business applications is messy because the technology shifted faster than vendors could thoughtfully integrate it. Most current AI features will be replaced or removed over the next two years as the practitioners using them discover what works and what does not.

The features that survive will be the ones that pass the three tests. The features that do not will quietly disappear from the marketing decks, and the vendors will pretend they were never there.

For now, the right posture for a buyer is skepticism on the AI claim and a willingness to ask the three questions. For us, the right posture as a builder is to apply the tests rigorously and ship only the AI that earns its presence.

Related essays.

Technology

WooCommerce is a great way to start and a poor thing to depend on forever.

We have argued for years that WooCommerce plus a custom ERP beats every all-in-one for distributors. That is still true as a starting point. This is the other half of the argument: the signals that tell you WooCommerce has become the liability, and what to rebuild it as.

Technology

When two databases hold the same customer, you have two truths.

A storefront on one database and an ERP on another is the most common architecture in Indian distribution. It also quietly accrues a debt: every record that lives in both places has to be kept in agreement, and the sync that does it is the part most likely to fail.

Technology

Direct WhatsApp Cloud API vs BSP: what founders should know.

Every Indian B2B founder running customer messaging on WhatsApp eventually asks the same question. The honest answer is technical, not a sales pitch, and it depends on a few specific factors most BSP comparison pages skip.

Continue the conversation

If this resonated, tell us about your operation.

The contact form takes about two minutes. The reply comes from the founder within two working days.

Start a Conversation More essays