kiroi.org

KIROI - Artificial Intelligence Return on Invest
The AI strategy for decision-makers and managers

Business excellence for decision-makers & managers by and with Sanjay Sauldie

KIROI - Artificial Intelligence Return on Invest: The AI strategy for decision-makers and managers

Start » AI Tool Check: How decision-makers test AI tools properly

8 May 2025

AI Tool Check: How decision-makers test AI tools properly

Automation Big data and smart data Digital transformation Artificial intelligence KIROI Step 3: Big Data and Smart Data

Imagine investing six-figure sums in a digital tool that is intended to revolutionise your organisation, only to find after six months that it neither fits your processes nor is accepted by your employees. This exact scenario is being experienced by executives across numerous industries with increasing frequency, because when it comes to AI Toolcheck skipping fundamental evaluation steps or being dazzled by impressive presentations without checking practical applicability in one's own context. However, the systematic evaluation of intelligent systems requires much more than a superficial glance at feature lists and pricing models.

Why the systematic AI tool check has become indispensable

The market for intelligent software solutions is growing exponentially, and new providers with promising products, covering everything from automated document processing to predictive maintenance, appear almost daily. This abundance of options overwhelms even experienced technology managers, as the differentiation between substantial innovations and marketing-driven promises becomes increasingly difficult. A structured AI Toolcheck provides orientation and prevents costly wrong decisions, which not only tie up financial resources but also cost valuable time and employee trust [1].

For example, in the healthcare sector, clinic managements are currently evaluating systems for the automated reporting of X-ray images. A hospital in North Rhine-Westphalia tested three different providers and found that only one fully met the strict data protection requirements of GDPR. In the retail sector, on the other hand, purchasing managers are examining intelligent inventory management solutions intended to create demand forecasts. A medium-sized drugstore chain discovered during the test phase that their preferred system showed significant weaknesses with seasonal fluctuations. In the manufacturing industry, in turn, plant managers are analysing algorithms for predictive machine maintenance. An automotive supplier only realised through systematic testing that the promised accuracy of 95 percent was only achieved under laboratory conditions.

Understanding the dimensions of a comprehensive AI tool check

A professional evaluation of intelligent systems encompasses technical, organisational, and ethical aspects, all of which must be considered equally to enable informed decisions. From a technical perspective, this involves integration capability, scalability, and performance under real-world conditions. Organisationally, the focus is on user acceptance, training requirements, and process adjustments. Ethically, transparency, fairness, and the traceability of algorithmic decisions must be examined [2].

A financial services group recently implemented an automated claims processing system without adequately considering the ethical implications. After several months, it emerged that the algorithm was systematically disadvantaging certain postcode areas. A logistics company, on the other hand, neglected the integration capability with its existing fleet management software when selecting a route optimisation system. This resulted in months of delays and considerable additional costs. A telecommunications provider, in turn, underestimated the training requirements for a new customer service tool and faced massive resistance from employees.

Best practice with a KIROI customer A medium-sized pharmaceutical company faced the challenge of evaluating and implementing an intelligent system for quality control in tablet production. As part of transruption coaching, we supported the project team over six months in systematically assessing four vendor solutions, with a particular focus on the regulatory requirements of the pharmaceutical industry. Together, we developed an industry-specific catalogue of criteria that considered technical performance parameters as well as GMP compliance and validation requirements. The team defined precise test scenarios that simulated real production conditions and covered various fault cases. During the pilot phase, we identified critical weaknesses in two of the vendors that had not been apparent in the standard presentations. The company ultimately opted for a solution that, while not the cheapest, offered the best integration into the existing production environment. The investment paid for itself after just eighteen months through reduced rejection rates and accelerated release processes.

Practical steps for a successful AI tool check

The first step in any sound evaluation is to precisely define your own requirements before contacting any provider. Many organisations make the mistake of getting inspired by product demonstrations and then retrospectively adapting their requirements to the available features. This approach regularly leads to disappointment because the actual pain points in day-to-day operations are not addressed. Instead, it is recommended to first conduct internal workshops where all relevant stakeholders can air their expectations and concerns.

An energy provider began its evaluation process with a two-week requirements analysis involving employees from seven different departments. The result was a detailed catalogue of criteria with weighted evaluation dimensions. A hotel chain, on the other hand, developed industry-specific test scenarios for a revenue management system, taking into account seasonal booking patterns and major events. A mechanical engineering company, in turn, defined precise interface requirements for a predictive maintenance system that was intended to integrate seamlessly into the existing IoT infrastructure [3].

Setting up and carrying out pilot projects correctly

The pilot phase is the core of any reputable evaluation process, as it confronts theoretical promises with practical reality. Crucially, the testing environment must be as close as possible to actual production conditions without endangering ongoing business. Clients often report that only during the pilot phase do hidden costs and expenses become visible, which were not mentioned in any sales presentation. These insights are invaluable because they enable an informed decision.

A private bank initially tested an automated investment advisory system with a small group of pilot customers before deciding on a broader rollout. It became apparent that the user interface was too complex for older customers and needed to be simplified. A food manufacturer conducted a three-month parallel test in which the new quality assurance system ran alongside established manual processes. This allowed for an objective measurement of the accuracy of the algorithmic recommendations. In turn, a municipal utility company implemented an intelligent grid management tool in a defined supply area to minimise risks and gain experience.

Best practice with a KIROI customer An internationally operating logistics service provider was looking for an intelligent solution for automated shipment tracking and delivery time forecasting, which would proactively inform customers about delays. As part of our support, we assisted the project team in conducting a structured comparison of five providers, placing particular emphasis on forecast accuracy under various conditions. Together, we developed a scoring model that weighted technical performance, integration effort, operating costs, and user-friendliness. During the eight-week pilot phase, we tested each provider with identical datasets from daily business operations. The team systematically documented deviations between forecasted and actual delivery times, as well as reaction speed upon disruption detection. Particularly insightful was the analysis of system behaviour during an unforeseen weather disruption, during which three out of five systems showed significant weaknesses. The insights gained enabled a well-founded decision for the provider whose solution also functioned reliably under extreme conditions.

Avoiding pitfalls and typical mistakes

Even experienced leaders repeatedly fall into the same traps when evaluating intelligent systems because certain cognitive biases are difficult to overcome. The so-called halo effect leads to an impressive brand name or a charismatic sales presentation pushing critical questions into the background. The confirmation bias, in turn, tempts one to primarily search for information that confirms a prior decision already made. Knowing these psychological mechanisms and actively countering them is an essential success factor [4].

A media company was impressed by the innovative visualisation of a content recommendation system, without critically questioning the underlying algorithms. Only after implementation did it become apparent that the system favoured clickbait content. In another instance, an authority trusted the references of a provider blindly, without considering that the mentioned reference customers had completely different requirements. A retail company underestimated the complexity of data migration and was faced with months of delays.

Understanding AI tool checks as a continuous process

The evaluation of intelligent systems does not end with the purchasing decision, but continues throughout the entire period of use, as both one's own requirements and technological possibilities are continuously evolving. Regular reviews of system performance, structured feedback rounds with users, and monitoring of the supplier market are part of professional governance. Clients frequently report that the true strengths and weaknesses of a system only become apparent after several months of operation.

An audit firm conducts quarterly performance reviews of its audit assistance systems, comparing the results against the original expectations. A hospital group has established permanent monitoring of diagnostic support systems, analysing deviations between algorithmic recommendations and final physician decisions. A manufacturing company, in turn, uses a continuous improvement process to continually optimise the configuration of its quality assurance systems.

My KIROI Analysis

After years of supporting organisations in the evaluation and implementation of intelligent systems, some key insights are emerging that I'd like to share as impulses for your own projects. The most important success factor is and remains thorough preparation before making initial contact with suppliers, because only a clear understanding of your own requirements protects you from being dazzled. At the same time, I observe that many organisations allocate too little time for the pilot phase, and consequently discover critical weaknesses only after the rollout. The systematic documentation of test scenarios and results may seem complex, but it pays off at the very latest when decisions need to be justified to various stakeholders.

Of particular importance to me seems to be the involvement of future users in early project phases, because technically brilliant solutions fail if they are not accepted. Transruption coaching supports precisely at this interface between technology and organisation by enabling structured dialogues and making hidden resistances visible. Furthermore, experience shows that the AI Toolcheck should never be viewed in isolation, but always within the context of the overarching digital strategy. Organisations that align their evaluation criteria with strategic goals make more sustainable decisions and avoid siloed solutions. Finally, I would like to emphasise that continuous review and adjustment remain essential even after a decision has been made, because both the technological landscape and one's own requirements are constantly evolving.