Expert AI – Competence Without Comprehension?

When we founded Cygnus a few years ago one of our core ideas was to combine the capabilities of Artificial Intelligence with human expertise to provide better answers to complex questions about risk. Cygnus advises organizations on how to respond to complex risks arising from geopolitical instability, conflict and collusion with bad actors, and we act as security risk experts in support of litigation.   

After a series of in-depth discussions with mathematicians specialising in predictive analytics and machine learning, we concluded that it wasn’t possible to use AI in the way we had envisaged. Not that it was ‘too difficult’, but that it isn’t possible in principle to model the more complex problems we deal with in our resilience & risk, and litigation support practices.  This answer surprised both us and the mathematicians – we’d both started the conversation assuming that doing this would be relatively straightforward given recent advances in AI, along with the claims being made about its future capabilities. In essence, the mathematicians’ view was that the types of uncertainty generated by humans exercising free will was the main thing driving chaos (or radical uncertainty) in the types of risk scenarios we were modelling – and in their opinion free will was too slippery for their maths. 

Right from the start we were clear that Cygnus would be built on the principles of honest marketing and real expertise in our field. We weren’t prepared to embellish our services – what has recently been called AI-washing – so we put the idea on ice.

Last year we attended an excellent conference on Artificial Intelligence & Expert Evidence led by Minesh Tanna, a Partner and Global AI Lead at the International law firm Simmons & Simmons. From Minesh’s presentation it was clear that AI’s rapidly evolving capabilities in some areas will have a profound impact on the world of expert evidence, at least in some fields of expertise. But we were struck more by the skepticism expressed by many of the well-respected experts in the room when confronted with ChatGPT’s actual outputs. We remain open-minded, but nothing we saw at that point led us to think that AI had strong applicability in our own fields of risk and security – essentially this was for the same reasons as those offered by the mathematicians when we were starting out. The experts’ consensus seemed to be that AI generates good answers when there is plenty of objective fact (quantifiable, measurable and documented), but it stumbles when ‘thinking’ and subjective assessment of the relevance and importance of fact is required.

Fast-forward to today and the emergence of OpenAI’s latest GPT models and the ability to customize them for specific uses with fine-tuning. The latest developments prompted us to revisit the question of whether AI models are approaching, at, or have even surpassed human expertise in our specific fields of expertise.

COMPLEX vs. DIFFICULT QUESTIONS – COMPETENCE WITHOUT COMPREHENSION?

Before trying to answer this question, it’s worth clarifying the difference between a difficult question and a complex one. For us, a difficult question is answerable on the basis of broad general knowledge, coupled with advanced reasoning capabilities – this is in fact OpenAI’s own definition. A complex question on the other hand requires very specific knowledge and good answers will rely on expert opinion derived from often incomplete, ambiguous, contradictory, and dynamic information. Ultimately, it seems to boil down to a balance between leveraging correlations and elaborate statistics (AI’s strength) and a sense of causality (our strength) – in our opinion neither AI nor humans can in principle do both, but that opinion is unproven and we are open to testing it.

To understand GPT-3.5’s capabilities (yes, I know about GPT-4) we asked the model a series of difficult and complex questions in our own fields of expertise, and there was a sharp difference in the quality of the answers. For difficult questions, the model did better on average than most analysts in ‘intelligence and research’ companies – plus the model is quicker and free. This result was surprising and impressive. On the other hand, its answers to complex questions were well written and structured but contained either no substantive information or, worse, contained wrong information. 

To be fair, these results are in-line with the capabilities OpenAI claims for its models, and our own comments here aren’t really offered as criticisms. Clearly, GPT-3.5 is an excellent tool for conducting general research, with the added benefit of indicating areas where more specific research and analysis might be needed. The more interesting question that follows from this is if, and to what extent, fine-tuning might improve the model’s ability to provide good answers to complex questions in specific applications.

FINE-TUNING – CAN AI DO BETTER THAN A HUMAN EXPERT?

Actually, this might not be the right question – it might be more sensible to ask if a fine-tuned AI model can equal a human expert, or at least challenge the expert so that the expert is forced to improve their answer. Why might this be needed?

Here’s an example. It’s based on our experience of acting as expert witnesses in litigation and arbitration, but the ideas apply just the same to complex issues in risk advisory work and geopolitical analysis. Imagine that you’re an expert witness in a dispute where the opposing expert is likely to offer an opinion that differs sharply from your own. This is usual, but the problem is that you have no way of knowing what facts or opinion the opposing expert will rely on – even worse, there’s no way of knowing how opposing counsel will use those facts and opinions to attack your own position. What you’d normally do at this stage is construct some adversarial arguments to challenge and test your own position. But the problem is that these arguments are inescapably subject to your own biases – in practice what we’ve found is that we’ll spot most of the other side’s attack options, but there’s always a good ambush lurking somewhere that only gets sprung in court. That’s all part of the game in an adversarial system, but good gamesmanship also requires building the best possible position for your client whilst being true to the facts and objective in your opinions.

It’s easy to imagine how a similar process could be used to identify and evaluate options for managing strategic and operational risks in complex and ambiguous situations outside of a legal setting. For example, in the new year we’ll be updating our forecast on Russia’s strategic options after Ukraine. If we game Russia’s options using an AI system, its responses would likely diverge from either side’s biased human assumptions. Rather than AI hallucinating alternative futures, the options it generates ought to be taken seriously – the deeper and unpalatable truth is that AI is spotting aspects of reality that humans can’t (or won’t) contemplate. In extremis, a winning move might be horrific and inscrutable, but once the option exists it becomes part of the game and therefore possible in reality.

Back to the legal process: What if an AI model could act as an intrinsically unbiased virtual expert? If OpenAI’s claims about fine-tuning are correct, then this should be possible in principle. Once the model is fed enough training examples from the human expert’s facts (the ‘evidence’ in the expert’s report), it should in theory be able to offer a range of expert opinions of its own. These could be used to challenge the human expert’s own position and could help the lawyers to diligently validate the strengths and weaknesses of a case.

There are a lot of conditionals in all of this. The effort and expense needed to train the model probably limits its application to very high-value cases, but this would be worthwhile if the AI could demonstrate real power in terms of the quality of its ‘expert opinion’. Although we have never seen AI achieve this in our field, we’re open-minded about the possibilities.

AI’s ULTIMATE DESTINATION – A TOOL, A PARTNER, OR A RIVAL?

For the time being, our own experience with AI means that we continue to believe that it needs a human expert to establish the facts that will be used to fine-tune the model, before there is any realistic possibility of AI becoming a viable tool, a partner, or a rival in truly complex domains of knowledge. We also think a human expert needs to rigorously evaluate AI’s answers and the human will remain free to accept or reject them. We also agree with the Security and Exchange Commission’s Chair Gary Gensler, that too many companies (including some in our field) are making unfounded marketing claims that don’t match what they are delivering – in other words, they are AI-washing. But we remain open-minded and will carefully watch how AI’s capabilities develop – like any good experts, we’re prepared to change our minds in the face of new evidence. 

Back to Insights

Share this insight