
Artificial intelligence has evolved rapidly lately, raising expectations across the investment industry for significant advances in research efficiency, reporting and risk management. But recent academic and industrial research offers a more sober view of this rapidly evolving technology.
Recent evidence points to persistent reliability gaps, the continuing need for human judgment and oversight, and limits to near-term value creation, suggesting that AI’s impact could also be more measured than initial enthusiasm implies. For investors, the message is obvious: AI stays a serious long-term opportunity, but one which is best realized through disciplined, evidence-based adoption reasonably than early-stage exuberance.
This post is the third a part of a quarterly reflection on the most recent developments in AI for investment management professionals. It draws on insights from investment professionals, academics and regulators who contribute to the bi-monthly newsletter, and builds on previous articles that examined the guarantees and pitfalls of AI and risk management techniques. This part moves towards a more pragmatic understanding of its potential.
A detailed review of recent publications reveals three common themes that might dampen industry optimism.
1. The reliability challenge
Despite impressive advances, the reliability of AI stays a key obstacle to its use in high-risk financial environments. A recent evaluation by NewsGuard (2025) documents a pointy increase in false or misleading statements from leading AI chatbots, with the error rate rising from around 10% to almost 60%.
This expansion of “hallucinations” will not be only a statistical anomaly: an internal OpenAI study (2025) concludes that hallucinations are sometimes a structural feature of model training, as current benchmarks reward confident answers over calibrated uncertainty and incentivize plausible but false statements.
Concerns also extend to moral orientation. In a financial decision simulation inspired by governance failures on the cryptocurrency exchange and hedge fund FTX, Biancotti et al. (2025) show that several leading models are more likely to recommend ethically or legally questionable actions when there’s a trade-off between personal gain and legal compliance. For investment professionals whose work relies on precision, transparency and accountability, these studies collectively underscore that AI will not be yet reliable enough to operate autonomously in lots of regulated financial processes.
2. Premium for human judgment
A second theme of the research is that AI appears to reinforce reasonably than replace human expertise and will even increase the importance of high-quality human oversight.
Neuroscience research from MIT (Kosmyna et al., 2025) has found that participants who interact with LLMs exhibit reduced brain activity in regions related to memory retrieval, creativity, and executive pondering. Although AI can speed up initial evaluation, a heavy reliance on these systems can weaken the cognitive skills that underlie sound investment decisions.
The introduction of AI also doesn’t reduce the necessity for human presence in customer-facing contexts. Yang et al. (2025) show that customers perceive AI-generated investment advice when accompanied by a human advisor to be significantly more trustworthy, even when the human doesn’t offer any analytical added value. Similarly, Le et al. (2025) find that customer satisfaction improves when collaboration between humans and AI is explicit reasonably than hidden.
Automation also stays limited. In large-scale task benchmarking, Xu et al. (2025) find that advanced AI agents only complete about 30% of complex, multi-step tasks autonomously. A separate study by Tomlinson (2025) analyzing greater than 200,000 Copilot interactions shows that model actions deviate significantly from user intent about 40% of the time.
Taken together, these results suggest that investment firms should view AI as a tool to reinforce reasonably than replace humans, with the standard of machine-generated outputs needing to be continually monitored. This continuous and structured monitoring reduces the machine’s added value and increases complexity and costs, especially because AI outputs often appear plausible even once they are incorrect. The literature also highlights the importance of organizational policies to forestall cognitive deskilling.
3. Structural and economic constraints
Finally, macroeconomic constraints are also dampening expectations. Acemoglu (2024) points out that even under optimistic assumptions, overall productivity gains from AI over the following decade are more likely to be modest. Much of the initial evidence comes from tasks which might be “easy to learn,” while tougher, context-dependent tasks have less scope for automation.
The regulation provides additional friction. Foucault et al. (2025) and Prenio (2025) note that the introduction of AI in financial intermediation introduces recent concentration risks, infrastructure dependencies and regulatory challenges, prompting regulators to proceed cautiously. This increases compliance costs and might slow industry-wide adoption. These structural aspects suggest that the impact of AI could also be more gradual and fewer disruptive than commonly believed.
Monitoring AI progress
The promise of AI is real, but its impact will rely on how rigorously and responsibly the industry integrates it. It will play a central role in the longer term of the industry, but its development is more likely to be more complex and depending on effective human leadership than initial expectations suggested.
References
Acemoglu, D. The Simple Macroeconomics of AI, , Working Paper 32487, May 2024
Biancotti et al., Chat Bankman-Fried: an Exploration of LLM Alignment in Finance, 2024
Foucault, T, L Gambacorta, W Jiang and X Vives (2025), Barcelona 7: Artificial Intelligence in Finance, Paris and London.
Kosmyna et al. Your Brain on ChatGPT: Accumulating Cognitive Debt When Using an AI Assistant for the Essay Writing Task, June 2025
Le et al., The Future of Work: Understanding the Effectiveness of Collaboration Between Human and Digital Employees in Service, , vol. 28(I) 186-205, 2025
NewsGuard, Chatbots spread falsehoods 35% of the time, September 2025
Prenio, J., Starting with the fundamentals: a list of genetic AI applications in supervision, June 2025
Tomlinson, et al., Working with AI: Measuring the Applicability of Generative AI to Careers, 2025
Xu et al., TheAgentCompany: Benchmarking LLM agents on consistent real-world tasks, December 2024
Yang, et al., My Advisor, Her AI and Me: Evidence from a Field Experiment on Human-AI Collaboration and Investment Decisions, June 2025
