Friday, January 24, 2025

ChatGPT and enormous language models: Their risks and limitations


Performance and data

Despite its seemingly “magical” properties, ChatGPT, like other large language models (LLMs), is just an enormous artificial neural network. Its complex architecture consists of roughly 400 core layers and 175 billion parameters (weights), all trained on human-written text from the Internet and other sources. In total, these text sources contain roughly 45 terabytes of source data. Without the training and optimization, ChatGPT would just produce nonsense.

We may think that the amazing capabilities of LLMs are limited only by the dimensions of their network and the quantity of knowledge they train on. That’s true to a certain extent. But LLM inputs cost money and even small performance improvements require significantly more computing power. It is estimated that training ChatGPT-3 used about 1.3 gigawatt-hours of electricity and price OpenAI a complete of about $4.6 million. In contrast, the larger ChatGPT-4 model may have cost $100 million or more to coach.

OpenAI researchers can have already reached a tipping point, and a few have admitted as much Further performance improvements must come from something apart from increased computing power.

Nevertheless, data availability is maybe the largest obstacle to the progress of LLMs. ChatGPT-4 has been trained on all high-quality texts available on the Internet. But much more high-quality text is stored in private and company databases and shouldn’t be accessible to OpenAI or other corporations at an affordable cost or to an affordable extent. But such curated training data, combined with additional training techniques, could refine the pre-trained LLMs to higher anticipate and reply to domain-specific tasks and queries. Such LLMs wouldn’t only outperform larger LLMs, but would even be cheaper, more accessible and safer.

But inaccessible data and the boundaries of computing power are only two of the obstacles holding back LLMs.

Hallucination, inaccuracy and abuse

The most relevant use case for basic AI applications like ChatGPT is gathering, contextualizing, and summarizing information. ChatGPT and LLMs have helped with writing dissertations, large computer codes, and even taking and passing complicated exams. Companies have commercialized LLMs to offer skilled support services. For example, the corporate Casetext has used ChatGPT in its CoCounsel application to assist lawyers write legal research memos, review and draft legal documents, and prepare for litigation.

But no matter their writing skills, ChatGPT and LLMs are statistical machines. They give “plausible” or “likely” answers based on what they “saw” during their training. They cannot at all times confirm or describe the reasoning and motivation behind their answers. Although ChatGPT-4 can have passed multiple bar exams in multiple states, an experienced attorney shouldn’t trust his legal memos any greater than those written by a first-year associate.

The statistical nature of ChatGPT is most evident on the subject of solving a math problem. Ask it to integrate a multi-term trigonometric function and ChatGPT may return a plausible-looking but incorrect answer. Ask him to explain the steps it took to reach at the reply. You may then get a seemingly plausible answer again. If you ask again, you might get a totally different answer. There needs to be just one correct answer and just one sequence of analytical steps to reach at that answer. This highlights the incontrovertible fact that ChatGPT doesn’t “understand” mathematical problems and doesn’t apply the computational algorithmic considering that mathematical solutions require.

Data Science certificate tile

The random statistical nature of LLMs also makes them vulnerable to what data scientists call “hallucinations,” flights of fancy that pass them off as reality. If they will provide false but convincing texts, LLMs may also spread misinformation and be used for illegal or unethical purposes. Bad actors could cause an LLM to write down articles within the type of a good publication after which distribute them as fake news, for instance. Or they may use it to defraud customers by obtaining sensitive personal information. For these reasons, corporations like JPMorgan Chase and Deutsche Bank have banned the usage of ChatGPT.

How can we address LLM-related inaccuracies, accidents and abuse? Fine-tuning pre-trained LLMs against curated, domain-specific data may also help improve the accuracy and appropriateness of responses. Casetext, for instance, relies on pre-trained ChatGPT-4, but supplements its CoCounsel application with additional training data – legal texts, cases, laws and regulations from all US federal and state jurisdictions – to enhance its responses. It recommends more precise prompts based on the precise legal task the user wants to finish. CoCounsel at all times cites the sources from which it obtains its answers.

Certain additional training techniques, corresponding to: B. Reinforcement Learning from Human Feedback (RLHF), applied along with initial training, may also reduce the potential for misuse or misinformation of an LLM. RLHF “scores” LLM responses based on human judgment. This data is then fed back into the neural network as part of coaching to cut back the potential for the LLM providing inaccurate or harmful answers to similar prompts in the longer term. Of course, what’s an “appropriate” response is determined by perspective, so RLHF is hardly a panacea.

“Red teaming” is one other improvement technique that enables users to “attack” the LLM to search out and fix its weaknesses. Red Teamers write requests to persuade the LLM to do what it shouldn’t be alleged to do in anticipation of comparable attempts by malicious actors in the actual world. By identifying potentially erroneous prompts, LLM developers can then set guardrails for LLM responses. While such efforts help, they are usually not foolproof. Despite extensive red teaming on ChatGPT-4, users can still create prompts to get across the guardrails.

Another possible solution is to make use of additional AI to observe the LLM by making a secondary neural network in parallel to the LLM. This second AI is trained to guage the LLM’s responses based on certain ethical principles or guidelines. The “distance” of the LLM’s response from the “correct” response in keeping with the Judge AI is fed back into the LLM as a part of its training process. In this manner, when the LLM considers its response to a prompt, it prioritizes the response that’s most ethical.

Tile for Gen Z and Investing: Social Media, Crypto, FOMO and Family Report

transparency

ChatGPT and LLMs share a typical shortcoming in AI and machine learning (ML) applications: they’re essentially black boxes. Not even OpenAI’s programmers know exactly how ChatGPT configures itself to supply its text. Model developers traditionally design their models before converting them into program code, but LLMs use data to configure themselves. The LLM network architecture itself lacks a theoretical foundation or technique: programmers have chosen many network functions just because they work, without necessarily knowing why they work.

This inherent transparency problem has led to a completely recent framework for validating AI/ML algorithms – the so-called explainable or interpretable AI. The model management community has explored various methods to develop intuition and explanations around AI/ML predictions and decisions. Many techniques aim to grasp which features of the input data generated the outputs and the way necessary they were to particular outputs. Others reverse engineer the AI ​​models to create a less complicated, more interpretable model in a localized area where only certain functions and outputs apply. Unfortunately, interpretable AI/ML methods change into exponentially more complicated as model sizes increase, so progress has been slow. To my knowledge, no interpretable AI/ML has been successfully applied to a neural network of the dimensions and complexity of ChatGPT.

Given the slow progress in explainable or interpretable AI/ML, there may be a compelling case for more regulation around LLMs to assist corporations protect against unexpected or extreme scenarios, the “unknown unknowns.” The increasing ubiquity of LLMs and the potential for productivity gains make outright bans on their use unrealistic. An organization’s risk governance policies for models should due to this fact focus not a lot on validating some of these models, but fairly on implementing comprehensive usage and security standards. These guidelines should prioritize the secure and responsible delivery of LLMs and be certain that users confirm the accuracy and appropriateness of output responses. In this model governance paradigm, independent model risk management doesn’t examine the operation of LLMs, but fairly examines the business user’s authority and justification for counting on the LLMs for a selected task and ensures that the business units using them use, have security measures in place as a part of the model output and within the business process itself.

Graphic for “Handbook of AI and Big Data Applications in Investments”.

What’s next?

ChatGPT and LLMs represent a serious leap in AI/ML technology and convey us one step closer to artificial general intelligence. However, the introduction of ChatGPT and LLMs brings with it significant limitations and risks. Companies must first adopt recent model risk governance standards corresponding to those described above before deploying LLM technology of their organizations. Good model governance policy recognizes the large potential of LLMs but ensures their secure and responsible use by mitigating the inherent risks.

If you enjoyed this post, do not forget to subscribe.


Photo credit: ©Getty Images /Yuichiro Chino


Latest news
Related news