By Jakub Lewandowski, Global Data Governance Officer at Commvault
It can’t have escaped your notice that there’s a lot of heated debate around AI at the moment, most specifically around the emergence of generative AI algorithms and Large Language Models (LLMs) like ChatGPT, Google Bard, Dolly, and others.
The rapid uptake of this technology has had a significant impact on just about every aspect of life: stimulating conversations around whether AI will take over human jobs, when and how it is ethically appropriate to use LLM tools, and how best to address the potential privacy and data security risks associated with using these tools.
Ethical and philosophical debates aside, organisations looking to deploy LLM-powered solutions to streamline and automate processes, or summarise and generate intelligence gained from massive data sets, will need to give due consideration to the risks involved.
Alongside creating internal policies and guidelines on how and when employees can use these tools, and for what, awareness of the current legal and regulatory landscape within which this technology currently operates will be key.
LLM – what is it, and how does it work?
LLM is a type of AI algorithm that uses deep learning techniques and large data sets to understand, summarise, generate, and predict new content.
Unlocking new possibilities in a range of fields, the potential applications for LLMs are infinite, encompassing everything from customer service chatbots through to anomaly detection and fraud analysis in financial services. It is also being used to speed up software development, generate complex legal summaries, provide insights for investment decisions, and create models that generate new insights on molecules, proteins, and DNA for pharmaceutical and life sciences researchers.
Clearly, LLMs are already proving to be a game-changer for multiple industry sectors and have a strong appeal for any organisation looking to increase efficiency and productivity. But there are a number of well-documented challenges that come with using the technology. These include dealing with issues like fabricated or inaccurate answers, model and output bias, intellectual property and copyright infringements, as well as data protection concerns.
Considering the risks
In terms of risk assessment and due diligence activities, some key questions will need to be asked around the sources from which data is taken to train and power LLM models, the licensing arrangements relating to that data, and how the data is sourced. Let’s take a look at why this is important.
LLMs can collect, store, and process any kind of data, including personal and other confidential data, at an unprecedented scale. This opens organisations up to some key challenges that arise from determining who is responsible for the legitimacy and quality of data used to train generative products.
Without knowing this, organisations could face significant legal or regulatory penalties if their models utilise personal data that has not been obtained using appropriate permissions. For example, personal information disclosed to LLMs could subsequently be used in additional ways that violate the expectations, or permissions, given by the people to whom this information explicitly relates.
Secondly, how does an organisation prevent or guardrail against the generation of problematic content such as observed biases, deep fakes, or outright discrimination? The responsibility of solution providers and organisations in terms of who is accountable for prevention, monitoring, and response needs to be clearly understood and documented.
When it comes to automated decision making or other outputs, who or what will gain access to the data or results generated by LLMs, and what are the implications of these systems in relation to cybersecurity risks?
Finally, organisations intending to use AI will need to think carefully about how they address privacy related obligations such as responding to the requests from data subjects to access or delete their data.
Regulatory compliance
LLMs are subject to the same regulatory and compliance frameworks as other AI technologies, but the speed at which they are becoming ubiquitous highlights some challenges in relation to compliance with existing data privacy and protection frameworks.
Let’s take a look at the key overarching legislation that organisations will need to be mindful of.
General Data Protection Regulation (GDPR)
Encapsulating the crucial principles of data sovereignty – that digital data is subject to the laws and regulations of a country in which it is physically located, that the government has jurisdiction over it, and can enforce its data protection policies – GDPR sets out a number of key requirements and principles in relation to the processing of personal data. These include:
- Consent – data subjects have a number of rights regarding their personal data, including the right to access, amend, or delete their data. How users exercise these rights in practice with relation to LLMs is a challenging proposition.
- The right of individuals not to be subject to decisions that produce legal effects for the individual or significantly affects the individual based on automated processing. This means that while certain decisions may be supported by LLM, the final say will typically require human judgement.
Data regulators in Italy, France, Germany, and Ireland have already voiced concerns about whether large LLM models are compliant with GDPR rules. Earlier this year Italy’s data regulator took a stand, preventing ChatGPT from harnessing the personal information of millions of Italians for its training data (though access was reinstated within a month, after OpenAI successfully “addressed or clarified” the issues).
The UK Data Protection and Digital Information Bill (DPDI Bill)
Currently under review by the House of Commons, the DPDI Bill aims to provide new clarity in relation to automated decision making and the safeguards that organisations will need to put in place when implementing AI. These include respecting the right of individuals to be informed about and to consent to such decisions, and to request and obtain human intervention in relation to such decisions. All of which is very much in line with current GDPR requirements.
Upcoming legislation and enforcement trends
European data protection authorities are already preparing to tackle complaints about GDPR violations resulting from the use of LLMs, ahead of the EU’s planned review of the effectiveness of GDPR in response to the rise of AI.
In France, the data protection watchdog, CNIL, has set out an action plan in relation to the deployment of generative AI systems, a move that could shape how other European regulators approach these technologies. Similarly, the European Data Protection Board (EDPB) has launched a task force that is focusing on enabling parity between the EU’s new AI Act and GDPR.
The EU’s AI Act, which was approved by the European Parliament in June 2023, sets out to regulate AI based on its potential to cause harm and is likely to put stricter obligations on the foundation models upon which LLM solutions are built. The regulation has already proposed a ban on certain AI uses, such as social scoring, and outlines the safeguards that will be needed for today’s rapidly evolving tech environment.
Meanwhile, the UK government has recently published its own AI whitepaper that sets out guidance on the use of AI that is designed to drive responsible innovation while maintaining public trust in this technology. The likelihood is that this will spawn further new legislation and regulation in the years ahead.
Evidently, the current flurry of data privacy and AI regulations means that organisations intending to deploy AI will need to navigate an increasingly complex legislative and regulatory landscape in the months to come and will need to ensure they stay fully abreast with developments. For the moment, however, the focus should be on ensuring compliance measures are in place so that data is collected and processed in line with current legal and regulatory requirements.
Recent Comments