AI and Data Governance
By Malcolm Chisholm
With the advent of AI there is a growing realization that many new Data Governance tasks are required. There are, of course, many AI Governance needs, such as what LLM to select, and how to mitigate model risks. But even some of these overlap with Data Governance. For instance, AI models can provide poor quality outputs if poor quality data is put into them.
In the context of AI, Data Governance tasks tend to follow the model life cycle. A lot of attention has to be paid to finding the right data to fine tune a model, and ensuring this data can be used for AI needs. Data quality is of paramount importance. Sensitive data, like personal information, may have to be excluded from input to models, as will anything that might lead to bias. On the model usage side, prompt libraries, or meta-prompts may be required for the model to provide useful outputs.
Finding the right sources of data and qualifying them for use in AI can be difficult for many organizations. Crystal clear use cases are required to avoid just throwing irrelevant content into AI. Once the sources are identified, a series of data preparation steps is required to get the content into a form that can be best consumed. Tools are emerging to support these data preparation needs, but not everything is a technical task that can be automated, and human judgement is still required.
Increasingly, AI is not just generative in the sense that we think of users asking questions and getting responses. AI agents are becoming increasingly useful and can connect with an organization’s data and even its metadata. In some respects this resembles an advanced form of Robotic Process Automation (RPA) where AI can execute API calls or perhaps even interact with screens. Thus, a whole new level of automation is becoming possible. Interestingly, AI agents seem to be well positioned to do many Data Governance related tasks that data stewards do not have the time for.
To be successful with AI means engaging Data Governance to help with the data-related tasks for the models being built. AI is not simply technology into which any data in any format can be loaded. We can expect to see a much greater evolution of the symbiosis of AI and Data Governance over the next couple of years.