How to build an AI assistant for your Data governance tool

Iman Johari
10 min readJan 10, 2025

--

Introduction

Every enterprise has various products that brings business values to their clients. Before the clients actually buy these product there is a process that the vendor and the client go through to ensure that it meets the client’s needs; This can through various show and tell sessions either by sales personnel showing custom demos or clients getting involved with the vendor to execute a pilot which basically showcases the value the vendor’s product can bring to client by for a specific use case.

In this article I demonstrate how we can accelerate “Time to Show” by building an AI assistant for a data governance tool and demonstrate its capabilities.

Let’s focus on software products at the moment. Each software product through various capabilities, can help solve a client’s problem either through a custom built application or integration of various other tools, products or applications. To show the business value a user has to follow a set of steps to complete tasks in a certain order. Now let’s get into an actual example without being too abstract.

IBM Knowledge Catalog data governance process

Let’s take IBM Knowledge Catalog as the governance tool here. In IBM Client Engineering we follow certain steps to show business value for this product. IBM Knowledge Catalog or IKC for short has capabilities such as data governance, data quality, data lineage , catalog management that helps clients with their data governance needs.

For us to complete a IKC pilot or engagement in client engineering we offer clients to showcase these capabilities in our environment. These steps are similar whether we perform them on our environment or on the client’s environment; however for a complete standard IKC engagement we follow the same instructions in the same order unless the clients wants something custom. Here are the steps for bulk of our data governance POCs:

1- Define groups and roles in the platform : Every platform should have a robust capability that shows how capabilities within the product can be associated with those who should have the priviledge.

2- Create a data source connection : Choose amongst close too 100 our of the box data source type to connect to

3- Create categories : Organize your data governance hierarchy

4- Associate users to groups in categories

5- Ingest/Create business terms

6- Ingest/Create data classes

7- Ingest/Create classification

8- Ingest/Create policies

9- Ingest/Create governance rules

10- Create data protection rules

11- Create SLA Rules

12- Create a project

13-Associate users and groups to project

14-Connect to data sources

15-Run metadata import in the project

16-Run metadata enrichment in the project

17-Define data quality rules and definition

18-Run Data quality rules

19-Create Catalogs

20-Associate user and groups to catalog

21-Publish data to catalog

22- Search for data

23- Configure and view data lineage

IKC’s platform is quite robust, that means for each of the above steps there are API’s that we can use. Some of the steps above are one API call and the others may be more than one API calls. Now Let’s imagine for every product out there, we can create this list for showing the business value for the process in mind.

I am going to show you how I use another IBM software called IBM Waston AI Assistant to help me go through these steps.

IBM Watsonx Assistant

IBM watsonx Assistant allows users to build custom chatbots through a low-code no-code User Interface. It allows users to implement processes through a set of different types of actions within the product as well as allowing to integrate generative ai within the tool.

Within this product there is a capability called custom extensions that allow users to set up API calls to be used as part of an AI chat bot. I can set up as many API calls from various tools to be part of a conversational experience within my process through actions.

To achieve that we need to identify the API calls that compose my process. Each of above API calls belong to a different component; For example user management is part of IBM CP4D API whereas creating projects and items related to data governance is part of Watson Data API

Lets get those AI calls identified; we are going to implement one of above calls just for the sake of showing feasibility for this article:

1- Create a new category

For Now lets implement this step, the rest follows the same pattern. As you can see in the figure below we can basically create the same process we defined in the previous section within our AI assistant builder and provide the right transition from one action to the other.

Let’s take a deeper look at an action. As we click on each action we are able to define the action; an action can have many steps shown in #2 in below figure. Then each step can have a condition and variable setting from previous steps. There are capabilities such assystem variables, user variables which can be created and maintained through the session without any need for coding. These variables can be used in conversation with the user conversing with the AI agent.

We can define what the AI assistant should communicate with the user in each step based on our process; This can also be based on generative AI and the Large Language models supported by the platform.

In the end we can define what should execute. for example in this step I am running a custom extension with the operation “Create a category in the glossary” and this will run a Watson data platform API rest call to create a category within my data governance platform. Once that is completed then there is a transition to step 3 of conversation steps on the left panel which will run the action 4- Add business terms action

In this steps I receive a CategoryName value shown in rectangle #3 from the user in above figure; Step 1 which is shown by “1.Enter the Name of the category” will store that value in this variable from the user; then this variable is passed to the custom extension for calling the API call to create the category; CategoryName is passed to “name” Parameter of our custom extension.

To add these API calls we must prepare an open specification API json file for those calls in our process. For example for the two calls I have prepared the following open specification API file in below json file :

1-Create a category

2-Create a business term

{
"openapi": "3.0.0",
"info": {
"title": "Watson Data API",
"description": "This method can be used to create a new category in the glossary.",
"version": "3.0.0"
},
"servers": [
{
"url": "https://cpd-cpd.apps.6748c7c8e8a5a32b1ae1ac56.ocp.techzone.ibm.com:443/"
}
],
"paths": {

"/v3/glossary_terms": {
"post": {
"summary": "Creates a glossary term",
"operationId": "create_glossary_term",
"security": [
{
"BearerAuth": []
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"type": "object",
"description": "Represents a glossary term to be created.",
"properties": {
"name": {
"type": "string",
"description": "The name of the glossary term.",
"example": "Glossary Term 1",
"maxLength": 255,
"minLength": 1
},
"short_description": {
"type": "string",
"description": "The short description of the glossary term.",
"maxLength": 255,
"minLength": 1
},
"long_description": {
"type": "string",
"description": "The long description of the glossary term.",
"maxLength": 15000,
"minLength": 1
},
"parent_category_id": {
"type": "string",
"description": "The parent category ID to which the glossary term belongs.",
"maxLength": 36,
"minLength": 36,
"pattern": "[A-Za-z0-9\\-]{36}"
},
"steward_ids": {
"type": "array",
"description": "The stewards assigned to the glossary term.",
"example": ["steward1", "steward2"],
"items": {
"type": "string"
}
},
"custom_attributes": {
"type": "array",
"description": "Custom attributes with their values.",
"items": {
"type": "object",
"properties": {
"definition_id": {
"type": "string",
"description": "The ID of the custom attribute definition."
},
"value": {
"type": "string",
"description": "The value of the custom attribute."
}
},
"required": ["definition_id", "value"]
}
}
},
"required": ["name"]
}
}
}
},
"responses": {
"201": {
"description": "The glossary term has been created successfully.",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"resources": {
"type": "array",
"items": {
"type": "object",
"properties": {
"href": {
"type": "string",
"description": "API reference URL for the created glossary term."
},
"artifact_id": {
"type": "string",
"description": "Unique ID of the created glossary term."
},
"version_id": {
"type": "string",
"description": "Version ID of the created glossary term."
},
"global_id": {
"type": "string",
"description": "Global ID of the created glossary term."
},
"entity_type": {
"type": "string",
"description": "Type of the entity, e.g., 'glossary_term'."
}
},
"required": ["href", "artifact_id", "version_id", "global_id", "entity_type"]
}
}
},
"required": ["resources"]
},
"example": {
"resources": [
{
"href": "/v3/glossary_terms/123e4567-e89b-12d3-a456-426614174000",
"artifact_id": "123e4567-e89b-12d3-a456-426614174000",
"version_id": "1a2b3c4d5e6f7g8h9i0j",
"global_id": "global_123e4567-e89b-12d3-a456-426614174000",
"entity_type": "glossary_term"
}
]
}
}
}
},
"400": { "description": "Bad Request. The input was invalid." },
"401": { "description": "Unauthorized. Authentication failed." },
"403": { "description": "Forbidden. Permission denied." },
"409": { "description": "Conflict. Glossary term with the given name already exists." },
"500": { "description": "Internal Server Error. An unexpected condition occurred." }
},
"tags": ["Glossary Terms"]
}
},
"/v3/categories": {
"post": {
"summary": "Creates a category in the glossary",
"operationId": "create_category",
"security": [
{
"BearerAuth": []
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"type": "object",
"description": "Represents a category to be created.",
"properties": {
"name": {
"type": "string",
"description": "The name of the artifact.",
"example": "Name 1",
"maxLength": 255,
"minLength": 1
},
"long_description": {
"type": "string",
"description": "The long description of an artifact.",
"maxLength": 15000,
"minLength": 1
},
"short_description": {
"type": "string",
"description": "The short description of an artifact.",
"maxLength": 255,
"minLength": 1
},
"parent_category_id": {
"type": "string",
"description": "Artifact ID of a parent category",
"maxLength": 36,
"minLength": 36,
"pattern": "[A-Za-z0-9\\-]{36}"
},
"reference_copy": {
"type": "boolean",
"description": "Indicates that it is a reference copy of an artifact managed in an external metadata server."
},
"steward_ids": {
"type": "array",
"description": "The stewards assigned to an artifact.",
"example": ["steward1", "steward2"],
"items": {
"type": "string"
}
},
"steward_group_ids": {
"type": "array",
"description": "The steward groups assigned to an artifact.",
"example": ["steward_group1", "steward_group2"],
"items": {
"type": "string"
}
},
"classifications": {
"type": "array",
"description": "Relationships to classifications.",
"items": {
"type": "object",
"properties": {
"artifact_id": {
"type": "string",
"description": "ID of the classification."
},
"name": {
"type": "string",
"description": "Name of the classification."
}
},
"required": ["artifact_id"]
}
},
"custom_attributes": {
"type": "array",
"description": "List of custom attributes with their values.",
"items": {
"type": "object",
"properties": {
"definition_id": {
"type": "string",
"description": "ID of the custom attribute definition."
},
"value": {
"type": "string",
"description": "Value of the custom attribute."
}
},
"required": ["definition_id", "value"]
}
},
"custom_relationships": {
"type": "array",
"description": "Custom relationships to assets.",
"items": {
"type": "object",
"properties": {
"relationship_id": {
"type": "string",
"description": "ID of the custom relationship."
},
"related_asset_id": {
"type": "string",
"description": "ID of the related asset."
}
},
"required": ["relationship_id", "related_asset_id"]
}
}
},
"required": ["name"]
}
}
}
},
"responses": {
"201": {
"description": "The category has been created successfully.",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"resources": {
"type": "array",
"items": {
"type": "object",
"properties": {
"href": {
"type": "string",
"description": "API reference URL for the created category."
},
"artifact_id": {
"type": "string",
"description": "Unique ID of the created category."
},
"version_id": {
"type": "string",
"description": "Version ID of the created category."
},
"global_id": {
"type": "string",
"description": "Global ID of the created category."
},
"entity_type": {
"type": "string",
"description": "Type of the entity, e.g., 'category'."
}
},
"required": ["href", "artifact_id", "version_id", "global_id", "entity_type"]
}
}
},
"required": ["resources"]
},
"example": {
"resources": [
{
"href": "/v3/categories/285b7b78-3d64-48e6-a9b6-e02f4c32295f",
"artifact_id": "285b7b78-3d64-48e6-a9b6-e02f4c32295f",
"version_id": "12e37d76-f28b-4dc9-a35e-0c3e30f9f8ea_0",
"global_id": "18091466-981e-4113-8943-2ddf162bff6d_285b7b78-3d64-48e6-a9b6-e02f4c32295f",
"entity_type": "category"
}
]
}
}
}
},
"400": { "description": "Bad Request. The input was invalid." },
"401": { "description": "Unauthorized. Authentication failed." },
"403": { "description": "Forbidden. Permission denied." },
"409": { "description": "Conflict. Category with the given name and parent already exists." },
"500": { "description": "Internal Server Error. An unexpected condition occurred." }
},
"tags": ["Categories"]
}
}
},
"components": {
"securitySchemes": {
"BearerAuth": {
"type": "http",
"scheme": "bearer",
"bearerFormat": "JWT",
"description": "Provide your Bearer token for authorization."
}
}
},
"security": [
{
"BasicAuth": []
}
]
}

I can setup my custom extension in my AI assistant by importing the above json file as open API and those calls now will be available in my AI assistant.

and as shown below I can choose to run that API call in my action step and pass CategoryName which is a variable set by the user to the name parameter of this rest call.

Once my configuration is completed I can run and preview my AI Assistant

As demonstrated in the figure above, in dialog 1 the user is greeted and is presented with options; For this demo I configured 3 actions therefore we can see there are 3 actions users can choose or the user can say start and the interactive conversation will be kicked off.

In dialog 2 after saying start, the actions are executing one after another until there is a need for user input.

First Authentication to CP4D is completed and then a data source connection is created. As demonstrated here we can also automate certain steps without the user interference.

In creating a category which is a folder like object for organizing glossary, the user is asked to enter a category name. The user enters for example a category called “Finance”, then this value is passed to the api which was configured through custom connections and our chatbot will send the request to IKC. In dialog 3 you can see the results in CP4D. A category called “Finance” has been created.

Conclusion

We are not limited to 1 tool to create these types of chatbots. If your process spans around multiple tools, and capabilities of those tools are exposed through an API, then you can certainly use Watsonx Assistant to create a chatbot that will communicate with each and every one of them based on the process you define in a low-code no-code AI assistant builder for mobile and desktop.

In addition to those there are tooling within Watsonx Assistant to help you monitor those chatbots for productivity purposes.

--

--

No responses yet