Data catalog

You cannot govern data that you cannot name, identify, or locate. Data catalogs are the response to this need. Although the idea of a data inventory is not new, data catalogs incorporate many capabilities that allow business, technical, and operational metadata to be integrated in a way that has not been possible before.

Fig. 1 Data Catalogs were first used to track the data flowing into data lakes. This requirement is still important, but we now have a better understanding of the end-to-end Data Acquisition process needed to get a dataset into a data lake. We expect data catalogs to support all the steps in Data Acquisition.

A data catalog can be incredibly helpful. Take for instance the process of Data Acquisition illustrated above. This is the lifecycle of bringing an external dataset into a data lake. While it can be viewed as just one-use case there are many use cases at a detailed level. For example, tracking the legal signoffs on a dataset, documenting its semantics, assigning accountabilities for the dataset, and flagging any personal information. And once the dataset is in the data lake, the data catalog can be used to monitor the state of its health, and to track data sharing agreements made with teams that want to share it.

That said, data catalog implementations do not always go smoothly. These flexible platforms must be configured to match the specific use cases that an enterprise has. While they can harvest a lot of metadata automatically, human entered and managed metadata is vital to their success. The tool also must be rolled out in a manner that its licensing model will support.

These issues should not be tackled after a data catalog is purchased. They should factor into the discovery process for a catalog and a robust plan for its deployment. Nobody wants these projects to become “all about the tool”. Nor does anyone want to build a vast “museum of metadata” that is supposed to be there to answer questions but in practice is a sea of utterly impenetrable complexity.

Data Millennium has been working with data catalogs since they were invented. We have experience with most of the tools, and have done multiple implementations with them, although we remain vendor-neutral at all times. We are ready to help you stand up a new data catalog or get more value out of an existing one.

FOR MORE INFORMATION CONTACT US AT

contact@datamillennium.com

CONTACT OUR TEAM OF EXPERTS

Don't Hesitate To Connect With Us!

Please use this form and we will get back to you as soon as possible.