A data catalog can be incredibly helpful. Take for instance the process of Data Acquisition illustrated above. This is the lifecycle of bringing an external dataset into a data lake. While it can be viewed as just one-use case there are many use cases at a detailed level. For example, tracking the legal signoffs on a dataset, documenting its semantics, assigning accountabilities for the dataset, and flagging any personal information. And once the dataset is in the data lake, the data catalog can be used to monitor the state of its health, and to track data sharing agreements made with teams that want to share it.
That said, data catalog implementations do not always go smoothly. These flexible platforms must be configured to match the specific use cases that an enterprise has. While they can harvest a lot of metadata automatically, human entered and managed metadata is vital to their success. The tool also must be rolled out in a manner that its licensing model will support.
These issues should not be tackled after a data catalog is purchased. They should factor into the discovery process for a catalog and a robust plan for its deployment. Nobody wants these projects to become “all about the tool”. Nor does anyone want to build a vast “museum of metadata” that is supposed to be there to answer questions but in practice is a sea of utterly impenetrable complexity.