AWS, at its re: Invent conference, unveiled Amazon DataZone, a new data management service that may help organizations catalog, discover, share, and regulate their data.
“To unlock the full power, the full value of data, we need to make it easy for the right people and applications to find, access and share the right data when they need it — and to keep data safe and secure,” said AWS CEO Adam Selipsky.
The program will give users fine-grained controls for managing and governing data.
“DataZone enables you to set data free throughout the organization safely by making it easy for admins and data stewards to manage and govern access to data. And it makes it easy for data engineers, data scientists, product managers, analysts and other business users to discover, use and collaborate around that data to drive insights for your businesses,” explained Selipsky.
Organizations today gather petabytes of data scattered over various departments, services, on-premises databases, and third-party sources. Administrators and data stewards who create and manage this data must make it accessible while upholding control and governance to guarantee that it can only be accessed by the appropriate parties and in the appropriate circumstances. Only then will organizations be able to realize the full value of this data.
Employees across the organization, or data consumers, are simultaneously looking for information from data producers to help them make decisions.
Although it is difficult to implement governance policies that consider the variety of data, departments, and use cases across, organizations must strike a balance between the need for control, to ensure that data remains secure, and the need for access, to generate new insights. Some companies create catalogs to curate their information, but these systems take a lot of work to maintain, demand that data producers manually add context to each dataset (such as origin and description) to make it discoverable, and don’t have built-in access controls to make governance straightforward.
As Amazon DataZone enables data consumers to find, use, and collaborate on data, it generates business insights while also making it simpler for data producers to manage and govern access to data. By defining their data taxonomy, configuring governance policies, and connecting to a variety of AWS services, partner solutions, and on-premises systems, data producers can set up their own business data catalog using the Amazon DataZone web portal.
Data consumers can search for and find data assets, look up context-relevant metadata, and request access to datasets using the Amazon DataZone web interface after the catalog has been set up. When a data consumer is prepared to begin data analysis, they create an Amazon DataZone Data Project—a shared area in the web portal where users may access various datasets, collaborate on analysis, and share access with colleagues.
By utilizing machine learning to gather and recommend metadata for each dataset and by training on a customer’s taxonomy and preferences to improve over time, Amazon DataZone eliminates the labor-intensive tasks associated with maintaining a catalog.
AWS also introduced its Digital Sovereignty Pledge, pledging to provide consumers with the tools they need to manage where their data is stored and accessed. DataZone does provide some of these controls, albeit its emphasis is not on digital sovereignty.
Users of DataZone will have access to a portal where they may create their data catalog and establish the taxonomy. After connecting to a data source, DataZone will employ machine learning to create its catalog with metadata, and users can add additional labels and descriptions as needed.