Tuesday, August 14

Machine learning “critical” to data prep, but analysts warn on “multiple versions of the truth” if unmanaged

LinkedIn Google+ Pinterest Tumblr +

As data becomes more complex, it is virtually impossible or time-prohibitive to manage, clean, harmonize and shape it manually, said a market guide by Gartner.

“Machine learning has become a critical feature to operationalize and automate time-consuming and error-prone data preparation tasks,” wrote Gartner analysts Ehtisham Zaidi, Rita Sallam, and Shubhangi Vashisth.

The survey found that data preparation tool vendors (about 70% of the surveyed vendors) have already incorporated some machine learning algorithms into their data preparation tools to improve and make the data preparation process more productive.

Some tools have algorithms used to identify data structures, schemas and relationships at various levels of granularity, and the ability to structure the datasets upon initial data ingestion.

At the same time, the market is crowded with a range of choices, from stand-alone specialists to vendors that embed data preparation as a capability into analytics and BI, data science, or enterprise data integration platforms.

“Although accelerating the shift toward broadly deployed modern analytics and data science, these tools, if unmanaged, can introduce multiple versions of the truth,” the analysts wrote.

Among a series of recommendations, analysts recommended investigating a data preparation vendors’ “roadmap on their current or planned support for extended data preparation capabilities to improve the interactive experience, facilitate timely insights and enhance enterprise readiness”.

Examples include the inclusion of data science libraries, more-intuitive data preparation workflows, improved governance, collaboration, machine learning and cataloging.

Gartner expects that by 2023, machine-learning-augmented master data management (MDM), data quality, data preparation and data catalogs will converge into a single modern enterprise information management (EIM) platform used for the majority of new analytics projects.


About Author

Leave A Reply