Data Oceans

Myth or the next Data Science playground?

Jul 23, 2018

In today's complicated, global environment, with various regulatory and data privacy challenges to meet, companies need to be creative about how they build environments for data science and other innovation to occur while not impacting production systems or violating the law. In my former article about Data Ecosystems, we dove a bit into both technology and theory about how data lakes spawn certified data sources for consistent analysis and application development. What should companies do, though, if they have multiple data lakes? How can the value of data that is dammed up in one division's lake be further levered to add exponential value for the firm?

Firms are learning that the more data they have, brought together in a compliant manner, the more valuable insights can be found or generated. As more data is available to train algorithms, firms can create more intelligent systems. Some firms are thinking now about how they can also monetize that data via subscription services or other means of selling the right data sets to the right external companies for the right price. This all requires a strong, firm-wide leadership champion and sponsor, though, that can demonstrate the power of data to the CEO staff.

From a technology perspective, the process is intricate, but not insurmountable. At GE (NYSE: GE), our data team created a strategy to ingest business data into one massive cloud environment. Each individual business's data was sectioned off logically to create various secure zones. As relevant data for business processes, such as direct material purchasing, are identified, the data from each business can be extracted and pumped into a larger, cross-company zone. Enter the rise of data oceans.

Data Oceans are cross-enterprise data environments that can be constructed to analyze company-wide business processes; provide volumes of relevant data to data scientists and developers to build and train artificial and augmented intelligence applications; and do it compliantly.

The key to data oceans is that only completely anonymized/obfuscated data should go into them. Most laws, especially the latest, GDPR in Europe, are concerned with personally identifiable data. Data that can implicate an individual, be used to affect her employment status/pay/privacy, or that can directly relate to the firm's finances are the hot coals in most company buckets. When these elements of data that can be used to isolate an individual or recreate the firm's financial statements are removed, a wealth of freedom opens up insofar as how that remaining data can be utilized. Think about the potential to create a true digital thread of customer activities, related product performance, enterprise parts spend and utilization, and logistical optimization algorithms.

Other examples of where this can be helpful are a retail group of stores across several brands learning how customers in a particular geography spend their money on related products. They may want to know how discounts and special events influence customer spend to plan a coordinated campaign. They may also want to know the difference between brick-and-mortar versus online sales in order to structure campaigns to encourage sales on one medium or the other (or to understand how they can be used in tandem from an augmented intelligence perspective). The possibilities are scalable across various industries and company types where multiple divisions, historically disconnected by physical and logical means, can benefit from related product and customer data.

Where could you see a use for a data ocean for effective data science or monetization? As you look to scale operations out, what are your top concerns and challenges to create a scalable and compliant environment?

Matt Brooks is a seasoned thought leader and practitioner in data and analytics; culture; product development; and transformation.

Thanks for reading dAIta POINTS! This post is public so feel free to share it.

Discussion about this post

Ready for more?