Enterprise Storage Guide

Data Awareness Is Increasingly Popular in the Storage Biz

As you scan the emerging storage market, there are some very cool things happening, but one of the features that seems to be increasing in popularity — and needs — revolves around analytics or at least some version of data-awareness.  What is “data awareness” you may ask?  Well, consider this: the vast majority of companies out there really don’t know much about their storage.  Sure, the storage administrator probably knows how much capacity they have, how much is left, and, in general, what applications are running against the storage.

But that’s about it.  For years, people have been bemoaning the state of unstructured data, which, by its very nature, doesn’t have a whole lot of organization around it.  Companies are running storage systems that may house millions — or even billions — of files with, literally, no idea of what they’re storing.  They may have hundreds of thousands of files with personally identifiable information (PII) or they may be wasting space storing people’s iTunes libraries.

While the second scenario is just a space-waster, the first one, storing PII or credit card numbers in a non-secure way, can open an organization to all kinds of problems, from legal trouble to PR disasters.  Further, companies simply don’t know which files in their environments have been recently accessed and who has accessed those files.  If a breach takes place, those companies may not be able to determine which files were accessed and without an understanding for who accessed files, a breach could be far worse than it would be if the direct impact can be truly determined.

Over the past couple of Tech Field Day events I’ve attended, the need to be able to keep track of data in a more structured way has been discussed.  There are a few companies that have developed products that are either absolutely designed to solve the data awareness problem or that have features that can help organizations become more aware of potential issues in their troves of data.

I’m not going to go to in-depth into exactly what each company is doing in this post, but just introduce basics of who is doing what in general.

First up, DataGravity.  DataGravity’s value proposition is “Data Security at the Point of Storage.”  The product was built around this idea.  They include deep data awareness in the platform from the beginning and allow companies to gain deep understanding of their data.  For example, with DataGravity, you can determine what kinds of data you want to watch for — PII, customer data; basically, anything sensitive — and you can determine who has accessed that information and when they accessed it.  DataGravity called this Data Awareness.

Qumulo is another company that includes data awareness in their platform as well.  Qumulo is very different than DataGravity in that Qumulo is very focused on companies in which the applications running on the storage are the business whereas DataGravity targets more general-purpose storage needs.  Qumulo’s data awareness value is summed up here: “With greater visibility into which data is most valuable, where it is stored, what users or applications are accessing what files, what should be archived, backed up or deleted, and why data grows, Qumulo’s customers report significant gains in workflow performance and storage efficiency.”

Cohesity is another company that has included data awareness/analytics in their platform:

  • eDiscovery: Rapid content analysis to find relevant case information for legal requests or holds.
  • Compliance: Ensure compliance with Personally Identifiable Information (PII) requirements with Cluster-wide content scans for names, phone numbers, and credit card information that may have been stored in clear text.

The key difference in the platforms at present is the ease by which administrators can glean information from the system.  DataGravity has advanced querying capabilities that don’t require admins to write code to retrieve information and don’t require DataGravity to write extensions to support new analysis needs.  Qumulo and Cohesity are still developing their querying capabilities.  That said, the mere fact that the data-awareness capabilities are baked into the various platforms means that it’s just a matter of reporting and that the architecture is already in place to enable that reporting.

Of course, data awareness products have been around for a while, but they’ve often been provided by third party products that had to be added on top of the rest of the infrastructure.  More and more, information and intelligence regarding the data are being built into storage as core features rather than relying on third party tools.  DataGravity, Qumulo, and Cohesity aren’t the only companies that do this, but they happen to be the three I’ve reviewed recently.  Further, this kind of “Data Awareness” is not necessarily a brand new concept, but is one that seems to be gaining new attention in the startup world.