The storage industry has an assortment of labels that describe the capability to align data storage performance, cost, and protection to the value of that data. The more valuable the data, the bigger and badder the system it’s stored on—at a higher price. As data ages and access slows or stops, it is moved to bulk storage tiers. All in all, it is a beautiful, compelling story: This type of alignment can save a bundle in the data center as data is moved off of Tier 1 storage onto bulk storage tiers, prolonging the life of Tier 1 storage investments and reducing both CAPEX and OPEX. Cue the choir of angels; this is data center storage nirvana. Or is it?
Vendors have been selling this story for years; calling it storage tiering, data lifecycle management, information lifecycle management (ILM), and, way back when, it was called hierarchical storage management (HSM). In reality, users have not seen the promised benefits because:
- Solutions lack standards. Sure, vendors have tools to automate migration, but they typically require that all of the storage tiers come from them or in many cases even that the data stay in the Tier 1 array on denser disk drives (more on that in a follow-up blog). So if you’ve deployed what you believe to be a best of breed architecture with vendor A for Tier 1 and vendor B for Tier 2, it won’t work. You need to be homogeneous, because vendor A’s and B’s products won’t work together, nor will their engineering teams. That makes life that much harder.
- Granularity is not properly addressed. Almost everything traditional block storage vendors, and some NAS vendors, do is done at the LUN or volume (logical unit or a logical representation of a disk drive) level. “Hot” LUNs get moved to SSD or high performance disk. That’s great, but if only a portion of the data on that LUN is hot—as is often the case, you are wasting expensive storage real estate on cold data. NAS vendors often have a file, file system, or directory level approach that again provides insufficient granularity to ensure storage resources are optimized. In both cases, for block and file data, if the container is not sufficiently granular, tiering does help some, but won’t really get you all the way to the promised land.
- Data migration is not easy. Copying or moving data from primary to secondary storage consumes lots of system overhead at either the storage or server processor, depending on the technology used. That means taking a performance hit for those really important, 24 x 7 x 365 high performance applications that are staying behind on Tier 1 storage. And all the cleanup work is not pretty: moving data between storage tiers means remapping data paths or mount points to point applications to the new data location. In a world where data locations are mapped on massive Excel spreadsheets, there is a lot of room for human error in the equation and organizations risk losing data rather than moving it to a long-term storage tier.
Will vendors solve these issues? It depends. Some of the issues, like migration, can be addressed with storage virtualization technology. And some vendors are happy collecting the “tiering tax” and leaving everything in tier 1 arrays on denser drives. More on that tomorrow.
Related posts:




Because nothing is certain
blogs



Hi Terri, I agree with your perspectives, but I believe that the biggest obstacle to the successful deployment of ILM has been much more basic – getting past the first (and necessary) step in the process: Data Classification. Classification is incredibly people-intensive and time-consuming, and the payoff for customers has needed to justify it for there to be value in the concept. Even for those organizations that did successfully complete the Classification process and acheive real ILM value (yes they are out there
), they found that the people-intensive side of ILM never really went away, especially as IT infrastructres and data volumes grow and change. I beleive that the real game-changer for making some key values of ILM become real for customers will come through the elimination of the need to Classify data, particularly from a performance perspective, and the ability to automate the process of data migration. Or put simply, let software automatically do for customers what they have had to manually analyze and implement up until now. Thanks, Ken
–Thanks Ken! I agree wholeheartedly – I’ve just listed some technical issues but the manpower issues do far exceed these! Because classification is so hard, users pick a tier of storage for an application and the data tends to stay on that tier forever. On the file side we have some solutions – moving data to new tiers based on metadata (age, activity, owner) but it becomes much harder on the block side where the storage has no data awareness. That’s forced us to take wholesale approaches to move data to new tiers based on policy and application (for example, email archiving) or LUN activity. Solutions like EMC’s FAST will certainly helps automate the process, with “in the box” tiering based on LUN activity, and virtualization can help automate for cross platform tiering, enabling users to move cold LUNs to bulk storage.
[...] Why ILM Never Really Took Off [...]
Hi Terri: All great points, and while I think you nailed the issues as to the block level approach, on the file-aware side of the ILM equation, I would agree with Ken too, that data classification is a key item. I would also add policies that follow data for the life of the data, which means data must be classified at moment of creation, and managed until expiration, rotation, or shred. That would be a big brain saver.
Another impediment to ILM is that once you get down to it, there are too many data movers, creating too many repositories (you mention, email archive, how about file archive, backup, dedupe, CDP, WORM, legal hold, etc.). For each set of data movers and repositories, you have dissimilar policies and classifications, and a separate point of administration. It quickly becomes a huge can of worms, a wasteland of redundancy, where policies, workflows, business processes, the origins of the data, and dreams, go to die.
That all said, we think we figured out a large part of the problem by utilizing a single, real-time data-mover, and a unified, multi-function repository, with auto classification of data at point of creation, and policy tagging for the lifecycle. We also use search engine technology to handle metadata as part of the distributed, grid-style repository. So there are some added benefits to that from an “active information management” perspective (for unified tools like backup, CDP, replication, file and email archive, legal hold, etc.).
So, while you are correct that ILM never really took off, we think that the devil (consistent policies) is in the details (metadata). Vendors like us at Cofio are now approaching the problem from different angles. We are sure things will get interesting for this space in the near future. http://www.cofio.com
Terri,
I posted a comment yesterday, but it apears it got “classified”
as something other than a post ..
I believe the true underlying reason that Managing Information, and I hesitate to use the ILM term, is that the people that are supposed to be stewards of the data, us IT folks, aren’t responsible for the understanding of the Information. We deal with tools and such, but tools are useless without the ability to place them into useable context for the business.
ILM was sold to the Storage/IT and not the business side, of an organization. We were supposed to then take what the vendor thought was a sure thing (since we understand about effectively managing capacity, and translate that into a business use case of why we should properly manage our greatest asset, Information.
It is those users that create the data, and hopefully they are diligent to create useful metadata in their documents, to determine what should remain as useful business intelligence and what shouldn’t. Although, many times, users opt for the KISS approach and save everything for eternity and let someone else worry about what is useful and what isn’t.
At home, I find myself guilty of the same, though I try to be more successful at work. I create something and then stick it in one of my folders, and in time, I might go and look back through it. Umm — problem is I tend to suffer from CRS Syndrome and I recreate it again. Search engines only seem to exacerbate the issue, since it finds some many items which may have similar words, that I don’t always know which is which.
The point I’m trying to make is that the tools can only do so much. People are the ones that need to understand how they are to manage information, not technology. I think we tend to rely too much on technology to do it for us and get lazy. And by being lazy, it piles up, then it costs even more to go and clean it up, look at the show Hoarders, cause that’s what we are, but with data.
Oh and the more we place policies and rules on people, the more we tend to resist the process, the rules, and in the end, make lots of money for storage companies. Even as a steward of my companies data, I don’t have the authorization nor the authority to make a determination of what stays, what goes, what should reside where. All I can do is try and educate and hope my people take the iniative to empower themselves in the management of their own information.
That’s the true nature of ILM, and it has nothing to do with IT. Pun loosely intended.