This article contains summary by most of the industry players so I'm going to put it here in full in case it gets lost from drunkendata.com:
Invitation to De-Duplication Vendors
There are some questions I would like to get answers to in the area of De-Duplication. I am hoping that some of the vendor readers of this blog will help out.
Here is an opportunity to shine, folks, and to tell the world why, what, how and where. Here is the question list. You can either respond on line through comment cut and paste or email me your response at firstname.lastname@example.org and I will put your response on-line for you. From where I am sitting, these are the kinds of questions that consumers would ask.
1. Please provide the name of your company and the de-dupe product(s) you sell. Please summarize what you think are the key values and differentiators of your wares.
2. InfoPro has said that de-dupe is the number one technology that companies are seeking today — well ahead of even server or storage virtualization. Is there any appeal beyond squeezing more undifferentiated data into the storage junk drawer?
3. Every vendor seems to have its own secret sauce de-dupe algorithm and implementation. One, Diligent Technologies (just acquired by IBM), claims that their’s is best because it collapses two functions — de-dupe then ingest — into one in-line function, achieving great throughput in the process. What should be the gating factors in selecting the right de-dupe technology?
4. Despite the nuances, it seems that all block level de-dupe technology does the same thing: removes bit string patterns and substitutes a stub. Is this technically accurate or does your product do things differently?
5. De-dupe is changing data. To return data to its original state (pre-de-dupe) seems to require access to the original algorithm plus stubs/pointers to bit patterns that have been removed to deflate data. If I am correct in this assumption, please explain how data recovery is accomplished if there is a disaster. Do I need to backup your wares and store them off site, or do I need another copy of your appliance or software at a recovery center?
6. De-dupe changes data. Is there any possibility that this will get me into trouble with the regulators or legal eagles when I respond to a supoena or discovery request? Does de-dupe conflict with the nonrepudiation requirements of certain laws?
7. Some say that de-dupe obviates the need for encryption. What do you think?
8. Some say that de-duped data is inappropriate for tape backup, that data should be re-inflated prior to write to tape. Yet, one vendor is planning to enable an “NDMP-like” tape backup around his de-dupe system at the request of his customers. Is this smart?
9. Some vendors are claiming de-dupe is “green” — do you see it as such?
10. De-dupe and VTL seem to be joined at the hip in a lot of vendor discussions: Use de-dupe to store a lot of archival data on line in less space for fast retrieval in the event of the accidental loss of files or data sets on primary storage. Are there other applications for de-duplication besides compressing data in a nearline storage repository?
11. Just suggested by a reader: What do you see as the advantages/disadvantages of software based deduplication vs. hardware (chip-based) deduplication? Will this be a differentiating feature in the future… especially now that Hifn is pushing their Compression/DeDupe card to OEMs?
Thanks in advance for your response.