Industry Leading Compression Translates to Huge Cost Savings

No feature in a database offers as many direct benefits as Data Compression. It can offer significant storage savings, increase data center density, allow more data to be kept, and increase query performance in cases where I/O is the bottleneck. All these advantages show up in the Total Cost of Ownership (TCO) calculation.

Central to RainStor’s unique product capabilities is the ability to compress and de-duplicate large data sets that typically achieve ratios of 40:1, rising to 100:1 in some cases. This comes through four distinct but complementary techniques.

compress

  1. Field-level de-duplication: This involves processing the source data on a column-by-column basis, reducing the dataset to only the list of the unique values that each column holds, together with a frequency count of the number of times the value appears. In this instance the storage space required using field-level de-duplication is a fraction of the original data.
  2. Pattern-level de-duplication: In order to store compressed data in a lossless state, a binary tree is built with pointers that can be used to reconstitute the data as it existed in its original form. Pattern-level de-duplication builds on field-level de-duplication by further leveraging the ability to store only unique values of the branches, again with a frequency count. This is achieved using exactly the same technique as used at the field level to work out the unique combinations.
  3. Algorithmic compression: Field and pattern compression techniques save disk space as much as saving memory. RainStor’s algorithmic compression involves innovative techniques designed to reduce the amount of disk space required for storage.
  4. Byte-level compression: In this scenario, components of the tree are aggressively compressed independently using industry standard byte-compression algorithms tuned to offer optimal savings.
Mark Cusack, Chief Architect at RainStor explains how extreme data compression is achieved to deliver ~50x reduction in storage footprint

 

It’s important to remember that these de-duplication techniques don’t cause any loss of detail, since the data is not summarized or aggregated in RainStor. Instead, RainStor stores each record as a series of pointers to the location of a single instance of a data value, or a pattern of data values.

rainstor-compress-1

RainStor uses a tree-based structure to store data that links various instances of the patterns to establish data records. This means that original records can be reconstituted at any time. It also ensures that the bigger the data set, the higher the probability that values and patterns will be repeated, enabling even greater compression.