RainStor
Contact Us   |   Download    |   Blog
  • Products
    • Overview
    • How it works
    • Ingest
    • Reduce
    • Comply
    • Query & Analyze
    • Manage
    • Scale
    • Services & Support
  • Solutions
    • Overview
    • Big Data Analytics on Hadoop
    • Machine Generated Data Retention
    • Analytics Data Retention
    • Compliance Data Retention
    • Database Archiving
  • Industries
    • Industries Overview
    • Communications
    • Financial Services
    • Retail
    • Security
    • Utility Smart Grid
  • Technology
    • Architecture
    • Cloud
    • Hadoop
    • On Premise
  • Partners
    • Strategic Partners
    • Technology Partners
    • Case-studies
    • Become a Partner
  • Company
    • History
    • Management
    • Advisory Council
    • Press Releases
    • News Coverage
    • Awards
    • Resources
    • Careers
    • Blog
    • Contact Us
    • Events
  • Technology

    • Architecture
    • Cloud
    • Hadoop
    • On Premise
  •  

    Low Cost Scale and Analytics Leveraging HDFS – Smaller Footprint, Lower Cost.

    RainStor runs natively on the Hadoop Distributed File System (HDFS) providing the ability to scale out data storage, query and analytics capabilities to hundreds of commodity servers, and potentially multi-petabyte data volumes.

    RainStor can be deployed with an existing Hadoop Map-Reduce cluster, utilizing the same hardware and file system, with the added benefit of delivering high levels of compression and direct access to retained data via SQL query.

    Because RainStor is a shared everything architecture, each node must have equal access to all of the RainStor partition files and HDFS supports this through it’s logical “shared everything” view of the distributed file store. Also, RainStor’s immutable data model means that RainStor partitions are only appended to HDFS and never need to be updated.

    RainStor can be deployed using Cloudera’s CDH3, the latest version of Cloudera’s Distribution for Hadoop.  RainStor also supports the Apache Hadoop distributions from partners Hortonworks and MapR. The result is a pragmatic and scalable approach to Big Data that performs fast analytics while retaining data at a lower overall total cost of ownership (TCO), driven by RainStor’s ability to compress the overall data footprint and leverage the distributed file system for cost effective scale.

    Generally speaking Hadoop replicates the data often triple-fold and so to counteract this, most Hadoop deployments rely on the use of binary compression (such as LZO), which typically yields about a 5 to 1 compression, which also comes with a re-inflation performance penalty upon access. In contrast, RainStor’s compression rates of about 40 to 1 significantly reduce the overall footprint and also provide data access without re-inflation.

    Example: With 2 petabytes (Pb) of raw data to be stored for a 6-month period, the difference in disk savings could look like this:

    • Data in HDFS: 2 Pb X 3 (for replication) = 6Pb + analysis results
    • Data in HDFS with RainStor: 50TB (original source data compressed 40 to 1) X 3 (for replication) =150TB+ results of analysis. A physical storage savings of 5.85Pb (or 5,850TB’s).

    Hadoop gives organizations the ability to scale for Big Data analytics but the data actually grows as it’s replicated across nodes. Reducing the size of data slated for retention makes enormous sense. The combination [of RainStor and Hadoop] changes the class of hardware and storage required, making the economics even more attractive.

    - Merv Adrian, VP Research at Gartner Group


    RESOURCE LIBRARY
    Analyst coverage, Solution Overviews, Datasheets, Whitepapers, Case Studies
    Download now »
    VIDEO LIBRARY
    Collection of informative videos on RainStor technologies and supported solutions
    View now »
    AWARDS
     
    About RainStor | Management Team | Support | Contact Us | Terms and Conditions | Sitemap
    © Copyright 2011 RainStor Inc. All Rights Reserved.
    Follow us  
    Twitter   Facebook   LinkedIn   YouTube