RainStor
Contact Us   |   Download    |   Blog
  • Products
    • Overview
    • How it works
    • Ingest
    • Reduce
    • Comply
    • Query & Analyze
    • Manage
    • Scale
    • Services & Support
  • Solutions
    • Overview
    • Big Data Analytics on Hadoop
    • Machine Generated Data Retention
    • Analytics Data Retention
    • Compliance Data Retention
    • Database Archiving
  • Industries
    • Industries Overview
    • Communications
    • Financial Services
    • Retail
    • Security
    • Utility Smart Grid
  • Technology
    • Architecture
    • Cloud
    • Hadoop
    • On Premise
  • Partners
    • Strategic Partners
    • Technology Partners
    • Case-studies
    • Become a Partner
  • Company
    • History
    • Management
    • Advisory Council
    • Press Releases
    • News Coverage
    • Awards
    • Resources
    • Careers
    • Blog
    • Contact Us
    • Events

Feeding the Elephant Peanuts and Making Pig Fly

Posted on January 23rd, 2012

Last week we announced a new product edition that runs natively on Hadoop and HDFS. We are particularly excited as we sincerely hope it will help support the growth and enterprise adoption of Hadoop in the marketplace. Although we are not an open source vendor, we have tremendous admiration and respect for the open source community and the incredible momentum that Hadoop has garnered. A special thanks to the efforts of Cloudera, who blazed and continues to blaze the trail, evangelizing the virtues of Hadoop, and to others such as Hortonworks and MapR (all RainStor partners) who are legitimizing the technology for solving Big Data problems.

By applying our unique pattern and value de-duplication to raw data that would normally be compressed via LZO or Gzip, RainStor delivers significant savings in the number of nodes required to retain Big Data. For example 40-1 compression could cut the number of nodes from 75 down to 2! Which is not just a lower upfront purchase cost but also a significant ongoing total operating cost reduction. Why bother if your savings in deploying Hadoop are already so significant compared to “traditional” enterprise database or data warehouse hardware and software deployments? Besides the obvious fact that saving money never goes out of style, the sheer rate of data growth is outstripping advances in physical storage media, which means it is a never ending job to feed the elephant.

Cost aside, another way to look at the challenge is think logically about uncoupling the storage and processing requirements used within each Hadoop node for solving your problem. If you are adding nodes purely to hold the data, you might be significantly under-utilizing the CPUs in each node. Also those CPUs might also be spending effort re-inflating data, if compressed via LZO or Gzip, rather than being fully applied to supporting the query or business analytic calculations. RainStor on the other hand, requires no re-inflation and actually the RainStor compressed files contain more records per block and have a magnification effect on disk performance and bandwidth upon access. So you end up in an almost surreal situation where not only is the data more compressed, Pig and MapReduce jobs actually run faster! So even though the number of nodes are reduced, they would be more efficiently used thereby allowing you to set the correct balance of adding nodes for processing power and storage needs.

Finally RainStor’s ability to run natively on Hadoop is due to the fact that our architecture fits Hadoop and HDFS like a glove. As a large block, MPP database already using MapReduce capabilities internally, it was a natural fit for RainStor to run on HDFS. This enables RainStor to be part of the Hadoop deployment, rather than a database or data warehouse connecting to or transferring data out of HDFS. Because you get all of the security, auditing, unique compliance and data lifecycle management features and more you would expect from an enterprise database that speaks perfect SQL so that your traditional BI tools can access the data without having to transform or transfer it into a separate environment. Furthermore our data virtualization partner Composite Software allows data stored within RainStor on Hadoop to be seamlessly combined with other data sources around the enterprise without the need for large scale copy or transfer.

In closing I have to give credit to our CFO Jamie Andrews (who is a budding marketer on the side) for the title of this blog. He knows a thing or two about saving money and articulated that RainStor’s compression and node reduction will allow enterprises to feed their Hadoop cluster peanuts, all while making Pig and MapReduce jobs fly!


The Object of Big Data Retention

Posted on December 14th, 2011

With the unparalleled explosion of unstructured content in the form of documents, emails, images and video, object storage has been rapidly gaining popularity as a leading method for enterprises and ISVs for defining new cloud storage solutions, big data repositories and general storage infrastructure.

Unlike a block-oriented interface that reads and writes fixed sized blocks of data, object storage organizes data into flexible-sized data containers. Each object has both data and metadata describing the object. Its simplicity makes it perfect for managing, retaining and allowing access to unstructured data, with a goal of delivering performance, flexibility and robustness. Object-based storage breaks the barriers of file system limitations, making it as an alternative to Network Attached Storage (NAS), tape and Virtual Tape Libraries (VTL). The only caveat to storing data at an object level is the inability to query or access structured data at a fine grain level.

With structured (relational data) and semi-structured (logs, delimited data) data growing just as rapidly, enterprises have been left to seek complementary file-based options to object-based storage to cover all their big data retention needs … until now.

Today we announced that we have ported and certified the RainStor Big Data Retention database to run on Caringo’s leading object store CAStor. Caringo has over 400 customers ranging from SMBs to Fortune 500 companies. With the integration, RainStor continues to demonstrate the ability to run on any form of storage environment and hardware configuration. RainStor is the only database to allow fine-grained query and accessibility using standard SQL and BI tools on CAStor.

The combination allows both unstructured and structured Big Data to be retained and accessed from a single environment improving compliance and search of related data. For example a Healthcare organization could store their patient medical images as well as patient documents and structured database medical records for years or decades on CAStor.

Partnering with Caringo made perfect sense from an architectural perspective; we share the philosophy of the lowest possible TCO through simplicity in administration and choice of commodity hardware and storage. In addition both of our products are ideally suited for cloud-based deployments. The low TCO should be of particular interest to ISVs or hosted/cloud service providers to allow them to offer new data retention services while boosting their margins through efficiencies such as RainStor’s market-leading compression that can be applied to both user data as well as internal logs.

Until now, different environments were required to handle the most efficient retention of unstructured and structured data. Now with RainStor and Caringo, we believe we have solved the object of Big Data retention.


RainStor for Teradata: The Cost is Low but The Data is Big

Posted on September 29th, 2011

Big Data is simply a fact of life for most IT groups and according to a McKinsey report published earlier this year, the average growth rate across enterprise applications is 40% which is in stark contrast to the average growth of IT budgets, now hovering at 5%. More recent survey findings from DBTA, (part of Unisphere Media) who polled over 600 respondents claim that among organizations managing 500TB or more of data, a majority report that the growth is a result of both business demand, as well as a proliferation of data warehouse and business intelligence (BI) applications.

Data warehouses are on the front lines of the Big Data trend and according to the survey findings: Many respondents report increasing issues in the performance of their applications as a result of data growth. However, many still look to hardware—additional server and storage systems—as the way to handle prolific, near-petabyte or multi-petabyte data.

The survey also goes on to point out that: As data grows, the reflex reaction by most organizations is to buy and install more disk storage. Close to one-third now embrace tiered storage strategies, and only one out of five is putting information lifecycle strategies into place to better and more cost- effectively manage their data.

Throwing more hardware at the Big Data problem is not the best approach. Aside from the obvious Opex costs, not to mention the damaging effects on the environment, there has to be a better way to corral the never-ending generation of more data – let’s be honest, it’s not going to slow down for quite some time.

Here at RainStor, we believe we know a little bit about corralling big data and at the same time avoid compromising what the business needs, which is continued access to large historical data for better reporting and analysis. According to the same DBTA survey, 25% report that they hold on to data forever, indefinitely. Data retention requirements are not going away, and Big Data is demanding some control.

As Teradata partners we are excited to announce that RainStor for Teradata forms the perfect complement to Teradata’s Data Warehouse by allowing even more data to be retained in support of compliance requirements or deeper historical analysis. Included with RainStor for Teradata is a FastConnect™ capability, which makes moving data bi-directionally between a Teradata database and RainStor achievable at the highest possible rates.

Data retained in RainStor benefits from high rates of compression that comes from patented value and pattern de-duplication capabilities and the data can be stored on your choice of low-cost storage platforms and environments. This significantly multiplies the amount of data that can be retained online in support of Teradata analytics. On an ongoing operational basis, this cost can be up to 10x less per Tb. Of course, the tradeoff is that the data stored within RainStor is best suited for direct SQL access and can be viewed with standard BI tools. To perform more complex analytics you can rapidly move data back into Teradata without missing a beat.

As an added bonus, the same RainStor database and infrastructure can act as a primary repository for large amounts of machine-generated data that does not require complex analytics. This can include data from communications call data records, logs and so on. RainStor’s ability to ingest at high rates, extreme compression rates and on demand query-able access, scalability and low administration provides the most cost-efficient means of retaining and managing this type of data.

RainStor is sponsoring and exhibiting at Teradata Partners conference next week, so if you would like to learn more about RainStor for Teradata please reach out to us here or stop by our booth – # 615.


5 Things You Need To Know Before You Retire … Your Apps

Posted on May 11th, 2011

Ramon Chen on 11 May 2011

The sooner you start saving and investing for retirement, the more money you may accumulate for your golden years. Ironically a similar concept applies to your applications, the sooner you identify and shutdown your old applications and retain the data elsewhere, the more you can save operationally and refocus your resources on things that will help drive your business. (more…)


A Small Price to Pay for Big (Machine-generated) Data Retention

Posted on March 8th, 2011

Ramon Chen on 08 March 2011

Big Data used to be mostly generated as a result of human-driven interaction (texting, online retail purchases, stock trades) but in this new age, more and more data is machine-generated (call data records, automated stock trades, smart meter sensors, security monitoring appliances, test and measurement devices). Machine Generated Data is widely expected to form the bulk of data growth into the future. (more…)


Big Data Requires Big Thinking

Posted on February 24th, 2011

John Bantleman on 24 February 2011

It has been well documented in the press that the age of Big Data (or Extreme Data as Gartner is coining it) is among us. While a great amount of attention has been placed on so called unstructured data (emails, videos, images etc), the urgent focus for most businesses is on structured or semi-structured big data, examples of which include call data records, sensor data from smart meters, automated trades, logs and events from cyber security monitoring. (more…)


The year Ahead – 2011 Big (but very real) Predictions

Posted on December 20th, 2010

Retention is on the Rise (the Data Kind)

The need for dedicated technology solutions to support compliant structured data will increase in 2011.

Many organizations in heavily regulated industries have already experienced pain managing large and growing data sets. Many have invested in storage compression technologies, which provide cost savings and benefits by physically compressing data at the byte or file-block level and this has certainly helped for unstructured big-data types such as documents, e-mail messages, images, and videos.

As organizations continue to retain critical, structured, transactional data in production environments far longer than legally required, their primary systems quickly bloat and require ongoing capacity planning to accommodate future growth. IT operations will attempt to stay on top of this problem by adding processing power to meet end-user performance and query-response times. (more…)


Enterprise Information Archiving: It’s Not Your Father’s Archiving Platform

Posted on November 9th, 2010

Ramon Chen on 09 November 2010

Gartner has published their “Magic Quadrant (MQ) for Enterprise Information Archiving (EIA)” – October 2010 (login required) as a direct replacement for their Email Active Archiving MQ which they have been publishing since 2002.

As stated in their report “…e-mail archiving is only one component of vendor’s overall solutions”. The report goes on to say that they are seeing significant interest through their enquiries towards archiving of multiple content types. Storage and e-discovery efficiency (single search across all content from one interface) are just a few of the benefits that customers are seeking from their archiving strategy. Rather than just e-mails and documents, other types of data such as instant messages, SMS and structured (database) data are starting to warrant compliant retention and on demand access. In a separate Gartner report titled “Enterprise Information Archiving Transforms the Strategy and Approach for Archiving” – June 2010 (login required), Gartner forecasts that EIA will become a key infrastructure component and will hold both structured data and unstructured content by 2013. (more…)


Sustainable Big Data

Posted on October 15th, 2010

Ramon Chen on 15 October 2010

I finally got round to watching the documentary The Corporation (2003) on DVD last night. If you haven’t seen it, it’s quite disturbing as it portrays a bleak picture of how large corporations generate great wealth, but can also cause great harm.

Environmental impact was of course one of the subjects reviewed in the film. Ray Anderson founder and chairman of Interface Inc.., the world’s largest manufacturer modular carpet for commercial and residential applications, talked about the non-sustainability of the industrial revolution. He called for a strong drive towards industrial ecology and leading by example through a reduction of waste from his company’s manufacturing process. (more…)


101010 The Answer to Life, the Universe and Big Data

Posted on October 10th, 2010

Ramon Chen on 10 October 2010

On October 10, 2010 a day represented as 10/10/10 raises awareness and brings back memories of many iconic events and topics. In the spirit of this day, here are 10: (more…)


The Big Data Odd Couple: Retention and Analytics

Posted on August 9th, 2010

Ramon Chen on 09 August 2010

I recently caught the 1968 movie The Odd Couple on TV. It starred Jack Lemmon and Walter Matthau, and was based on Neil Simon’s 1965 Broadway play. The movie and subsequent TV series features a neat freak and neurotic, Felix Ungar rooming with his friend Oscar Madison, a messy sportswriter. As you may recall Felix and Oscar get into many humorous situations that highlight their personality differences. But somehow, someway, they get along despite their different approaches. Not unlike Big Data Retention and Analytics solutions, both have differentiated unique capabilities, and both can and should get along. (more…)


Make Your Applications Younger By Getting OLDR

Posted on July 8th, 2010

Ramon Chen on 08 July 2010

OLTP or Online Transaction Processing databases and OLAP or Online Analytics Processing tools and data warehouses are widely used in data management applications and IT infrastructures. But as George Crump of Storage Switzerland points out in his Information Week blog post “Keeping Data Forever vs. Data Retention” the decision of what data to keep and when to purge it has always been a dilemma which has yet to be resolved. (more…)


Simplification Through Specialization

Posted on July 6th, 2010

Ramon Chen on 06 July 2010

For all the advances the human race has delivered over countless centuries, one could argue that this is still a very complex world indeed. True, things can now be accomplished a lot faster, cheaper and more efficiently than ever before, but exactly how far have we come that it still takes an average person till the age of 21 before they amass the skills needed to seek their fortune in the business world. Tack on additional MBA time or Doctor/Lawyer level focused education, and on the job training and all of a sudden you are pushing late 20s before you feel that you’ve found your niche and “career”. (more…)


Diets, Discipline and Data

Posted on June 11th, 2010

Ramon Chen on 11 June 2010

According to the Wikipedia page on Weight Loss “Between $33 billion and $55 billion is spent annually on weight loss products and services, including medical procedures and pharmaceuticals, with weight loss centers garnering between 6 percent and 12 percent of total annual expenditure. About 70 percent of Americans’ dieting attempts are of a self-help nature. Although often short-lived, these diet fads are a positive trend for this sector as Americans ultimately turn to professionals to help them meet their weight loss goals.” (more…)


How to Stand Out from the Cloud

Posted on May 12th, 2010

Ramon Chen on 12 May 2010

Today EMC Corporation announced an expansion of its EMC Atmos cloud partner ecosystem to help customers manage and optimize external clouds as part of an overall private cloud strategy. RainStor participated in the early evaluation and feedback of the Atmos platform. Our close partnership with EMC has ensured that our ISV partners, who embed RainStor as part of their overall solution, immediately gain the ability to leverage Atmos as their platform of choice with little or no effort on their part. As Chuck Hollis, VP Global Marketing CTO eloquently articulated in today’s blog post Building The Atmos Storage Ecosystem Atmos is “… designed to deliver services, rather than simply storage. And if you’re a hardware type, you’re frustrated that the hardware of Atmos is relatively uninteresting—almost all of the value comes from software.” (more…)


The Other Side of the Big Data Problem

Posted on March 23rd, 2010

Ramon Chen on 23 March 2010

If you are unfamiliar with Big Data, may I suggest you use a pioneer of big data searches to find out more? Google the term “big data”, in quotes, and as of today March 21st, 2010 you’ll get the following: Results 1 – 10 of about 249,000 for “big data”. (0.24 seconds). Google again in a month and that number will have grown appreciably. This could be said to be a foundational Big Data example illustrating the use of a modern tool to analyze and present vast quantities of data in its most relevant form. (more…)



RESOURCE LIBRARY
Analyst coverage, Solution Overviews, Datasheets, Whitepapers, Case Studies
Download now »
VIDEO LIBRARY
Collection of informative videos on RainStor technologies and supported solutions
View now »
AWARDS
 
About RainStor | Management Team | Support | Contact Us | Terms and Conditions | Sitemap
© Copyright 2011 RainStor Inc. All Rights Reserved.
Follow us  
Twitter   Facebook   LinkedIn   YouTube