If you know anything about Hadoop architecture - the task seemed daunting to
us and it proved to be one of the most challenging engineering feat that we
have accomplished so far.
After almost 24 months of development, tens of thousands of lines of Java,
Scala and C++ code, multiple design iterations, several releases and dozens
of benchmarks later we have the product that can deliver real-time
performance to Hadoop with only minimal integration and no ETL required.
Backed-up by customer deployments that prove our performance claims and
validate our architecture.
Here's how we did it.
The Idea - In-Memory Hadoop Accelerator
Hadoop is based on two key technologies: HDFS for storing data, and MapReduce
for processing that data in parallel. Everything else in Hadoop itself and
the entire ecosystem coalesce around these two technologies.
Both - HDFS and MapReduce - were ... (more)
What are the performance differences between in-memory columnar databases
like SAP HANA and GridGain's In-Memory Database (IMDB) utilizing distributed
key-value storage? This questions comes up regularly in conversations with
our customers and the answer is not very obvious.
First off, let's clearly state that we are talking about storage model only
and its implications on performance for various use cases. It's important to
Storage model doesn't dictate of preclude a particular transactionality or
consistency guarantees; there are columnar databases tha... (more)
Let's start at... the beginning. What is the in-memory computing? Kirill
Sheynkman from RTP Ventures gave the following crisp definition which I like
"In-Memory Computing is based on a memory-first principle utilizing
high-performance, integrated, distributed main memory systems to compute and
transact on large-scale data sets in real-time - orders of magnitude faster
than traditional disk-based systems."
The most important part of this definition is "memory-first principle". Let
Memory-first principle (or architecture) refers to a... (more)