If you know anything about Hadoop architecture - the task seemed daunting to
us and it proved to be one of the most challenging engineering feat that we
have accomplished so far.
After almost 24 months of development, tens of thousands of lines of Java,
Scala and C++ code, multiple design iterations, several releases and dozens
of benchmarks later we have the product that can deliver real-time
performance to Hadoop with only minimal integration and no ETL required.
Backed-up by customer deployments that prove our performance claims and
validate our architecture.
Here's how we did it.
The Idea - In-Memory Hadoop Accelerator
Hadoop is based on two key technologies: HDFS for storing data, and MapReduce
for processing that data in parallel. Everything else in Hadoop itself and
the entire ecosystem coalesce around these two technologies.
Both - HDFS and MapReduce - were ... (more)
Let's start at... the beginning. What is the in-memory computing? Kirill
Sheynkman from RTP Ventures gave the following crisp definition which I like
"In-Memory Computing is based on a memory-first principle utilizing
high-performance, integrated, distributed main memory systems to compute and
transact on large-scale data sets in real-time - orders of magnitude faster
than traditional disk-based systems."
The most important part of this definition is "memory-first principle". Let
Memory-first principle (or architecture) refers to a... (more)
In-Memory Technology Will Open the Doors to a Wave of Innovation
by Abe Kleinfeld and Nikita Ivanov
Gordon E. Moore's famously predicted tech explosion was prophetic, but it may
have hit a snag. While the number of transistors on integrated circuits has
doubled approximately every two years since his 1965 paper, the ability to
process and transact on data hasn't. We're now ingesting data faster than we
can make sense of it, leaving computing at an impasse. Without a new
approach, the innovation promised by the combination of Big Data and internet
scale may be like the flying car... (more)
The Facts and Fiction of In-Memory Computing
In the last year, conversations about In-Memory Computing (IMC) have become
more and more prevalent in enterprise IT circles, especially with
organizations feeling the pressure to process massive quantities of data at
the speed that is now being demanded by the Internet. The hype around IMC is
justified: tasks that once took hours to execute are streamlined down to
seconds by moving the computation and data from disk, directly to RAM.
Through this simple adjustment, analytics are happening in real-time, and
applications (as well as th... (more)
Today, we are proud to announce the first code drop of Apache Ignite, Apache
Ignite v1.0 RC (Release Candidate), available for download on the Apache
Ignite homepage. This is an exciting time for the project and the committers
have been working hard since November to reach this milestone. We commend
them all. Apache Ignite v1.0 RC not only carries forward the capabilities
formerly available as the open source edition of the GridGain In-Memory Data
Fabric, but now also boasts new ease-of-use and automation features,
simplifying the deployment of an in-memory data fabric and allowi... (more)