Monday, 21 October 2013

BangDB vs LevelDB - Performance Comparison

This post is about performance comparison for BangDB vs LevelDB. Following are high level overview of the dbs.

LevelDB

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Leveldb is based on LSM (Log-Structured Merge-Tree) and uses SSTable and MemTable for the database implementation. It's written in C++ and availabe under BSD license. LevelDB treats key and value as arbitrary byte arrays and stores keys in ordered fashion. It uses snappy compression for the data compression. Write and Read are concurrent for the db, but write performs best with single thread whereas Read scales with number of cores

BangDB

BangDB is a high performance multi-flavored distributed transactional nosql database for key value store. It's written in C++ and available under BSD license. BangDB treats key and value as arbitrary byte arrays and stores keys in both ordered fashion using BTREE and un-ordered way using HASH. Write, Read are concurrent and scales well with the number of cores. BangDB used here is the embedded version as LevelDB is also an embedded db, but BangDB is also available in other flavors like client/server, clustered and Data Fabric(upcoming)

Following commodity machine ($400 commodity hardware) used for the test;
  • Model: 4 CPU cores, Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz, 64bit
  • CPU cache : 6MB
  • OS : Linux, 3.2.0-54-generic, Ubuntu, x86_64
  • RAM : 8GB
  • Disk : 500GB, 7200 RPM, 16MB cache
  • File System: ext4
Following are the configuration that are kept constant throughout the analysis (unless restated before the test)
  • Assertion: OFF
  • Compression for LevelDB: ON
  • Write and Read: Sequential and Random as mentioned before the test
  • Access method: Tree/Btree
  • Key Size: 10 bytes
  • Value Size: 16 bytes
The tests are designed to cover following;

Performance of the dbs for
  1. sequential write and read for 100M keys using 1 thread
  2. sequential write and read for 75M keys using 4 threads
  3. random write and read for 75M keys using 1 thread
  4. random write and read for 75M keys using 4 threads
  5. sequential write and read for 1 Billion keys using 1 thread

Friday, 16 August 2013

Redis vs BangDB - Performance Comparison

This post is not related to our series of posts on "distributed computing". I have digressed a bit and since I released the BangDB as master -slaves config cluster hence thought of doing a simple performance comparison with the very popular db redis. This post is about a simple performance comparison of Redis and BangDB (server)

Redis: ( http://redis.io/topics/introduction )

Redis is an open source, BSD licensed, advanced key value store. It is also referred to as a data structure server since key can contain strings, hashes, lists, and sorted sets

In order to achieve its outstanding performance, Redis works with an in-memory dataset. Depending on the use case one can persist it either by dumping the data set to disk every once in a while, or by appending each command  to a log

Redis also supports trivial-to-setup master-slave replication, with very fast non-blocking first synchronization, auto reconnection. Other features include Transaction, Pub/Sub, Lua  scripting, Keys with limited time-to-live, and configuration to make Redis behave like a cache

BangDB: ( www.iqlect.com )

BangDB is multi flavored, BSD licensed, key value store. The goal of BangDB is to be fast, reliable, robust, scalable and easy to use data store for various data management services required by applications

BangDB is transactional key value store which supports full ACID by implementing optimistic concurrency control with parallel verification for high performance and concurrency. BangDB implements it's own buffer pool, write ahead log with crash recovery and provides users with many configuration to control the execution environment including the memory budget

BangDB works as embedded, stand alone server and cluster db. It's very simple to set up master-slave configuration, with high performant non-blocking slave synchronization without ever bringing the server standstill or down

Sunday, 26 May 2013

Model of distributed system

Post #2 of the distributed computing discussion series 


Previous post gave the high level introduction of the distributed system. In this post we will discuss about model of distributed system.

Once defined, model will help us understand many features and flavours of distributed computing challenges and put them in perspective and allow us to formulate workarounds or solutions to solve or overcome those challenges. These models will be used throughout in our next set of blogs related to the topic. Hence it is important to focus on the subject and understand clearly.

How do we visualize a distributed system?

Here is how I would describe a distributed system in simple terms. 
  • message passing system - nodes interact by sending/receiving messages
  • loosely coupled - no upper bound on message arrival time
  • no shared memory - all nodes have their own private memory
  • no global clock - clocks of different nodes can't be synchronized globally
  • a graph topology - consisting of processes as graph nodes and channels for message passing as edges (directional)
  • ordering of messages are not assumed in channels 
A simple figure to describe what I wrote above;


Monday, 6 May 2013

Distributed System - a high level introduction


Post #1 of the distributed computing discussion series

This is the beginning of the series of articles/blogs on the distributed computing. I will try to first put forward some of the basic concepts of the distributed computing and then take up some of the related problems and dig deeper. In the process I would also dedicate few blogs on existing products/systems which are relevant to the discussion and try to explain why few things are done in some manners or how someone has solved or overcome a problem.

This post is to quickly give you the introduction on the distributed computing from high level as it will be referred at many places in future blogs. Please note that this subject is too vast to cover in a single blog hence I will try to focus on stuff important from practical design and implementation perspective.

Distributed System

For simplicity and for the discussion sake lets define distributed system as a collection of multiple processors (programs, computers etc...) connected by a communication network. These connected processors try to achieve a common goal by communicating with each other using messages that are sent over the network.

Monday, 15 April 2013

Welcome to Grid Quorum

A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed scenario. A quorum-based technique is implemented to enforce consistent operation in a distributed system.

Typically mutual exclusion is sought to avoid conflict and in distributed system this becomes a rather complex job. Timestamps, Token-based algorithms are the other ways of doing the same, but they are vulnerable to failures, whereas quorum based algorithms don't have single point of failure. In the timestamps based algorithm, the process seeks permissions from all other participating processes to enter a critical section. In the token based algorithm the permission is required from one process. The quorum based algorithm, however, seeks permission from a subset of processes called request set.

There are three metrics to compare various quorum systems;