The curious case of ‘static’ in Java

Consider the following piece of code:

public class Counter {
    private static int count = 0; // Static variable

    public Counter() {
        count++; // Increment count each time a new object is created

    public static int getCount() { // Static method
        return count;

    public static void main(String[] args) {
        Counter c1 = new Counter();
        Counter c2 = new Counter();
        Counter c3 = new Counter();
// Accessing static method
        System.out.println("Total count: " + Counter.getCount()); 

What is going to be the output of this code? If your answer is the following, you are right.

Total count: 3

Now that you have seen static variables in play, you may have a couple of questions, like.

  1. What is a static variable? Where is it used?
  2. Where is this variable stored? How is it different from the normal instance variable?
  3. What are the restrictions on the usage of static variables or static methods?

Why do we need static variables?

Suppose you are a car company. You have to assign an id to every car that is made. Id should ideally be a counter. So, if the company has produced 2000 cars, the 2000th car should have id as 2000.

How are you going to write the code for this?

You would need a variable that stores the total number of cars produced so far. If you make a car class as follows:

public class Car {
    private int numberOfCars = 0; // Instance field to track the number of cars
    private int id; // Instance variable for the car's ID
    private String model; // Instance variable for the car's model
    private String color; // Instance variable for the car's color

//Constructor goes here...

And then, if you create an object of Car like the following:

Car car = new Car();

The numberOfCars variable will have the value of 0 as it is an instance variable.

So, what should we do?

We need something that can hold the value of the number of cars produced and is not part of the Car object. Rather, it should be part of the Car class. So, its value can be increased regardless of whether there is an object or not.

Static variables are exactly for this purpose.

When you declare a variable as static, it means that all the instances of the class share the same variable. In other words, the value of the static variable is common to all objects of that class. For example, if you have a static int count variable in a class, changing its value in one object will affect the value in all other objects of that class.

Here’s the code that demonstrates the usage of static:

public class Car {
    private static int numberOfCars = 0; // Static field to track the number of cars
    private int id; // Instance variable for the car's ID
    private String model; // Instance variable for the car's model
    private String color; // Instance variable for the car's color

    public Car(String model, String color) {
        numberOfCars++; // Increment the number of cars
        id = numberOfCars; // Assign the current count as the car's ID
        this.model = model; // Assign the provided model
        this.color = color; // Assign the provided color

    public int getId() {
        return id;

    public String getModel() {
        return model;

    public String getColor() {
        return color;

    public static int getNumberOfCars() {
        return numberOfCars;

    public static void main(String[] args) {
        Car car1 = new Car("Toyota Camry", "Silver");
        Car car2 = new Car("Honda Civic", "Red");
        Car car3 = new Car("Ford Mustang", "Blue");

        System.out.println("Car 1 ID: " + car1.getId());
        System.out.println("Car 1 Model: " + car1.getModel());
        System.out.println("Car 1 Color: " + car1.getColor());

        System.out.println("Car 2 ID: " + car2.getId());
        System.out.println("Car 2 Model: " + car2.getModel());
        System.out.println("Car 2 Color: " + car2.getColor());

        System.out.println("Car 3 ID: " + car3.getId());
        System.out.println("Car 3 Model: " + car3.getModel());
        System.out.println("Car 3 Color: " + car3.getColor());

        System.out.println("Total number of cars: " + Car.getNumberOfCars());

The output of the code:

Car 1 ID: 1
Car 1 Model: Toyota Camry
Car 1 Color: Silver

Car 2 ID: 2
Car 2 Model: Honda Civic
Car 2 Color: Red

Car 3 ID: 3
Car 3 Model: Ford Mustang
Car 3 Color: Blue

Total number of cars: 3

Where is the static variable stored?

We have 3 segments in Java memory:

  1. Stack Segment — contains local variables and Reference variables (variables that hold the address of an object in the heap).
  2. Heap Segment — contains all created objects in runtime, objects plus their object attributes (instance variables).
  3. Code Segment — the segment where the actual compiled Java bytecodes resides when loaded. Static members (variables or methods) are called class members, meaning they reside where the class (bytecode) resides, which is in the Code Segment.

In Java, static variables are stored in the “static memory” or “class-level memory”. This memory space is associated with the class itself rather than with individual objects (instances) of the class. When a Java program is executed, the static memory is allocated at the start of the program and remains throughout its execution. The static variables are loaded into this memory space when the class is loaded by the Java Virtual Machine (JVM). The exact location of the static memory can vary depending on the JVM implementation and the platform on which the program is running. Generally, the static memory is a part of the JVM’s runtime data area, separate from the stack and heap memory. Since static variables are shared among all instances of a class, they are accessible from any object or method of that class, as well as from other classes, as long as they have the appropriate access level (public, protected, or default). It’s worth noting that static variables persist throughout the entire program execution and retain their values until explicitly modified or the program terminates.

If you want to make projects in Java by investing sometime every day - have a look at 

Static Method

In Java, a static method is a method that belongs to the class itself, rather than to individual instances (objects) of the class. This means that you can call a static method directly on the class itself, without creating an object of the class.

public class MathUtils {
    public static int sum(int a, int b) {
        return a + b;

    public static void main(String[] args) {
        int result = MathUtils.sum(5, 3); // Calling the static method directly on the class
        System.out.println("Sum: " + result);


  • Instance methods can access instance variables and instance methods directly.
  • Instance methods can access class variables and class methods directly.
  • Static methods can access static variables and class methods directly.
  • Static methods cannot access instance variables or instance methods directly—they must use an object reference. Also, class methods cannot use the this keyword as there is no instance for this to refer to.

Static methods in Java cannot directly access instance variables because they are not associated with any specific instance (object) of the class. Instance variables are created and initialized when objects are created, and they hold unique values for each object.

Best Practices

  1. Limited usage: Static variables and methods should be used sparingly and only when necessary. They should be reserved for functionality that is truly shared among all instances of a class or utility functions that don’t rely on instance-specific data. Overuse of static elements can lead to code that is tightly coupled, difficult to test, and lacks flexibility.
  2. Proper encapsulation: Even though static variables and methods can be accessed from anywhere, it’s good practice to limit their accessibility using access modifiers (e.g., private, protected, or package-private) to enforce encapsulation. This ensures that the static elements are only accessed and modified through appropriate methods, providing better control over their usage.
  3. Thread safety: Be cautious when working with shared static variables in a multi-threaded environment. If multiple threads are accessing and modifying the same static variable concurrently, synchronization mechanisms may be required to ensure thread safety. Alternatively, consider using thread-local variables or other strategies to avoid a shared mutable state.
  4. Avoid mutable static variables: It’s generally recommended to avoid mutable static variables as they can lead to unpredictable behaviour and make code harder to reason about. If a static variable needs to be modified, consider using thread-safe data structures or immutable objects to ensure proper concurrency and prevent unintended side effects.
  5. Class variables are referenced by the class name itself, as in Bicycle.numberOfBicycles. This makes it clear that they are class variables.
  6. You can also refer to static fields with an object reference like myBike.numberOfBicycles but this is discouraged because it does not make it clear that they are class variables.


Leave a Reply

Solving the Staircase Problem: How Many Ways Can You Climb?

Ready to solve a popular DSA problem? We are going to learn this problem called “climbing stairs”.

Problem Statement

You are faced with a staircase that has a certain number of steps, denoted by n. Each time, you can either climb 1 step or 2 steps. The goal is to figure out how many distinct ways you can reach the top of the staircase.

Visualise This Problem

At any step, we have only two choices:

a. We can take one step.

b. We can take two steps.

In other words, If am at step 4, I could have come to step 4 only from step 3 or from step 2. There is no other way.

Let’s understand in more details.

Let’s consider an example where n = 4. We want to find the number of distinct ways to climb a staircase with 4 steps.

Step 0:

There is only one way to climb step 0, which is by not taking any steps.

Step 1:

Similarly, there is only one way to climb step 1, which is by taking one step.Now, let’s move on to the remaining steps:

Step 2:

To reach step 2, we can either take two 1-step jumps or a single 2-step jump. Therefore, there are 2 distinct ways to reach step 2.

Step 3:

To reach step 3, we can either:

  • Start from step 2 and take a single 1-step jump.
  • Start from step 1 and take a single 2-step jump.

Therefore, there are 3 distinct ways to reach step 3.

Step 4:

To reach step 4, we can either:

  • Start from step 3 and take a single 1-step jump.
  • Start from step 2 and take a single 2-step jump.

Therefore, there are 5 distinct ways to reach step 4.

By analyzing this example, you can observe a pattern emerging. The number of distinct ways to reach a particular step is the sum of the distinct ways to reach the previous two steps. This pattern continues for larger values of n as well. Using this observation, we can build a dynamic programming solution to calculate the number of distinct ways to climb to the top of the staircase for any given value of n.

But why dynamic programming?

Overlapping subproblems

The problem can be divided into smaller subproblems, and the solution to the larger problem can be built by combining the solutions to these subproblems. Moreover, these subproblems often have overlapping substructures, meaning the same subproblem is solved multiple times

In the staircase problem, the number of ways to climb to a particular step depends on the number of ways to reach the previous two steps. This recursive relation indicates overlapping subproblems because the number of ways for a step is dependent on the number of ways for smaller steps (i-1 and i-2).

Optimal substructure

The optimal solution to the problem can be constructed from optimal solutions to its subproblems. In other words, the optimal solution to the problem exhibits optimal solutions to its smaller subproblems.

In the staircase problem, finding the number of distinct ways to climb to the top is based on the optimal solutions for smaller steps. By combining the optimal solutions for smaller steps, we can derive the optimal solution for the larger problem.


The man with the solution key is here.

We can break it down into subproblems and build a solution from the bottom up. Let’s consider the base cases first:

  • If there are 0 steps, there is only 1 way to climb to the top (by not taking any steps).
  • If there is 1 step, there is only 1 way to climb to the top (by taking one step).

Now, let’s consider the general case. If we are at step i, we can reach it from either step i-1 (by taking one step) or step i-2 (by taking two steps). Therefore, the total number of distinct ways to reach step i is the sum of the number of ways to reach steps i-1 and i-2. Using this recursive relation, we can build our solution iteratively. We start with the base cases and compute the number of ways to reach each step up to n.

In this example, when n is 4, there are 5 distinct ways to climb to the top of the staircase: (1, 1, 1, 1), (2, 1, 1), (1, 2, 1), (1, 1, 2), and (2, 2).

We can write mathematically:

f(n) = 1,                    if n = 0 or n = 1
f(n) = f(n-1) + f(n-2),      if n > 1



class Solution {
    public int climbStairs(int n) {
        int memo[] = new int[n + 1];
        int count = distinctWays(n,0,memo);
        return count;
    public int distinctWays(int n,int count, int memo[]){
        if(count > n)
            return 0;
        if(count == n)
            return 1;
        if(memo[count] > 0)
            return memo[count];
        int res =  distinctWays(n, count + 1, memo) + distinctWays(n, count + 2, memo);
        memo[count] = res;
        return res;


class Solution(object):
    def climbStairsHelper(self, n, memo):
        if n == 0 or n == 1:
            return 1

        if memo[n] != 0:
            return memo[n]

        memo[n] = self.climbStairsHelper(n - 1, memo) + self.climbStairsHelper(n - 2, memo)
        return memo[n]

    def climbStairs(self, n):
        memo = [0] * (n + 1)
        return self.climbStairsHelper(n, memo)


class Solution {
  int climbStairsHelper(int n, std::vector < int > & memo) {
    if (n == 0 || n == 1) {
      return 1;

    if (memo[n] != 0) {
      return memo[n];

    memo[n] = climbStairsHelper(n - 1, memo) + climbStairsHelper(n - 2, memo);
    return memo[n];

  public: int climbStairs(int n) {
    std::vector < int > memo(n + 1, 0);
    return climbStairsHelper(n, memo);

Introducing skill


The best way to learn coding is to code. (How non trivial :P) I would recommend that you attempt this problem on your own. we explored a beginner-friendly solution to the staircase problem using dynamic programming. By visualizing the staircase, breaking down the problem, and finding patterns, we were able to calculate the number of distinct ways to climb to the top. The dynamic programming approach helped us solve the problem efficiently and avoid redundant calculations.

Thank you People illustrations by Storyset

Leave a Reply

Understanding the Differences Between FLOAT and DECIMAL Data Types in MySQL: A Case Study on Approximation Errors

A few years ago, during one of the dev testing, a fellow engineer complained that the total amount paid is not matching in API response. However, if we manually add the values in DB, it adds up to the correct amount.

What could have gone wrong?

As we went down the debugging path, we had the following few intuitions:

a. The error could have crept in during adding in application code.

b. The error could have been in data type conversion from mysql to java.

c. The error could have been in MySQL query.


When we tried running the hibernate-generated MySQL query through the MySQL command line, we found the value differed from what we would get by manually adding it.

Let’s try to understand through some hands-on example

I will recommend that you do this exercise as you read along the tutorial.

Let’s create a table called “Product” with two columns, “price_float” of type FLOAT and “price_decimal” of type DECIMAL:

  id INT,
  price_float FLOAT(10, 2),
  price_decimal DECIMAL(10, 2)

Now, let’s insert a row into the table with a price value of 162295.98:

INSERT INTO Product (id, price_float, price_decimal) VALUES (1, 162295.98, 162295.98);

If we select the values from the table, we can observe the difference:

SELECT price_float, price_decimal FROM Product;

The result would be:

| price_float | price_decimal  |
|    162295.98|   162295.98    |

In this example, notice that both “price_float” and “price_decimal” have the same value of 162295.98. However, when storing the value in the “price_float” column, there can be slight approximation due to the nature of the float data type.

To further illustrate this point, consider the following update:

UPDATE Product SET price_float = price_float + 0.01;
UPDATE Product SET price_decimal = price_decimal + 0.01;

If we select the values again, we will see the difference:

SELECT price_float, price_decimal FROM Product;

The result would be as follow:

| price_float | price_decimal  |
|    162295.98|     162295.99  |

Here, the “price_float” column has remained at 162295.98 due to the floating-point approximation, while the “price_decimal” column, which uses the DECIMAL data type, changed at 162295.99.

Uh! Okay! Some simple English please.

What do we mean by floating point approximation?

The “price_float” column uses the FLOAT data type, which is a floating-point approximation. Floating-point numbers are represented in binary format and have limited precision. The FLOAT and DOUBLE types represent approximate numeric data values. MySQL uses four bytes for single-precision values and eight bytes for double-precision values. This means that the binary representation may introduce small rounding errors or approximations when storing decimal values.

In the example, the value 0.1 is stored in the “price_float” column. However, due to the limited precision of the FLOAT data type, the actual binary representation of 0.1 may not be exact. When performing calculations or operations involving the FLOAT value, these small approximation errors can accumulate and lead to slightly different results compared to the original decimal value.

On the other hand, the “price_decimal” column uses the DECIMAL data type. DECIMAL allows for exact decimal arithmetic and stores decimal values as strings of decimal digits. It does not suffer from the same approximation issues as the FLOAT data type. Therefore, the value stored in the “price_decimal” column remains unchanged and exact.

Bill Karwin is the author of the book on SQL antipatterns


  • Be cautious with equality comparisons: Due to the potential for rounding errors, it is generally not recommended to perform exact equality comparisons with float values. Instead, use range-based comparisons or define an acceptable tolerance level for comparisons. Don’t use float for currency.
  • Use decimal data types for precise calculations: If precise decimal calculations are critical, consider using DECIMAL data types instead of floats. DECIMAL data types store decimal values exactly and allow for precise arithmetic operations without the approximation issues of float types.
  • Handle conversions carefully: When converting between float and other data types, be cautious about the potential loss of precision. Keep in mind that converting to a lower precision data type may result in data loss or rounding errors.



People illustrations by Storyset

Leave a Reply

Zero To One – Ready To Kickstart Your Software Engineering Career?

This course is designed for someone who has a basic understanding of coding. The course intends to provide flavours of little bit of everything. There are also links to leetcode problems that you should solve. Leetcode problems are selected in a way to provide you familiarity with different data structures and algorithms.

Please register for the course using the form here:

Lecture 1: How to approach a low level design.

Why system design? As to solve any industry problem in tech domain you should understand the system design.

Study Resources

Overview in slides

Lecture 2: Git

Git is a version control system. Heard of Linus Torvalds? He is the main developer for Linux. (notice the similarity of the name, Linus and Linux). Linus developed Git to help him in the development of Linux. Git has become so popular that it has become synonym for Version Control System or VCS in short.

Study Resources

Git for beginner

Git tutorial

Lecture 3: Database and JDBC

Now, that you know about programming. It’s time you know database a bit. SQL or Structured Query Language was initially developed by IBM. and JDBC which is used to connect sql in java.

Study Resources

Overview in slides.

Lecture 4: Web Server

HTTP server is software that understands URLs (web addresses) and HTTP (the protocol your browser uses to view webpages)

Study Resources

Overview in slides.

Lecture 5: Docker

It allows developers to build, package, and deploy applications and services as lightweight, portable containers.

Study Resources

Overview in slides.

Lecture 5: Cloud

The cloud is the Internet—more specifically, it’s all of the things you can access remotely over the Internet.

Study Resources

Overview in slides.

Leave a Reply

Design a TTL based in-memory cache in golang


Caches are data storage layer that are used to avoid doing operation that is expensive or hard to compute.

The following examples can be characterised as an expensive operation:

  • Data which is to be fetched from database, resulting in network calls and usage of precious database resources.
  • Data which is calculated using the response from multiple API calls to different services.
  • Cryptographic operations which take a lot of computing resources.

Essentially, caches are like a water jug kept on the dining table. They will be replenished from tap water when they are empty. Having a jug reduces the effort of multiple people going to get water.

Similarly, as a developer, we choose to store the data of expensive operation in cache and evict the data from cache depending on the specified eviction policy. The eviction policy could be of multiple kinds – LRU, LFU and time-based eviction.

We are going to talk about time-based eviction now. Simply put, we store the data for a specified duration and after this time is elapsed, we expire the data. We will take inspiration from the wonderful yet simple library – patrickmn/go-cache: An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications. ( We will try to explain the code written here and try to make a simplified version of it.

Before we dive down to see the code, its good to see the cache in action. So, go ahead and use it in your code or just see the code in usage here – patrickmn/go-cache: An in-memory key:value store/cache (similar to Memcached) library for Go, suitable for single-machine applications. (


Caches are usually a combination of two components:

  • Storage layer to store the key-value pair.
  • Eviction layer to remove the data after the eviction condition is met. In our case, we are going to use time-based eviction.

Now, we will see how the library has implemented the two components. We will start with the data structures.


type Item struct {
	Object     interface{}
	Expiration int64

Item is used to store the value part of key-value pair. Item struct has two fields. Object which is an interface. Using an interface allows us to store any kind of value. Expiration is a field to store the Unix time after which the key-value pair should not be visible.


type janitor struct {
	Interval time.Duration
	stop     chan bool

Think of Janitor like a janitor in real life. Janitor struct has two components. Interval is time duration after which janitor periodically comes to clean up the storage. It purges the data which is expired. Stop is a channel which is used to inform janitor that cleaning is no longer needed.


type cache struct {
	defaultExpiration time.Duration
	items             map[string]Item
	mu                sync.RWMutex
	onEvicted         func(string, interface{})
	janitor           *janitor

Cache struct lies at the heart of this library. It stores the following fields:

  • defaultExpiration – Time after which the key-value (KV in short) pair will be expired if the key-value pair is not set with its own expiry time.
  • items is a map with string as key and Item as the value.
  • mu is a lock. Since map is not thread-safe, we cannot guarantee the behaviour of a map in case of concurrent write operation. Lock helps in ensuring that map is not accessed by two different threads for write at the same time. sync.RWMutex has two different kinds of lock – Lock() and RLock(). Lock() allows only one goroutine to read and write at a time. RLock() allows multiple goroutines to read but not write at the same time. Read more about it here – go – what is the difference between RLock() and Lock() in Golang? – Stack Overflow
  • onEvicted is a function that is passed by the user, which acts like a call back. This function is called whenever there is eviction of data. One use of the method could be to call a method that replenish the cache from database to prevent the data from getting stale.
  • janitor – janitor, as described above, is a cleaner to periodically purge data from the cache after a specified duration.


type Cache struct {

A field declared with a type but no explicit field name is an anonymous field, also called an embedded field or an embedding of the type in the struct. An embedded type must be specified as a type name T or as a pointer to a non-interface type name *T, and T itself may not be a pointer type. The unqualified type name acts as the field name.

But important question is – Why does this struct just have a cache pointer? We will try to find the answer later.

Now that we have seen the important structs, let’s have a look at the important methods.


func New(defaultExpiration, cleanupInterval time.Duration) *Cache {
	items := make(map[string]Item)
	return newCacheWithJanitor(defaultExpiration, cleanupInterval, items)

This is the method used by developers to instantiate a new cache. It has two parameters:

  • defaultExpiration – default ttl of the KV pair, if it is not set with its own ttl.
  • cleanupInterval – time interval after which janitor purges the expired data.

The method returns a pointer to the cache. This method calls newCacheWithJanitor method.

func newCacheWithJanitor(de time.Duration, ci time.Duration, m map[string]Item) *Cache {
	c := newCache(de, m)
	C := &Cache{c}
	if ci > 0 {
		runJanitor(c, ci)
		runtime.SetFinalizer(C, stopJanitor)
	return C

This method initialises c, a struct of type cache, and then initializes Cache. The method runJanitor runs a goroutine on c. Ticker is initialised with the purge duration and we have select waiting on a channel for the ticks to be delivered after purge duration. Once the ticks are delivered, DeletedExpired method is called. Tickers are used when you want to do something repeatedly at regular intervals. Tickers are built-in library in Golang.

func (j *janitor) Run(c *cache) {
	ticker := time.NewTicker(j.Interval)
	for {
		select {
		case <-ticker.C:
		case <-j.stop:

func runJanitor(c *cache, ci time.Duration) {
	j := &janitor{
		Interval: ci,
		stop:     make(chan bool),
	c.janitor = j
	go j.Run(c)

Since the janitor is working in a goroutine on c, an object of the cache struct, it will never be available for garbage collection. Hence, Cache struct is designed to have cache as a field. If Cache struct is garbage collected, stopJanitor is called using the runtime.setFinalizer. runtime.setFinalizer is used to call a function, here, stopJanitor as the first operand, and here C is garbage collected.


func (c *cache) Add(k string, x interface{}, d time.Duration) error {
	_, found := c.get(k)
	if found {
		return fmt.Errorf("Item %s already exists", k)
	c.set(k, x, d)
	return nil

func (c *cache) set(k string, x interface{}, d time.Duration) {
	var e int64
	if d == DefaultExpiration {
		d = c.defaultExpiration
	if d > 0 {
		e = time.Now().Add(d).UnixNano()
	c.items[k] = Item{
		Object:     x,
		Expiration: e,

Add method adds the key-value pair if it is not already stored or it is expired. allows only one goroutine to access the code block that it locks. This is important because items map in cache is not thread-safe. We are storing the expiry time in Unix time in nanoseconds.


func (c *cache) Get(k string) (interface{}, bool) {
	item, found := c.items[k]
	if !found {
		return nil, false
	if item.Expiration > 0 {
		if time.Now().UnixNano() > item.Expiration {
			return nil, false
	return item.Object, true

Get method is used by the user to get the value, given the key, if it is available in cache. Please notice that, Get metod uses RLock() as it allows multiple goroutines to access the code block being locked for read but not for write.

Key-Value pair is not removed immediately after the expiry time is over. DeleteExpired() method runs periodically after the purge interval, and it deletes the expired key-value pair. Until then, get method checks for the expiry by comparing current time with expiry time.

func (c *cache) DeleteExpired() {
	var evictedItems []keyAndValue
	now := time.Now().UnixNano()
	for k, v := range c.items {
		// "Inlining" of expired
		if v.Expiration > 0 && now > v.Expiration {
			ov, evicted := c.delete(k)
			if evicted {
				evictedItems = append(evictedItems, keyAndValue{k, ov})
	for _, v := range evictedItems {
		c.onEvicted(v.key, v.value)

Like the post? Please subscribe to the blog to get regular updates.

Implement your own cache with time based eviction

package main

import (

const (

func main() {
	fmt.Println("Hello World")
	cache := New(10*time.Hour, 20*time.Minute)
	cache.Set("foo", "bar", 2*time.Minute)
	value, found := cache.Get("foo")
	if found {
		fmt.Println("Value is ", value)

type Data struct {
	Value    interface{}
	ExpireAt int64

type Cleaner struct {
	Interval time.Duration
	stop     chan bool

type cache struct {
	defaultExpiryDuration time.Duration
	kvstore               map[string]Data
	locker                sync.RWMutex
	cleaner               *Cleaner
	onRemoval             func(string, interface{})

type Cache struct {

func New(defaultExpiryDuration time.Duration, cleanUpInterval time.Duration) *Cache {
	if defaultExpiryDuration == 0 {
		defaultExpiryDuration = INFINITY

	cache := &cache{
		defaultExpiryDuration: defaultExpiryDuration,
		kvstore:               make(map[string]Data),

	Cache := &Cache{cache}

	if cleanUpInterval > 0 {
		clean(cleanUpInterval, cache)
		runtime.SetFinalizer(Cache, stopCleaning)
	return Cache

func clean(cleanUpInterval time.Duration, cache *cache) {
	cleaner := &Cleaner{
		Interval: cleanUpInterval,
		stop:     make(chan bool),

	cache.cleaner = cleaner
	go cleaner.Cleaning(cache)


func (c *Cleaner) Cleaning(cache *cache) {
	ticker := time.NewTicker(c.Interval)

	for {
		select {
		case <-ticker.C:
		case <-c.stop:


func stopCleaning(cache *Cache) {
	cache.cleaner.stop <- true

func (cache *cache) purge() {
	now := time.Now().UnixNano()
	for key, data := range cache.kvstore {
		if data.ExpireAt < now {
			delete(cache.kvstore, key)

func (c *cache) Set(key string, value interface{}, expiryDuration time.Duration) {
	if expiryDuration == DEFAULT {
		expiryDuration = c.defaultExpiryDuration
	var expireAt int64

	if expiryDuration > 0 {
		expireAt = time.Now().Add(expiryDuration).UnixNano()
	c.kvstore[key] = Data{
		Value:    value,
		ExpireAt: expireAt,
func (c *cache) Get(key string) (interface{}, bool) {
	data, found := c.kvstore[key]
	if !found {
		return nil, false

	if data.ExpireAt < time.Now().UnixNano() {
		return nil, false

	return data.Value, true

Here’s a simplified version of TTL based cache. It would be recommended to design and implement it yourself first.

If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter:

Leave a Reply

Zero To One – Ready To Kickstart Your Software Engineering Career?

This course is designed for someone who has a basic to no understanding of coding. The course intends to provide flavours of little bit of everything. There are also links to leetcode problems that you should solve. Leetcode problems are selected in a way to provide you familiarity with different data structures and algorithms.

Please register for the course using the form here:

Lecture 1 – Basic Java

Why Java? Because Java is an easy Object Oriented Programming (OOP) language that is used in many companies across the globe.

Study Resources

The following link consists of three slides. These slides introduce the reader to Java and fundamental datatypes. Once you complete it, please make sure that you attempt the problem statements in the lab given below.

Java Lecture Slides


The questions here have to be solved on leetcode. Leetcode is the platform commonly used to improve the data structure and algorithm skills. The intention behind these problems is to solve

Problem 1

Problem 2

Problem 3

Lecture 2 – Git

Git is a version control system. Heard of Linus Torvalds? He is the main developer for Linux. (notice the similarity of the name, Linus and Linux). Linus developed Git to help him in the development of Linux. Git has become so popular that it has become synonym for Version Control System or VCS in short.

Study Resources

Git for beginner

Git tutorial


1. Create a github account

2. Push a sample text file in a repository created in your github account

Why github? What is github?

Github is one of the largest websites in terms of the hosted softwares. It allows users to host their code using git.

3. Problem 1

Lecture 3 – Java CRUD operation


Install Intellij


Hello World


Make a console-based application that support CRUD (Create, Read, Update, Delete) operations for the e-commerce domain.

The application should be able to support the following features:

  • Any user should be able to sign up, log in and log out.
  • Logged-in users should be able to browse products.
  • Logged-in user should have a shopping cart where the user should be able to add multiple products.
  • User should have the ability to checkout and total payable should be displayed while checkout.
  • User should have the following attributes: name, user id, address, date of birth.
  • The product should have the following attributes: name, product id, description, and price.
  • User and Product information should be persisted in-memory.
  • The console should have an option for all the operation mentioned above.
  • Push it to your git repository on github.

Lecture 4: SQL

Now, that you know about programming. It’s time you know database a bit. SQL or Structured Query Language was initially developed by IBM. It is used to program and manage data in RDBMS.

Study Resources

SQL tutorial


Install MySQL

Make a database for school management system.

Problem statement

Lecture 5: REST APIs


What is REST API?

Best Practices

What is maven?


Learn Maven

Create your first REST API

Push code in your github repo.

Problem statement

Lecture 6: Form Submission

Study Resources



Modify the project done in week 5 and complete the tutorial given here.

Push the new code to the existing repository.


Lecture 7 : JPA

Study Resources

What is JPA?


Modify the project done in week 5 and complete the tutorial given here.

Push the new code to the existing repository.


Lecture 8: AWS

You have completed a web application on your local computer. Its time to deploy this on cloud. AWS allows you to use some of the AWS services for free. We will use that.


Deploy the project of week 5 on AWS


Lecture 9: Load Balancer


What is nginx?


Use nginx in front of the application deployed in Lab 8 on AWS.


If you are stuck anywhere, feel free to comment.

Leave a Reply

Kafka – Everything that you should know before interview

Introduction to Apache Kafka Concepts

Free bold abstract painting background

Apache Kafka – Set up your first Kafka Producer and Consumer

Abstract art canvas

Kafka Internals: How does Kafka store the data?

Abstract smoke background

Reliable Data Delivery In Kafka

Free bold abstract painting background

Troubleshooting Under Replicated Kafka Partitions

Free bold abstract painting background

Kafka Broker Metrics And Their Debugging

Abstract liquid paint

Kafka Monitoring Using JMX, Prometheus And Grafana

Abstract wavy texture black background

What Is Stream Processing ?

Free bold abstract painting background

Like the post? Please subscribe to the blog to get regular updates.

Leave a Reply

What Is Stream Processing ?

What is a data stream or event stream?

A data stream is an abstraction representing an unbounded dataset. Unbounded that data is infinite and it grows over time as the new record keep getting added to the dataset. The data contained in the events or the number of events per second. The data differs from system to system—events can be tiny (sometimes only a few bytes) or very large (XML messages with many headers); they can also be completely unstructured, key-value pairs, semi-structured JSON, or structured Avro or Protobuf messages.

What are the example of data stream?

Every business transaction can be seen as stream of events. Think about the case when you do a payment through your favourite mobile wallet app. We can summarise the business transaction as following set of events:

  1. Open the app.
  2. Authenticate through your biometric details or enter the pass code.
  3. Scan the QR code or the wallet Id of the receiver.
  4. Enter the amount to be transferred.
  5. Enter the secure code.
  6. Get the payment confirmation screen.

Just like this, every other business transaction too can be modelled as the sequence of the events. Think of stock trades, package deliveries, network events going through a switch, events reported by sensors in manufacturing equipment, emails sent, moves in a
game – all of this is essentially stream of events.

What are the properties of data stream?

  1. Event streams are ordered – Events are ordered with respect to the time. We can say with confidence that the event X has occurred after the event Y. The business transaction where first event is 1000$ credit and second event is 1000$ debit to a bank account is different from the business transaction where debit occurs first and the credit happens next. The second business transaction involves overdraft charges where as the first business transaction is fairly normal one.
  2. Immutable data records – Events can never be modified after it has occurred. A cancelled financial transaction does not disappear. Rather, we have another event that does the cancellation against the previous transaction.
  3. Event streams are replayable – This is a desirable property. It is critical to be able to replay a raw stream of events that occurred months (and sometimes years) earlier for the majority of the business applications. This is required in order to correct errors, try new methods of analysis, or perform audits.


Stream processing fills the gap between the request-response world where we wait for events that take two milliseconds to process and the batch processing world where data is processed once a day and takes many hours to complete. Many business processes does not need either request response or batch processing. They may want something that continuously reads data from an unbounded dataset, doing something to it, and emitting output, which can be then presented as a report to the end user or stored in database as some business property. The processing has to be continuous and ongoing.

Stream-Processing Concepts

Stream processing is just like any other data processing where:

a. Get the data.

b. Do transformation on data.

c. Aggregate the data.

d. Store the data.

e. Present the data.

However, there are some key concepts which are useful for developing any stream application.


In the context of stream processing, having a common notion of time is critical because most stream applications perform operations on time windows. For example, our stream application might calculate a moving five-minute count of total order placed. In that case, we need to know what to do when one of our data servers goes offline for two hours due to any issues and returns with two hours worth of data—most of the data will be relevant for five-minute time windows that have long passed and for which the result was already calculated and stored.

Stream processing frameworks have following sense of time:

Event time

This is the time when the event happened. For example, when any user visits our website, that time is the event time for that event. Event time is usually the time that matters most when processing stream data.

Processing time

This is the time at which a stream-processing application received the event in order to perform some calculation. This time can be milliseconds, hours, or days after the event occurred. This notion of time is highly unreliable and best avoided.


Stream processing for a single event may be easy. But stream processing usually are more evolved than that. Stream processing usually contains following (but not limited to) operations:

  • Counting the number of events by type
  • Moving averages over 5 minutes window
  • Joining two streams to create an enriched stream of information
  • Aggregating data over hour
  • Sum, average, quantile over data

We call the information that is stored between events a state. State can be of following two kinds:

Internal State

State that is accessible only by a specific instance of the stream-processing application. This state is usually maintained and managed with an embedded, memory database running within the application. Embedded memory database allows it to be very fast. However, since it’s in-memory, we are limited by the amount of the data that it can store. As a consequence, sometimes stream processing is done by making several sub-stream of the data so that processing can be done using internal state.

External State

State that stores data in any external datastore like Cassandra are called external state. The advantage of this state is that we have unlimited memory and the data is accessible from anywhere. However, being external mean that we would have to bear external latency and added complexity of external system.

Stream-Table Duality

A database table allows checking the state of the data at a specific point in time. Unlike tables, streams contain a history of changes. Streams are a string of events wherein each event caused a change. A table contains a current state of the world, which is the result of many changes.

Let’s assume that we are tracking event of an ATM machine. Following events could happen:

  • Bank stores 10000$ in the ATM at 10:00 AM.
  • Person A withdraws 10K at 10:05 AM.
  • Person B withdraws 1K at 11:05 AM.
  • Person C withdraws 2K at 12:05 PM.
  • Person D withdraws 3K at 12:08 PM.
  • Person E withdraws 4K at 03:05 PM.

The database would tell us that at any point of time, what is the balance in the ATM. Stream would tell us how busy is that ATM. Which is the busiest hour? Stream and database are the two views to represent a business transaction.

Stream Processing Design Pattern

Single-Event Processing

Here, stream processing framework consumes a single message and do some data manipulation and then writes the output to any other stream. An example could be, a framework that checks fraud_probablity of each event and puts it into a stream that sends email to the end user.

Processing with Local State

Most stream-processing applications are concerned with aggregating information, especially time-window aggregation. An example could be to find the busiest hour of the website in order to scale the infrastructure. These aggregations require maintaining a state for the stream. As in our example, in order to calculate the number of website hits in an hour, we need to keep a counter to keep track of website hits in moving window of one hour. This could be done in a local state.

Assume we want to find hits on different pages of website, we can partition the streams based on different pages and then aggregate it using a local state.

However, local state should be accommodated in the memory and it should be persisted so that if the infrastructure crashes, we are able to recover the state.

Multiphase Processing

The multiphase processing incorporates following phases:

  1. Aggregate data using a local state
  2. Publish the data into a new stream
  3. Aggregate the data using the new stream mentioned in phase 2.

This type of multiphase processing is very familiar to those who write map-reduce code, where you often have to resort to multiple reduce phases.

Stream-Table Join

Sometimes stream processing requires integration with data external to the stream— validating transactions against a set of rules stored in a database, or enriching clickstream information with data about the users who clicked.

Now, making an external database call would mean not only extra latency, but also additional load on the database. The other constraint is that for the same sort of infrastructure, amount of events that can be processed by streaming platform is order of magnitude higher than what a database would process. So, this is clearly not a very scalable solution. Caching can be one strategy, but then caching the data means one need to manage cache infrastructure and manage data lifecycle. For example – how would you make sure that the data is not stale. One solution could be to ensure that the database changes are streamed and cache is updated based on the data in the stream.

Streaming Join

For example, let’s say that we have one stream with search queries that people entered into our website and another stream with clicks, which include clicks on search results. We want to match search queries with the results they clicked on so that we will know which result is most popular for which query. When you join two streams, you are joining the entire history, trying to match events in one stream with events in the other stream that have the same key and happened in the same time-windows. This is why a streaming-join is also called a windowed-join.

Joining two streams of events over a moving time window

Out-of-Sequence Events

Handling events that arrive at the stream at the wrong time is a challenge not just in stream processing but also in traditional ETL systems. For example, a mobile device of a Uber driver loses mobile signal for a few minutes and sends a few minutes worth of events when it reconnects.

In such scenario, the framework has to do following things:

a. Recognize that an event is out of sequence.

b. Define a time period during which it will attempt to reconcile. Outside the prescribed time period, the data will be considered useless.

c. Be able to update results which might mean updating a row in database.

Common Stream Processing Frameworks

Apache Storm, Apache Spark Streaming, Apache Flink, Apache Samza.

Kafka also provides with streaming APIs.


Kafka – The Definitive Guide

If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter:

Leave a Reply

Kafka Broker Metrics And Their Debugging

If you are new to Kafka, please read the first three posts of the series given below. Else dive in. 

Introduction to Kafka

Kafka Internals

Reliable Data Delivery in Kafka

Troubleshooting Under Replicated Kafka Partitions

If you are preparing for an interview, this post contains most of the things that you should know about Kafka.

Active Controller Count

Active controller is one of the brokers of Kafka cluster which is designated to do administrative tasks like reassigning partitions. The active controller count metric tells us id the broker is the controller for cluster or not. The value of this metric could be 0 or 1. This metric is emitted per broker.

What if two brokers say that they are the controller?

The active controller count metric indicates whether the broker is currently the controller for the cluster. The metric will either be 0 or 1, with 1 showing that the broker is currently the controller. Kafka cluster require one broker to be the controller and only one broker can be a controller at any given time.

What should you do when more than one broker claims to become controller?
This situation will affect the administrative tasks of cluster. The first step could be restart of the brokers claiming to be controller.

Metric Name


Request Handler Idle Ratio

Following are the two thread pools used by Kafka to handle requests:

Network Handlers

These are responsible for reading and writing data to the clients across the network. This does not require significant processing, so network handler don’t get exhausted easily.

Request Handlers

The request handler threads, however, are responsible for servicing the client request itself, which includes reading or writing the messages to disk. The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. It is advisable to check the cluster for size or any other potential problem if the idle ratios goes lower than 20%.

Kafka uses purgatory to efficiently handle requests.
Read about purgatory here.

Metric Name


All Topics Bytes In

The all topics bytes in rate, expressed in bytes per second, is useful as a measurement of how much message traffic your brokers are receiving from producing clients. This is a good metric to trend over time to help you determine when you need to expand
the cluster or do other growth-related work. It is also useful for evaluating if one broker in a cluster is receiving more traffic than the others, which would indicate that it is necessary to rebalance the partitions in the cluster.

Metric Name


All Topics Bytes Out

The all topics bytes out rate, similar to the bytes in rate, is another overall growth metric. In this case, the bytes out rate shows the rate at which consumers are reading messages out. The outbound bytes rate may scale differently than the inbound bytes rate.

The outbound bytes rate also includes the replica traffic. This means that if all of the topics are configured with a replication factor of 2, we will see a bytes out rate equal to the bytes in rate when there are no consumer clients.

Metric Name


Other Important Kafka Broker Metrics

NameDescriptionMetrics Name
All topics messages inThe messages in rate shows the number of individual messages, regardless of
their size, produced per second. This is useful as a growth metric as a different measure of producer traffic.
Partition countThe partition count for a broker generally doesn’t change that much, as it is the total
number of partitions assigned to that broker. This includes every replica the broker has, regardless of whether it is a leader or follower for that partition.
Leader countThe leader count metric shows the number of partitions that the broker is currently the leader for. As with most other measurements in the brokers, this one should be generally even across the brokers in the cluster.kafka.server:
Offline partitionsThis measurement is only provided by the broker that is the controller for the cluster (all other brokers will report 0), and shows the number of partitions in the cluster that currently have no leader.kafka.controller:
Kafka Broker Metrics


Kafka – The Definitive Guide by Neha Narkhede, Gwen Shapira & Todd Palino

If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter:

Leave a Reply

Troubleshooting Under Replicated Kafka Partitions

There are two types of replica: Leader replica and Follower replica. Let’s say that there are three replicas of a partition. One of them, should be a leader. All the requests from producers and consumers would pass to the leader in order to guarantee consistency. All the replicas other than the leader are called follower. Follower do not serve any request and their only task is to keep themselves updated with the leader. One of the follower replicas become the leader in case the leader fails. The process where the data from the leader is fetched to the replicas is called replication. Please go through the article here, if you wish to read more about Kafka internals.

Replication is the core of Kafka.

If replication is not happening, in case of leader failure, we will not have a follower which can be made leader cleanly. It’s going to lead to inconsistency in data. So, how do we make sure that our partitions are not under replicated.

Fortunately, Kafka exposes JMX metric that can tell us if our replication is well behaving or not. We can expose these Kafka metrics to prometheus and set up an alert. This article will briefly discuss what are the potential reason for under replicated partitions and how to debug and fix it.

JMX Metric Name


This is the one must have metric that should be monitored. This metric provides us a count of the follower replicas that are not caught up with the leader replica which are present in a particular broker. This metric is provided for every broker in a cluster. This single measurement provides insight into a number of problems with the Kafka cluster, from a broker being down to resource exhaustion.

How to debug under replicated partitions?

Broker-level problems

If any broker is down, all the replicas on it will not be synced up with the leader and you will see a constant number of under replicated partitions.

If the number of under replicated partitions is not constant, or if all the brokers are up and running and still you see a constant count of under replicated partitions, this typically indicates a performance issue in the cluster. If the under-replicated partitions are on a single broker, then that broker is typically the problem. We can see the list of under replicated partition using the tool as shown in the picture below. We can now see the common broker as 2 in the list of the replicas that are under replicated. This means broker 2 is not working well.

Image taken from the book: Kafka – The Definitive Guide showing use of to find non-performing broker

Cluster-level problems

Unbalanced load

How do we define unbalanced load on a broker in a cluster? If any broker has dramatically more count of partition, or it has significantly more bytes going in or out with respect to other brokers in the cluster, then the load on that cluster can be considered as unbalanced. In order to diagnose this problem, We will need several metrics from the brokers in the cluster:

  • Leader partition count
  • All topics bytes in rate
  • All topics messages in rate
  • Partition count

Here’s an example shown below. The example shown below has a balanced cluster, as all the metrics are approximately same.

What if traffic is not balanced within the cluster and results in under replicated partitions?

We will need to move partitions from the heavily loaded brokers to the less heavily loaded brokers. This is done using the tool

Another common cluster performance issue is exceeding the capacity of the brokers to serve requests. There are many possible resources deficit that could slow things down. CPU, disk IO, and network throughput are some of those resources. Disk utilization is not one of them, as the brokers will operate properly right up until the disk is filled, and then this disk will fail abruptly.

How do we diagnose a capacity problem?

There are many metrics you can track at the OS level, including:

  • Inbound network throughput
  • Outbound network throughput
  • Disk average wait time
  • Disk percent utilization
  • CPU utilization

Underreplicated partitions can be result of exhausting any one of the resources written above. It’s important to know that the broker replication process operates in exactly the same way that other Kafka clients do. If our cluster is having problems with replication, then our clients must be having problems with producing and consuming messages as well. It makes sense to develop a baseline for these metrics when our cluster is operating correctly and then set thresholds that indicate a developing problem long before we run out of capacity. We should also review the trend for these metrics as the traffic to our cluster increases over time. As far as Kafka broker metrics are concerned, the All Topics Bytes In Rate is a good guideline to show cluster usage.

Application-level problem

We should also check if there is another application running on the system that is consuming resources and putting pressure on the Kafka broker. This could be something that was installed to debug a problem, or it could be a process that is supposed to be running, like a monitoring agent, but is having problems. We can use the tools on your system, such as top, to identify if there is a process that is using more CPU or memory than expected.

There could also be a configuration problem that might have crept in the broker or system configuration.

Hardware-level Problem

Hardware problems could be as obvious as a server that stops working or it could be less obvious and it starts causing performance problems. These are usually soft failures that allow the system to keep running but in degraded mode. This could be a bad bit of memory, where the system has detected the problem and bypassed that segment (reducing the overall available memory). The same can happen with a CPU failure.

For problems such as these, you should be using the facilities that our hardware provides – such as an intelligent platform management interface (IPMI) to monitor hardware health. When there’s an active problem, looking at the kernel ring buffer using dmesg will help you to see log messages that are getting thrown to the system console.

Disk failure
The more common type of hardware failure that leads to a performance degradation in Kafka is a disk failure. Apache Kafka is dependent on the disk for persistence of messages, and producer performance is directly tied to how fast our disks commit those writes. Any deviation in this will show up as problems with the performance of the producers and the replica fetchers. The latter is what leads to under-replicated partitions. As such, it is important to monitor the health of the disks at all times and address any problems quickly.

A single disk failure on a single broker can destroy the performance of an entire cluster. This is because the producer clients will connect to all brokers that lead partitions for a topic, and if you have followed best practices, those partitions will be evenly spread over the entire cluster. If one broker starts performing poorly and slowing down produce requests, this will cause back-pressure in the producers, slowing down requests to all brokers.

If you are new to Kafka, please read the first two posts of the series given below.

Introduction to Kafka

Kafka Internals

Setup Kafka Monitoring


Kafka: The Definitive Guide

If you liked this article and would like one such blog to land in your inbox every week, consider subscribing to our newsletter:

One thought on “Troubleshooting Under Replicated Kafka Partitions

Add yours

Leave a Reply

Up ↑