Introduction to Apache Hbase Architecture

Published on
June 9, 2022

Apache Hadoop has gained popularity in the storage, management, and processing of large amounts of data because it can handle large volumes of highly structured data. However, Hadoop cannot handle random high-speed writing and reading and cannot change files without completely rewriting. HBase is a column-based NoSQL Hadoop database that overcomes the shortcomings of HDFS by enabling fast arbitrary writing and reading in an optimized way. In addition, relational databases with exponentially growing data cannot handle various data for better performance. HBase architecture offers scalability and separation for efficient storage and retrieval.

HBase is a data module which provides quick random access to huge amounts of data. It’s a Column Family Oriented NoSQL (Not only SQL) Database which is built on the top of the Hadoop Distributed File System and is suitable for faster reading and writing at large volumes of big data throughput with low I / O latency. Hbase is used for Performance & Scalability. HBase is known for its exceptional scalability because it can handle an increase in load and performance demands by adding various server nodes. This provides optimal performance when consistency is very important and allows developers with modern SQL systems, distributed systems.

Let’s have a look into SQL and NoSQL functionalities:

SQL: Rigid Shema, Consistency, Transactions (Scans every row)

NoSQL: Speed, Flexibility, Scaling (Go directly to Column)

Useful for bulk data

Column Oriented

Document Oriented

Key-Value Store

Graph Oriented

HBase is on Hadoop/HDFS so HDFS’s features are also applicable to HBase

Features:

  • Fault tolerance
  • Replication
  • provides permission to Random Real-Time Access.
  • High Availability
  • Fast Processing
  • Can access through Java API or Thrift server or REST

HBase can use on Large data volumes TB or PB, Also where we don’t need RDMS features like Transactions, Complex Queries, Complex Joins their we can use HBase.

Facebook, Adobe, Twitter, Yahoo, etc use HBase.

Data in HBase is divided under ColumnFamily, and it’s a Master-Slave architecture

Master (HMaster)

Slave (Region Server)

The sequence of process is like:

Data on the HBase table are divided into regions.

256 MB is the default size of Region and it’s configurable.

Storing data in the first Region of 256MB gets full then the next data is inserted into a new region.

Size of Regions is configurable but it’s better to keep it as 256 MB, if we change it for large files then it affects performance.

Column Family of Region contains :

  • Memstore
  • BlockCache
  • HFile

Write Operation:

WAL: Write Ahead Log

When Data gets written in HBase that’s written in Hlog i.e Write Head Log or in the Memstore.

Write Head Log is a file that maintains all Region server, means in future we lost some data in the Region server then we can pick up that data from the Write-Ahead Log.

Memstore is also called Write Buffer, The data is stored in memstore before putting data in the actual disk.

If Memstore gets full then the data gets flushed and one Hfile created

One table can contain multiple Regions

One Region server can contain multiple regions

The region contains multiple ColumnFamily Which contains 2 memories

  • Read (BlockCache)

BlockCache contains the data which we frequently read, If we get request later to read that data so it can read fast, and the data which is least recently used gets clear from Block Cache because it stored in memory(RAM)

  • Write (Memstore/WriteBuffer)

Memstore is also called Write Buffer, The data is stored in memstore before putting data in the actual disk.

If Memstore gets full then the data gets flushed and one Hfile created

Region server handles multiple regions

HMaster

In HBase, HMaster handles multiple Region server

Create, delete, update operation performed through HMaster

Assign Region to any region server done by HMaster

Recover and Load balancing, reassigning Regions done by HMaster

Region server manages the recovery of the failed Region server

Zookeeper

HMaster and all Region servers send heartbeat signals to Zookeeper to acknowledge that they all are active and alive. If any Region server crashes then it failed to send heartbeat and zookeeper can get to know the server failed.

In HBase

  • Active HMaster (sends a heartbeat to Zookeeper)
  • Inactive HMaster

If one fails another takes place on Active HMaster by zookeeper

Manages Root Metadata server

In HBase to handle Read and Write operation, there are two tables

  • Root Table (Only one in the whole cluster)
  • Meta table (can be more than one)

Both table handle by Zookeeper

Both tables stores on Region server, Which contains details of region server, which datastores on which region server, which region stores on which region server

When we need to read any data, then it gets ask to table that where is the data, then that table gives us the location of the region, then Memstore, BlockCache, and HFile gets read if that data find then HBase provides that data to the user

Compactions

When data write into the HBase that time data stores in HFile which has a very small size (KB)

HBase is created to update, delete data easily, If the size of that file is large then it’ll be difficult to find that file and perform the operation, so small files are quite helpful, once we find that which file contains our records then it’ll be easy to find our data.

But if the data is too large like in TB and then it’ll create lots of small files and it’ll be difficult to manage all these small files that’s why the COMPACTIONS concept is introduced in HBase

There are two types of Compactions

  • Major compactions–
  • If we having Region in that Column Family is stored, in that there are 4 HFiles and Combining all 4 HFiles we create 1 HFile, this task done by Admin in non-peak hours

(Combining all Hfiles of ColumnFamily into one HFile)

  • Minor Compactions–
  • If we having Region in that Column Family is stored, in that there are 4 HFiles and Combining 2 2 HFiles we create 2 HFiles This is the example of Minor Compaction

(Framework does the minor compaction, we don’t need to do anything, we can set criteria for Minor Compaction)

Conclusion: HBase is one of the NonSql-oriented columns that are distributed in the queue. Compared to Hadoop or Hive, HBase performs better when taking fewer notes. In this article, we look at HBase architecture and its important components.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Share this post
Job title, Company name

Apache Hadoop has gained popularity in the storage, management, and processing of large amounts of data because it can handle large volumes of highly structured data. However, Hadoop cannot handle random high-speed writing and reading and cannot change files without completely rewriting. HBase is a column-based NoSQL Hadoop database that overcomes the shortcomings of HDFS by enabling fast arbitrary writing and reading in an optimized way. In addition, relational databases with exponentially growing data cannot handle various data for better performance. HBase architecture offers scalability and separation for efficient storage and retrieval.

HBase is a data module which provides quick random access to huge amounts of data. It’s a Column Family Oriented NoSQL (Not only SQL) Database which is built on the top of the Hadoop Distributed File System and is suitable for faster reading and writing at large volumes of big data throughput with low I / O latency. Hbase is used for Performance & Scalability. HBase is known for its exceptional scalability because it can handle an increase in load and performance demands by adding various server nodes. This provides optimal performance when consistency is very important and allows developers with modern SQL systems, distributed systems.

Let’s have a look into SQL and NoSQL functionalities:

SQL: Rigid Shema, Consistency, Transactions (Scans every row)

NoSQL: Speed, Flexibility, Scaling (Go directly to Column)

Useful for bulk data

Column Oriented

Document Oriented

Key-Value Store

Graph Oriented

HBase is on Hadoop/HDFS so HDFS’s features are also applicable to HBase

Features:

  • Fault tolerance
  • Replication
  • provides permission to Random Real-Time Access.
  • High Availability
  • Fast Processing
  • Can access through Java API or Thrift server or REST

HBase can use on Large data volumes TB or PB, Also where we don’t need RDMS features like Transactions, Complex Queries, Complex Joins their we can use HBase.

Facebook, Adobe, Twitter, Yahoo, etc use HBase.

Data in HBase is divided under ColumnFamily, and it’s a Master-Slave architecture

Master (HMaster)

Slave (Region Server)

The sequence of process is like:

Data on the HBase table are divided into regions.

256 MB is the default size of Region and it’s configurable.

Storing data in the first Region of 256MB gets full then the next data is inserted into a new region.

Size of Regions is configurable but it’s better to keep it as 256 MB, if we change it for large files then it affects performance.

Column Family of Region contains :

  • Memstore
  • BlockCache
  • HFile

Write Operation:

WAL: Write Ahead Log

When Data gets written in HBase that’s written in Hlog i.e Write Head Log or in the Memstore.

Write Head Log is a file that maintains all Region server, means in future we lost some data in the Region server then we can pick up that data from the Write-Ahead Log.

Memstore is also called Write Buffer, The data is stored in memstore before putting data in the actual disk.

If Memstore gets full then the data gets flushed and one Hfile created

One table can contain multiple Regions

One Region server can contain multiple regions

The region contains multiple ColumnFamily Which contains 2 memories

  • Read (BlockCache)

BlockCache contains the data which we frequently read, If we get request later to read that data so it can read fast, and the data which is least recently used gets clear from Block Cache because it stored in memory(RAM)

  • Write (Memstore/WriteBuffer)

Memstore is also called Write Buffer, The data is stored in memstore before putting data in the actual disk.

If Memstore gets full then the data gets flushed and one Hfile created

Region server handles multiple regions

HMaster

In HBase, HMaster handles multiple Region server

Create, delete, update operation performed through HMaster

Assign Region to any region server done by HMaster

Recover and Load balancing, reassigning Regions done by HMaster

Region server manages the recovery of the failed Region server

Zookeeper

HMaster and all Region servers send heartbeat signals to Zookeeper to acknowledge that they all are active and alive. If any Region server crashes then it failed to send heartbeat and zookeeper can get to know the server failed.

In HBase

  • Active HMaster (sends a heartbeat to Zookeeper)
  • Inactive HMaster

If one fails another takes place on Active HMaster by zookeeper

Manages Root Metadata server

In HBase to handle Read and Write operation, there are two tables

  • Root Table (Only one in the whole cluster)
  • Meta table (can be more than one)

Both table handle by Zookeeper

Both tables stores on Region server, Which contains details of region server, which datastores on which region server, which region stores on which region server

When we need to read any data, then it gets ask to table that where is the data, then that table gives us the location of the region, then Memstore, BlockCache, and HFile gets read if that data find then HBase provides that data to the user

Compactions

When data write into the HBase that time data stores in HFile which has a very small size (KB)

HBase is created to update, delete data easily, If the size of that file is large then it’ll be difficult to find that file and perform the operation, so small files are quite helpful, once we find that which file contains our records then it’ll be easy to find our data.

But if the data is too large like in TB and then it’ll create lots of small files and it’ll be difficult to manage all these small files that’s why the COMPACTIONS concept is introduced in HBase

There are two types of Compactions

  • Major compactions–
  • If we having Region in that Column Family is stored, in that there are 4 HFiles and Combining all 4 HFiles we create 1 HFile, this task done by Admin in non-peak hours

(Combining all Hfiles of ColumnFamily into one HFile)

  • Minor Compactions–
  • If we having Region in that Column Family is stored, in that there are 4 HFiles and Combining 2 2 HFiles we create 2 HFiles This is the example of Minor Compaction

(Framework does the minor compaction, we don’t need to do anything, we can set criteria for Minor Compaction)

Conclusion: HBase is one of the NonSql-oriented columns that are distributed in the queue. Compared to Hadoop or Hive, HBase performs better when taking fewer notes. In this article, we look at HBase architecture and its important components.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Share this post
Job title, Company name

Experience Azure to build and deploy the apps your way

Contact us to start your journey today

Enhance your business with Amazon cloud services

Contact us to know more about our AWS offerings

Accelerate your digital transformation with our IT solutions

Connect with our experts now

Optimize your business with the insights from Big Data

Contact us for any queries or services

Leverage IT to amplify your health care services

Contact us for more details

Get cutting-edge IT solutions for your manufacturing business

Connect with our expert now

Make the most out of your data with our solutions

Connect with our experts to know more

Get started with Database administration

Not sure what you need? Reach out to us now.

Ready to elevate your business to the next level?

Get in touch to know how

Select the right cloud service for you

Get advise from our dedicated Cloud Consultants

Optimize your business with the insights from Big Data

Contact us for any queries or services

Start your DevOps journey with us

Contact us to know all about DevOps

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Privacy PolicyTerms of ServiceSitemap
© 2022 Cloudaeon All Rights Reserved.