Every business will not be willing to wait for hours to get the updated analytics, at the same time not all results need to be updated very quick and fast, So the trade of needs to happen between the time and the effort as well as cost that involves generating the results. Basically the trade of needs happens between the cost and latency.
Lambda bigdata architecture supports both batch analytics as well as speed analytics. It also helps us to amalgamize the results generated out of the speed analytics and batch analytics and get the real value to the customer, we call that as lambda architecture and that has majorly divided into three layers
- Batch layer
- Servicing layer
- Speed layer
Now let us have a quick introduction to Lambda Architecture, as we discussed earlier, it has three layers – batch layer, servicing layer and speed layer.
Batch layer that provides the functionality of managing master data and immutable append-only set of data, precomputing arbitrary query functions.
Speed layer, this layer accommodates all requests that are subject to have low latency requirements using fast and incremental algorithms, the speed layer deals with recent data only.
Servicing layer indexes the batch data or the views so that the queries can be done in an ad-hoc way and the results could be available or can be generated with low latency
Each of these layers can be realized using various big data technologies, for instance, the batch layer datasets can be distributed across file systems using a distributed file system called HDFS Hadoop Distributed File System that we use it within Hadoop framework. While Mapreduce with Hadoop can be used to create batch views that can be fed to the servicing layer.
The servicing layer can be an implementation of multiple technologies especially no NoSQL technologies like HBase or Cassandra while querying can be implemented by technology such as apache drill or Impala
Finally, the speed layer can be realized with data streaming technologies such as the Apache Storm and Spark streaming. The number of technologies available within the big data ecosystem is really very huge that is like a toolbox we will have multiple tools within the toolbox and the right tool should be used for the right purpose.
Use case on how lambda architecture is being implemented in real-world scenarios
Node provides horizontal scalability and facility for massive storage and massive compute capacity. Data can be brought into the system in batch or in real-time in any case the first activity that needs to be done with the data is that needs to be stored with massive historical data machine learning or predictive analytics will be done with a help of data scientist by providing the right algorithm to the right data to extract the model or a pattern. This model can be again persisted or can be cached in memory. The cached models always give sub-second results the latency will be very very less to predict any value out of any data using this model.
In this case, let us consider sensor data coming from telephone towers heavy earth-moving vehicles, data will continue to flow. This is streaming data. When the data gets generated from these instruments either from telephone tower or earth moving instruments. The sensor data first activity we need to do is persist the data in some storage, once the data is stored with huge historical data, we can apply any machine learning algorithm or algorithm which are applicable to that specific data set we can predict and we can generate a model or a pattern hidden within that particular dataset.
In this case, we may be interested in generating what is the pattern that was getting emitted or getting generated or getting evolved before that telephone tower failure maybe that may be a rise or spike in the power consumption by the chilling unit or that may be more call drops so all this hidden pattern that could be n number of parameters so this hidden pattern can be identified with a historical data and that will be stored as a complex function as a model. With this particular model if I feed in the streaming data that is happening at this point of time model can predict where any pattern involved in the streaming data, where that tower about to fail so basically I can predict from that particular model as well as the streaming data, is there any failure about to happen in that particular streaming data. So basically I can predict with the historical knowledge what I have as well as what is happening now with the streaming data So that’s where the streaming data that is the speed layer the model that is stored or generated from the historical data that is the batch layer and a dashboard which is going to give the alarm to the user or send out a notification to the management team or the administration team saying there are chances of this particular telephone tower or this earthmoving instrument about to fail So that’s where all these three-layer comes together to add value to the customer and vendors, that’s where lambda architecture comes into picture.