Monday 4 January 2016

Getting started with Apache Mesos


Often, while running multiple applications or frameworks in a cluster, we face problems of isolating different services or resource conflicts between these applications. To solve these problems and to isolate the frameworks from each other, we now have the Docker technology which uses containerization and enables resource sharing.

You may ask "why do we need sharing if we could run each of the applications in a separate cluster". Suppose you have two applications, Spark and Storm, running on two different clusters. Both these applications are using the same dataset. For both of these applications to be able to work, the dataset has to be replicated, since the applications operate on different clusters. Replicating large datasets across clusters could prove to be too costly. Thus, multiplexing a cluster between frameworks helps to improve the cluster utilization and avoids per-framework data replication.

Now, how does Apache Mesos fit in? Apache Mesos is a platform that was built for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI. Mesos uses containerization technology, such as Docker and Linux Containers (LXC), to accomplish this. However, Mesos provides much more than that—it provides real-time APIs for interacting with and developing for the cluster.


Mesos is a system that allows us to harness all of the different machines in your cluster or data center and treat them as a single logical entity. It allows us to treat our cluster as a single, large group of resources. In technical terms, Mesos is an orchestration platform for managing CPU, memory, and other resources across a cluster. 

Now, why would you want to use it? Let's find out.

Companies use Mesos because of its reliability, flexibility, and efficiency. Mesos clusters require very few administrators to ensure that even large clusters remain operational. Twitter has just three operators managing its tens-of-thousands-of-node Mesos clusters. This might very well tempt you to use Mesos. We could stop focusing on individual machines, either for developing our applications or 
deploying them. Mesos increases our agility for testing and deploying new distributed systems. Even organizations that only use one framework can use Mesos to run multiple instances of that framework in the same cluster, or multiple versions of the framework.

Comparison to traditional deployment tools:

Deployment tools like Chef and Puppet can now be relegated to basic roles in which they bootstrap Mesos. These tools require the lists of tasks that should be performed on the various machines in cluster. Groups of tasks can be combined to describe a "role" or "recipe", which represents a higher-level concept, like setting up a database or web server. Finally, these groups of tasks are applied to a set of hosts, which then configures them according to the specification.

These deployment tools allow for configuration to be written in a structured, understandable form and are fundamentally designed to make a fleet of machines conform to a static configuration. However, they suffer from a fundamental limitation. What if we want our cluster to dynamically reallocate resources or shift configurations depending on various external factors, such as the current load? 

This is where Mesos frameworks are more powerful than static descriptions used by more traditional deployment systems. In Mesos, each framework is essentially a role or recipe where the framework contains all the logic needed to install, start, monitor, and use a given application. Since the framework is a live, running program, it can make decisions on the fly, reacting to changing workloads and cluster conditions. For instance, a framework could observe how many resources seem to be available, and change its execution strategy to better perform given the current cluster conditions. Because the deployment system runs constantly, it can notice and adapt to failures in real time, automatically starting up new servers as previous ones fail.

So, how does Mesos dynamically reallocate the resources? The answer is simple, by using cgroups. 

Control groups are a way to group Linux processes together in order to do resource accounting and allocation. Processes on a Linux system belong to a node in the cgroup hierarchy. Each node in the hierarchy has a certain share of the resources, and each child of a node has some subset of that node’s resources. Through these resource hierarchies, anything from CPU shares and memory to network and disk I/O bandwidth can be accounted for and precisely shared. This sharing and accounting is what guarantees that one task can’t starve another of resources. When the Mesos slave uses cgroups, it actually dynamically creates and destroys nodes in the hierarchy to match the allocations that the master calculates.

Note: Every Mesos slave should have cgroups enabled, because they provide the CPU and memory isolation that ultimately makes Mesos so reliable.

In Mesos, the distributed systems are assumed to have a central controller and many workers. Also, the workers work independently from the controllers so that they're not bottlenecked by it and don’t rely on it for their availability.

Note: Successful scalable systems like Apache Hadoop, Storm, Hive have some sort of central controller that organizes their workers.

Distributed applications that run on Mesos are called frameworks. A framework has two parts: the controller portion, which is called the scheduler, and the worker portion, which are called the executors. Mesos coordinates and collaborates with frameworks to allocate slices of machines in the Mesos cluster to various roles. The capability of brokering the division of machines between many frameworks is how Mesos enables greater operational efficiency.

Mesos Architecture with two running frameworks (Hadoop and MPI)

To run a framework on a Mesos cluster, we must run its scheduler. A scheduler is a process that can speak the Mesos protocol. When a scheduler first starts up, it connects to the Mesos cluster so that it can use the cluster’s resources. As the scheduler runs, it makes requests to Mesos to launch executors as it sees fit. When a scheduler wants to do some work, it launches an executor. The executor is simply the scheduler’s worker. The scheduler then decides to send one or more tasks to the executor, which will work on those tasks independently, sending status updates to the scheduler until the tasks are complete.

Schedulers can launch tasks based on resource availability, changing workloads, or external triggers. 
This complete flexibility for the scheduler is what makes Mesos so powerful. 

A Mesos cluster itself is comprised of two components: the Mesos masters and the Mesos slaves. The masters are the software that coordinates the cluster; the slaves are what execute code in containers. Don't confuse yourself between the Mesos cluster components and the framework components. 

Mesos uses the concept of resource offers to implement fine-grained sharing across frameworks. A resource offer is a list of free resources on a slave. The frameworks' scheduler registers with the Mesos master to be offered resources and an executor process is launched on slave nodes to run the framework's tasks. The Mesos master determines how many resources to offer to each framework, while the frameworks’ schedulers select which of the offered resources to use. When a framework accepts offered resources, it passes Mesos a description of the tasks it wants to launch on them. 

Mesos does not require frameworks to specify their resource requirements or constraints. Instead, Mesos gives frameworks the ability to reject offers. A framework can reject resources that do not satisfy its constraints in order to wait for ones that do. Thus, the rejection mechanism enables frameworks to support arbitrarily complex resource constraints while keeping Mesos simple and scalable.

Since all the frameworks depend on the Mesos master, it is critical to make the master fault-tolerant. To achieve this, we run multiple masters in a hot-standby configuration using ZooKeeper for leader election. When the active master fails, the slaves and schedulers connect to the next elected master and repopulate its state. The master’s state consists of the list of active slaves, active frameworks, and running tasks. A new master can reconstruct its internal state from the information held by the slaves and the framework schedulers.

Lot of theory??? I know, but it's important to understand how Mesos works before practically trying it out.

Now, let's deploy Apache Mesos on a single node. We will be installing Mesos on Ubuntu 15.04. But the following procedure should work for Ubuntu 12.04 and above.

First we update the packages as follows:


Next, we install a few utility tools that will be required going ahead


At the time of writing this post, the latest stable release for Apache Mesos is 0.26.0. So let's first download this stable release of Mesos as a tar file using wget utility as follows:


Next, we untar the tar file as follows



Next, we install the latest OpenJDK as follows:


Installing Mesos also requires installing some of its dependencies. So let's install them all using the following apt-get command.


Next, we change our working directory to the mesos directory and try to configure and build. 


While the build is working on, some of you might encounter this error. The error message here is quire clear-- the libz library is missing and is required by Mesos to continue the build process.


In order to solve this, we simply install the libz-dev package as follows:


Then, we again try to build the Mesos as before. This time you should see something similar like below, which indicates that the Makefile has been successfully created. 


Now, Mesos master requires a working directory. So we create one destined at /var/lib/mesos and ensure it has got proper permissions. 


Next, we start the Mesos master by running the mesos-master.sh script as follows: 


Now, open a new terminal and run the mesos-slave.sh script to start the Mesos slave as follows:


As you can see, the slave has been started on the localhost. Once it starts, it tries to register itself with the Mesos master. Since we have specified the master URI running on the same host, it registers itself with the master and gets an unique id.  


Next, as you can see on the slaves' terminal, the slave sends its current resource usage to the master.  


Similarly, on the master, you could see the "Added slave" message along with the available resources on that slave. The slave regularly sends the updates regarding the resource availability to the master. 


This is all you need to setup a single node Mesos system. If you have reached here, I hope you now understand the Mesos architecture and the way it handles the different distributed frameworks. Once you have the master running, you can also visit the master's web UI at http://l27.0.0.1:5050/ to get a graphical feel of this system. As you can see, on the home page it lists the active tasks along with the completed ones. On the left sidebar, it displays the number of slaves connected and the total resources available comprising the used, offered and idle resources. Under the Tasks subtitle, it also gives an overview of the state of all the tasks.


Next, as we click on the slaves tab, we could get the detailed information about each slave and the resource it offers.  


Then, clicking on the Frameworks tab will provide us information about different frameworks launched on top of Mesos. It lists both the active and terminated frameworks. This is an easy and simple way to get a complete overview of the different frameworks running on our Mesos cluster.


That's it. I hope you are benefited after reading this post. This post was intended to get you started with Mesos. To learn more about Apache Mesos and its advanced features, you should go through the official documentation of Mesos here.  

Do comment below if you have any doubts. 
Keep Reading!




References:
https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
http://mesos.apache.org/gettingstarted/

1 comment:

  1. Tritanium Gold™ for Genesis - Titsanium Art
    The Tritanium Gold™ for Genesis has been designed in Solingen, Germany, for some time. The microtouch solo titanium original titanium linear compensator in chi titanium flat iron a very microtouch titanium trim as seen on tv solid-steel frame. The €33.95 · ‎In stock titanium dioxide

    ReplyDelete