Microservice Architecture: The Real Challenge

Microservice is a buzzword among the developer community. In this article, we will explore the most important points about Microservice...

Share on:  
                 

Technology

Microservice Architecture: The Real Challenge

Microservice is a buzzword among the developer community. In this article, we will explore the most important points about Microservice architecture that are seldom addressed elsewhere. We will try to cover all the practical problems of moving to microservice architecture.

Let’s start with some advantages of microservice architecture. Following are three major advantage for which you may consider moving to microservice architecture.

  1. Engineering flexibility: Each service team can choose a different programming language or framework. (provided interoperability is taken care of between the languages and protocols).
  2. Independent development: Services can be developed, deployed and maintained independently (It’s not 100% true, we will see why later)
  3. Technology flexibility: Easy to introduce new technology i.e. Blockchain, AI/ML etc.

You will find many articles on the Internet discussing the advantages of microservices in detail but my intention is to highlight some practical problems.

If you are doing microservice for the first time then It will be Pandora’s box. You will find many surprises which you never read on any article advocating microservices.

So, Let’s jump to the practical world of microservice architecture. We will focus on the following areas of microservice architecture.

  1. Communication between services
  2. Service Management
  3. Observability
  4. Developer’s Life

1. Communication between services

The complexity of communication between services (aka routing) increases exponentially with the number of services. It’s simple and manageable if the project has 10–12 services but it becomes more and more complex if the project has 20+ services. It’s difficult to remember and understand which service communicate with whom. Imagine a star network with 100s of nodes.

Following diagram shows a simple and complex example.

Any consideration for choosing microservice architecture must answer the following questions first.

  1. How does service A makes a request to service B? The answer can be dependent upon a number of services you have in the product. It can be simple HTTP, RPC or Websocket which send a request and gets a response back. After deciding communication protocol next thing would be…
  2. Where are (healthy) instances running?
  3. How does service A find Healthy instances of a particular service?
  4. How to define “health” of a service?

You need tools which helps you to solve the above issues. A tool that continuously monitors all instances of all services and keeps track of their health status. Basically, you need a service registry. [Consul] by HashiCorp is the most common service registry. Consul will help to maintain a list of instances, which instances are healthy and which are not.

You can also look for [ZooKeeper by Apache] or you can use container orchestrator engines like Kuberneters. Container orchestrator will take care of all.

If you don’t want to depend upon 3rd party tools and like adventure then you can use DNS registry. If you decided to use DNS as a service registry then remember that DNS doesn’t return port number by default. Of course, you can do SRV or DNS lookup which returns the port number, but it’s not normal practice and DNS is not fast to respond to change and It also does not give health status.

The really simple option is to create small EC2 instance per service and scale manually as per your need. It will work if you have a limited number of services but it will not give you agility and speed of service registry. Scaling will be slow because you need to create EC2 instance each time and creating instance takes time.

If you are a Redis fan and you already have Redis in your current infrastructure, you can explore [Hydra]. Hydra is a node package which helps for routing, discovery, registry etc. which facilitates building distributed applications such as Microservices. Hydra does this by leveraging the power of Redis.

We solved (sort of) the first problem of microservice, which is to identify healthy instances of service.

let’s move on to the second challenge, which is load balancing.

  1. You need to decide which instance of service you need to send a request to.
  2. What should be the timeout?
  3. What if the request fails?
  4. If a request fails then you need to decide retry policy.
  5. How many time do you need to retry?

In Microservice, defensive coding is important. Defensive coding means taking extra care of error handling. Every service communicates with others via Network and many things can go wrong on the network. It is possible that service is slow in response or service might be overloaded, In this situation, you need to think how best you can handle the response.

Rate limiting is a common technique that you need to use. A service receives traffic from a bunch of other services. What to do when a service gets overloaded with request? What you can do to prevent overloading in the first place? The rate limit is an option to preserve overloading service. Consul has a feature to configure the rate limit.

Now, There are main two popular ways to handle this complex business logic regarding service to service communication, i.e. rate limit, load balancing, service discovery, health check etc.

  1. Client-Side management
  2. Service Mesh

Client-Side management is simple. For example, When service A wants to call Service B, Service A acts as a client of Service B and Service B acts as a server. Service A has all client-side logic to handle everything Service B is offering it. Service A can have all logic related to timeouts, retries, which instance to call, etc. This approach offers the following benefits and challenges.

  1. Easy to understand, potentially easier to debug.
  2. Managing client libraries can be difficult especially when services are developed in different programming languages.
  3. It’s a tight coupling between services. Eg. Any change in Service A’s API/RPC needs to be implemented in all services which have a client of Service A. This will be a lot of effort if you have 100s of services because client-side logic is built into all service. You need to update all instances which have Service A’s client even though Server itself doesn’t have undergone any changes. Compare this with a monolithic architecture. You deploy an update once, and that’s it.

In a Service Mesh approach, the logic related to client-side is extracted in to a proxy program called “Sidecar” or “Mesh”. Service Mesh runs alongside every service. In this approach, Mesh instance communicate with each other, services are not directly communicating with each other. Mesh is communicating to Service registry and Mesh knows how to call other service and where to find it. It also knows which instances are healthy. It automatically collects logs, matrics and sends it to proper places. Service Mesh approach has become popular recently. I suggest don’t jump start Service Mesh if you are implementing microservice for the first time because It will be the most important piece in your infrastructure. If anything goes wrong with Service Mesh, your entire system will be down. Before implementing Service Mesh, make sure you assign a dedicated team who understand Service Mesh framework inside out. The service mesh is a new approach so, there are not many mature production ready frameworks available. [Istio] is getting popular and [Linkerd] is another popular framework.

Service Mesh has the following benefit and challenge.

  1. No libraries to maintain, automatic observability and security, easier to manage a large number of services.
  2. It’s another critical piece of infrastructure to manage.

Finally, What is the solution of Routing in microservice?

It depends upon the type of project size and complexity but If it’s your first microservice project. I suggest starting with Consul by HarshiCorp.

Services Management

Management of services is challenging if you do not have the right tools and knowledge. I suggest you should start with Docker, run Docker in production, debug Docker in production, fix docker issues in production. Once you are comfortable with Docker in production then implement service registry i.e. Consul. Make yourself comfortable with a service registry in production. Don’t jump start to container orchestration engine. Start slow, start with a small team. Don’t introduce container orchestration engine until you feel that you have too many services and too many servers which you are not able to manage manually.

The popular container orchestration frameworks

  1. Kubernetes
  2. Docker Swarm
  3. Nomad
  4. Mesos + Marathon
  5. Cloud Providers Features — Amazon ECS/ Fargate

Kubernetes is the most popular amongst the open source community. I also recommend going for it unless you are working on .Net and deploy the project on Azure. [Microsoft Azure Fabric] Service Mesh is readymade service provided by Azure. Kuberneters make your life simple. Kubernetes is developed by Google’s Engineers. Kubernetes is an open source project and part of CNFC. In Kubernetes, everything is managed using Pods. Pods are essentially like mini VM where you can run one or more containers (Docker). You can run service mesh as a sidecar on the same pod with Istio. Kubernetes has many features out of the box. There are many tools like Helm, Minikube, Kopts etc. which will make development and deployment much easier.

Observability

Once you are using Kubernetes for microservice architecture, your next challenges will be: How to see things? What’s going inside the cluster? Where to get accurate logs? How to scale services independently? How to manage ingress? How to configure a load balancer for services? How to protect sensitive information? And many more…

In a Monolithic system, you can have SSH into the instance and check what’s going on in the system but in the word of Kubernetes and microservices, software or program runs behind many layers (virtualization, docker engine, etc). Many dockers run on a single VM. In this case, developers may ask Why my service is slow? Why others' services are fast? Why my service is slow sometimes but not always? Why my request for other service gets time-out sometimes? Any many more… to answer all the questions you start collecting metrics and logs. Metrics might not help as you expected even with 20 to 30 services because it will generate a lot of data so you will be confused with where to look and what to look in matrices. Now the question is which matrix do you care about? Maybe start with memory and CPU? You may be looking for container’s CPU and Memory and Host VM’s CPU and Memory, Disk Usage and Network usage. Even if you have 20–30 services with 2–3 instances of each makes 100 containers to watch. It will generate huge data. It’s just too much information that you can’t make sense of. So, Make sure to collect matrices and logs which will help you to answer questions when something critical happens. Here are a few tools which will help you increase observability

Logs: ELK, Spunk, Sumo Logic, Cloud provider’s log collectors.

Metrics: Prometheus, InfluxDB, Datadog, Wavefront, New Relic.

Tracing: Zipking, Opentracing, Jaeger, Stackdrive.

Distributed tracing is the most essential part of observability. It will act as Systrace of your services. In which you can identify data flows between service with time. You can identify the various issue. You can identify which service is faster and which one is slower. Think about which logs and matrices you care about and log only those.

Developer’s Life

You want your developer to be happy. You need to understand how they work in monolithic architecture and what will change when you move to microservice. Everyone always wants development, test and production environment to be as identical as possible. A developer wants to run a service on his/her laptop. In a microservice architecture, Developer A’s service depends upon Service B which is being developed by a different team. It may happen that the developer’s service is dependent on 5–6 different services. It would be challenging to fulfil all dependency on a laptop. If you are running Docker in production, developers may run Docker in the laptop to test but if you are running Kubernetes in production then does that mean developers will need to run Kubernetes in a laptop? The answer depends on your setup. It will work if you have a limited number of services but assume that you have more then 30 services. It’s nearly impossible to install everything on a developer’s laptop. Now, you need to think about how you can provide all the dependencies locally. The bad news is that there isn’t any proper solution available yet. Nobody has solved this problem perfectly. There are a few approaches that you can take and the most popular is a shared cloud environment. You can have a development environment where all services are deployed and you can run your service against it from your laptop. It will work with some shortcomings. As many teams are developing different services and deploy on the same shared environment. It may happen that some other teams deploy the latest version of their service and you are not aware of changes deployed by another team, your service may stop because of that. Another approach is Mock services. It’s not full service, it will not have a real data set. You may lose corner cases and developing and testing Mock services will take additional time. The last approach can be to map dependencies to load them automatically. You need some program which identifies dependencies of your service and procure it for you before you execute your service. This is very hard to do. There isn’t any ready to use tool available now for dependency management. I think a shared cloud environment is easy and useful approach as of now.

Next challenge would be tools and integration. You need development tools which are integrated with IDE for debugging and also integration with orchestration framework. You also need to build CI / CD pipeline which supports containers.

Summary

Don’t worry about new tools and technologies yet. Use the tools you know. There are many new technologies like Docker, Istio, Kuberneters, Serverless, GraphQL, TICK Stack, MEAN Stack, Event Sourcing, CQRS pattern, Micro Frontends etc. but the most important things that companies really care about are success and performance. Remember that you are trying to solve a business problem. Business only cares about solving a business problem. They do not care about which tools & technologies you employ. If you think that you can solve a business problem with old stable technologies then go for it. It’s not necessary to use cutting edge technologies just for the sake of novelty or hype. Make project success, get the users in the market then think about microservices and other latest technologies, that too only if it improves upon the existing solution significantly. Moving from monolithic to a microservice architecture isn’t just a change in coding. It’s a much bigger change in organisational structure, communication and coding culture.

Use it when you really need it.