Accessing Internal Services

TLDR: Teams need to have the ability to deploy and access internal services easily.

In the days before microservices, teams lived in the “Admin Section” of an app. It was the dumping ground for all functionality used the team that they didn’t want users to see.

These days, that admin section should be a small constellation of microservices, focused entirely of functionality the developers and admins use to configure and monitor the cluster. Additionally, it’s not uncommon for apps to expose separate endpoints that expose stats or a simple API.

Accessing those internal services becomes critical. Teams might be tempted to resort to HTTP only, password protected APIs exposed on their public IPs. This is a security nightmare and should be discounted as soon as possible. Those internal APIs are never given the same security considerations as the public.

Mapping internal services to external, public ports also becomes an administrative headache. Every service needs a new port and keeping tracking of that port mapping and who is using which one (and thusly which ones need to be active) spirals out of control quickly.

Providing Access

VPN

One of the easiest ways to solve this problem is to provide a VPN connection that developers can use to obtain access the internal cluster network, so that they may access the internal services.

If you’re on a office network, perhaps even that office network is setup to allow routed traffic to your production network. If you already have this, great. But I actually don’t recommend setting this up. It makes it much harder to track who has access to the internal services and it’s possible you don’t want everyone on that network to access them. Additionally, when developers aren’t on that office network, they need another solution to access those resources anyway. For that reason, start and end with a good client VPN solution. Treat your office network like it’s the wifi at a coffee shop.

There are many VPN solutions, scaling up in depending on your needs. Everything from using a recent ssh’s native interface tunnel to OpenVPN and up to commercial VPN appliances.

One solution that I stumbled upon and have been using more and more is ZeroTier. It’s a peer-to-peer VPN which makes it more flexible (for instance to allow access to services provided by other clients) and the configuration is dead simple. They recently bumped up their free accounts to allow 100 devices and for $29.99 you can have unlimited devices. The client is open source so what you’re paying for is the convenience of using their cloud controller. But with the free accounts now allowing 100 devices, it’s easy to see that everyone but large companies could use the service free!

ZeroTier also provides a public API to allow querying which devices are online and their connection information, making it easy to incorporate DNS and auditing systems. The client also has a local API that can be used control it. This makes it also easy to debug and script against, depending on needs.

Basically, ZeroTier is the ultimate power user VPN.

Tunneling

Another mechanism that can be used is programmatic tunneling. The best example of this is kubernetes kubectl port-forward. This mechanism creates a tunnel and binds a local port to access a service running within a kubernetes Pod.

When a user needs to access a service, they simple run the right incantation of kubectl port-forward and then access the bound port. The traffic is tunnel over the kubernetes api service running within the cluster.

There is a big limitation with kubectl port-forward though. It can’t access services, only pods. This means that the name of the thing to bind to is unstable (because pod names are usually transient, based upon the name of the replicated controller). So to use kubectl port-forward with any regularity, users have to script translating from a replicated controller or service to a pod, then invoke kubectl port-forward. I hope that in the future, this functionality is built directly in, it would vastly improve the usefulness.

Directory

The number of revelation behind deploying microservices is that it’s easy to lose track of them. With dynamic scheduling and replicas, what is running where becomes a task that a computer should be tracking, not a human.

This directory of services becomes important when developers want to use these internal services. Hunting around for random IP address / port combos is a great way to loose a day and then accidentally use the wrong service all together.

This directory concept directly fits in with my previous post about bridging the gap between local and remote services. In this case though what is needed is basically an internal portal, listing the services and provide links to the internal HTTP ones. This concept is already done by larger engineering organizations but rarely makes it way down. Every team doing microservices, whether it’s a one person operation or Google itself, needs this functionality.

Case Study: Kubernetes + ZeroTier

I want to detail the setup that I currently use which is working fantastically, not only for developer access, but also to bridge services between clusters.

The names of the services/clusters have been changed to protect the innocent

The players

  • Cluster W, running:
    • Reader
    • Purge
  • Cluster M, running:
    • Members
  • Developer machine Z

Both clusters are running kubernetes and there are associated replicated controllers and services for each of the above apps.

Installation
  1. We need a ZeroTier network. Login and create one, giving it whatever name.
  2. Install the ZeroTier clients on the master instances of W and M, but not the minions.
  3. Add the master instances ZeroTier clients to your previously created network.
  4. Configure the master instances to run kube-proxy. This means that the masters are now aware of all services in the cluster and can route traffic to the minions based on those services.
  5. Install ZeroTier on Z and add it to the ZeroTier network.
  6. At this point, you should have connectivity from Z to both W and M. The ZeroTier UI will show you the IP address that they’ve been assigned and you should be able to ping between them.
    • If you can’t then it’s likely that there are firewalls blocking traffic getting to W and M.
    • Alter the firewall (for instance, in AWS, change the security group) to allow UDP port 9993.
    • You should now have connectivity.
  7. Z can now access any service provided by clusters W and M that use NodePort.

Because ZeroTier provides a Point-to-Point VPN, the Readers service now has access to Members on M. But because the traffic has to flow through the master, you need to add an IP route so that traffic for the ip subnet used by ZeroTier is sent to the master. On AWS, this is done by adding a route to your VPC.

Directory

To do basic dynamic configuration, the ZeroTier API is queried for devices and their assigned name is added to a special int.blah.com domain. That allows for device to easily access the other nodes by name rather than auto-assigned IP.

For the apps running inside the Kubernetes cluster, those are assigned ports and tracked by Kubernetes itself. There is a simple script that can resolve the node port assigned to a service from it’s name. Because these ports are stable, they can be put into configuration files safely.

Day to day

From Z, it’s possible to access the Members service simply by doing curl w.int.blah.com:$(resolve m members), where resolve is a simple script that translates the cluster name and service into a port number.

Reader can use the same addressing mechanism to access Members as well.

Summary

Internal services are backbone of a microservices architecture. They provide the most important functionality and accessing them is critical. Provide secure and consistent access to removes stumbling blocks, keeping everything moving.

Microservice Development - Who Runs What Where

TLDR: The industry needs a common set of practices for how to develop microservices. This post discusses the required features those practices provide.

“Microservices!” she shouted, exclaiming what a brave, new world we were living in.

“No more monoliths! Code bases so small you can fit them in the palm of your hand!”. The dream was surely alive.

But as quickly as the exuberance of a new development paradigm set in, the trouble began.

“Now instead of running one app to develop a feature, I need to have access to 5 different, coordinating services!”

Everyone that is doing microservices has this question. How this question is answered is as varied as there are teams. And so I posed the question on twitter:

The responses started to roll in:

There seem to be 2 schools represented:

  • Run everything locally
    1. in VMs
    2. in containers
    3. as regular programs
  • Run only one service locally, the rest in a remote sandbox

I’d imagine that the second case is a reaction to the complicated nature of managing the first. If 5 different services are required to do development, a developer can easily lose a week just trying to get everything setup. VMs and containers can certainly help, and help a lot, but they still put heavy resource constraints upon a single machine.

What microservice development needs as a mainstream strategy that can unify these 2 approaches so that developers can easily flow between them based on ease of use. Namely, these are the necessary features:

  • Ability for multiple services running locally to find each other
  • Ability for local services to use remote ones
  • Ability for local services to give out a url/address for themselves that remote services can use

Some features that I think would make such a tool/strategy even better:

  • A monitor that can report on traffic flowing between services
    • Could even include the full data stream, ala ngrok’s HTTP monitor support
  • Local log aggregation
  • Ability to easily rebuild and restart one service without disrupting other local services

Let’s breakdown these features:

Multiple Local Services

The easiest way do this is static port mappings. Service A always runs on 20001, B on 20002, etc. You can get a long ways doing this, but obviously it breaks down in B is now being run remotely instead of locally. In that case, port 20002 needs to run some kind of proxy that can route traffic to B.

One upside of static port mappings is that it simplifies internal service discovery. Rather than needing to be coded with a specific dynamic discovery strategy (DNS, Consul, etcd, etc), a simple config file or even static configuration can be used.

A place where static port mappings breaks down badly is when 2 versions of a service need be used. Reverting locally changes to a service to fix an unrelated bug is a huge productivity killer. For that reason, some level of dynamic configuration is preferable. It doesn’t have to be runtime dynamic, for instance environment variable inject counts as dynamic in this context.

Dynamic Service Configuration

There are 2 elements to dynamic service configuration that we should discuss.

First is what details about a service are advertised and how the systems knows that info. For all local services, that could be a simple config file or a trivial local database where services register themselves. When you add remote services into the mix as well, something like Consul, etcd, or even Kubernetes can be queried. The key here is that a program running on the local machine can answer based on local values as well as remote ones.

The second part is how service information is consumed. Simplest is environment variable injection. To find service B, simply read SERVICE_B_ADDRESS. This is a tried and true solution, and gives services a high degree of flexibility with regard to how they integrate that information into their local config. But environment variables have a huge downside: they can only be set at program start. That means that all services must be known and assigned some static value before any program starts up.

Another common technique is to use DNS to provide service information. Consul is a great example of that, it provides SRV records for services, so that given a name it’s possible to get the host and port to connect to. To use DNS service discovery though, either the DNS server that can answer with the service records needs to be in /etc/resolv.conf or the DNS server needs to queried it directly. The later option is basically bootstrapping DNS based discovery via an environment variable: SERVICE_DNS_ADDRESS is injected as an environment variable, then queried to provide any further records.

A side note on DNS: Because we want to allow for dynamic port allocation for added flexibility, a service will likely need to query SRV records, not just an A record, from the DNS server. Because of that gethostbyname can’t be used, even if the DNS server is configured properly via /etc/resolv.conf. For that reason, it’s almost always cleaner to inject the DNS server address via an environment variable rather than fiddle with /etc/resolv.conf.

Connectivity and Callback URLs

Assuming we’ve wired up the above functionality, local services can now find other services, regardless of if they’re local or remote. Now the rubber meets the road, services need to connect and send requests to each other. If a service is remote, either there is a VPN pre-configured to allow local service to route to it, or another tool is going to be running locally and providing some kind of proxy. Getting into all of that is a post for another day, so we’ll assume that is already setup. What I do want to go over briefly is the requirement that local services can be connected back to. Some will see this requirement as unnecessary, but if you’re building asynchronous, HTTP-based services, it’s- extremely useful because it means having to run and configure fewer services locally.

Providing this functionality can be accomplished by integrating functionality similar to ngrok. Services will need to have a way to find out their external address so they can give it out. That can happen by injecting the information via the same service configuration we used above. So in that way, the callback proxy is just another service that assists in bridging the local/remote gap.

Additional Features

Traffic Monitoring

In development, answering the question “what exactly did I just send or receive?” is extremely common. So common that it’s common for people to just add debugging code do print out these values before being processed. Having a tool that is capturing that data for analysis would be super valuable.

Local Log Aggregation

If a user is running 3 services locally, seeing the logs for each of them interlaced makes it much easier to understand what is happening. What I think is also important is that this log aggregation isn’t the ONLY way to see the logs. It’s also very useful to be able to just see the logs for one service without the noise of the others.

Service Restarts

Programs are going to get restarted really often in development as they change. Relying on frameworks to reload code (such as in Rails) is not a solution to the problem, and so providing the user the ability to stop, rebuild, and start a service without disrupting all the other features here is critical.

This means:

  • Assigned ports must not change between restarts
  • Restarting all services to restart one is unacceptable

Development is all about introduce new, iterative versions of services and thusly easy restarting is a critical function.

Summary

Microservice development is still in it’s infancy. Teams are slowly figuring out solutions to their specific problems and largely hacking something together. If microservices are really going to be the future (and some would say the present), then these issues have to be solved.