On April 7th, Digital Ocean announced the availability of its new feature called Virtual Private Cloud (VPC). It is essentially a local network between your droplets, that let your resources communicate privately in a local network using a different interface without needing to hop to the public internet then back again. Put simply, it’s like a LAN between droplets.
You can enable VPC in any new droplet you create. If you have not defined a VPC before, the setup will create a default one for you when creating a new droplet if you choose to enable VPC.
This seems pretty good feature to use and simplifies deployments instead of manually designing custom tunneling solutions between droplets.
One use case for this feature I can image, is when you want to run all your services behind a proxy. You could easily assign a VPC to to the backend services with the proxy, while exposing only the proxy to the public internet.
However, there are some limitations, one is that VPCs cannot traverse across data centers. That means if you were to create a VPC on a data center, it would only be available for droplets residing on that particular data center. This is kinda obvious, because to be traversable across data centers it would require Layer 3 tunnels and other expensive complexities.
Another limitation is that, according to DO docs, VPCs do not support broadcast. That means you cannot run network protocols that rely on broadcast such as DHCP. However, this is almost true, because ARP does actually run fine on the network (or so I thought). To test that, I fired up two droplets, assigned them the same VPC, ran Wireshark and let them communicate.
I can clearly sea ARP requests and responses as in following screenshot.
Seeing the destination as broadcast in the first frame then seeing a reply to that frame felt weird for me. The documentation clearly states that broadcast and multicast are not supported. More on that later.
This seems all great, but is it actually that seamless?
Well, I encountered a problem connecting two droplets together in the same VPC and the same subnet. Before that, and as I usually do after I create a new droplet, I immediately lock it down to a single ip that is my public ip address for security and management purposes (I do this using the built-in firewall function in DO in the inbound rules section).
Back to the problem, the two droplets refused to establish any sort of connection. I could see the the ARP requests and responses and they look just fine, also the ARP tables of each host looks fine too. So it’s got to be a unicast issue.
After spending nearly an hour pulling my hair out, I found out that it is actually a cloud firewall limitation, in which public and private traffic are both affected by the rules defined in the cloud firewall, not just the public traffic. So if you want to use the firewall, you would have to make sure there are permit rules between the resources you want connect. For example, if your VPC subnet is 10.0.0.0/24, you would want to add a permit rule to the firewall with a destination of 10.0.0.0/24 to allow all resources in that firewall instance to communicate with each other.
After fixing the above issue, everything went smooth and I could connect droplets directly to each other. Then I further decided to test the reliability of the connection by measuring the bandwidth using iperf3.
It seems that this droplet is given a shared 10 GBit connection. I noticed that sometimes the bandwidth drops below 200 MBits/s, however this is certainly sufficient for most use cases.
Getting back to broadcast limitation again, as stated before, VPCs do not support broadcast. However, if that's the case, then how come ARP frames reach other nodes in the network? Well, it seems that some ARP frames are one exception to the rule. Notice that I say some, because the other some actually get dropped by whatever internal firewall DO are implementing. So let's generate some ARP traffic:
Here I use nmap to tell it to scan the the given range and effectively generate ARP request to every scanned ip address.
Here is how it looks on Wireshark on the machine that generated the requests:
And here is how it looks on other machines:
Apparently there is some sort of internal filtering at layer 2 that limits broadcast and multicast.
From experimenting with the situation, I find that:
1. Gratuitous ARP are filtered.
2. Only ARP requests that ask for an ip/mac combination that has already been assigned by the VPC's DHCP server are permitted.
3. Only ARP requests that are generated from an assigned address by the VPC's DHCP server are permitted.
To sum up point 2 and 3; only DHCP assigned addresses will be able to send and receive broadcast frames.
Beside not being able to run your own DHCP server, you cannot assign a static ip address to your machines and be able to connect to other machines on the VPC network, unless you add static ARP entries to each of your hosts.
I think there is still a lot to be explored in this topic. And I might have miss interpreted something during this analysis.