This is the second installment of the Smart Home Matter and Thread deep dive series. In this post we deep dive into IPv6 routing, and how the Home Assistant Matter controller does failover at the network level between two Apple TVs. I’ll also show you how this fails with HAOS 11.0 but has been fixed in HAOS 11.1.
You can check out the other posts in this series:
- What is Matter?
- What is Thread?
- What can you deploy today?
- The Thread Network State
- Thread IPv6 Addressing
- What is Multicast?
- Discovering Your Matter Devices with mDNS
- Decoding Home Assistant IPv6 Addresses
Part 2: Smart Home Matter and Thread Deep Dive
- Exploring IPv6 Routing Details
- Home Assistant Matter IPv6 Routing Failover
- Home Assistant Deployment Model for Matter
- Home Assistant Matter Controller Logs
- Proxmox IPv6 Tips
- mDNS Networking Tips
- The Sad State of Switch/Router Firmware
- The Sad State of the Linux Kernel IPv6 Routing
- Ubiquity Suggestions
Exploring IPv6 Routing Details
In Part 1 of this series I showed you two ways to get the IPv6 addresses of Thread devices. So let’s use one of those Thread device IPs from the last post for this experiment. Since I have two Apple TVs on the network, let’s ping one of my Thread devices from two different devices: from my Mac and from an Ubuntu VM.
As you can see my Mac (first screenshot) is routed via my living room Apple TV. But the Ubuntu VM (second screenshot) is routed through my loft Apple TV. This is to be expected. Packets can take either route, so you may see different machines take the same or different routes. It’s not perfectly load balanced, so don’t always expect your devices to use different paths.
traceroute6 -w 2
Now from the Ubuntu VM let’s look at the IPv6 routing table. As you can see there are two routes to the fd5d: prefix. Both of those routes should be to the two Apple TVs.
ip -6 route
Let’s verify those two routes are in fact my Apple TVs. From the prior screenshot we are looking for fe80::b5 and fe80::1826. We can see in the screenshot below that those two addresses are associated to MAC addresses 9c:3e:53xx and a4:cf:99xx. Pulling up the details for my Apple TVs in my Firewalla firewall we can see that, yes, we have a matching MAC addresses. This means those two routes to the Thread network are in fact using my ATVs as Thread Border routers.
ip -6 neigh
Home Assistant Matter IPv6 Routing Failover
Note: I discovered an IPv6 routing failover issue in HAOS 11. It is tracked in issue #2845 and fixed in HAOS 11.1. Upgrade to at least HAOS 11.1 for fast IPv6 routing failover. Otherwise if you have multiple Thread border routers and one becomes unavailable, all Matter devices might go offline in Home Assistant for 30 minutes on 11.0 and earlier.
When it comes to tracking down possible networking issues with Matter, it can be a bit tricky. For the best Matter experience with Home Assistant, you must run Home Assistant OS 11.1 or later. The HAOS kernel and Matter container have been optimized for adapting quickly to IPv6 routing advertisement changes.
If you check IPv6 routes or try to ping Matter accessories from the generic HAOS CLI (or another computer on your network), results may differ from those you see inside the Matter container. If you are running Proxmox and looking at the IPv6 routing at the Proxmox host level, it won’t tell you much about how the Matter server docker container is behaving. Always look at IPv6 routing from within the HAOS Matter controller Docker container..NOT your Proxmox host or your PC. The Proxmox host CAN interfere with routing messages getting passed to the Matter container. But more on that in the next post where I go over Proxmox best practices.
Warning: SSHing into your HAOS instance and the Docker containers could cause your HA install to go belly up if you do something wrong. So I’d urge you to do a full HAOS backup, at the VM level if you are running it virtualized, before you begin this procedure.
I will assume you have a SSH add-on installed in Home Assistant and that you can SSH into your HAOS instance. You will need to disable protection mode, so we can access the Docker containers.
Once you have SSH’d into your HAOS instance, run the following Docker command to open a bash shell inside the Matter container. Note: We are SSHing on port 22, using the SSH add-on. We are not using the special port 22222 developer mode SSH.
docker exec -it addon_core_matter_server /bin/bash
The container doesn’t have the package to show IPv6 routing and other network commands, so we need to install them.
apt-get update && apt-get install iproute2 iputils-ping traceroute
Now we can look at the IPv6 routing table that the HAOS Matter controller is using:
ip -6 r
As we can see, this is the exact same routing table that the Ubuntu server showed us. We can see each Apple TV has it’s own route entry for the fd5d: Thread network. Now let’s traceroute to one of my Matter devices from inside the Matter controller:
Here we can see the loft ATV is being used. For fun let’s pull the power on that Apple TV and see what happens. I started a running ping from the Matter container and pulled the power on the loft ATV about half way through the ping test. As you can see in the top image below using HAOS 11.1, 50 out of 51 pings were received. This means during the route failover, only ONE ping was lost. Very good! However, I did the exact same test at literally the same time on HAOS 11.0, and when I pulled the ATV power it did NOT failover to the other ATV. It should failover after 30 minutes, but I didn’t sit around to find out.
Two minutes after pulling the power on the loft Apple TV, let’s look at the routing table. Hmm we still see both Apple TVs for fd5d:443b:. Yes, this is to be expected. There is a 30 minute timer that should kick off when the route to the Apple TV becomes unavailable. If we check back 31 minutes after I pulled the power we should see the route dropped and only the route to the loft ATV should remain. I will show that routing table result in a bit.
A few minutes after pulling the plug on the loft ATV, let’s look at the IPv6 neighbors. We still see both ATVs listed.
ip -6 neigh | grep fe80
Fast forward 31 minutes, let’s look at the new routing table inside the Matter controller container. Say what? Where did fd5d: go? Well, Apple took the liberty of completely changing the Thread network IPv6 prefix. The new Thread network prefix is now fdbc:7e9c:557b. And now we can see that the Thread network only has one route, through the working ATV (remember fe80::1826 is the living room ATV).
Next I plugged in the loft ATV, waited about two minutes, then looked at the routing table again. Yes, another route entry came back. And it happened to get the same f80::c2e address.
Let’s use the Neighbor Discovery Protocol (NDP) to dig even further. Running the IPv6 neighbor command provides additional information about the Neighbor Unreachability Detection (NUD) states. Now we see both routes return. fe80::c2e and fe80::1826, which are REACHABLE for the two ATV MAC addresses. This is good! That means the Matter controller has recently confirmed both routes to be reachable (working).
Now sometimes you can also see lingering FAILED routes. These routes would be to the unreachable ATV, and could remain in the router table for many hours. In my case I didn’t have any failed entries. Not sure if that had anything to do with the brand new IPv6 network prefix and invalidating the old one.
ip -6 neigh | grep fe80
If you restart HAOS or the Matter container, then the networking packages that we installed will be wiped out. So you don’t need to worry about extra packages laying around. You will need to re-install them down the road if you want to do more testing after a restart.
Although a bit tedious, I walked you through a Matter IPv6 routing failover scenario. I showed how it fails on HAOS 11.0 (for 30 minutes), but is fixed in HAOS 11.1. With HAOS 11.1 the failover time is so fast that only a single ping was dropped. Check out Part 3 for the final installment of this series which will cover best practices for Proxmox, mDNS, and some Home Assistant tips.