In this final installment of my smart home deep dive on Matter and Thread, I cover a ton of practical advice on how to set yourself up for success with Matter. Even if you aren’t using Home Assistant and are entirely into Apple Home, Google Home, etc. the last section on mDNS tips can be killer for solving your Matter issues.
You can check out the other posts in this series:
- What is Matter?
- What is Thread?
- The Thread Network Mess
- Thread IPv6 Addressing
- What is Multicast?
- Discovering Your Matter Devices with mDNS
- Decoding Home Assistant IPv6 Addresses
- Exploring IPv6 Routing Details
- Home Assistant Matter IPv6 Routing Failover
Part 3: Smart Home Matter and Thread Deep Dive
- Home Assistant Deployment Model for Matter
- Home Assistant Matter Controller Logs
- Proxmox IPv6 Tips
- mDNS Networking Tips
Home Assistant Deployment Model for Matter
If you want to use Matter devices with Home Assistant I would STRONGLY urge you to only consider using HAOS and no other deployment models such as container, supervised or core. And if you really want the solution with the least risk, don’t throw in a hypervisor in the mix. Run HAOS bare metal on your favorite mini PC, used thin client, etc. Check out my article Hands On: Beelink Mini-PCs for Proxmox, Home Assistant and Plex for mini-PC ideas.
Why only use HAOS? HAOS has kernel level tweaks in the networking stack to ensure the best possible mDNS and IPv6 routing experience. As shown with the IPv6 failover bug in HASO 11.0 and earlier, having kernel level optimization is key. Matter IPv6 routing failover went from 30+ minutes to a couple of seconds with the working kernel patch.
If you use your own OS or roll your own Docker deployment you are asking for all kinds of potential headaches that you may pull your hair out over and think Matter or Home Assistant is horrible. HAOS takes care of all that for you. Plus, hypervisors like Proxmox can interfere with IPv6 routing and even with HAOS create issues. But keep reading for my Proxmox tips if you do want to go down that path. I run HAOS on Proxmox 8 without issue.
Home Assistant Matter Controller Logs
Home Assistant has a log dedicated to Matter which you can view in the UI. You can find it at:
Settings -> System -> Logs -> Matter Server (upper right drop down menu)
When looking through the Matter logs to try and debug Matter issues, keep your eye on two kinds of events: mDNS timeouts, and re-subscription errors. Here’s an example of a mDNS timeout entry:
core-matter-server matter_server.server.device_controller.[node 30] INFO Previous subscription failed with Error: 50, re-subscribing in 537909 ms... core-matter-server chip.DIS ERROR Timeout waiting for mDNS resolution.
This error is usually indicative of a networking issue. This could range from mDNS issues to IPv6 routing problems. I cover some tips below for mDNS issues.
Another source of errors is related to subscriptions and re-subscriptions. This is when the Matter controller loses communications with a Matter device and tries to reconnect. Now some amount of re-subscriptions is normal, and if they self resolve within a few minutes, I would not worry. This can happen if your Thread network topology changes, for instance.
In the code block below you can see that within 3 seconds of the subscription failure for node 29 (an Eve Energy outlet), that HA was able to resubscribe in 3 seconds. Another device of mine took 30 seconds yesterday to resubscribe.
2023-10-22 13:36:14 core-matter-server chip.DMG ERROR Subscription Liveness timeout with SubscriptionID = 0xe5599d1d, Peer = 01:000000000000001D 2023-10-22 13:36:14 core-matter-server matter_server.server.device_controller.[node 29] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms... 2023-10-22 13:36:15 core-matter-server PersistentStorage INFO DeleteSdkKey: g/s/lPSHZMdGH8HCgSM+EK6H3g== 2023-10-22 13:36:15 core-matter-server PersistentStorage INFO Committing... 2023-10-22 13:36:15 core-matter-server PersistentStorage INFO SetSdkKey: f/1/s/000000000000001D = b'\x150\x03\x10\xa8\xb8>\xa2\x01~\xc6\x00\xdeo\xf0\x82U\xb0d\xed0\x04 \x02-0}:&\xd8S\xc4}\x9f;\x18C\x86\x19=\xe2\xe8\x91@\x96\xdf\r\xd9X~\x9b=\xa3\xc8\xe10\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18' 2023-10-22 13:36:15 core-matter-server PersistentStorage INFO Committing... 2023-10-22 13:36:15 core-matter-server PersistentStorage INFO SetSdkKey: g/s/qLg+ogF+xgDeb/CCVbBk7Q== = b'\x15$\x01\x01$\x02\x1d\x18' 2023-10-22 13:36:15 core-matter-server PersistentStorage INFO Committing... 2023-10-22 13:36:17 core-matter-server root INFO Re-subscription succeeded! 2023-10-22 13:36:17 core-matter-server matter_server.server.device_controller.[node 29] INFO Re-Subscription succeeded
Re-subscription errors should be of a concern when they are frequent, take a long time to resolve, or if specific devices always seem to causing you problems. This could indicate a device firmware issue, patchy Thread network, or maybe another device on the network has shotty firmware and is causing issues. I’ve also found that in the worst case the Matter device needs a power cycle, or pull the batteries to reboot it. On rare occasions an Eve or Smartwings device needs a reboot to bring it back online.
One last note about Home Assistant Matter logs. In the logs you will also see Node ID xx. This is an ID assigned to each Matter device on the network. It remains constant throughout its commissioned life. For troubleshooting in can handy to make a spreadsheet of your physical device name and node ID. You can map the node ID to physical device by downloading all of the diagnostics files for each matter device and looking through the JSON.
Proxmox IPv6 Tips
If you are running Home Assistant OS on Proxmox, which is a great solution for many reasons, beware that some non-default Proxmox host network settings might cause issues with IPv6 routing and create an unstable Matter environment.
If you have added any of the following settings manually to Proxmox (via the /etc/sysctl.conf file), then they could well be causing mDNS issues and thus impacting Matter devices within Home Assistant. Potentially problematic additions include, but are not limited to:
net.ipv6.conf.default.forwarding = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.default.proxy_ndp = 1
net.ipv6.conf.all.proxy_ndp = 1
net.ipv6.conf.vmbr0.accept_ra = 2
A stock /etc/sysctl.conf file on Proxmox has all of the content commented out. So if your file has any uncommented lines, carefully review them and understand what you are doing. If you make any changes to the sysctl.conf file I would strongly suggest you reboot the Proxmox host. Don’t just reload the interfaces.
It may be worth checking the VLAN aware state on the Proxmox Linux network bridge. I’ve seen enabling it have some negative interaction with the IGMP snooping settings my QNAP switch. It did not have an effect on my TP-Link switch though. If you are experiencing weird multicast/mDNS issues, you might look at this setting in combination with your switch’s IGMP/MLD settings (more on this below).
Also, there’s no need to configure the Proxmox Linux bridge with an IPv6 address and gateway for the HA Matter controller to work.
Troubleshooting Tip: If you are experiencing mDNS timeouts or weird Matter device availability issues, you could get a USB Ethernet dongle and pass that through directly to the HAOS VM. Then disable the Proxmox virtual NIC for the VM. This will totally bypass Proxmox for the network connectivity and eliminate one potential source of pain. Definitely reboot the HAOS VM after making this change. This is only for troubleshooting and not intended as a permanent solution. Make sure in HAOS the USB NIC gets a valid IPv6 address, or Matter will not work.
mDNS Networking Tips
No matter which smart hub you use, Apple, Google, Amazon, etc., having stable mDNS is absolutely critical. Some consumer grade switches are quite buggy when it comes to multicast and will cause you no end of headaches. And it may not be obvious what’s going on, you just see devices randomly unavailable in your smart home app.
Configuring your network so that mDNS is stable can be a bit tricky. Some advanced settings on switches or routers may cause issues, switch firmware can be buggy, or having a complex network topology can all break mDNS in various ways. Here are some best practices to follow for increasing your chances of solid mDNS performance:
- Do NOT enable any mDNS reflectors/forwarders/Bonjour Gateway on ANY network devices. This includes any Unifi, Firewalla, or other gear that gives you that “feature”. Attempting IPv6 mDNS forwarding across VLANs will almost certainly cause problems. mDNS reflector implementations are crap for IPv4, and don’t even think about doing it with IPv6. TURN THAT $H1T OFF. For Firewalla, tap on the gear in the upper right corner of the home page -> Advanced -> Configurations -> mDNS Reflector. Turn if OFF for ALL networks.
- Flatten your network. All of your IoT devices, smart home hubs (like Apple TV) and Home Assistant servers should be on the SAME VLAN. If you really want to isolate your IoT devices, go for it, but put your HAOS server and smart home hubs on that IoT VLAN and just open TCP ports to HA from your primary network as needed. This is vastly safer than trying to do mDNS forwarding across VLANs. mDNS is designed to be used in a single layer 2 domain, so don’t expect it to work reliability across VLANs. Just don’t!!!!
- IGMP and MLD snooping. Depending on your network equipment, firmware, topology, etc. you may find that either enabling or disabling IGMP/MLD snooping provides the most stable network. Ideally you want MLD snooping turned ON to control multicast traffic on your network. You may need to experiment to see if turning it off helps your Matter stability, which would indicate buggy switch firmware.
Another factor that might influence your IGMP/MLD setting is if you are using a hypervisor like Proxmox and using the VLAN aware option on the bridge. In addition, some switches lump IPv4 and IPv6 multicast snooping together under IGMP snooping, which is technically not correct. More advanced switches separate out IPv4 (IGMP) and IPv6 (MLD) multicast settings, as they should. Only IPv6 settings are relevant here.
If your switch supports MLD (or IGMP) Querier, turn that ON. That should give you more insight on what ports/devices on your network are in which multicast group. mDNS uses group ff02::fb. The screenshot below is from a TP-Link Jetstream switch (which is awesome BTW).
- Multicast to unicast conversion. Some WiFi APs may have advanced settings that control if multicast traffic is converted into unicast traffic. You do NOT want any mutlticast traffic converted to unicast. That’s just asking for problems. Disable that feature. For example, on Ruckus APs with Unleashed firmware you want to set Directed MC/BC Threshold to 0.
The Sad State of Switch/Router Firmware
Not all switches have bug free IPv6 multicast support. In fact, some popular brands might be exceptionally buggy. So beware…some switches are just poorly engineered so worst case you might need new network gear. I did, and others have had to replace gear as well. Many vendors just don’t test nor care about proper IPv6 mDNS and routing support.
- TP-Link Jetstream: My TL-SG3428X-M2 was solid as a rock. Separate IGMP/MLD settings. Excellent MLD snooping and querier support. Great real time multicast group information. Zero mDNS issues. Check out my post: Configuring TP-Link IGMP & MLD Multicast Snooping
- NetGear M4300 Series: My two M4300-16x switches have rock solid multicast support. Zero mDNS failures. Features separate IGMP and MLD snooping/queriers. Realtime IGMP group membership list, but none for MLD (unlike TP-Link). But these are fantastic enterprise grade switches that are pretty quiet and energy efficient. Check out my post: Configuring Netgear M4300 IGMP & MLD Multicast Snooping
- QNAP: My QSW-M2116P-2T2S-US was terrible. It only had IGMP settings (no MLD), and the switch was very unhappy with Proxmox VLANs enabled and mDNS. Frequent mNDS timeouts no matter if IGMP snooping was on or off. DO NOT USE.
- Ubiquiti: I don’t have personal experience, but a primary Home Assistant Matter developer had so many problems with his Ubiquity gear causing significant Matter stability issues that he switched to TP-Link. Matter has been rock solid since he made the switch. Ubiquiti firmware can be a crap shoot. Beware.
Bottom Line: If your Matter environment is not stable with IoT devices flapping or being non-responsive, it may well be your network gear. Matter can be exceptionally stable, responsive, and scalable if your network doesn’t have buggy multicast code. I have over 17 Thread/Matter devices and after dumping the QNAP switch, its be rock solid. All 17 devices come up in less than 45 seconds after a Home Assistant reboot.
The Sad State of the Linux Kernel IPv6 Routing
The Linux Kernel does not have all of the necessary parts of IPv6 routing baked in, let alone turned on by default, to support cross-VLAN mDNS routing. The features that need to be enabled are CONFIG_IPV6_ROUTE_INFO and CONFIG_IPV6_ROUTER_PREF. Without these flags the kernel won’t process route information from “option 24” of a route advertisement. Do note that user space processes can process this information, which is what HAOS uses by leveraging NetworkManager. But this is not bug free.
Even with those kernel flags, there’s a sysctl setting called accept_ra_rt_info_max_plen which defaults to 0. However, it needs to be set to accept a /64, and since that’s bigger than 0, all the thread routes get dropped with the default setting. Bottom line is that router vendors need turn on those kernel settings and set the sysctl value appropriately. Good luck finding a vendor that does that anytime soon.
There’s also a little issue of RFC4191, which covers IPv6 router reachability. This is disabled in the Linux kernel. However, since HAOS acts as both a host and a router (for Skyconnect), the HA team patched the kernel to explicitly opt in to RFC4191.
Only a few days ago did the Home Assistant team fix a major IPv6 routing bug in the Kernel (that I discovered), in HAOS 11.1. Prior to that patch, Thread Border router failover would take 30 minutes. Now it’s just a second or two. Yippee.
The Home Assistant teams upstreams kernel patches which HAOS has baked in, but they are nowhere close to being merged as of October 2023. And even if they were, they are only opt-in. And since many routers are built on Linux kernels, there’s little chance that even when the IPv6 routing patches are merged, that switch vendors will enable the features and test accordingly. In fact, Ubiquity Unifi gear requires end users to hack their gear with custom kernels to even properly process Router Advertisements from Thread Border routers. Yes, it’s that bad.
Bottom line, do not even attempt cross-VLAN mDNS/Matter. Put ALL of your IoT, Matter, smart home hubs, and Home Assistant servers on the same darn VLAN. Except for HAOS, the Linux kernels running in nearly all other devices just can’t do routed mDNS properly.
Having a rock solid Matter experience is not always easy. Sometimes the cause of your woes is your networking gear, its firmware, or fiddling with IGMP/MLD snooping. And worst case, you may need new switches that properly support IPv6 multicast if what you have now is fundamentally broken.
In addition, if you do any network segmentation, always, alway, place all your Matter devices, smart home hubs and Home Assistant in the SAME VLAN. DO NOT segment your IoT devices in a different VLAN from the control plane. It just WILL NOT work as intended with Matter, due to lack of commercially available network gear that properly routes IPv6 multicast traffic and Thread’s IPv6 routing requirements.
I hope this series of three posts has provided you a good education source for Thread and Matter in the smart home.