I played with it a bit, and was delighted to discover that the dynamic tunnel MTU mechanism operates on a per-NBMA neighbor basis, much the same as ip pim nbma-mode on the same interface type. Both features do all the right things, just like you'd hope they would.
Here's the topology I'm using:
Here's the topology I'm using:
Constrained MTU in path between R1 and R4 |
The DMVPN tunnel interface on R1 is configured with a 1400-byte MTU. With GRE headers, it will generate packets that can't reach R4. It's also configured with tunnel MTU discovery.
interface Tunnel0
ip address 192.168.1.1 255.255.255.0
no ip redirects
ip mtu 1400
ip pim sparse-mode
ip nhrp map multicast dynamic
ip nhrp network-id 1
tunnel source FastEthernet0/0
tunnel mode gre multipoint
tunnel path-mtu-discovery
tunnel vrf TRANSIT
end
The two spokes are online with NBMA interfaces (tunnel source) using 10.x addressing. Both routers have their NBMA interfaces configured with 1500 byte MTU, and their tunnel MTU set at 1400 bytes:
As Ivan noted, tunnel MTU discovery doesn't happen if the Don't Fragment bit isn't set on the encapsulated packet. If, on the other hand, the DF bit is set, then the DF bit gets copied to the GRE packet's (outer) header. Here we don't set the DF bit, and the ping gets through just fine:
Those pings will have created 1424-byte packets that don't fit on the link between R2 and R5. Debugs on the target (R4) indicate that the traffic was indeed fragmented in transit:
52 bytes + 1392 bytes = 1444 bytes. Drop the extra 20 byte IP header from one of those fragments, and we're right at our expected 1424-byte packet size.
So far, no large packets with DF-bit have been sent, so no tunnel MTU discovery has happened. The hub reports dynamic MTU of "0" for the NBMA addresses of both spokes, which I guess means "use the MTU applied to the whole tunnel", which is 1400 bytes in this case:
R6 can ping R1 with an un-fragmentable 1400 byte packet without any problem:
But when we try this over the constrained path to R3, the ping fails silently. ICMP debugs are on, but no errors rolled in:
It was R2 that failed to encapsulate the 1424-byte packet onto the constrained link to R5, so he sent a "packet too big" message not to the originator of the ping (R6), but to the originator of the GRE packet (R1):
R1 reacted by reducing the tunnel MTU only for R4's NBMA address (10.0.45.4). Pretty nifty.
Because R1 only reduced the MTU, but didn't alert R6 about the problem, a second ping from R6 is required to provoke the 'frag needed' message from R1, based on its knowledge of the constrained link between R2 and R5:
We can still send un-fragmentable 1400-byte packets from R6 to R1:
Now, I would like to be using this feature to discover end-to-end tunnel MTU for some IP multicast traffic on DMVPN, but for some reason my DMVPN interface doesn't generate unreachables in response to multicast packets. They're just dropped silently. Feels like a bug. Not sure what I'm missing.
Update: What I was missing is that RFCs 1112, 1122 and 1812 all specify that ICMP unreachables not be sent in response to multicast packets.
R1#show dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================
Interface: Tunnel0, IPv4 NHRP Details
Type:Hub, NHRP Peers:2,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 10.0.23.3 192.168.1.3 UP 00:18:16 D
1 10.0.45.4 192.168.1.4 UP 00:13:23 D
R1#
As Ivan noted, tunnel MTU discovery doesn't happen if the Don't Fragment bit isn't set on the encapsulated packet. If, on the other hand, the DF bit is set, then the DF bit gets copied to the GRE packet's (outer) header. Here we don't set the DF bit, and the ping gets through just fine:
R6#ping 4.4.4.4 source lo0 ti 1 re 1 size 1400
Type escape sequence to abort.
Sending 1, 1400-byte ICMP Echos to 4.4.4.4, timeout is 1 seconds:
Packet sent with a source address of 6.6.6.6
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 92/92/92 ms
R6#
Those pings will have created 1424-byte packets that don't fit on the link between R2 and R5. Debugs on the target (R4) indicate that the traffic was indeed fragmented in transit:
*Aug 15 16:33:27.059: IP: s=10.0.12.1 (FastEthernet0/0), d=10.0.45.4 (FastEthernet0/0), len 52, rcvd 3
*Aug 15 16:33:27.059: IP: recv fragment from 10.0.12.1 offset 0 bytes
*Aug 15 16:33:27.071: IP: s=10.0.12.1 (FastEthernet0/0), d=10.0.45.4 (FastEthernet0/0), len 1392, rcvd 3
*Aug 15 16:33:27.071: IP: recv fragment from 10.0.12.1 offset 32 bytes
52 bytes + 1392 bytes = 1444 bytes. Drop the extra 20 byte IP header from one of those fragments, and we're right at our expected 1424-byte packet size.
So far, no large packets with DF-bit have been sent, so no tunnel MTU discovery has happened. The hub reports dynamic MTU of "0" for the NBMA addresses of both spokes, which I guess means "use the MTU applied to the whole tunnel", which is 1400 bytes in this case:
R1#show interfaces tunnel 0 | include Path
Path MTU Discovery, ager 10 mins, min MTU 92
Path destination 10.0.23.3: MTU 0, expires never
Path destination 10.0.45.4: MTU 0, expires never
R1#
R6 can ping R1 with an un-fragmentable 1400 byte packet without any problem:
R6#ping 3.3.3.3 source lo0 ti 1 re 1 size 1400 df-bit
Type escape sequence to abort.
Sending 1, 1400-byte ICMP Echos to 3.3.3.3, timeout is 1 seconds:
Packet sent with a source address of 6.6.6.6
Packet sent with the DF bit set
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 44/44/44 ms
But when we try this over the constrained path to R3, the ping fails silently. ICMP debugs are on, but no errors rolled in:
R6#debug ip icmp
ICMP packet debugging is on
R6#ping 4.4.4.4 source lo0 ti 1 re 1 size 1400 df-bit
Type escape sequence to abort.
Sending 1, 1400-byte ICMP Echos to 4.4.4.4, timeout is 1 seconds:
Packet sent with a source address of 6.6.6.6
Packet sent with the DF bit set
.
Success rate is 0 percent (0/1)
R6#
It was R2 that failed to encapsulate the 1424-byte packet onto the constrained link to R5, so he sent a "packet too big" message not to the originator of the ping (R6), but to the originator of the GRE packet (R1):
R2#
*Aug 15 16:42:18.558: ICMP: dst (10.0.45.4) frag. needed and DF set unreachable sent to 10.0.12.1
R1 reacted by reducing the tunnel MTU only for R4's NBMA address (10.0.45.4). Pretty nifty.
R1#
*Aug 15 16:42:18.582: ICMP: dst (10.0.12.1) frag. needed and DF set unreachable rcv from 10.0.12.2
*Aug 15 16:42:18.582: Tunnel0: dest 10.0.45.4, received frag needed (mtu 1400), adjusting soft state MTU from 0 to 1376
*Aug 15 16:42:18.586: Tunnel0: tunnel endpoint for transport dest 10.0.45.4, change MTU from 0 to 1376
R1#show interfaces tunnel 0 | include Path
Path MTU Discovery, ager 10 mins, min MTU 92
Path destination 10.0.23.3: MTU 0, expires never
Path destination 10.0.45.4: MTU 1376, expires 00:04:05
Because R1 only reduced the MTU, but didn't alert R6 about the problem, a second ping from R6 is required to provoke the 'frag needed' message from R1, based on its knowledge of the constrained link between R2 and R5:
R6#ping 4.4.4.4 source lo0 ti 1 re 1 size 1400 df-bit
Type escape sequence to abort.
Sending 1, 1400-byte ICMP Echos to 4.4.4.4, timeout is 1 seconds:
Packet sent with a source address of 6.6.6.6
Packet sent with the DF bit set
M
Success rate is 0 percent (0/1)
R6#
*Aug 15 16:50:38.999: ICMP: dst (6.6.6.6) frag. needed and DF set unreachable rcv from 192.168.6.1
R6#
We can still send un-fragmentable 1400-byte packets from R6 to R1:
R6#ping 3.3.3.3 source lo0 ti 1 re 1 size 1400 df-bit
Type escape sequence to abort.
Sending 1, 1400-byte ICMP Echos to 3.3.3.3, timeout is 1 seconds:
Packet sent with a source address of 6.6.6.6
Packet sent with the DF bit set
!
Success rate is 100 percent (1/1), round-trip min/avg/max = 48/48/48 ms
Now, I would like to be using this feature to discover end-to-end tunnel MTU for some IP multicast traffic on DMVPN, but for some reason my DMVPN interface doesn't generate unreachables in response to multicast packets. They're just dropped silently. Feels like a bug. Not sure what I'm missing.
Update: What I was missing is that RFCs 1112, 1122 and 1812 all specify that ICMP unreachables not be sent in response to multicast packets.