Tuesday, October 12, 2010

Mapping multicast groups to MAC addresses

When an Ethernet station builds a frame for an IP packet, it needs to know what destination address to put on that frame.

For a unicast IP packet, the sending station uses the destination node's unique MAC address, which it learns through the ARP mechanism.

For a broadcast IP packet, the broadcast MAC address (ff:ff:ff:ff:ff:ff) is used.

But what about multicast IP packets?  A unicast MAC address isn't appropriate, because there might be several stations on the segment which are interested in receiving the packet.  Conversely, a broadcast frame isn't appropriate, because we'd be bothering systems that don't want to process the packet.

Sensibly, multicast IP packets get encapsulated into multicast Ethernet frames, using a block of addresses from 01:00:5e:00:00:00 - 01:00:5e:7f:00:00.  RFC 1112 has all the details.

Most network folks have seen this process, and then forgotten it.  The times it's come up at work, I've found that people think it's much uglier than it really is.  It's a little ugly, but worth learning, and luckily, there's an interesting story behind it.

IP multicast group numbers look like IP addresses.  They fit in the "Class D" space from 224.0.0.0 through 239.255.255.255.  There are 2^28 unique multicast groups in that range.  Unfortunately, there are only 2^23 unique multicast MAC addresses, so there's some overlap which needs to be taken into consideration when handing out multicast groups to applications.

I'm going to cover two historical points here.  They're both interesting tidbits that make the multicast mapping rules make sense.

Ethernet frames are structured to make things easy on stations and bridges.
An Ethernet frame doesn't really begin with the destination MAC address.  It starts with the preamble, which can be thought of as a way to "wake up" stations on a shared media segment, and get them ready to receive an incoming frame.  I think of it like a rumble strip you'd encounter before a higway toll plaza, because it serves a similar function.  And because it looks like one.  The preamble, along with it's partner the start-of-frame-delimiter (SFD) comprises a 64 bit pattern of alternating ones and zeros ending with an errant one: 101010....101011

That pattern-breaking '11' at the end of the SFD indicates that the destination address will begin in the next bit.  If you're a bridge, you're going to use the next 6 bytes to make a forwarding decision.  If you're a station, you'll use thse 6 bytes to decide whether to process the frame or ignore it.  The Ethernet designers did this so that the receiving NIC can quickly determine whether the frame is worthy of processing.

But that's not all.  The very first bit in those 6 bytes, the bit that comes immediately after the '11' in the SFD is special:  Bytes on Ethernet are transmitted in little-endian order, so that first bit to arrive is the least significant bit in the first byte of the address, otherwise known as the individual/group bit.  If it is a '1', a bridge knows immediately (only one bit into the frame!) that this frame will need to be flooded out all ports.  Nifty, and makes very speedy cut-through bridging decisions possible.

If you look at the various hardware addresses in one of your device's mac-address or arp tables, you shouldn't find any stations where the first byte is an odd number because stations must use unicast addresses.  An odd numbered first-byte would mean that the individual/group bit is set.  The broadcast (all-ones) Ethernet address, appropriately enough, has the bit set.  Along with all of the other bits.


Somebody else's tight budget can become your forwarding problem.
The story goes that when Steve Deering was putting RFC 1112 together, he wanted to purchase 16 Ethernet OUIs.  Each OUI allows for 2^24 unique addresses, so 16 of them would be required to cover the whole 28-bit IP multicast space.  But the budget wouldn't cover 16 OUIs.  The budget wouldn't even cover one OUI.  Instead, he was able to procure only half of an OUI.  So, that's why we map 28 bits of multicast group into 23 bits.


Armed with these two bits of information, we know more than two thirds of the resulting multicast frame.  Here's how all 6 bytes of the multicast frame are derived:
  1. Must be an odd number (multicast/broadcast bit is set), happens to be "01".  That should be easy to remember now.
  2. Always "00".  Memorize it.
  3. Always "5E".  Memorize it.
  4. Mapped from the multicast group, keeping in mind that Dr. Deering only procured 7 of the 8 bits in this byte.
  5. Mapped directly from the multicast group.
  6. Mapped directly from the multicast group.



I find that knowing the origin story behind these things makes them much easier to remember than:
The least significant bit of the most significant byte is the multicast/broadcast flag.
The budgetary reasoning behind this technical decision, and the long term implications it has for filtering multicast at L2 is a real bummer.

10 comments:

  1. Hi Chris - this is one of the best explanations I have seen of mapping multicast IP addresses to MAC addresses.

    The picture is absolutely brilliant - truly worth a thousand words!

    Thank you very much - Sanjeev.

    ReplyDelete
  2. You're welcome. Thank you for taking the time to let me know you liked the post!

    ReplyDelete
  3. This is a fantastic explanation, just stumbled on it. Thank you very much for putting this together!

    ReplyDelete
  4. very minor point, but in the example for 32-bit multicast group address, I believe the first octet should be 11100000 vs 11110000 (that would be 240)

    ReplyDelete
    Replies
    1. Nuts. You're absolutely right. I don't think I'll be re-doing that artwork, however :)

      Delete
  5. Great Story! And great explanation.

    ReplyDelete
  6. Thanks for the detailed explanation.
    Your article has helped me to understand some very important points in learning multicast protocol.

    ReplyDelete
    Replies
    1. I'm happy to have helped. Thank you for letting me know.

      Delete
  7. I also believe the IEEE has a policy against issuing consecutive OUIs to stop application/protocol developers from coming up with all sorts of kludges.

    ReplyDelete
  8. This is a great post thanks for sharing it

    ReplyDelete