MVPN

Overview

An SP determines whether a particular VPN is multicast-enabled. If it is, it corresponds to a “Multicast Domain”. A PE which attaches to a particular multicast-enabled VPN is said to belong to the corresponding Multicast Domain. For each Multicast Domain, there is a default “Multicast Distribution Tree (MDT)” through the backbone, connecting ALL of the PEs that belong to that Multicast Domain. A given PE may be in as many Multicast Domains as there are VPNs attached to that PE. However, each Multicast Domain has its own MDT. The MDTs are created by running PIM in the backbone.

In a departure from the usual multicast tree distribution procedures, the Default MDT for a Multicast Domain is constructed automatically as the PEs in the domain come up. Construction of the Default MDT does not depend on the existence of multicast traffic in the domain; it will exist before any such multicast traffic is seen. Default MDTs correspond to the “MI-PMSIs” of [MVPN-ARCH].

Inclusive and Selective PMSIs

We will distinguish between two different kinds of PMSI(there is a third type of PMSI but I haven’t covered it here):

“Multidirectional Inclusive” PMSI (MI-PMSI) – [Also known as default mdt]: A Multidirectional Inclusive PMSI is one that enables ANY PE attaching to a particular MVPN to transmit a message such that it will be received by EVERY other PE attaching to that MVPN. There is at most one MI-PMSI per MVPN. An MI-PMSI can be thought of as an overlay broadcast network connecting the set of PEs supporting a particular MVPN.

“Selective” PMSI (S-PMSI) [Also known as data mdt]: A Selective PMSI is one that provides a mechanism wherein a particular PE in an MVPN can multicast messages so that they will be received by a subset of the other PEs of that MVPN. There may be an arbitrary number of S-PMSIs per PE per MVPN.

In BGP/IP MPLS VPNs, each CE router is a unicast routing adjacency of a PE router, but CE routers at different sites do NOT become unicast routing adjacencies of each other. This important characteristic is retained for multicast routing — a CE router becomes a PIM adjacency of a PE router, but CE routers at different sites do NOT become PIM adjacencies of each other. Multicast packets from within a VPN are received from a CE router by an ingress PE router. The ingress PE encapsulates the multicast packets and (initially) forwards them along the Default MDT tree to all the PE routers connected to sites of the given VPN. Every PE router attached to a site of the given VPN thus receives all multicast packets from within that VPN. If a particular PE routers is not on the path to any receiver of that multicast group, the PE simply discards that packet.

If a large amount of traffic is being sent to a particular multicast group, but that group does not have receivers at all the VPN sites it can be wasteful to forward that group’s traffic along the Default MDT. Therefore, we also specify a method for establishing individual MDTs for specific multicast groups. We call these “Data MDTs”. A Data MDT delivers VPN data traffic for a particular multicast group only to those PE routers which are on the path to receivers of that multicast group. Using a Data MDT has the benefit of reducing the amount of multicast traffic on the backbone, as well reducing the load on some of the PEs; it has the disadvantage of increasing the amount of state that must be maintained by the P routers. The SP has complete control over this tradeoff. Data MDTs correspond to the S-PMSIs.

Multicast VRFs

The notion of a “VRF”, defined in [RFC4364], is extended to include multicast routing entries as well as unicast routing entries. Each VRF has its own multicast routing table. When a multicast data or control packet is received from a particular CE device, multicast routing is done in the associated VRF. Each PE router runs a number of instances of PIM-SM, as many as one per VRF. In each instance of PIM-SM, the PE maintains a PIM adjacency with each of the PIM-capable CE routers associated with that VRF. The multicast routing table created by each instance is specific to the corresponding VRF. We will refer to these PIM instances as “VPN-specific PIM instances”, or “PIM C-instances”.

Each PE router also runs a “provider-wide” instance of PIM-SM (a “PIM P-instance”), in which it has a PIM adjacency with each of its IGP neighbors (i.e., with P routers), but NOT with any CE routers, and not with other PE routers (unless they happen to be adjacent in the SP’s network). The P routers also run the P-instance of PIM, but do NOT run a C-instance.

In order to help clarify when we are speaking of the PIM P-instance and when we are speaking of a a PIM C-instance, we will also apply the prefixes “P-” and “C-” respectively to control messages, addresses, etc. Thus a P-Join would be a PIM Join which is processed by the PIM P-instance, and a C-Join would be a PIM Join which is processed by a C-instance. A P-group address would be a group address in the SP’s address space, and a C-group address would be group address in a VPN’s address space.

Multicast Domains

Model of Operation

A “Multicast Domain (MD)” is essentially a set of VRFs associated with interfaces that can send multicast traffic to each other. From the standpoint of PIM C-instance, a multicast domain is equivalent to a multi-access interface. The PE routers in a given MD become PIM adjacencies of each other in the PIM C-instance.

Each multicast VRF is assigned to one MD. Each MD is configured with a distinct, multicast P-group address, called the “Default MDT group address”. This address is used to build the Default MDT for the MD.

When a PE router needs to send PIM C-instance control traffic to the other PE routers in the MD, it encapsulates the control traffic, with its own IPv4 address as source IP address and the Default MDT group address as destination IP address. Note that the Default MDT is part of P-instance of PIM, whereas the PEs that communicate over the Default MDT are PIM adjacencies in a C-instance. Within the C-instance, the Default MDT appears to be a multi-access network to which all the PEs are attached.

The Default MDT does not only carry the PIM control traffic of the MD’s PIM C-instance. It also, by default, carries the multicast data traffic of the C-instance. In some cases though, multicast data traffic in a particular MD will be sent on a Data MDT rather than on the Default MDT.

Multicast Tunnels

An MD can be thought of as a set of PE routers connected by a “multicast tunnel (MT)”. From the perspective of a VPN-specific PIM instance, an MT is a single multi-access interface. In the SP network, a single MT is realized as a Default MDT combined with zero or more Data MDTs.

Auto-Discovery

Any of the variants of PIM may be used to set up the Default MDT: PIM-SM, Bidirectional PIM [BIDIR], or PIM-SSM [SSM]. Except in the case of PIM-SSM, the PEs need only know the proper P-group address in order to begin setting up the Default MDTs. The PEs will then discover each others’ addresses by virtue of receiving PIM control traffic, e.g., PIM Hellos, sourced (and encapsulated) by each other. However, in the case of PIM-SSM, the necessary MDTs for an MD cannot be set up until each PE in the MD knows the source address of each of the other PEs in that same MD. This information needs to be auto-discovered.

A new BGP Address Family, MDT-SAFI is defined. The NLRI for this address family consists of an:

  • RD
  • IPv4 unicast address
  • multicast group address

A given PE router in a given MD constructs an NLRI in this family from:

  • Its own IPv4 address. If it has several, it uses the one which it will be placing in the IP source address field of multicast packets that it will be sending over the MDT.
  • An RD which has been assigned to the MD.
  • The P-group address, an IPv4 multicast address which is to be used as the IP destination address field of multicast packets that will be sent over the MDT.

When a PE distributes this NLRI via BGP, it may include a Route Target Extended Communities attribute. This RT must be an “Import RT” [RFC4364] of each VRF in the MD. The ordinary BGP distribution procedures used by [RFC4364] will then ensure that each PE learns the MDT-SAFI “address” of each of the other PEs in the MD, and that the learned MDT-SAFI addresses get associated with the right VRFs.

The encoding of the MDT-SAFI is specified in the following

subsection:

MDT-SAFI

BGP messages in which:-

  • AFI=1 and
  • SAFI=66

are “MDT-SAFI” messages.

The NLRI format is 8-byte-RD:IPv4-address followed by the MDT group address. i.e. The MP_REACH attribute for this SAFI will contain one or more tuples of the following form :

+——————————-+

|                                                       |

| RD:IPv4-address (12 octets) |

|                                                       |

+——————————-+

| Group Address (4 octets) |

+——————————-+

The IPv4 address identifies the PE that originated this route, and the RD identifies a VRF in that PE. The group address must be an IPv4 multicast group address, and is used to build the P-tunnels. All PEs attached to a given MVPN must specify the same group-address, even if the group is an SSM group. MDT-SAFI routes do not carry RTs, and the group address is used to associated a received MDT-SAFI route with a VRF.

More detail in MVPN can be found in draft-rosen-vpn-mcast-12.txt

Bidirectional PIM (bidir PIM)

Introduction

RFC5015 specifies Bidirectional PIM (BIDIR-PIM), a variant of PIM Sparse-Mode (PIM-SM) that builds bidirectional shared trees connecting multicast sources and receivers.

The shared tree for each multicast group is rooted at a multicast router called the Rendezvous Point (RP). Different multicast groups can use separate RPs within a PIM domain.

Bidirectional PIM dispenses with both encapsulation and source state by allowing packets to be natively forwarded from a source to the RP using shared tree state.

Terminology

RFC 5015 introduces some terminology for bidirectional PIM.

The following terms have special significance for BIDIR-PIM:

Multicast Routing Information Base (MRIB)

The multicast topology table, which is typically derived from the unicast routing table. It is used by PIM for establishing the RPF interface. In PIM-SM, the MRIB is also used to make decisions regarding where to forward Join/Prune messages, whereas in BIDIR-PIM, it is used as a source for routing metrics for the DF election process.

Rendezvous Point Address (RPA)

An RPA is an address that is used as the root of the distribution tree for a range of multicast groups. The RPA must be routable from all routers in the PIM domain. The RPA does not need to correspond to an address for an interface of a real router. In this respect, BIDIR-PIM differs from PIM-SM, which requires an actual router to be configured as the Rendezvous Point (RP). Join messages from receivers for a BIDIR-PIM group propagate hop-by-hop towards the RPA.

Rendezvous Point Link (RPL)

An RPL for a particular RPA is the physical link to which the RPA belongs. In BIDIR-PIM, all multicast traffic to groups mapping to a specific RPA is forwarded on the RPL of that RPA. The RPL is special within a BIDIR-PIM domain as it is the only link on which a Designated Forwarder election does not take place

Upstream

Towards the root (RPA) of the tree. The direction used by packets traveling from sources to the RPL.

Downstream

Away from the root of the tree. The direction on which packets travel from the RPL to receivers.

Designated Forwarder (DF)

The protocol presented in this document is largely based on the concept of a Designated Forwarder (DF). A single DF exists for each RPA on every link within a BIDIR-PIM domain (this includes both multi-access and point-to-point links). The only exception is the RPL on which no DF exists. The DF is the router on the link with the best route to the RPA (determined by comparing MRIB provided metrics). A DF for a given RPA is in charge of forwarding downstream traffic onto its link, and forwarding upstream traffic from its link towards the RPL. It does this for all the bidirectional groups that map to the RPA. The DF on a link is also responsible for processing Join messages from downstream routers on the link as well as ensuring that packets are forwarded to local receivers (discovered through a local membership mechanism such as IGMP).

State Summarization Macros

Using this state, we define the following “macro” definitions that we will use in the descriptions of the state machines and pseudocode in the following sections.

olist(G) =

RPF_interface(RPA(G)) (+) joins(G) (+) pim_include(G)

RPF_interface(RPA) is the interface the MRIB indicates would be used to route packets to RPA. The olist(G) is the list of interfaces on which packets to group G must be forwarded. The macro pim_include(G) indicates the interfaces to which traffic might be forwarded because of hosts that are local members on that interface.

Data Packet Forwarding Rules

The BIDIR-PIM packet forwarding rules are defined below in

pseudocode.

iif is the incoming interface of the packet.

G is the destination address of the packet (group address).

RPA is the Rendezvous Point Address for this group.

First we check to see whether the packet should be accepted based on TIB state and the interface that the packet arrived on. A packet is accepted if it arrives on the RPF interface to reach the RPA (downstream traveling packet) or if the router is the DF on the interface the packet arrives (upstream traveling packet). If the packet should be forwarded, we build an outgoing interface list for the packet. Finally, we remove the incoming interface from the outgoing interface list we’ve created, and if the resulting outgoing interface list is not empty, we forward the packet out of those interfaces.

On receipt of data to G on interface iif:

if( iif == RPF_interface(RPA) || I_am_DF(RPA,iif) ) {

oiflist = olist(G) (-) iif

forward packet on all interfaces in oiflist

}

Hopefully the above makes sense. If you want further details please see RFC 5015.

(Note: The above text was taken from RFC 5015)

PIM-SSM

IP version 4 (IPv4) addresses in the 232/8 (232.0.0.0 to 232.255.255.255) range are designated as source-specific multicast (SSM) destination addresses and are reserved for use by source-specific applications and protocols. For IP version 6 (IPv6), the address prefix FF3x::/32 is reserved for source-specific multicast use.

The SSM destination address 232.0.0.0 is reserved, and it must not be used as a destination address. Similarly, FF3x::4000:0000 is also reserved. The goal of reserving these two addresses is to preserve one invalid SSM destination for IPv4 and IPv6, which can be useful in an implementation as a null value.

The address range 232.0.0.1 – 232.0.0.255 is currently reserved for allocation by IANA.

The policy for allocating the rest of the SSM addresses to sending applications is strictly locally determined by the sending host.

When allocating SSM addresses dynamically, a host or host operating system MUST NOT allocate sequentially starting at the first allowed address. It is RECOMMENDED to allocate SSM addresses to applications randomly, while ensuring that allocated addresses are not given simultaneously to multiple applications (and avoiding the reserved addresses).

As described in the post “Layer 2 Multicast Addresses“, the mapping of an IP packet with an SSM destination address onto a link-layer multicast address does not take into account the datagram’s full source IP address (on commonly-used link layers like Ethernet). If all hosts started at the first allowed address, then with high probability, many source-specific channels on shared-medium local area networks would use the same link-layer multicast address. As a result, traffic destined for one channel subscriber would be delivered to another’s IP module, which would then have to discard the datagram.

Packet Forwarding

A router that receives an IP datagram with a source-specific destination address MUST silently drop it unless a neighboring host or router has communicated a desire to receive packets sent from the source and to the destination address of the received packet.

With PIM-SSM, successful establishment of an (S,G) forwarding path from the source S to any receiver depends on hop-by-hop forwarding of the explicit join request from the receiver toward the source. The protocol(s) and algorithms that are used to select the forwarding path for this explicit join must provide a loop-free path.

Protocol Behavior

A network can concurrently support SSM in the SSM address range and any-source multicast in the rest of the multicast address space, and it is expected that this will be commonplace. In such a network, a router may receive a non-source-specific, or “(*,G)” in conventional terminology, request for delivery of traffic in the SSM range from a neighbor that does not implement source-specific multicast in a manner compliant with this document. A router that receives such a non-source-specific request for data in the SSM range MUST NOT use the request to establish forwarding state and MUST NOT propagate the request to other neighboring routers. A router MAY log an error in such a case. This applies both to any request received from a host (e.g., an IGMPv1 or IGMPv2 [IGMPv2] host report) and to any request received from a routing protocol (e.g., a PIM-SM (*,G) join).

(The majority of the text above was taken from RFC4607.)