An update to this article; there’s a method to get around the issue I ran into. It is not well documented, and TAC knows nothing about it in my experience, but, as described at https://xrdocs.github.io/cloud-scale-networking/tutorials/ncs5500-routing-in-vrf/ you simply need to enable ‘label mode per-vrf’ for the BGP instance in the VRF; it will consume one lem entry instead of the full route table.
Two notes before we even begin:
- Run this if you’re doing full internet tables on either the 5501 or 5501SE:
hw-module fib ipv4 scale internet-optimized
The manual / config guide doesn’t even mention it; had to find it on a Cisco employee’s blog.
- (See my comment at the start of the article for a way around this issue if trying to do a full internet table in a VRF) Next, don’t try to run full internet tables on a BGP instance in a VRF, the NCS will die a s̶l̶o̶w̶ fast death. I tried to do this because I already had some config in place that was working and did not want internet mixing in with what was already defined. Doesn’t matter if you’re running base 5501 or the ‘scale’ edition, 5501SE with the external tcam. The issue is when BGP is in a VRF, all the incoming routes are no longer stored solely as IPv4 (and v6) prefixes, they’ve now become labeled routes since VRF’s use MPLS route distinguishers. The end result is you’ll exhaust your ipnh table (check it via “show contr npu resources all location 0/0/CPU0”) almost as soon as the tables start to populate since it can only hold 100,000 entries. If your router even stays up, one of mine kernel panic’d almost immediately, then your logs will fill up with this kind of thing:
LC/0/0/CPU0:Dec 16 19:33:28.972 : fib_mgr[128]: %ROUTING-FIB-3-PD_FAIL : FIB platform error: fib_leaf_ldi_lw_platform_update 626: PD action CREATE failed for lw_ldi 0x30c7847338 flags 0x8. Shared LDI 0x30c4d73698 num_slots 1 num_buckets 1 depth 3 ldi type 4 ldi protocol mpls flags 0x108441 pfx 203.199.41.0/24 : 0xaf9c0600 ‘DPA’ detected the ‘fatal’ condition ‘DPA Server – All FEC’s already allocated’ : fib_mgr : (PID=3443) : -Traceback= (TRUNCATED)
LC/0/0/CPU0:Dec 16 19:34:12.186 : fia_driver[285]: %PLATFORM-DPA-4-OOR_YELLOW : NPU 0, Table ipnhgroup
LC/0/0/CPU0:Dec 16 19:34:21.092 : fib_mgr[128]: %ROUTING-FIB-3-PLATF_UPD_FAIL : FIB platform update failed: Obj=DATA_TYPE_LDI_LW[ptr=0x30c7935fa0,refc=0x2,flags=0x8] Action=CREATE Proto=mpls. Cerr=’DPA’ detected the ‘fatal’ condition ‘DPA Server – All FEC’s already allocated’ : fib_mgr : (PID=3443) :
Assuming you’re not running full tables, and still want to proceed, it took a bit of work finding the right combination of directives to get BGP running in a VRF on this NCS5501(SE) with IOS XR 6.2. I kept getting this annoying error when trying to commit, and then just general BGP not coming up:
!!% 'BGP' detected the 'warning' condition 'The address family has not been initialized'
Apparently the above issue was being caused by the fact that I was trying to run BGP in a VRF, not at the root level, and IOS XR not appreciating that certain config bits were not defined in the root ‘router bgp’ stanza when you’re trying to add your real config in a VRF. The things I had to do to get BGP up for internal and external were:
- For either type of session, a missing route distinguisher at the BGP VRF level will keep it down. If you’re not doing MPLS or a more complex deployment, you can probably just make this number up, such as 1:1.
- If you will have no BGP at the root (non-VRF) level, then an ‘address-family vpnv4 unicast’ and ‘address-family ipv4 unicast’ at the root level must be defined regardless. So what you need to do is tweak the following to your site specifics:
rd-set 1:1 exit router bgp ASN address-family ipv4 unicast exit address-family vpnv4 unicast exit commit vrf WHATEVER rd 1:1 bgp router-id 192.0.2.2 address-family ipv4 unicast exit commit
Note that the ASN, VRF name, router id and route distinguisher will of course all differ from your implementation; the 192.0.2.2/32 is on my loop1 in vrf WHATEVER. Obviously adjust as needed.
That should make it happy and allow you to continue with adding neighbors.
- If you’re peering from a loop interface for iBGP, like you should so you can have path redundancy, then make sure to add an update source to the iBGP neighbor config on each side.
- On the eBGP side, you must have an in and out route-policy even if you don’t need one. So, you could simply add:
route-policy IN pass end-policy route-policy OUT pass end-policy router bgp ASN vrf WHATEVER neighbor 192.0.2.1 address-family ipv4 unicast route-policy IN in route-policy OUT out commit
Obviously you should create real route policies, including an equivalent to the old prefix-list to ensure you’re not advertising things you don’t intend to.
Did you try configuring “label-allocation-mode per-ce” or “label-allocation-mode per-vrf”?
https://www.cisco.com/c/en/us/td/docs/iosxr/ncs5500/bgp/b-ncs5500-bgp-cli-reference/b-ncs5500-bgp-cli-reference_chapter_01.html#wp7965331930
label mode vs label-allocation-mode :)
Unfortunately TAC is apparently not aware of that setting as their advice was just move full internet BGP back to default VRF. I found https://xrdocs.github.io/cloud-scale-networking/tutorials/2017-08-03-understanding-ncs5500-resources-s01e02/ which also doesn’t mention it, just says to use internet scale. Does label allocation mode still result in the same number of CAM lookups vs normal and keeping BGP in default VRF?
Good question. Doc says this for per-ce mode:
Each prefix that belongs to a VRF instance is advertised with a single label, causing an additional lookup to be performed in the VRF forwarding table to determine the customer edge (CE) next hop for the packet. Use the label-allocation-mode command with the per-ce keyword to avoid the additional lookup on the PE router and conserve label space. This mode allows the PE router to allocate one label for every immediate next hop. The label is directly mapped to the next hop so there is no VRF route lookup performed during data forwarding. However, the number of labels allocated is one for each CE rather than one for each prefix.
And this:
In the Resilient Per-CE label allocation scheme, BGP installs a unique rewrite label in LSD for every unique set of CE paths or next hops. There may be one or more prefixes in BGP table that points to this label. BGP also installs the CE paths (primary) and optionally a backup PE path into RIB. FIB learns about the label rewrite information from LSD and the IP paths from RIB. In steady state, labeled traffic destined to the resilient per-CE label is load balanced across all the CE next hops. When all the CE paths fail, any traffic destined to that label will result in an IP lookup and will be forwarded towards the backup PE path, if available. This action is performed on the label independently of the number of prefixes that may point to the label, resulting in the PIC behavior during primary paths failure.
Doesn’t comment on per-vrf.
I don’t currently have a NCS-55xx to test with. Do you have a contact in the NCS-55xx BU? Nicholas Fevrier is very knowledgeable.
Seems like per-vrf/per-ce is worth testing in the lab at least, despite what TAC says…