This is just a little diary entry for my first install of some NCS5501-SE devices. I continue to edit it as I find more and more that is broken on this platform. Look towards the bottom for 2022 and 2024 upgrade woes.
Initially I said there’s pros and cons to going with it compared to an Arista equivalent, but after a series of probably eight edits, now there’s simple no reason at all, that I can think of, to purchase this mostly broken NCS5501 / NCS5501SE over Arista equivalents, and even more so knowing now what’s on the Arista horizon in early 2018. (Edit, 2024, Arista is so freaking easy to upgrade, man. And look out, I’m about to post a new article about going Arista on the edge, with full dual stack BGP feeds). Ugh; feels like waking up with a hangover. First, the summary of my complaints, then explanations of them, and the initial experience diary entry.
Changelog:
- 20171203 – initial
- 20171205 – uRPF update #1
- 20171205 – VRRP on BVI update
- 20171206 – MC-LAG not active-active
- 20171215 – BFD broken
- 20180103 – Version numbering issue
- 20180226 – uRPF update #2
- 20180328 – storage issue w/core dumps, more upgrade failures
- 20180413 – SNMP missing Cisco’s cbgpPeer2RemoteAddr BGP4-MIB OID
- 20220831 – Upgrade failures
- 20240513 – Upgrade failures, as usual
Gripe Summary
- Minimum 2x higher cost than the currently shipping (Jan 2018) Arista 7280R2, while physically having two less 100gig ports (4 vs 6), (40) 1/10 SFP+ ports vs Arista’s (48) 1/10/25 SFP+ ports, and license-wise, only having (8) 1/10 ports active in the base price.
- Potentially as much as 4x higher cost than Arista 7280R2 if you license all the ports; 100gig ports are $12k/ea retail, as are 8-count 10gig licenses.
- NO active/active MC-LAG or vPC functionality; so half the participants in your multi-chassis port channels / bundles are going to sit in a standby state costing you money and doing nothing until a failure occurs.
- uRPF is completely broken unless you intentionally disable the TCAM optimizations for route scale. This means if you’re using this thing with internet route tables, the base 5501 can’t do uRPF and a full table, and the ‘scale’ SE model just went from the highly touted 2.7M route scale to less than the 1.4M+ the Arista can do. So your choice is, if you’re using this thing as an internet router, then you either permit bogons into your network, or you intentionally downgrade the capacity; and if you’re just discarded the main selling point of the SE model, what’s the point of buying it? Confirmed with TAC.
- VRRP is the only first hop redundancy protocol supported (which I’m fine with), but it doesn’t function on bridge virtual interfaces (BVI’s). Summary version is if you’re using these for any switching, and have a pair serving the same bridge domain (think vlan from IOS) for redundancy, they can’t also do VRRP, so they can’t be the first hop. You’re going to need to buy switches to handle your layer 2 stuff and attach these via physical interfaces to do VRRP; so your deployment just got more expensive and complex.
- Broken BFD if you use bundle interfaces (aka combining multiple interfaces, e.g. port channels from IOS, LAG from Brocade). If you use BFD for rapid link failure detection to help your routing protocols converge quicker, and you use bundle interfaces, you have to enable ‘bfd multipath’, which on this platform gets you the wonderful error
!!% 'bfd_api' detected the 'fatal' condition 'unsupported request'
You could also just try enabling it at the BGP level; similar problem, it will take the config, but if you look at your BGP neighbor detail, you’ll see crap like this:
BFD disabled (interface type not supported)
So basically you can use BFD if you’re running nothing but physical single links, otherwise you’re screwed by this platform, yet again.
- Broken BFD if you use VRRP – even if you buy switches and work around the no VRRP on bridge interfaces issue, you still can’t do BFD to gain rapid failover for your VRRP sessions.
- No support for any multi-chassis active/active forwarding (like Arista’s VARP), but I guess that’s a given since there’s no support for an active-active bundle interface.
- MST is the only spanning tree flavor supported.
- Broken SFTP software installer if software source is in a VRF; have to revert to insecure FTP.
- Rack rails, not included
- NEBS kit, to seal the front of the rack around the device’s weird sloped-down front face, not included.
- SNMP implementation is missing the cbgpPeer2RemoteAddr OID (from the CISCO-BGP4-MIB mib), so no way to query for the remote peer address of configured IPv6 peers to build dynamic monitoring rules from.
Long Review
Anyway, I wanted some nice 1U dual-stack full-table edge routers at a reasonable price, but with two caveats; one, I needed a little bit of layer 2 mixed in there, and two, I wanted route scale that would guarantee me it would be happy for at least three years from end of 2017 when the full internet tables are roughly 670k IPv4 + 45k IPv6 routes. As IPv4 becomes harder to obtain, I anticipate the quantity of advertised /24’s continuing to shoot up as people buy and sell ever-smaller blocks, so I wanted something that I’d have no worries about handling 1.2M+ routes plus whatever IPv6 is at at the time.
With Cisco SE suggestions, I went with this model, in the -SE designation, because of its ~2.7M route FIB thanks to the large external TCAM; it was the only device they have which has that scale in a 1U format and price range I was targeting. The next step(s) up in the Cisco lineup to give you similar route scale and port density would be something like Nexus 7k or ASR9k, and we’re talking couple hundred grand higher list price.
Arista has a better device from a port-count and 1U perspective, with much lower price, however, as of December 2017 it’s only going to get you “1.4M+” routes, with no real expectation if the + will actually turn into greater scale later. If 1.4M routes is fine though, you probably already have a better solution from Arista for reasons I’ll explain further down. Additionally, if you make some NCS5501-SE config changes to get around the broken uRPF, you will actually end up being able to support slightly LESS routes than the Arista can already do, and don’t even mention the standard 5501, you’ve got a useless box there if you need uRPF and internet route tables; see below. Bottom line; if you REQUIRE a functional uRPF, along with high route scale, the NCS5501 (regular and SE) is useless.
Now, what I’ve heard, is that in latter Q1 2018, we’re going to see shipping start for a Arista 7280CR2K, which is a 2M route 30x100Gbe device. Supposedly, the same chipset that allows it to do that is going to trickle its way down to a (48) 1/10/25 + (?) 100gig stackable device. In comparison to the NCS, this means you get 48+ ports that don’t require a license, they do 25gig if that matters, you can do all layer 2 paths active simultaneously via MLAG (vs Cisco NCS5500-series implementation of MC-LAG which puts the second chassis’ piece of the bundle in standby), and you can do all layer 3 forwarding paths active simultaneously via VARP (which Cisco has no counter for in the NCS line, you have to go to NX-OS-based devices that are several hundred thousand higher list price to get the same layer2+layer3 active/active via vPC+VRRP/HSRP). If I had time to wait for those, there’d be no reason to use the NCS unless there’s some features Arista hasn’t implemented that I haven’t come across yet. Well, I won’t go quite that far, the Cisco RPL support in IOS-XR is pretty awesome, but, I’d live without RPL to spend a quarter of the price.
First my gripes:
- Nickel & Diming pt 1: The license fees from Cisco on the NCS55xx series are simply stupid, you get nickel and dimed to death even if you negotiated a reasonable discount up front. There are 5+ add-on software licenses depending on what you want to use your hardware for, then, port license. Yes, barf, port licenses. Cisco charges you roughly $12k list to use blocks of eight 10gig ports, or per 100gig port, so you better negotiate what you think you’ll need up front if you don’t have another purchase to piggy back it onto in the future where you have more negotiating room. Arista gives you a switch that, imagine this, actually lets you use all its ports without paying extra; what a concept!!
- For top of rack or leaf-spine, Arista’s MLAG and VARP stuff is so much easier to deal with than Cisco’s equivalents; you can have an active-active redundant layer 2/3 setup going in a few minutes. You don’t need a complex bundle + add interfaces to bundle + iccp + mpls ldp + mlacp (MC-LAG) + neighbor + add mlacp (MC-LAG) & iccp settings back to bundle + bridge group + bridge domain + interfaces to bridge + add BVI + add BVI to bridge. By the time you finish getting it going you’re like wtf have I been doing for the past half hour. And…. next point is a continuation from here.
- Plain and simple, Arista’s MLAG does multi-chassis LAG in an active-active fashion, like Cisco’s virtual port channel, which this platform doesn’t support. If you buy these, and spend a bunch of money on the port licenses, and had planned to use multi-chassis LAGs, well, you just spent a bunch of money for half your ports to go unused until there’s a failure, at which point the other half will be used.
- Speaking of layer 2, the NCS5501 only supports MST, so if you have a simple deployment and were hoping to keep using PVST because MST is more complex then your environment requires, well, too bad.
- (Note: I had an enlightening call with some folks at Cisco outside of TAC and there may be alternatives to get around this issue; will update soon. -20180403) Let’s move on to Layer 3. This platform has NO first hop redundancy options available if you’re trying to use it in a combination mode of layer 2 and 3. Specifically, if you are trying to use this device where you have more than one port in a bridge domain for switching, then you add a BVI interface (think VLAN interface if you’re not used to this model), you can’t do VRRP on it (and HSRP/GLBP not supported at all platform-wide). So, you can do switching, or you can do routing, but you can’t do both if you expected this router to act as a high availability default gateway for downstream devices.
- uRPF is broken. This thing is billed as a ‘scale’ device designed to hold full internet route tables. Well, that being the case, you’d think common security features like uRPF would work as where else would you use that other than on an internet router. I turned it on and my ports began dropping all traffic on the external-facing interfaces even though they had full BGP tables running and all relevant routes installed. Well, to fix that (per TAC), you need to turn off all external TCAM optimizations via:
hw-module fib ipv4 scale host-optimized-disable hw-module fib ipv6 scale internet-optimized-disable hw-module tcam fib ipv4 scaledisable
Also confirmed by TAC is that this reduces your route capacity by half. Fantastic; a core feature that has been enabled on nearly any edge router for years turns your scale router into commodity crap.
- BFD – broken. Tried to make use of it. Added to OSPF config, didn’t come up. Added to BGP config, didn’t come up. Added to bundle ethernet, hey, it came up:
RP/0/RP0/CPU0:rtr2#show bfd session Mon Feb 26 21:42:39.300 UTC Interface Dest Addr Local det time(int*mult) State Echo Async H/W NPU ------------------- --------------- ---------------- ---------------- ---------- Te0/0/0/12 192.0.2.1 0s(0s*0) 450ms(150ms*3) UP Yes 0/0/CPU0 Te0/0/0/13 192.0.2.1 0s(0s*0) 450ms(150ms*3) UP Yes 0/0/CPU0 Te0/0/0/15 192.0.2.1 0s(0s*0) 450ms(150ms*3) UP Yes 0/0/CPU0 BE1 192.0.2.1 n/a n/a UP No n/a
Well, except for the fact that the routing protocols can’t actually use it. At the OSPF neighbor level, you’ll see this:
Neighbor BFD status: Waiting to create BFD session create
At the BGP neighbor level, you’ll see this:
BFD disabled (interface type not supported)
Dig a little deeper and you’ll find that IOS XR needs bfd multi path to work on bundle interfaces. If you try to add the relevant config, you’ll get a failure, explained as follows:
bfd multipath include location 0/0/CPU0 !!% 'bfd_api' detected the 'fatal' condition 'unsupported request' !
So yeah, if you’re a normal person who uses link bundles for redundancy, guess you won’t be using BFD on this platform. WTF; does anything work on here?
- Nickel & Diming pt 2: Like some of the UCS Fabric boxes, these NCS5501’s slope down from top edge to ports. In the data centers where I have these, this creates a problem because it allows an air gap between the NCS and the area above it if there’s no equipment above it to close that gap. Data centers employing heat containment may forbid these gaps, as does NEBS. How to solve? Buy a NEBS kit for it NCS-1RU-NEBS-KIT. Yes, we’ll make a device with a weird shape and then sell you a kit to convert its profile back to normal.
- Nickel & Diming pt 3: Finally, it doesn’t even include the rack mount kit. Is there a contingent of customers installing these things on someone’s desktop?! Add on NCS-1RU-ACC-KIT.
- If Cisco’s developers mess up code version numbering, who cares, roll it out anyway. At the time of this writing, 6.2.3 is newer than 6.2.25, because they had meant for it to be 6.2.2.5, but since it was ready to go and already numbered, they just rolled it out anyway.
Okay, lets get these bad boys installed. Two units bought at the same time, of course they don’t come with the same IOS XR or firmware, so check it and don’t assume multiple devices will be matching. This was also my first foray into IOS XR, so flying by seat of the pants with this one. So far, I really like it; blank config without a million dumb things to disable before getting into the config. One thing that initially caught me off guard was a combination of no longer needing to copy running config to startup config (but you still should for backup purposes) AND interfaces resetting to an admin-down state if they have no other configuration present. So, when just testing layer 2, I flipped all the interfaces on, did some global config work, reloaded, and all my ports went back down. I wasted some time thinking the device was not saving my config when I really just needed to add a description, or any other setting, to the ports I had ‘no shut’ and then they stayed up through reboots before real config was put on.
Before upgrading XR, do yourself a favor and “fpd auto-upgrade enable”
Okay, so first issue I ran into is trying to upgrade IOS XR when the source interface for pulling the new image is in a VRF. This Cisco page:
suggests:
4) File Server in a VRF? This is how an install add is performed when the file server is reachable inside a VRF, in this example the VRF name is “management”.
A9K-PE3(admin)#install add source ftp://user:[email protected];management/ asr9k-px-5.1.3.CSCef12345.pie asr9k-px-5.1.3.CSCab67890.pie activate
Well, that didn’t work. tcpdump confirmed the attempt never even went out. What I instead found would work is explicitly setting the directive (using their vrf example name management) ssh client:
ssh client vrf management
Then re-running the install command without specifying the VRF succeeded:
install add source sftp://[email protected]:/home/ncsupgrade/ ncs5500-mini-x-6.2.25.iso
where this did NOT work:
install add source sftp://[email protected];management:/home/ncsupgrade/ ncs5500-mini-x-6.2.25.iso
I could find no solution to getting the install to run from a non-standard (not 22) port, so had to move the sftp server to accommodate IOS XR. And, as luck would have it, not out of the wood work on this upgrade issue yet either. One of my NCS’s came broken, so I had to replace it before even getting started. The replacement unit, since it was at a depot, did not have the k9 image on it, so no support for SSH. The above trick of specifying the vrf for ssh did NOT work for ftp. I did “ftp client vrf management passive” then re-ran:
install add source ftp://[email protected]:/home/ncsupgrade/ ncs5500-mini-x-6.2.25.iso
and no connection attempt was made to the source server. I tried the copy command too, no bueno. Finally, I figured out that while the install command didn’t honor the VRF designation, the copy command did, so I was able to:
copy ftp://user:[email protected];management harddisk:
then filled in the answers for paths and files interactively to get the file onto harddisk:. But wait, it’s not that easy, don’t give it a full path as that seems to confuse things. It will prompt you like this:
Address or name of remote host [192.0.2.1]? Source filename [/ftp:]?ncs5500-mini-x-6.2.25.iso Destination filename [/harddisk:/ncs5500-mini-x-6.2.25.iso]?
Notice how in the above, it defaulted to /ftp as source and I gave it a filename without a path? The real path is /home/user/ncs5500-mini-x-6.2.25.iso but giving it that didn’t work. Specifying just the raw filename, and having the file in the remote user’s home directory seemed to be the key to solving this problem.
If I tried to put the vrf and paths in the command, it would screw up the copy somehow.
Well, of course both upgrades failed, why not; if you spend thousands on equipment, upgrades shouldn’t be easy.
#Dec 13 02:28:13 Install operation 11 aborted
RP/0/RP0/CPU0:Dec 13 02:28:13.947 : sdr_instmgr[1180]: %INSTALL-INSTMGR-3-OPERATION_ABORT : Install operation 11 aborted
This was due to my trying to install the .iso file with the extension included, since the docs are not clear. You really want to do a ‘show install repo’ and then install the mini ISO file without the extension, as will be presented in that ‘show install repo’ list, for example:
install prepare ncs5500-mini-x-6.2.25
Well, that will run a lot longer, minutes, but will still fail. Feeling a trend here?
RP/0/RP0/CPU0:Dec 13 03:00:50.694 : sdr_instmgr[1180]: %INSTALL-INSTMGR-3-OPERATION_ABORT : Install operation 13 aborted
‘show install log’:
Dec 13 02:28:12 Error! The following package(s) is/are required to be activated as part of this operation: ncs5500-mpls ncs5500-mgbl ncs5500-mpls-te-rsvp ncs5500-ospf ncs5500-isis ncs5500-k9sec
Okay, so, now we need to copy all the other rpm files that came out of the downloaded tar file up to the router, even though the directions on Cisco’s site beg to differ. So, from first broken router, the one with SSH available:
install prepare ncs5500-mini-x-6.2.25 ncs5500-isis-2.2.0.0-r6225.x86_64 ncs5500-mgbl-3.0.0.0-r6225.x86_64 ncs5500-k9sec-3.2.0.0-r6225.x86_64 ncs5500-ospf-2.0.0.0-r6225.x86_64 ncs5500-mpls-2.1.0.0-r6225.x86_64 ncs5500-mpls-te-rsvp-2.3.0.0-r6225.x86_64
and from second broken router, the one reliant on ftp, do the same copy command to get the six missing rpm’s onto the router’s hard drive. Then you can re-run the install prepare like the above. Finally, you can run install activate.
install prepare ncs5500-mini-x-6.2.25 ncs5500-mpls-2.1.0.0-r6225.x86_64 ncs5500-mgbl-3.0.0.0-r6225.x86_64 ncs5500-ospf-2.0.0.0-r6225.x86_64 ncs5500-mpls-te-rsvp-2.3.0.0-r6225.x86_64 ncs5500-isis-2.2.0.0-r6225.x86_64 ncs5500-k9sec-3.2.0.0-r6225.x86_64 Dec 13 03:47:53 Package list: Dec 13 03:47:53 ncs5500-mini-x-6.2.25 Dec 13 03:47:53 ncs5500-mpls-2.1.0.0-r6225.x86_64 Dec 13 03:47:53 ncs5500-mgbl-3.0.0.0-r6225.x86_64 Dec 13 03:47:53 ncs5500-ospf-2.0.0.0-r6225.x86_64 Dec 13 03:47:53 ncs5500-mpls-te-rsvp-2.3.0.0-r6225.x86_64 Dec 13 03:47:53 ncs5500-isis-2.2.0.0-r6225.x86_64 Dec 13 03:47:53 ncs5500-k9sec-3.2.0.0-r6225.x86_64 RP/0/RP0/CPU0:router1#install activate Wed Dec 13 03:59:32.259 UTC Dec 13 03:59:33 Install operation 18 started by admin: install activate This install operation will reload the sdr, continue? [yes/no]:[yes]
Oh Emm Gee; we have an upgraded device!
RP/0/RP0/CPU0:router1#show hw-module fpd Wed Dec 13 04:38:42.696 UTC FPD Versions ================= Location Card type HWver FPD device ATR Status Running Programd ------------------------------------------------------------------------------ 0/RP0 NCS-5501-SE 1.1 Bootloader CURRENT 1.15 1.15 0/RP0 NCS-5501-SE 1.1 CPU-IOFPGA CURRENT 1.14 1.14 0/RP0 NCS-5501-SE 1.1 MB-IOFPGA CURRENT 1.07 1.07 0/RP0 NCS-5501-SE 1.1 MB-MIFPGA CURRENT 1.02 1.02 RP/0/RP0/CPU0:router1#sh ver Wed Dec 13 04:50:28.537 UTC Cisco IOS XR Software, Version 6.2.25 Copyright (c) 2013-2017 by Cisco Systems, Inc. Build Information: Built By : ahoang Built On : Thu Sep 28 20:01:45 PDT 2017 Build Host : iox-lnx-057 Workspace : /auto/srcarchive12/production/6.2.25/ncs5500/workspace Version : 6.2.25 Location : /opt/cisco/XR/packages/ cisco NCS-5500 () processor System uptime is 13 minutes
If everything is cool, verify, commit and then remove the older packages from the devices:
install verify packages install commit install remove inactive all
If you had to do any updates via copy command, then you need to remove the source rpm and iso files from harddisk: too.
If you run into issues while checking the FPD versions, you may find some of them are stuck in a needing upgrade or failed to upgrade state; particularly CPU-IOFPGA. There are two common workarounds to this. From ‘admin’ mode, force an upgrade of both the Bootloader (first) and CPU-IOFPGA (second) via:
upgrade hw-module location all fpd Bootloader force
# before proceeding with the next command, you'll need to
# just watch the output of "show hw-module fpd" and wait
# for it to complete, because it does this in the background
# and there's not an option for a 'sync' argument to force
# it to foreground.
upgrade hw-module location all fpd CPU-IOFPGA force
# again, wait for the above to fully complete! Now, you must do a FULL reload of the device, not just the virtual machine.
hw-module location all reload
Update for March 2018; some co-workers have been having issues ssh’ing into these devices, and after much trying, that ultimately turned into this being logged:
0/RP0/ADMIN0:Mar 27 18:52:21.379 : mediasvr[2290]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : /misc/disk1 exceeded 93%
Thought that was kind of weird. I needed to update these to 6.2.3 from 6.2.25 (yes, 25 is older than 3, see above) anyway, so figured I’d just do that. Well, upgrades failed too, with:
Mar 27 21:35:30 No space left on device: (No space in /misc/disk1/tmp_staging/2/ to download pkgs from xr (default-sdr) to admin) ERROR! No enough space to proceed with ADD operation. 1. Please ensure that the free space in 'harddisk:' of Sysadmin and SDR is at least twice the total size of all packages being added 2. Please ensure that the free space in 'rootfs' of Sysadmin and SDR is enough to hold the total size of all the packages being installed Please consider following steps to proceed : 1. 'install remove' unwanted inactive packages and ISOs. 2. Delete old core files from 'harddisk:' on Sysadmin and SDR . 3. Remove unnecessary data from 'rootfs' and 'harddisk:' of Sysadmin and SDR
Well I’d be inclined to believe the above if the device’s own commands matched:
RP/0/RP0/CPU0:router2#show media Tue Mar 27 21:37:17.347 UTC</pre> Tue Mar 27 21:37:17.347 UTC Media Information for local node. ---------------------------------------------- Partition Size Used Percent Avail rootfs: 3.9G 1.2G 33% 2.5G apphost: 3.7G 106M 3% 3.4G /dev/sde 969M 361M 40% 542M harddisk: 5.6G 1.3G 24% 4.1G log: 459M 101M 24% 324M config: 459M 3.6M 1% 421M disk0: 2.0G 19M 1% 1.8G --------------------------------------------------- rootfs: = root file system (read-only)
log: = system log files (read-only) config: = configuration storage (read-only)
Umm, nothing seems anywhere close to out of space, but especially not harddisk: where it’s trying to extract. Ultimately I figured out that the ‘show media’ output isn’t accurate; once I did ‘admin’ and then ‘run’ to get to a bash prompt, I was able to find several gigs worth of core dump files in /misc/disk1/ named with the format default-sdr–2.date-date.core.0_RP0.lxcdump.tar.lz4. I removed those, then upgrade was able to proceed.
Upgrade didn’t solve the SSH issue. However, again, from within admin->run, I was able to determine that every time someone running putty on Windows tried to ssh in, ssh would crash(?!) and leave a core dump:
411 -rw-r--r-- 1 104491 Mar 27 18:51 sshd_child_handler_832.by.11.20180327-185123.xr-vm_node0_RP0_CPU0.02a5b.core.txt
We determined agent forwarding had to be off, then the crashes stopped.
Next article will be creating bundles (e.g. port channels, MC-LAG, MLAG, LACP, whatever your preferred vendor calls them), locally and cross-chassis, and then coming up with a next hop redundancy config. BGP after that.
SNMP would normally be last after everything is happy, but ran into one issue so I’ll just mention it here. I build dynamic monitoring profiles for routers where they’re queried for their eBGP neighbors and then monitoring rules are configured based on the data returned. For IPv4, you can use the non-Cisco BGP4-MIB and query for , but for IPv6, you must use the Cisco-provided CISCO-BGP4-MIB with the cbgpPeer2RemoteAddr object. Well, this platform doesn’t implement that object, so only way to get your IPv6 peers’ remote addresses is to query some other value each peer would have, such as cbgpPeer2State.2.16 (the .2.16 = IPv6-specific peers), chop out the portion following that from the child OID’s returned, since the value itself is not what you’re looking for, then build the rest of your rules off of that. To display the actual address, you’ll need to convert the child OID data back to hex and format it back into IPv6.
Just a 2022 update to this. It’s kind of funny but some of the above article had been improved upon, making many of the steps, and missteps, irrelevant. There was a period of time where you could just ‘install add source’ on the tar file that Cisco offered up for download, and it would do all the magic for you with the iso + SMU’s (the .rpm files). So all the above mess basically had turned into:
# install add source harddisk:/ NCS5500-iosxr-k9-7.0.2.tar synchronous
Aug 31 12:57:41 Action 1: install add action started
Aug 31 13:00:35 Action 1: install add action completed successfully
Aug 31 13:00:37 Install operation 52 finished successfully
Aug 31 13:00:37 Ending operation 52
The tar file could be local or remote. Then you’d simply “install activate”, let it reload, and do the cleanup tasks once validated.
Well, they broke that too. What had been working reverted to:
install add source harddisk:/ NCS5500-iosxr-k9-7.5.2.tar
Aug 31 14:03:07 Action 1: install add action started
Aug 31 14:03:12 Install operation will continue in the background
Aug 31 14:05:58 WARNING : the following files/packages were not added to the Software Repository due to unsupported file format
README-NCS5500-iosxr-k9-7.5.2.txt
Aug 31 14:09:53 On Node 0/RP0 Sysadmin: Failed to scp with error: /misc/disk1/tftpboot/ncs5500-mini-x-7.5.2: No space left on device
Aug 31 14:09:54 ERROR!! failed while distributing packages to sysadmin
Aug 31 14:09:57 Install operation 57 aborted
Aug 31 14:09:57 Ending operation 57
Yes, I did check that there were no inactive images, and no core files from crashed processes (you’d think their installer could look for those). I happened across this bug (https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvv35484) that shows itself as ‘resolved’ but doesn’t bother to tell you when or in what versions, and given 7.5.2 is a currently recommended release, but still blows up, no it doesn’t seem to be resolved Cisco.
So, funny thing about all of that is you need to go back to the long process I’d outlined back in 2017, where you install add source the .iso file separately after extracting it from the tar, then install add source all the other relevant files from the tar which match your system. Then, you can prepare them:
install prepare pkg ncs5500-mini-x-7.5.2 ncs5500-mgbl-3.0.0.0-r752 ncs5500-k9sec-3.2.0.0-r752 ncs5500-ospf-2.0.0.0-r752 ncs5500-li-1.0.0.0-r752 ncs5500-mpls-2.1.0.0-r752 ncs5500-isis-2.1.0.0-r752 ncs5500-mpls-te-rsvp-3.1.0.0-r752 ncs5500-mcast-3.0.0.0-r752
And finally you can install activate.
Oh, one more thing. They happily add some call home BS to your config without asking your permission. Even worse, it will reactivate itself across upgrades. It first showed up in 6.6.3, I marked it inactive. It came back in 7.0.2, marked it inactive. Came back in 7.5.2 and now they refuse to let you remove the CiscoTAC-1 profile. If you mark it inactive, you also have to remove smart-licensing now, and then it will still leave http active and attempt to call home anyway until you go into the profile and mark it ‘no active’, and then go further to mark the http method as disabled:
call-home
service active
contact smart-licensing
profile CiscoTAC-1
active
destination transport-method email disable
destination transport-method http
!
!
2024 Update
Trying to go from 7.5.2 to 7.7.21; fun fun in Cisco land like usual. What used to be a VRF specified after the IP has been removed from the examples in 7.5.2:
RP/0/RP0/CPU0:rtr1#install source ?
WORD Enter source directory for the package(s)
Example:
sftp://user[:password]@server/directory/
scp://user[:password]@server/directory/
ftp://user[:password]@server/directory/
tftp://server/directory/
http://server/directory/
/directory/
RP/0/RP0/CPU0:rtr1#install add source ?
WORD Password in cli for sftp/ftp is supported only in exr platforms
Enter source directory for the package(s)
Example:
tftp://server/directory/
harddisk:/directory
sftp://user[:password]@server:/directory/
ftp://user[:password]@server:/directory/
http://[user:password@]server/directory/
https://[user:password@]server/directory/
ftp://user[:password]@server;VRF/directory/
/dir/
Code language: PHP (php)
Yep, and given the images keep increasing in size, you still can’t install from tar file locally like you used to be able to. Arista fixed this issue a while ago as image sizes grew, like years ago. I literally just installed a modern EOS on an Arista device I purchased in 2016, eight years ago. I used Arista’s install source option to upgrade in place, couple easy commands, done. Link.
So, literally, in that above example you clearly see that install add source suggests the VRF can be included in the ftp protocol upgrade option. Want to know what happens when you try it? Typical bullshit:
2024-05-13 23:25:06 Install operation 71 aborted
Code language: CSS (css)
If you try via scp, in the ways that worked previously, yep that fails too. It tries to pass your vrf name into the remote host:
RP/0/RP0/CPU0:rtr1#install source scp://[email protected];mgmtvrf/home/update ncs5500-mini-x-7.7.21.iso
Mon May 13 23:04:13.580 UTC
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2024-05-13 23:04:22 Install operation 67 started by rtradmin:
2024-05-13 23:04:22 install source scp://[email protected];mgmtvrf/home/update ncs5500-mini-x-7.7.21.iso
2024-05-13 23:04:25 No install operation in progress at this moment
2024-05-13 23:04:25 Checking system is ready for install operation
2024-05-13 23:04:26 'install source' in progress
2024-05-13 23:04:26 ISO ncs5500-mini-x-7.7.21.iso in input package list. Going to upgrade the system to version 7.7.21.
Enter password for update@192.0.2.1;mgmtvrf:
2024-05-13 23:04:32 Scheme : scp
2024-05-13 23:04:32 Hostname : 192.0.2.1;mgmtvrf
2024-05-13 23:04:32 Username : update
2024-05-13 23:04:32 SourceDir : home/update
2024-05-13 23:04:32 Collecting software state..
2024-05-13 23:04:32 Getting platform
2024-05-13 23:04:32 Getting supported architecture
2024-05-13 23:04:32 Getting active packages from XR
2024-05-13 23:04:32 Getting inactive packages from XR
2024-05-13 23:04:38 Getting list of RPMs in local repo
2024-05-13 23:04:38 Getting list of provides of all active packages
2024-05-13 23:04:38 Getting provides of each rpm in repo
2024-05-13 23:04:38 Getting requires of each rpm in repo
2024-05-13 23:06:50 ssh: connect to host 192.0.2.1 port 22: Connection timed out
sh: mgmtvrf: command not found
2024-05-13 23:06:50 Failed to list files in /home/update
2024-05-13 23:06:50 Ending operation 67
2024-05-13 23:06:50 Install operation 67 aborted
Code language: PHP (php)
So, we’ll head on back to the command line to search for more options. Ah, VRF has moved to an additional command line argument:
RP/0/RP0/CPU0:rtr1#install add source sftp://192.1.0.1:/dir ?
WORD package name
vrf Name of the vrf interface to be used
(cisco-support)
Okay so we’re back to a potential solution. Let’s try. It worked. Here’s your steps. Accumulate your packages, start with the mini iso:
RP/0/RP0/CPU0:rtr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-mini-x-7.7.21.iso synchronous
2024-05-14 00:23:06 Action 1: install add action started
2024-05-14 00:30:07 Packages added:
2024-05-14 00:30:07 ncs5500-mini-x-7.7.21
2024-05-14 00:30:07 Action 1: install add action completed successfully
2024-05-14 00:30:08 Install operation 74 finished successfully
Code language: PHP (php)
It worked!!
Moving on, one at a time (and see my note immediately after this segment):
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-k9sec-3.1.0.0-r7721.x86_64.rpm synchronous
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-li-1.0.0.0-r7721.x86_64.rpm synchronous
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-lictrl-1.0.0.0-r7721.x86_64.rpm synch
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-mcast-3.0.0.0-r7721.x86_64.rpm synch
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-mgbl-3.0.0.0-r7721.x86_64.rpm synch
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-mpls-2.1.0.0-r7721.x86_64.rpm synch
RP/0/RP0/CPU0:bdr1#install add source sftp://cisco:[email protected]:/home/cisco/ vrf mgmtvrf ncs5500-mpls-te-rsvp-3.1.0.0-r7721.x86_64.rpm synch
Code language: PHP (php)
I messed up the above and left off the mpls-te-rsvp package, while my system requires it. There appears to be yet another installer bug which allows you to move on to the prepare stage without it, and the prepare stage takes you past the ‘point of no return’ without it, rather than checking for it and telling you it’s missing like the other packages would. So, be very careful to not miss the mpls-te-rsvp package if your system currently uses it.
2024-05-14 12:21:43 The operation is past point-of-no-return and can no longer be stopped!
2024-05-14 12:21:43 Action 1: install prepare action started
2024-05-14 12:21:43 Triggering prepare operation.
This may take a while...
2024-05-14 12:23:06 Following error(s) occurred in sysadmin vm(s) during install operation:
2024-05-14 12:23:06 Preparation failed with the error:
Package /install_repo/gl/xr/ncs5500-mpls-te-rsvp-3.1.0.0-r7721 is not available in repository
2024-05-14 12:23:06
Error stack for 0/RP0 :
#1 Package /install_repo/gl/xr/ncs5500-mpls-te-rsvp-3.1.0.0-r7721 is not available in repository
Please collect 'show tech-support install one-showtech' from XR and
'show tech-support ctrace' from Admin and pass this information to
your TAC representative for support.
2024-05-14 12:23:10 Ending operation 86
2024-05-14 12:23:10 Install operation 86 aborted
Code language: PHP (php)
Fortunately, I did not actually have to contact TAC. I just did a re-removal of all the new inactive packages “install remove inactive all” and had to start over from scratch, so it wasted 90 minutes (thanks Cisco).
Now, back to the good ol’ prepare operation:
install prepare ncs5500-mini-x-7.7.21.iso ncs5500-k9sec-3.1.0.0-r7721.x86_64.rpm ncs5500-li-1.0.0.0-r7721.x86_64.rpm ncs5500-lictrl-1.0.0.0-r7721.x86_64.rpm ncs5500-mcast-3.0.0.0-r7721.x86_64.rpm ncs5500-mgbl-3.0.0.0-r7721.x86_64.rpm ncs5500-mpls-2.1.0.0-r7721.x86_64.rpm ncs5500-mpls-te-rsvp-3.1.0.0-r7721.x86_64.rpm ncs5500-ospf-2.0.0.0-r7721.x86_64.rpm
Tue May 14 00:40:16.168 UTC
2024-05-14 00:40:17 Install operation 83 started by root:
install prepare pkg ncs5500-mini-x-7.7.21 ncs5500-k9sec-3.1.0.0-r7721.x86_64 ncs5500-li-1.0.0.0-r7721.x86_64 ncs5500-lictrl-1.0.0.0-r7721.x86_64 ncs5500-mcast-3.0.0.0-r7721.x86_64 ncs5500-mgbl-3.0.0.0-r7721.x86_64 ncs5500-mpls-2.1.0.0-r7721.x86_64 ncs5500-mpls-te-rsvp-3.1.0.0-r7721.x86_64 ncs5500-ospf-2.0.0.0-r7721.x86_64
2024-05-14 00:40:17 Package list:
2024-05-14 00:40:17 ncs5500-mini-x-7.7.21
2024-05-14 00:40:17 ncs5500-k9sec-3.1.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-li-1.0.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-lictrl-1.0.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-mcast-3.0.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-mgbl-3.0.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-mpls-2.1.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-mpls-te-rsvp-3.1.0.0-r7721.x86_64
2024-05-14 00:40:17 ncs5500-ospf-2.0.0.0-r7721.x86_64
2024-05-14 00:40:17 Install operation will continue in the background
RP/0/RP0/CPU0:rtr1#2024-05-14 00:44:19 Install operation 83 finished successfully
Code language: PHP (php)
Now we install activate and hope for the best. You’ve got to love this message:
2024-05-14 00:40:34 The operation is past point-of-no-return and can no longer be stopped!
Code language: CSS (css)
Anyway, it worked, we survived another one:
RP/0/RP0/CPU0:rtr1#show ver
Tue May 14 01:13:26.990 UTC
Cisco IOS XR Software, Version 7.7.21
Copyright (c) 2013-2023 by Cisco Systems, Inc.
Build Information:
Built By : deenayak
Built On : Thu Jun 29 03:55:49 PDT 2023
Built Host : 8ef204814797
Workspace : /auto/srcarchive16/prod/7.7.21/ncs5500/ws
Version : 7.7.21
Location : /opt/cisco/XR/packages/
Label : 7.7.21
cisco NCS-5500 () processor
System uptime is 3 minutes
Code language: PHP (php)
Now do all your normal cleanup to commit the images and get rid of the old ones. And, also as per usual, all the call home shit reactivates:
call-home
service active
contact smart-licensing
profile CiscoTAC-1
active
destination transport-method email disable
destination transport-method http
!
!
So:
RP/0/RP0/CPU0:rtr1#conf t
RP/0/RP0/CPU0:rtr1(config)#call-home
RP/0/RP0/CPU0:rtr1(config-call-home)#no service active
RP/0/RP0/CPU0:rtr1(config-call-home)#profile CiscoTAC-1
RP/0/RP0/CPU0:rtr1(config-call-home-profile)#no active
RP/0/RP0/CPU0:rtr1(config-call-home-profile)#end
Uncommitted changes found, commit them before exiting(yes/no/cancel)? [cancel]:yes
RP/0/RP0/CPU0:rtr1#show run call-home
Tue May 14 01:25:48.450 UTC
call-home
contact smart-licensing
profile CiscoTAC-1
destination transport-method email disable
destination transport-method http
!
!
Code language: PHP (php)
Fascinating read. I’m also looking into 1U appliances for the internet edge with full routes 100G capabilities. One thing I noted during my hardware review is that the SE version sacrifices a number of ports (100G) to make room for the extra TCAM ship to handle the 1M+ routes on top of what it does. For those like me that need 100G thats a big deal. The 7280Rs from Arista sit in between them as far as how many routes but without losing any ports.
Yep, I was willing to give up the two additional 40/100 gig ports for the additional route scale, begrudgingly, but Arista will likely have a 1U box with similar route scale and better port options very soon, and without the port license BS. I need to try out the suggestions made by Timothy Durack regarding EVPN/IRB for first hop redundancy, but either way, I think Arista is going to soon make the NCS5500-series irrelevant. At the very least, they need to drop the port licenses immediately; all it does is piss customers off and make them want to shop around.
hi “your mom” :),
I am xander thuijs from the cisco XR engineering team. one of my areas of focus is user experience. from your read it looks like we dropped a ball there. I would like to connect with you to discuss some of the concerns raised and to see how we can mitigate that.
the vrf/install piece is known to us that is in the works to get improved btw.
hopefully you like to connect with me so we can take care of some of the issues you are and have been facing!
cheers!
xander
This is fantastic, and absolutely helped me with my 5501-SE upgrade. Also super interesting the Xander from Cisco has chimed in (saw you at Cisco Live in 2016, great presentation/info). Great testing post, thanks!
Great read. Any reason you don’t make the switch to streaming telemetry to offset some of the SNMP issues?