Last week I invited Neeraj Malhotra, a Principal Engineer at Cisco, to present at the Bay Area Network Operators Group (BANOG) on how EVPN can be used to build multi-tenant data center fabrics. Neeraj gave a great presentation and I’m sharing his slides below.
EVPN has many use cases and this presentation focuses on EVPN as a control plane in the data center to support host/VM mobility and active/active multihoming. I’m currently looking at EVPN as alternative to replace the proprietary MLAG technology in my data centers.
The presentation abstract and slides are below. Enjoy.
Abstract:
EVPN-IRB (Integrated Routing and Bridging) is a technology that leverages BGP EVPN as common overlay control plane to enable VPN routing and bridging service over an MPLS or IP underlay fabric. Point to multi-point bridging service enables VLANs to be stretched across data center IP or MPLS fabric, while VPN routing service enables inter-subnet routing across these stretched subnets. It hence allows for flexible workloads with seamless VM mobility across the stretched subnet.
This talk will provide a tutorial of relevant EVPN constructs and procedures used to enable overlay bridged and routed connectivity between tenant workloads in a data center and compare three main design choices with respect to overlay routing architecture: – Centralized EVPN-IRB: with centralized first-hop any-cast GW on the border leafs OR DCI / DC Edge routers – Asymmetric EVPN-IRB: with distributed first-hop any-cast GW on the ToRs – Symmetric EVPN-IRB: with distributed first-hop any-cast GW on the ToRs
It will further focus on a symmetric EVPN-IRB design with distributed any-cast GW, and go thru detailed packet walks to get a good feel for how an EVPN-IRB based DC fabric works to provide any to any L2 and L3 overlay connectivity.
Someone asked me the other day how they could automate the execution of a command on multiple routers without accessing each router manually. Obviously an Ansible playbook can easily do that (or even using Ansible ad-hoc command without a playbook); or you can write a bash script with a for loop that iterates over the devices and connects to each device to run the command. But that question also got me wondering if there was a simpler way to get the job done quickly without installing any configuration management tools or messing with scripting?
It turns out the answer is Yes. You can do that using good old SSH — specifically using parallel SSH on Linux.
Parallel SSH (PSSH) is a great tool to use when you want to run single or multiple commands on more than one host or router at the same time. All what you need is a Linux host with PSSH installed and you are good to go. You can install PSSH on Ubuntu by using the Python package installer pip install pssh (if you don’t have the Python package installed, you can install it by executing apt-get install python-pip).
pssh [OPTIONS] command […]
Let’s look at a quick example. I have two routers, csr and csr3, defined in the hosts.txt file and I want PSSH to save the runnng configuration on each router. The optional -l argument tells PSSH what unsername to use while the -A argument tells it to prompt for a password (alternatively you can use private/public key pair instead of passwords).
➜ ~ pssh -h hosts.txt -l cisco -A “wr mem”
Warning: do not enter your password if anyone else has superuser privileges or access to your account.
Password:
[1] 20:11:59 [SUCCESS] csr3
[2] 20:12:00 [SUCCESS] csr
If you want to gather some data from the routers and write the output to a file, you can do so by adding the -o argument as follows:
Warning: do not enter your password if anyone else has superuser privileges or access to your account.
Password:
[1] 20:14:39 [SUCCESS] csr3
[2] 20:14:39 [SUCCESS] csr
➜ ~
➜ ~ ls /tmp/out/
csr csr3
➜ ~
➜ ~ more /tmp/out/csr
Building configuration…
Current configuration : 2643 bytes
!
! Last configuration change at 14:59:27 UTC Wed Feb 8 2017 by cisco
!
version 15.4
service timestamps debug datetime msec
service timestamps log datetime msec
no platform punt-keepalive disable-kernel-core
platform console virtual
!
hostname CSR
!
➜ ~
The PSSH utility is lightweight, simple, and does the job with minimum overhead. It also runs through routers in parallel which saves time especially when you are executing tasks that take some time to complete.
It’s amazing how much you can do with the mighty SSH. Keep PSSH in your toolbox in case you need it one day.
Did I say there is also PSCP (parallel SCP) utility which you can use to copy an image to multiple devices at the same time when you are upgrading those devices? That is your homework now, Google it and check it out.
The first half of the meetup was an introduction to Ansible for network engineers. The second half was a live demos of Ansible playbooks. You can view and download the slides below
With Cisco IOS, I had to use several modules in my playbook to be able to automate the upgrade process because there was not a single module available that could handle all the tasks.
As I was trying to expand my Ansible knowledge, I began looking at the available Ansible networking modules that run on Juniper devices. It turns out junos_package is a core module (comes installed with Ansible) and can take care of the entire process: copy the package to the device flash, install the package, commit, and reboot.
My Setup:
– Python 2.7.6 and Ansible 2.2 running on Ubuntu 14.4.5 LTS (codename: Trusty)
– Juniper vSRX running version 12.1X47-D10.4
Requirements:
– You need to have Ansible and Junos PyEZ installed. Junos PyEZ is a Python library to manage remotely and automate Junos devices.
– Netconf and SSH enabled on the Junos device.
The following playbook consists of few tasks: Ansible collects first device facts. Then the junos_package module compares the running version on the Junos device with the version defined in the “package” variable and upgrades the device if there is a mismatch. Once the device reboots with the new package, Ansible will wait until that device becomes reachable via Netconf and then attempts to ping a root DNS from the device to check internet connectivity. If the destination is not reachable, Ansible will generate a “ping failed” error.
You can of course expand this playbook to include tasks to verify that routing protocols have come up after the reboot.
Ansible is a nice tool to automate the deployment and configuration of network devices. I wrote the following playbook to automate the upgrade of Cisco IOS devices. This playbook has been tested successfully to upgrade a Cisco CSR1000v router and can be easily tweaked to support Cisco Nexus and Arista switches.
My Setup:
– Python 2.7.6 and Ansible 2.2 running on Ubuntu 14.4.5 LTS (codename: Trusty)
– Cisco CSR1000v running IOS-XE 3.10.S
Requirements:
– Obviously you must have Ansible version 2.2 or higher installed and configured properly to access via SSH the network devices defined in your hosts file.
– You must also have ntc_ansible installed. The ntc_ansible modules were written by Jason Edelman; follow the instructions on Github to install them
– Some of the ntc_ansible modules have dependency on the pyntc library. Pyntc is an open-source multi-vendor library which makes it easier to copy files and upgrade network devices. Follow the steps here to install the library.
Note: the pyntc package install should also install the future library which is required for pyntc to work. If Ansible spits out “No module named builtins” errors when you run the playbook, that means the future library is missing from your system and you can install it by executing sudo pip install future. A quick way to find out if the future library is installed on your system is by doing import pyntc from the Python interpreter. If the import works then both the pyntc and future libraries have been installed successfully.
– In my setup, Ansible is authenticating against the devices using username/password credentials. If you prefer to use Public key authentication instead, here is a quick tutorial on how to enable SSH RSA authentication on a Cisco router.
—
– name: Upgrade a Cisco IOS router
hosts: csr
tasks:
– name: GATHERING FACTS
ios_facts:
gather_subset: hardware
provider: “{{cli}}”
tags: always
– name: COPYING IMAGE TO DEVICE FLASH
ntc_file_copy:
platform: cisco_ios_ssh
local_file: images/{{ new_image }}
host: “{{ inventory_hostname }}”
username: “{{ username }}”
password: “{{ password }}”
when: ansible_net_version != “{{version}}”
tags: copy
– name: SETTING BOOT IMAGE
ios_config:
lines:
– no boot system
– boot system flash bootflash:{{new_image}}
provider: “{{cli}}”
host: “{{ inventory_hostname }}”
when: ansible_net_version != “{{version}}”
tags: install
– name: SAVING CONFIGS
ntc_save_config:
platform: cisco_ios_ssh
host: “{{ inventory_hostname }}”
username: “{{ username }}”
password: “{{ password }}”
local_file: backup/{{ inventory_hostname }}.cfg
when: ansible_net_version != “{{version}}”
tags: backup
– name: RELOADING THE DEVICE
ntc_reboot:
platform: cisco_ios_ssh
confirm: true
timer: 2
host: “{{ inventory_hostname }}”
username: “{{ username }}”
password: “{{ password }}”
when: ansible_net_version != “{{version}}”
tags: reload
– name: VERIFYING CONNECTIVITY
wait_for:
port: 22
host: “{{inventory_hostname}}”
timeout: 300
– ios_command:
commands: ping 8.8.4.4
provider: “{{cli}}”
wait_for:
– result[0] contains “!!!”
register: result
failed_when: “not ‘!!!’ in result.stdout[0]”
tags: verify
The playbook is pretty straightforward and consists of 6 tasks:
1. GATHERING FACTS: the first task uses the ios_facts (core module that ships with Ansible itself) to gather facts about the device and see if it needs an upgrade. If the image running on the device matches the target image, Ansible skips that device. Otherwise it collects the facts and moves on to the next task.
2. COPYING IMAGE TO DEVICE FLASH:This task checks if the image file is available on the device flash and copies the file to flash if it doesn’t exist.
Note: this module uses SCP/Netmiko to copy the file. If you want to use this module with a Cisco Nexus switch, you will need to enable SCP on the switch (not a requirement for Cisco IOS) by doing: ip scp server enable
3. SETTING BOOT IMAGE: this task sets the boot image. Simple.
4. SAVING CONFIGS: this task saves the running-configs as startup configs on the device and also saves a copy of the configs locally on the host in the backup folder
5. RELOADING THE DEVICE: This task reloads the device after a time interval (2 minutes)
6. VERIFYING CONNECTIVITY: the final task waits for the device for 5 minutes to come up and become accessible via SSH before it pings a root DNS to verify internet connectivity. If the ping is not successful, Ansible generates an error.
This is it. Now you can run a single command for all of your devices and let Ansible do its magic.
Next I will be doing some work to automate Junos devices. Stay tuned for that post. You can find the source code including the ansible configuration and variable files in my Github repo.
Here are also some resources that helped me in my Ansible learning journey: