Presentation: Designing Multi-tenant Data Centers Using EVPN

Last week I invited Neeraj Malhotra, a Principal Engineer at Cisco, to present at the Bay Area Network Operators Group (BANOG) on how EVPN can be used to build multi-tenant data center fabrics. Neeraj gave a great presentation  and I’m sharing his slides below.

EVPN has many use cases and this presentation focuses on EVPN as a control plane in the data center to support host/VM mobility and active/active multihoming. I’m currently looking at EVPN as alternative to replace the proprietary MLAG technology in my data centers.

The presentation abstract and slides are below. Enjoy.

Abstract:

EVPN-IRB (Integrated Routing and Bridging) is a technology that leverages BGP EVPN as common overlay control plane to enable VPN routing and bridging service over an MPLS or IP underlay fabric. Point to multi-point bridging service enables VLANs to be stretched across data center IP or MPLS fabric, while VPN routing service enables inter-subnet routing across these stretched subnets. It hence allows for flexible workloads with seamless VM mobility across the stretched subnet. 

This talk will provide a tutorial of relevant EVPN constructs and procedures used to enable overlay bridged and routed connectivity between tenant workloads in a data center and compare three main design choices with respect to overlay routing architecture: 
– Centralized EVPN-IRB: with centralized first-hop any-cast GW on the border leafs OR DCI / DC Edge routers
– Asymmetric EVPN-IRB: with distributed first-hop any-cast GW on the ToRs
– Symmetric EVPN-IRB: with distributed first-hop any-cast GW on the ToRs

It will further focus on a symmetric EVPN-IRB design with distributed any-cast GW, and go thru detailed packet walks to get a good feel for how an EVPN-IRB based DC fabric works to provide any to any L2 and L3 overlay connectivity. 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Use Parallel SSH to Run Commands on Multiple Devices At the Same Time

Someone asked me the other day how they could automate the execution of a command on multiple routers without accessing each router manually. Obviously an Ansible playbook can easily do that (or even using Ansible ad-hoc command without a playbook); or you can write a bash script with a for loop that iterates over the devices and connects to each device to run the command. But that question also got me wondering if there was a simpler way to get the job done quickly without installing any configuration management tools or messing with scripting?

It turns out the answer is Yes. You can do that using good old SSH — specifically using parallel SSH on Linux.   

Parallel SSH (PSSH) is a great tool to use when you want to run single or multiple commands on more than one host or router at the same time. All what you need is a Linux host with PSSH installed and you are good to go. You can install PSSH on Ubuntu by using the Python package installer pip install pssh (if you don’t have the Python package installed, you can install it by executing apt-get install python-pip).

pssh [OPTIONS] command […]

Let’s look at a quick example. I have two routers, csr and csr3, defined in the hosts.txt file and I want PSSH to save the runnng configuration on each router. The optional -l argument tells PSSH what unsername to use while the -A argument tells it to prompt for a password (alternatively you can use private/public key pair instead of passwords).

➜  ~ pssh -h hosts.txt -l cisco -A “wr mem”

Warning: do not enter your password if anyone else has superuser privileges or access to your account.

Password:

[1] 20:11:59 [SUCCESS] csr3

[2] 20:12:00 [SUCCESS] csr

 

If you want to gather some data from the routers and write the output to a file, you can do so by adding the -o argument as follows:

➜  ~ pssh -h hosts.txt -l cisco -o/tmp/out/ -A “show run”

Warning: do not enter your password if anyone else has superuser privileges or access to your account.

Password:

[1] 20:14:39 [SUCCESS] csr3

[2] 20:14:39 [SUCCESS] csr

➜  ~

➜  ~ ls /tmp/out/

csr  csr3

➜  ~

➜  ~ more /tmp/out/csr

 

Building configuration…

 

Current configuration : 2643 bytes

!

! Last configuration change at 14:59:27 UTC Wed Feb 8 2017 by cisco

!

version 15.4

service timestamps debug datetime msec

service timestamps log datetime msec

no platform punt-keepalive disable-kernel-core

platform console virtual

!

hostname CSR

!

➜  ~

 

The PSSH utility is lightweight, simple, and does the job with minimum overhead. It also runs through routers in parallel which saves time especially when you are executing tasks that take some time to complete. 

It’s amazing how much you can do with the mighty SSH. Keep PSSH in your toolbox in case you need it one day. 

Did I say there is also PSCP (parallel SCP) utility which you can use to copy an image to multiple devices at the same time when you are upgrading those devices? That is your homework now, Google it and check it out. 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Install and Upgrade Junos Software Packages Using Ansible

In a previous post, I talked about how you can use Ansible to automate Cisco IOS device upgrades. In this post I will show you how easily you can do the same thing on Junos. 

With Cisco IOS, I had to use several modules in my playbook to be able to automate the upgrade process because there was not a single module available that could handle all the tasks.  

As I was trying to expand my Ansible knowledge, I began looking at the available Ansible networking modules that run on Juniper devices. It turns out junos_package is a core module (comes installed with Ansible) and can take care of the entire process: copy the package to the device flash, install the package, commit, and reboot.

My Setup:

– Python 2.7.6 and Ansible 2.2 running on Ubuntu 14.4.5 LTS (codename: Trusty)

– Juniper vSRX running version 12.1X47-D10.4

Requirements:

– You need to have Ansible and Junos PyEZ installed. Junos PyEZ is a Python library to manage remotely and automate Junos devices.

– Netconf and SSH enabled on the Junos device. 

 The following playbook consists of few tasks: Ansible collects first device facts. Then the junos_package module compares the running version on the Junos device with the version defined in the “package” variable and upgrades the device if there is a mismatch. Once the device reboots with the new package, Ansible will wait until that device becomes reachable via Netconf and then attempts to ping a root DNS from the device to check internet connectivity. If the destination is not reachable, Ansible will generate a “ping failed” error.

You can of course expand this playbook to include tasks to verify that routing protocols have come up after the reboot. 

 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Automating Cisco Device Upgrades With Ansible

Ansible is a nice tool to automate the deployment and configuration of network devices. I wrote the following playbook to automate the upgrade of Cisco IOS devices. This playbook has been tested successfully to upgrade a Cisco CSR1000v router and can be easily tweaked to support Cisco Nexus and Arista switches.

My Setup:

– Python 2.7.6 and Ansible 2.2 running on Ubuntu 14.4.5 LTS (codename: Trusty)

– Cisco CSR1000v running IOS-XE 3.10.S

Requirements:

– Obviously you must have Ansible version 2.2 or higher installed and configured properly to access via SSH the network devices defined in your hosts file. 

– You must also have ntc_ansible installed. The ntc_ansible modules were written by Jason Edelman; follow the instructions on Github to install them

– Some of the ntc_ansible modules have dependency on the pyntc library. Pyntc is an open-source multi-vendor library which makes it easier to copy files and upgrade network devices. Follow the steps here to install the library.

   Note: the pyntc package install should also install the future library which is required for pyntc to work. If Ansible spits out “No module named builtins” errors when you run the playbook, that means the future library is missing from your system and you can install it by executing sudo pip install future. A quick way to find out if the future library is installed on your system is by doing import pyntc from the Python interpreter. If the import works then both the pyntc and future libraries have been installed successfully.

– In my setup, Ansible is authenticating against the devices using username/password credentials. If you prefer to use Public key authentication instead, here is a quick tutorial on how to enable SSH RSA authentication on a Cisco router.

 

– name: Upgrade a Cisco IOS router

  hosts: csr

 

  tasks:

  – name: GATHERING FACTS

    ios_facts:

       gather_subset: hardware

       provider: “{{cli}}” 

    tags: always

  

  – name: COPYING IMAGE TO DEVICE FLASH

    ntc_file_copy:

      platform: cisco_ios_ssh

      local_file: images/{{ new_image }}

      host: “{{ inventory_hostname }}”

      username: “{{ username }}”

      password: “{{ password }}”

    when: ansible_net_version != “{{version}}”

    tags: copy

  

  – name: SETTING BOOT IMAGE

    ios_config:

       lines:

         – no boot system 

         – boot system flash bootflash:{{new_image}}

       provider: “{{cli}}”

       host: “{{ inventory_hostname }}”

    when: ansible_net_version != “{{version}}”

    tags: install

 

  – name: SAVING CONFIGS

    ntc_save_config:

        platform: cisco_ios_ssh

        host: “{{ inventory_hostname }}”

        username: “{{ username }}”

        password: “{{ password }}”

        local_file: backup/{{ inventory_hostname }}.cfg

    when: ansible_net_version != “{{version}}”

    tags: backup

  

  – name: RELOADING THE DEVICE

    ntc_reboot:

      platform: cisco_ios_ssh

      confirm: true

      timer: 2

      host: “{{ inventory_hostname }}”

      username: “{{ username }}”

      password: “{{ password }}”

    when: ansible_net_version != “{{version}}”

    tags: reload

  

  – name: VERIFYING CONNECTIVITY

    wait_for:

         port: 22

         host: “{{inventory_hostname}}”

         timeout: 300

  – ios_command:

        commands: ping 8.8.4.4

        provider: “{{cli}}”

        wait_for:

        – result[0] contains “!!!”

    register: result

    failed_when: “not ‘!!!’ in result.stdout[0]”

    tags: verify

 
The playbook is pretty straightforward and consists of 6 tasks:
 

1. GATHERING FACTS: the first task uses the ios_facts (core module that ships with Ansible itself) to gather facts about the device and see if it needs an upgrade. If the image running on the device matches the target image, Ansible skips that device. Otherwise it collects the facts and moves on to the next task.

2. COPYING IMAGE TO DEVICE FLASH:This task checks if the image file is available on the device flash and copies the file to flash if it doesn’t exist. 

  Note: this module uses SCP/Netmiko to copy the file. If you want to use this module with a Cisco Nexus switch, you will need to enable SCP on the switch (not a requirement for Cisco IOS) by doing: ip scp server enable

3. SETTING BOOT IMAGE: this task sets the boot image. Simple. 

4. SAVING CONFIGS: this task saves the running-configs as startup configs on the device and also saves a copy of the configs locally on the host in the backup folder

5. RELOADING THE DEVICE: This task reloads the device after a time interval (2 minutes) 

6. VERIFYING CONNECTIVITY: the final task waits for the device for 5 minutes to come up and become accessible via SSH before it pings a root DNS to verify internet connectivity. If the ping is not successful, Ansible generates an error.

This is it. Now you can run a single command for all of your devices and let Ansible do its magic.

Next I will be doing some work to automate Junos devices. Stay tuned for that post.
You can find the source code including the ansible configuration and variable files in my Github repo.

Here are also some resources that helped me in my Ansible learning journey:

Up and Running with Ansible (eBook)

Ivan Pepeljak’s Blog


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Building Data Center Fabric: Junos Fusion vs Cisco FEX

Last month at Networking Field Day (NFD10) Juniper presented their Junos Fusion solution which brings simplicity to the data center by giving you a single pane of glass for managing all the switches in the fabric and allows you to upgrade all the switches from a central interface. 
 
With Junos Fusion, all the access switches (called Satellite devices) are managed from a single or a pair of aggregation devices. The access devices can be either EX 4300 or QFX 5100 series switches and run a Windriver Linux distribution, known as the Linux Forwarding Operating System (LFOS), which is decoupled from the Junos operating system running on the aggregation devices. The aggregation devices are the new QFX 10000 series and run classic Junos. Juniper uses LLDP between the aggregation and access devices for auto discovery/provisioning and 802.1br+ for configuring and monitoring the ports. They also use Netconf between the aggregation devices to sync the configurations.  
 
The feature itself is not new, the Juniper MX edge routers have supported this feature for a while but Juniper just extended the support to their data center switches this year. 
 
From operational perspective, Junos Fusion is very similar to Cisco Fabric Extender (FEX). They both make the fabric look from the outside as a big switch with a single IP address.

So if you are building a data center fabric, should you go with Junos Fusion or Cisco FEX? Well there are few things to consider when comparing the two architectures. Here is a few things to think about:
 
Port Density: how many server ports do you need? Both Junos Fusion and Cisco FEX architectures support today up to 64 access switches per fabric. The Nexus 2200 (FEX) has only 48 extended (server) ports which gives you a total of 3072 (64 x 48) ports per fabric while the QFX 5100-96S has 96 server ports which gives you a maximum of 6144 (64 x 96) ports per fabric so the Junos Fusion architecture clearly scales better when it comes to port density
 
Support For Local Switching & Other Features In the Access Layer: The Cisco Nexus 2200 has no brain and therefore has no support for local switching, VLAN tagging, or any other features you typically see in an access switch. It’s an “extender” and doesn’t not have ASICs to switch traffic. The QFX 5100/EX 4300 on the other hand are full blown switches with ASICs & intelligent software and support all the features mentioned above and more. L3 routing is not supported today on the QFX 5100/EX 4300 in Fusion mode, however Juniper stated that this feature is on the roadmap.
 
The need for local switching is a good debate to have. Some people argue that the Nexus 2200 is not a good fit for the data center because it cannot do local switching, however this is not a fair assessment in my opinion. Traffic patterns in the data center depend heavily on the type of workloads. Some workloads like Hadoop generate heavy east-to-west traffic within the same VLAN and in such case it’s recommended to keep all the server nodes on the same TOR to switch traffic locally and avoid congesting the uplinks. However many of the other workloads (Web applications namely) don’t generate heavy east-west traffic within the same VLAN. 
 
The other thing to keep in mind is that with server virtualization the edge of the network is moving to the hypervisor and much of that intra-VLAN traffic is getting switched in kernel by the hypervisor without leaving the physical host therefore making the need for local switching unnecessary. Even inter-VLAN traffic can now get routed without leaving the physical host if you have a virtual distributed firewall / router.
 
Ivan Pepelnjak has a nice blog post on the need for distributed switching in the Nexus 2000.
 
Cost: This is where the Nexus 2200 really shines. Because it’s an extension and does not have full software/hardware capabilities, it’s very affordable and can reduce your CapEx substantially. 
 

My Take: 

Both the Junos Fusion and Cisco FEX architectures simplify managing data center networks. When comparing the two solutions, examine your workload requirements, determine how intelligent your TORs need to be, and from there you can decide which solution best works for you. 

Here is the Junos Fusion presentation from NFD10:

 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Use Parallel SSH to Run Commands on Multiple Devices At the Same Time

Someone asked me the other day how they could automate the execution of a command on multiple routers without accessing each router manually. Obviously an Ansible playbook can easily do that (or even using Ansible ad-hoc command without a playbook); or you can write a bash script with a for loop that iterates over the devices and connects to each device to run the command. But that question also got me wondering if there was a simpler way to get the job done quickly without installing any configuration management tools or messing with scripting?

It turns out the answer is Yes. You can do that using good old SSH — specifically using parallel SSH on Linux.   

Parallel SSH (PSSH) is a great tool to use when you want to run single or multiple commands on more than one host or router at the same time. All what you need is a Linux host with PSSH installed and you are good to go. You can install PSSH on Ubuntu by using the Python package installer pip install pssh (if you don’t have the Python package installed, you can install it by executing apt-get install python-pip).

pssh [OPTIONS] command […]

Let’s look at a quick example. I have two routers, csr and csr3, defined in the hosts.txt file and I want PSSH to save the runnng configuration on each router. The optional -l argument tells PSSH what unsername to use while the -A argument tells it to prompt for a password (alternatively you can use private/public key pair instead of passwords).

➜  ~ pssh -h hosts.txt -l cisco -A “wr mem”

Warning: do not enter your password if anyone else has superuser privileges or access to your account.

Password:

[1] 20:11:59 [SUCCESS] csr3

[2] 20:12:00 [SUCCESS] csr

 

If you want to gather some data from the routers and write the output to a file, you can do so by adding the -o argument as follows:

➜  ~ pssh -h hosts.txt -l cisco -o/tmp/out/ -A “show run”

Warning: do not enter your password if anyone else has superuser privileges or access to your account.

Password:

[1] 20:14:39 [SUCCESS] csr3

[2] 20:14:39 [SUCCESS] csr

➜  ~

➜  ~ ls /tmp/out/

csr  csr3

➜  ~

➜  ~ more /tmp/out/csr

 

Building configuration…

 

Current configuration : 2643 bytes

!

! Last configuration change at 14:59:27 UTC Wed Feb 8 2017 by cisco

!

version 15.4

service timestamps debug datetime msec

service timestamps log datetime msec

no platform punt-keepalive disable-kernel-core

platform console virtual

!

hostname CSR

!

➜  ~

 

The PSSH utility is lightweight, simple, and does the job with minimum overhead. It also runs through routers in parallel which saves time especially when you are executing tasks that take some time to complete. 

It’s amazing how much you can do with the mighty SSH. Keep PSSH in your toolbox in case you need it one day. 

Did I say there is also PSCP (parallel SCP) utility which you can use to copy an image to multiple devices at the same time when you are upgrading those devices? That is your homework now, Google it and check it out. 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Install and Upgrade Junos Software Packages Using Ansible

In a previous post, I talked about how you can use Ansible to automate Cisco IOS device upgrades. In this post I will show you how easily you can do the same thing on Junos. 

With Cisco IOS, I had to use several modules in my playbook to be able to automate the upgrade process because there was not a single module available that could handle all the tasks.  

As I was trying to expand my Ansible knowledge, I began looking at the available Ansible networking modules that run on Juniper devices. It turns out junos_package is a core module (comes installed with Ansible) and can take care of the entire process: copy the package to the device flash, install the package, commit, and reboot.

My Setup:

– Python 2.7.6 and Ansible 2.2 running on Ubuntu 14.4.5 LTS (codename: Trusty)

– Juniper vSRX running version 12.1X47-D10.4

Requirements:

– You need to have Ansible and Junos PyEZ installed. Junos PyEZ is a Python library to manage remotely and automate Junos devices.

– Netconf and SSH enabled on the Junos device. 

 The following playbook consists of few tasks: Ansible collects first device facts. Then the junos_package module compares the running version on the Junos device with the version defined in the “package” variable and upgrades the device if there is a mismatch. Once the device reboots with the new package, Ansible will wait until that device becomes reachable via Netconf and then attempts to ping a root DNS from the device to check internet connectivity. If the destination is not reachable, Ansible will generate a “ping failed” error.

You can of course expand this playbook to include tasks to verify that routing protocols have come up after the reboot. 

 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Automating Cisco Device Upgrades With Ansible

Ansible is a nice tool to automate the deployment and configuration of network devices. I wrote the following playbook to automate the upgrade of Cisco IOS devices. This playbook has been tested successfully to upgrade a Cisco CSR1000v router and can be easily tweaked to support Cisco Nexus and Arista switches.

My Setup:

– Python 2.7.6 and Ansible 2.2 running on Ubuntu 14.4.5 LTS (codename: Trusty)

– Cisco CSR1000v running IOS-XE 3.10.S

Requirements:

– Obviously you must have Ansible version 2.2 or higher installed and configured properly to access via SSH the network devices defined in your hosts file. 

– You must also have ntc_ansible installed. The ntc_ansible modules were written by Jason Edelman; follow the instructions on Github to install them

– Some of the ntc_ansible modules have dependency on the pyntc library. Pyntc is an open-source multi-vendor library which makes it easier to copy files and upgrade network devices. Follow the steps here to install the library.

   Note: the pyntc package install should also install the future library which is required for pyntc to work. If Ansible spits out “No module named builtins” errors when you run the playbook, that means the future library is missing from your system and you can install it by executing sudo pip install future. A quick way to find out if the future library is installed on your system is by doing import pyntc from the Python interpreter. If the import works then both the pyntc and future libraries have been installed successfully.

– In my setup, Ansible is authenticating against the devices using username/password credentials. If you prefer to use Public key authentication instead, here is a quick tutorial on how to enable SSH RSA authentication on a Cisco router.

 

– name: Upgrade a Cisco IOS router

  hosts: csr

 

  tasks:

  – name: GATHERING FACTS

    ios_facts:

       gather_subset: hardware

       provider: “{{cli}}” 

    tags: always

  

  – name: COPYING IMAGE TO DEVICE FLASH

    ntc_file_copy:

      platform: cisco_ios_ssh

      local_file: images/{{ new_image }}

      host: “{{ inventory_hostname }}”

      username: “{{ username }}”

      password: “{{ password }}”

    when: ansible_net_version != “{{version}}”

    tags: copy

  

  – name: SETTING BOOT IMAGE

    ios_config:

       lines:

         – no boot system 

         – boot system flash bootflash:{{new_image}}

       provider: “{{cli}}”

       host: “{{ inventory_hostname }}”

    when: ansible_net_version != “{{version}}”

    tags: install

 

  – name: SAVING CONFIGS

    ntc_save_config:

        platform: cisco_ios_ssh

        host: “{{ inventory_hostname }}”

        username: “{{ username }}”

        password: “{{ password }}”

        local_file: backup/{{ inventory_hostname }}.cfg

    when: ansible_net_version != “{{version}}”

    tags: backup

  

  – name: RELOADING THE DEVICE

    ntc_reboot:

      platform: cisco_ios_ssh

      confirm: true

      timer: 2

      host: “{{ inventory_hostname }}”

      username: “{{ username }}”

      password: “{{ password }}”

    when: ansible_net_version != “{{version}}”

    tags: reload

  

  – name: VERIFYING CONNECTIVITY

    wait_for:

         port: 22

         host: “{{inventory_hostname}}”

         timeout: 300

  – ios_command:

        commands: ping 8.8.4.4

        provider: “{{cli}}”

        wait_for:

        – result[0] contains “!!!”

    register: result

    failed_when: “not ‘!!!’ in result.stdout[0]”

    tags: verify

 
The playbook is pretty straightforward and consists of 6 tasks:
 

1. GATHERING FACTS: the first task uses the ios_facts (core module that ships with Ansible itself) to gather facts about the device and see if it needs an upgrade. If the image running on the device matches the target image, Ansible skips that device. Otherwise it collects the facts and moves on to the next task.

2. COPYING IMAGE TO DEVICE FLASH:This task checks if the image file is available on the device flash and copies the file to flash if it doesn’t exist. 

  Note: this module uses SCP/Netmiko to copy the file. If you want to use this module with a Cisco Nexus switch, you will need to enable SCP on the switch (not a requirement for Cisco IOS) by doing: ip scp server enable

3. SETTING BOOT IMAGE: this task sets the boot image. Simple. 

4. SAVING CONFIGS: this task saves the running-configs as startup configs on the device and also saves a copy of the configs locally on the host in the backup folder

5. RELOADING THE DEVICE: This task reloads the device after a time interval (2 minutes) 

6. VERIFYING CONNECTIVITY: the final task waits for the device for 5 minutes to come up and become accessible via SSH before it pings a root DNS to verify internet connectivity. If the ping is not successful, Ansible generates an error.

This is it. Now you can run a single command for all of your devices and let Ansible do its magic.

Next I will be doing some work to automate Junos devices. Stay tuned for that post.
You can find the source code including the ansible configuration and variable files in my Github repo.

Here are also some resources that helped me in my Ansible learning journey:

Up and Running with Ansible (eBook)

Ivan Pepeljak’s Blog


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Building Data Center Fabric: Junos Fusion vs Cisco FEX

Last month at Networking Field Day (NFD10) Juniper presented their Junos Fusion solution which brings simplicity to the data center by giving you a single pane of glass for managing all the switches in the fabric and allows you to upgrade all the switches from a central interface. 
 
With Junos Fusion, all the access switches (called Satellite devices) are managed from a single or a pair of aggregation devices. The access devices can be either EX 4300 or QFX 5100 series switches and run a Windriver Linux distribution, known as the Linux Forwarding Operating System (LFOS), which is decoupled from the Junos operating system running on the aggregation devices. The aggregation devices are the new QFX 10000 series and run classic Junos. Juniper uses LLDP between the aggregation and access devices for auto discovery/provisioning and 802.1br+ for configuring and monitoring the ports. They also use Netconf between the aggregation devices to sync the configurations.  
 
The feature itself is not new, the Juniper MX edge routers have supported this feature for a while but Juniper just extended the support to their data center switches this year. 
 
From operational perspective, Junos Fusion is very similar to Cisco Fabric Extender (FEX). They both make the fabric look from the outside as a big switch with a single IP address.

So if you are building a data center fabric, should you go with Junos Fusion or Cisco FEX? Well there are few things to consider when comparing the two architectures. Here is a few things to think about:
 
Port Density: how many server ports do you need? Both Junos Fusion and Cisco FEX architectures support today up to 64 access switches per fabric. The Nexus 2200 (FEX) has only 48 extended (server) ports which gives you a total of 3072 (64 x 48) ports per fabric while the QFX 5100-96S has 96 server ports which gives you a maximum of 6144 (64 x 96) ports per fabric so the Junos Fusion architecture clearly scales better when it comes to port density
 
Support For Local Switching & Other Features In the Access Layer: The Cisco Nexus 2200 has no brain and therefore has no support for local switching, VLAN tagging, or any other features you typically see in an access switch. It’s an “extender” and doesn’t not have ASICs to switch traffic. The QFX 5100/EX 4300 on the other hand are full blown switches with ASICs & intelligent software and support all the features mentioned above and more. L3 routing is not supported today on the QFX 5100/EX 4300 in Fusion mode, however Juniper stated that this feature is on the roadmap.
 
The need for local switching is a good debate to have. Some people argue that the Nexus 2200 is not a good fit for the data center because it cannot do local switching, however this is not a fair assessment in my opinion. Traffic patterns in the data center depend heavily on the type of workloads. Some workloads like Hadoop generate heavy east-to-west traffic within the same VLAN and in such case it’s recommended to keep all the server nodes on the same TOR to switch traffic locally and avoid congesting the uplinks. However many of the other workloads (Web applications namely) don’t generate heavy east-west traffic within the same VLAN. 
 
The other thing to keep in mind is that with server virtualization the edge of the network is moving to the hypervisor and much of that intra-VLAN traffic is getting switched in kernel by the hypervisor without leaving the physical host therefore making the need for local switching unnecessary. Even inter-VLAN traffic can now get routed without leaving the physical host if you have a virtual distributed firewall / router.
 
Ivan Pepelnjak has a nice blog post on the need for distributed switching in the Nexus 2000.
 
Cost: This is where the Nexus 2200 really shines. Because it’s an extension and does not have full software/hardware capabilities, it’s very affordable and can reduce your CapEx substantially. 
 

My Take: 

Both the Junos Fusion and Cisco FEX architectures simplify managing data center networks. When comparing the two solutions, examine your workload requirements, determine how intelligent your TORs need to be, and from there you can decide which solution best works for you. 

Here is the Junos Fusion presentation from NFD10:

 


Your Turn Now
 
What are you thoughts on this? Have you deployed either solution? I want to hear from you.

Disclaimer: I attended Networking Field Day 10 as a delegate. Vendors sponsoring the event indirectly covered my travel expenses, however I’m not required to write about their products or about the event. If I do write something, it’s because I want to express my opinions.


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

My Favorite Sessions From Cisco Live 2015

I had the opportunity to attend Cisco Live 2015 last month in San Diego and meet new people and learn bunch of new stuff. Most of the sessions I had attended were IPv6 related as I wanted to learn more about the protocol and be able to design IPv6 networks.

Now that the on-demand videos and presentation slides have been published, I wanted to share with you some of the sessions that I thought were really great and worth watching.

Here are some of my favorite sessions from Cisco Live 2015.

IPv6 from Intro to Intermediate: This is session by Tim Martin was a great introduction to IPv6 and covered addresses, headers, and link operations. If you are just getting started with IPv6, this would be a good place to start.

Enterprise IPv6 Deployment: another session by Tim Martin which covered general design, host configuration, and translation techniques.

IPv6 Routing Protocols Update: This presentation by Wim Verrydt was a comprehensive overview of the IPv6 routing protocols (OSPFv3, BGP, EIGRP) along with some configuration examples. It also discussed the coexistence of IPv4 and IPv6 routing protocols.

Enterprise Multi-Homed Internet Edge Architectures: This presentation by Michael Kowal discussed BGP multi-homed deployment scenarios and covered pros and cons of each design.

Troubleshooting OSPF: This is a presentation by Faraz Shamim which I could not attend live at the conference but watched the video this week when it became available. It is a great session that covers OSPF LSAs and some of the new commands that make troubleshooting easier. If you work in operations and you deal with OSPF, you definitely want to check out this session.

Do you have favorite sessions? I want to know about them. Share them with us below.


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

New Additions to My Home Lab: HP MicroServer & Synology NAS

I have been preparing for my VMvware Certified Professional (VCP) exam. Early this year I decided to invest and buy HP ProLiant MicroServer G8 and Synology DS414slim NAS appliance to expand my home lab.

I’m one on those who learn better by doing rather than reading and I wanted to rely on practice labs and hands-on experience instead of books and practice tests to pass the exam.

I bought everything from newegg.com. The HP MicroServer came with 8GB of RAM installed. I then upgraded the RAM to 16GB and downloaded the ESXi 5.5 ISO directly from the HP website which comes with all the drivers required to run ESXi on HP ProLiant servers.

I’m running few VMs on the HP server including the VMware vCenter Server Virtual Appliance (vCSA) which manages my ESXi servers. The HP server is one of two ESXi servers I have running. The other ESXi server runs inside VMware Fusion (nested) on my iMac. Because my HP server and iMac desktop have two different CPU architectures, I had to enable VMware Enhanced vMotion Compatibility (EVC) to provide CPU compatibility and support for vMotion and DRS.

I populated the Synology appliance with two SSDs configured in RAID 1. On it, I have a datastore configured that provides NFS storage to my VMs. I also store there all of my ISOs and OVA files.

I’m happy so far with both devices. The HP MicroServer is relatively quiet compared to other devices I have seen. In terms of noise the HP MicroServer fan generates on average 40 dB-A of noise (equivalent to the noise which a fan of Dell Latitude laptop would put out) according to my iPhone noise meter. The Synology NAS appliance is also pretty quiet. Its fan comes on for few seconds only when the CPU is doing heavy processing. I keep both devices in my home office, which is where I do most of my work.

One thing I wanted to do was to schedule automatic shut down at night to save power. So I searched online for a script to do so but the problem I ran into was that in vSphere 5.5 the host had to be put into maintenance mode before it could shut off gracefully. That meant that the server would come up as a result in maintenance mode when it powered back on and I would need to intervene and take it out of maintenance mode every time.

After experimenting with few ESX CLIs and with some help from the online community I came up with the following Apple script (hack) which basically shuts down the powered on VMs, puts the host in maintenance mode and then issues a shut down command with a delay of 10 seconds. Before the delay timer expires the script executes another command (last command below) and takes the host out of maintenance mode. When the delay timer finally expires the host gracefully shuts down.

do shell script “ssh -i sshkey [email protected] vim-cmd vmsvc/power.shutdown 1”

do shell script “ssh -i sshkey [email protected] esxcli system maintenanceMode set -e true -t 0″

do shell script “ssh -i sshkey [email protected] esxcli system shutdown poweroff -d 10 -r Shell”

do shell script “ssh -i sshkey [email protected] esxcli system maintenanceMode set -e n -t 0″

From there I scheduled an action in my Apple calendar to launch and execute the script every night.

HP MicroServer G8 + Synology NAS

I will be sharing in future posts some of the lessons I have learned during my prep journey so stay tuned for that.

Anas

@anastarsha

 

Additional Information:

Install VMware ESxi 5.5 on HP ProLiant MicroServer G8

HP ProLiant MicroServer G8 Links 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Troubleshooting BGP Adjacency

In this post I will walk you through some steps you can take to troubleshoot BGP neighbor adjacency. These steps become even more helpful when you have access to only one side on the link (in the case where you are trying to run BGP with a service provider). We will focus in this post on some of the reasons that may prevent two BGP routers from forming a relationship and will demonstrate along the way how the BGP state machine moves from Idle to Established.

I’m using here two Cisco CSR1000v routers as my BGP speakers but the tips below apply in general to any router from any vendor.

troubleshoot BGP adjacency

In the diagram above I have two routers, R1 and R2, with two parallel physical links between them. The two routers want to peer using the loopback addresses via BGP which is a common way to do load sharing between two routers. However the BGP adjacency is not coming up and stuck in Idle state as you can see from the output below:

R2#sh ip bgp sum

BGP router identifier 2.2.2.2, local AS number 200

BGP table version is 1, main routing table version 1

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd

1.1.1.1         4          100       0       0        1    0    0 never    Idle

Follow the steps below to verify that your configurations are complete and that there are no connectivity issues between the two routers:

1- First test and verify that R1 is reachable from R2 and vice versa. Issue a ping command from R2 sourcing your ping from the loopback0 interface with R2’s loopback0 interface as the destination as shown below:

R2#ping 1.1.1.1 source loopback 0

Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:

Packet sent with a source address of 2.2.2.2

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms

If your ping fails then you have a connectivity problem and you need to fix that before continuing this process. You want to make sure that the static routes are there on each router and that each router is able to ARP for the other router’s IP address.

2- You also need to verify that there is no firewall between R1 and R2 blocking TCP port 179 which is the port BGP uses to establish the connection. A quick way to ensure whether there is no firewall between R1 and R2 is to use the telnet command with port 179 as the destination port. Perform this test from both routers and don’t forget to source the traffic from the loopback interface:

R2#telnet 1.1.1.1 179 /source-interface loopback 0

Trying 1.1.1.1, 179 …

% Connection refused by remote host

As you can see from the output above, I got a “Connection refused by remote host” response when I tried to telnet from R2 to R1. This simply means that there is no device in the middle blocking traffic and R1 is rejecting the request obviously because 179 is not a standard port for telnet. If there was a firewall in the middle blocking traffic then you would get a “Unable to connect to remote host” response instead.

3- Now that we have verified that there are no connectivity problems, let’s focus on the BGP configurations. I will turn on “debug ip bgp x.x.x.x” on the router which shows me that the router is failing to establish a TCP connection with its peer. A good show command to use at this point is “show ip bgp neighbor x.x.x.x”

R2#show ip bgp nei 1.1.1.1

BGP neighbor is 1.1.1.1,  remote AS 100, external link

  BGP version 4, remote router ID 0.0.0.0

 Address tracking is enabled, the RIB does have a route to 1.1.1.1

  Connections established 0; dropped 0

  Last reset never

  External BGP neighbor not directly connected.

  Transport(tcp) path-mtu-discovery is enabled

  Graceful-Restart is disabled

  No active TCP connection

The output above tells me two important things. First that the RIB does have a route to reach the peer which confirms that the router has static routes needed to reach its peer’s loopback address.

The second important thing this output shows is this line: “External BGP neighbor not directly connected”. By default only directly connected eBGP peers are allowed to establish relationship.In order to change this default behavior, I have to add the “neighbor disable-connected-check” on both routers.

4- Even after disabling the direct-connected check, BGP relationship is still not coming up so my next step is to enable “debug ip tcp transactions” to see if that tells me why the TCP connection is failing:

R1#deb ip tcp transactions

TCP special event debugging is on

*Mar 13 05:53:52.522: Reserved port 0 in Transport Port Agent for TCP IP type 0

*Mar 13 05:53:52.522: TCP: connection attempt to port 179

*Mar 13 05:53:52.522: TCP: sending RST, seq 0, ack 4262843216

*Mar 13 05:53:52.522: TCP: sent RST to 192.168.32.20:27730 from 1.1.1.1:179

The last line in the output above is interesting. It is showing that R1 (1.1.1.1:179) is sending a TCP reset to R2 (192.168.32.20). Which means that it was R2 who initiated the TCP session. What this reveals also is that R2 sourced the connection request from its physical interface which is the default behavior in eBGP. But since I want the routers to peer using the loopback addresses and each router is expecting to receive a connection request from its peer loopback address, then i need to add the “neighbor update-source” to BGP on both ends

5- Now if I look at “debug ip bgp“, I can see that the TCP session is getting established and BGP is transitioning from the Idle -> Connect -> OpenSent -> OpenConfirm as shown below:

*Mar 13 06:49:28.979: BGP: 1.1.1.1 passive open to 2.2.2.2

*Mar 13 06:49:28.979: BGP: Fetched peer 1.1.1.1 from tcb

*Mar 13 06:49:28.979: BGP: 1.1.1.1 passive went from Idle to Connect

*Mar 13 06:49:28.979: BGP: ses global 1.1.1.1 (0x7F028066E270:0) pas Receive OPEN

*Mar 13 06:49:28.979: BGP: ses global 1.1.1.1 (0x7F028066E270:0) pas Send OPEN

*Mar 13 06:49:28.979: BGP: 1.1.1.1 passive went from Connect to OpenSent

*Mar 13 06:49:28.979: BGP: 1.1.1.1 passive went from OpenSent to OpenConfirm

*Mar 13 06:49:28.980: %BGP-3-NOTIFICATION: received from neighbor 1.1.1.1 passive 2/2 (peer in wrong AS) 2 bytes 00C8

*Mar 13 06:49:28.980: BGP: ses global 1.1.1.1 (0x7F028066E270:0) pas Receive NOTIFICATION 2/2 (peer in wrong AS) 2 bytes 00C8

*Mar 13 06:49:28.980: %BGP-5-NBR_RESET: Neighbor 1.1.1.1 *Mar 13 06:49:28.980: BGP: 1.1.1.1 passive went from OpenConfirm to Closing

*Mar 13 06:49:28.980: BGP: 1.1.1.1 passive went from Closing to Idle

When BGP is in the OpenConfirm state it’s one step away from reaching its final state (ESTABLISHED) and while in the OpenConfirm state BGP waits to hear a KEEPALIVE from its peer before it moves to the Established state. As you can see from the output above, after reaching OpenConfirm BGP instead closes the connection and transitions back to Idle because it receives an error (peer in wrong AS).

This is a clear indication that the AS number on R1 is wrong so I will fix that and issue a “clear ip bgp *” command to restart the process.

And now after I corrected AS number in the configs on R1, the BGP state machine transitions to Established as you see below and the two peers can start exchanging routing updates and keepalives.

R2#sh ip bgp neighbors 1.1.1.1

BGP neighbor is 1.1.1.1,  remote AS 100, external link

  BGP version 4, remote router ID 1.1.1.1

  BGP state = Established, up for 00:00:42

Obviously that’s not everything, and there are other reasons that could prevent BGP relationship from being established but I wanted to discuss the most common ones that I have seen in the field. Do you have something to share? Please respond and share below.

Here is the final configs for R1 and R2 for your reference.

R1#sh run

interface Loopback0

 ip address 1.1.1.1 255.255.255.255

!

interface GigabitEthernet4

 ip address 192.168.32.10 255.255.255.0

 negotiation auto

!

interface GigabitEthernet5

 ip address 192.168.116.10 255.255.255.0

 negotiation auto

!

router bgp 100

 bgp log-neighbor-changes

 neighbor 2.2.2.2 remote-as 200

 neighbor 2.2.2.2 disable-connected-check

 neighbor 2.2.2.2 update-source Loopback0

!

no ip http secure-server

ip route 0.0.0.0 0.0.0.0 192.168.32.1

ip route 2.2.2.2 255.255.255.255 192.168.32.20

ip route 2.2.2.2 255.255.255.255 192.168.116.20

!

R2#sh run

Building configuration…

!

hostname R2

!

!

interface Loopback0

 ip address 2.2.2.2 255.255.255.255

!

interface GigabitEthernet4

 ip address 192.168.32.20 255.255.255.0

 negotiation auto

!

interface GigabitEthernet5

 ip address 192.168.116.20 255.255.255.0

 negotiation auto

!

router bgp 200

 bgp log-neighbor-changes

 neighbor 1.1.1.1 remote-as 100

 neighbor 1.1.1.1 disable-connected-check

 neighbor 1.1.1.1 update-source Loopback0

!

ip route 0.0.0.0 0.0.0.0 192.168.32.1

ip route 1.1.1.1 255.255.255.255 192.168.32.10

ip route 1.1.1.1 255.255.255.255 192.168.116.10


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail

Great Python Training For Beginners

Python is every network engineer’s favorite programming language. It’s simple, powerful, and open-source. I started learning Python two months ago as I will be getting into network automation next year. I figured to share the trainings I have been using with those of you who are interested in learning Python.

Below are some of the training resources I have used personally. There is also a lot of other free training available online you can search for if you want to learn Python. All the trainings below except for the Rice University class are self-paced. 

  • Up and Running With Python: This online video tutorial is offered by lynda.com. It is the first Python training I have used and covers advanced topics like working with files, dates & times, and parsing & processing HTML. This class is not free, lynda.com requires a subscription to use their site and it’s usually about $25 a month to get started. You can use either the online Python interpreter or Aptana Studio for this class 
  • Google’s Python Class: This is a free and popular Python class. It combines articles and video lectures. It’s probably the first training people use when learning Python. I’m currently looking into this training. You can the built-in Python interpreter for Mac or download the free Python interpreter for Windows for this class. 
  • Python for Network Engineers: This free ten-week class is offered via email by Kirk Byers and it’s an introduction to Python. I have not taken this training personally but have heard good things about it. Check and find out from the website when is the next available class. Kirk also runs a blog which focuses on network automation. 
  • An Introduction to Interactive Programming in Python: This free online course is offered by Rice University through Coursera. I took this class recently and I can tell you it’s a lot of fun. Be prepared to spend about 2 hours a day studying and writing code if you plan to take this course. You will be required to write few games in this class including the Memory, Pong, and Blackjack games. I did not get a chance to complete all the games but certainly learned a lot in this class. If you are in it for the challenge and have the time, this class is for you.  

My implmentation of the Memory game in PythonMy implementation of the Memory game in Python

 

 


Share This:
Facebooktwitterredditpinterestlinkedintumblrmail
« Older posts