IT (Mis)Adventures in Namibia: vSphere

Showing posts with label vSphere. Show all posts

Tuesday, April 23, 2013

Cisco Port Configuration Best Practices

I run into inconsistent network configurations wherever I go, from customers who just lets everything live in the native VLAN to ones who horrendously over-complicate things. For the majority of Cisco deployments I have a simple set of configuration standards I adhere to. You will always get corner-case requiring special configurations, but I find that the below works very well for most use cases.

Standard Access Port Configuration

This configuration is applied to ports connecting to standard end-user equipment, like PC’s, printers etc.
switchport mode access
switchport access vlan 5
spanning-tree portfast
spanning-tree bpduguard enable

The first two lines are self-explanatory, it’s an access port living in vlan 5. Portfast is something we configure on an access port, which tells STP not to bother and just put the port in a forwarding state, as opposed to taking time to go through the listening and learning states.
BPDU’s are basically STP messages exchanged between switches, therefore a BPDU is not something we expect to see on a normal access-port. BPDUguard tells the switch that if it receives a BPDU (for example because someone connected an unauthorized switch), it should shut the port down. Finally BPDUFilter tells the switch to not send or receive BPDU’s on ports configured as portfast.

ESXi Host Port Configuration

ESXi uses internal vSwitches and PortGroups which allows for VM’s running on one host to reside in different VLANs
switchport mode trunk
switchport trunk allowed vlan 5-10,200
spanning-tree port type edge trunk
spanning-tree bpduguard enable
Fairly straightforward, we configure our link as a trunk which carries vlan 5 to 10 and 200. We then tell the switch that even though it’s a trunk we’re no connecting to another switch so no need to worry about STP on the port. Lastly bpduguard protects us against incorrect cabling – if the port receives a BPDU (which will happen if you inadvertently hook it up to another switch) it will shut down.

Standard Trunk Port Configuration

In the Cisco world a trunked link is a link that carries multiple VLANs. Not to be confused with link aggregation, which in Cisco parlance is called an Port-Channel.
switchport mode trunk
switchport trunk encapsulation dot1q
switchport trunk native vlan 10
switchport trunk allowed vlan 10-15
Both sides of the trunk needs to have the same default VLAN, by default the native VLAN is VLAN 1, but in all but the simplest deployments you will have to change this. We can also do VLAN pruning on our trunks, that is only carry certain VLANs accross our trunk.

Port-Channel Configuration

It is possible to aggregate multiple links between two switches and treat them as a single link. This gives us link redundancy and bandwidth increases. As a rule we do not use LACP to ensure compatibility with, for example, vSphere vSwitches.
interface Port-channel10
switchport trunk encapsulation dot1q
switchport mode trunk
Once again simplicity is the name of the game. We create the interface, set the encapsulation (not necessary, strictly speaking since Cisco defaults to dot1q) and set it as a trunk. Of course it does not need to be a trunk link, this is optional.

Sunday, February 24, 2013

Issues and Workarounds – vSphere Site Recovery Manager with the NetApp Storage Replication Adapter 2.01

I have just completed a project where I had to Install and configure VMware vSphere Site Recovery Manager. Storage was provided by NetApp FAS and V-Series filers, thus I had to use the NetApp provided Storage Replication Adapter. As of the time of this writing the latest version was 2.01. True to form I ran into a couple of bugs, which took a bit of figuring out.

Unable to add a controller: “Error: SRA command ‘discoverArrays’ failed”

Execute the following commands on your filers

options httpd.admin.enable on
options httpd.enable on
options httpd.admin.ssl.enable off

Error when adding an Array Pair: “Internal error: std::exception 'class Dr::Xml::XmlValidateException' "Element 'SourceDevices' is not valid for content model: '(SourceDevice,)”

There are two solutions to this issue

Downgrade back to NetApp SRA version 2.0.0
Manually include the lists of volumes you want discovered by the SRA. You’ll need to do this on both controllers in the pair.

This is a documented bug

Reprotect Job fails after recovering to Disaster Recovery Site

The SRM / SRA timeouts seems a bit aggressive to me. This is highlighted when you do a reprotect on a failed over Protection Group. Part of the task sequences is to reverse the direction of replication, but this fails consistently due to the SRM not waiting long enough for this reversal to take place.

You can kludge it by:

Re-running the reprotect until it works
Manually refresh the Array Manager while the Re-Protect job is running

Recovered DataStores have snap-xxx prefixes

More of a cosmetic irritant than a true bug, I wanted this fixed nonetheless.

Within SRM, right-click your site and select Advanced Settings
Click StorageProvider
Select the storageProvider.fixRecoveredDatastoresNames check box

Tip

I would suggest increasing your SRM SAN provider timeout settings to something a bit more sane, like double. Instructions can be found here.

Also make sure that the ALUA settings on your iGroups in both the protected and recovery sites are the same.

Wednesday, October 10, 2012

Directly Connecting a Brocade 815 HBA to a EMC VNX5300

I’m busy with a project which involves getting two ESXi hosts hooked up to a VNX5300 configured in block mode. The order we placed with Dell specified Emulex 12000 HBA’s, but Dell got creative and shipped Brocade 815’s instead. Only problem was that they didn’t work when directly connected to the front-end ports on the VNX. I’m documenting the symptoms here as well, so that the next person does not have to battle for two days.

The Symptoms

When directly connecting the HBA’s to the VNX fiber ports the following events pop up in the SP event logs

EV_VirtualArrayFeature::_mergeInternalObjects() - No parent for HBA,
EV_TargetMapEntry::GetHostInitiatorPort() - NULL HBAPort pointer

Running NaviSECCli.exe -Address 172.20.10.27 port -list –sfpstate outputs the following:

SP Name:             SP A
SP Port ID:          1
SP UID:              50:06:01:60:BE:A0:72:F9:50:06:01:61:3E:A0:72:F9
Link Status:         Up
Port Status:         Online
Switch Present:      NO
SFP State:           Online

This tells us that things are fine on a physical layer, but not much else is happening higher up the stack.

The Fix

First we need to upgrade the HBA firmware to version 3.1. There are various OS specific ways to do it, easiest is probably to download the livecd from Brocade. Since this HBA is not on the ESXi 5.1 HCL we need to install the driver. You need to install at least the v3.1 I include the steps for the sake of completeness

Enable SSH on your ESXi host
Use scp for Windows or the following command from a linux / max host: scp brocade_driver_esx50_v3-1-0-0.tar root@<ip address>:/tmp
SSH into your ESXi host and navigate to the /tmp folder with cd /tmp
Execute tar xf brocadedriveresx50_v3-1-0-0.tar
Execute ./brocade_install_esxi.sh
Wait for the installation to finish (takes about 1 – 2 mins) and reboot host once done

Now we need to configure the HBA for direct connection, or more technically, FC-AL mode

SSH into your ESXi host and navigate to /opt/brocade/bin/ by entering cd /opt/brocade/bin/
./bcu port --topology 1/0 loop
./bcu port —disable 1/0
./bcu port —enable 1/0
./bcu port --topology 2/0 loop
./bcu port —disable 1/0
./bcu port —enable 1/0

Your ESXi host should now show up as a host on the VNX where you can add it to a storage group and assign LUNs.

Sunday, July 29, 2012

Setting up vSphere Active / Active iSCSI connections to a NetApp FAS2040

I recently had the opportunity to architect a solution consisting of 3 vSphere 5 boxes connecting to a NetApp FAS2040. Storage connectivity would be via iSCSI. The storage network would be running off of 2 Cisco 2960G switches, soon to be replaced by stacked Cisco 3750’s.

The requirements were stock standard, as high a throughput as possible, with as much redundancy as possible. This meant going active active on the iSCSI links. Here is how I did it.

NetApp FAS2040 Configuration

This little SAN has 8 1GB Ethernet ports. Due to the fact that the Cisco 2960G switches does not support multi-link switch aggregation (this is where the 3750’s will come in) I had to come up with a simpler design – what NetApp terms a Single-Mode design. My design allows for:

Two active connections to each controller, thus a total of four active sessions
Storage path HA
Load balancing across links
Uses vSphere storage MPIO as opposed to switch-side configuration

Virtual Interface (VIF) Configuration:

All Vif's are single-mode / active passive
Cont1_Vif01 - e0a/e0b (e0a will be active, connected to switch 1 / e0b passive connected to switch 2) IP – 192.168.1.1
Cont1_Vif02 - e0c/e0d (e0c will be active, connected to switch 2 / e0d passive connected to switch 1) IP – 192.168.2.1
Cont2_Vif01 - e0a/e0b (e0a will be passive, connected to switch 1 / e0b active connected to switch 2) IP – 192.168.1.2
Cont2_Vif02 - e0c/e0d (e0c will be passive, connected to switch 2 / e0d active connected to switch 1) IP – 192.168.2.2

This image, courtesy of NetApp, explains it infinitely better than my wall of text:-)

I also configured partner takeover for all VIF. In case of controller failure it allows the remaining controller to take over the VIFs.

Ethernet Storage Network Configuration

On the storage network I had to configure 2 critical settings:

Spanning Tree Portfast
Jumbo Frames

When connecting ESX and NetApp storage arrays to Ethernet storage networks, NetApp highly recommends configuring the Ethernet ports to which these systems connect as RSTP edge ports. This is done like so:

Switch2960(config)# interface gigabitethernet2/0/2
Switch2960(config-if)# spanning-tree portfast

Next up, Jumbo Frames:

Switch2960(config)# system mtu jumbo 9000
Switch2960(config)# exit
Switch2960# reload

vSphere Configuration

I am in love with vSphere 5, and one of the biggest reasons for that is the fact that a lot of the configuration parameters that used to be command-line only has been moved into the GUI. Another reason is Multiple TCP Session Support for iSCSI. This feature enables round robin load balancing using VMware native multipathing and requires a VMkernel port to be
defined for each physical adapter port assigned to iSCSI traffic. That said, let’s get configuring:

Open your vCenter Serve
Select an ESXi host
In the right pane, click the Configuration tab
In the Hardware box, select Networking
In the upper-right corner, click Add Networking to open the Add Network wizard
Select the VMkernel radio button and click Next
Configure the VMkernel by providing the required network information. NetApp requires separate subnets for active/active iSCSI connections, therefore we will create two VMkernels, on the 192.168.1.x and 192.168.2.x subnets respectively.
Configure each VMkernel to use a single active adapter that is not used by any other iSCSI VMkernel. Also, each VMkernel must not have any standby adapters. If using a single vSwitch, it is necessary to override the switch failover order for each VMkernel port used for iSCSI. There must be only one active vmnic, and all others should be assigned to unused
The VMkernels created in the previous steps must be bound to the software iSCSI storage adapter. In the Hardware box for the selected ESXi server, select Storage Adapters.
Right-click the iSCSI Software Adapter and select properties. The iSCSI Initiator Properties dialog box appears
Click the Network Configuration tab
In the top window, the VMkernel ports that are currently bound to the iSCSI software interface are listed
To bind a new VMkernel port, click the Add button. A list of eligible VMkernel ports is displayed. If no eligible ports are displayed, make sure that the VMkernel ports have a 1:1 mapping to active vmnics as described earlier
Select the desired VMkernel port and click OK.
Click Close to close the dialog box
At this point, the vSphere Client will recommend rescanning the iSCSI adapters. After doing this, go back into the Network Configuration tab to verify that the new VMkernel ports are shown as active, as per the image below.

Congratulations, you now have active / active, redundant iSCSI sessions into your NetApp SAN!

Wednesday, April 11, 2012

Issues when upgrading to Veeam Backup & Replication v6

Let me start off by saying that Veeam Backup & Replication (VBR) is one of the most awesome pieces of software I know of! As with any piece of software there might be the occasional bug or issue that needs to be worked around. I’ve upgraded a couple of clients to VBR already, and I’ve consistently run into two little niggles. These are:

Backup of vCenter SQL DB fails when using VBR v6

This is actually detailed on the Veeam forums here.

What it boils down to is that when using VBR for an application aware backup of the VM which hosts the vCenter SQL DB, the backup will fail with a VSSControl: Failed to freeze guest, wait timeout error. This occurs because VBR has to communicate with vCenter to create a snapshot, but at the same time the vCenter SQL DB is frozen because of the VSS snapshot

To solve this problem, add the IP-address of the ESX host which runs the vCenter Server SQL database to the Veeam console (the servers list in the left panel). Then adjust the job and select the virtual machine with the SQL database from the ESX host instead of via the vCenter Server. Veeam B&R will then use the host for communication with the VM to do the VSS snapshot.

This will, of course, cause your replication job to fail should the vCenter VM move to another host. We can work around this by using DRS Host Affinity rules to tie your vCenter VM to a physical box.

Replication job fails – cannot connect to port 2500

When VBR does a replication job, it might fail with “Creating snapshot Cannot connect to server [x.x.x.x:2500]”

In my case it was always happening when the destination server was running ESX (not ESXi) v4.x. This is easily resolved by logging onto the target server via SSH and issuing the following commands:

esxcfg-firewall -openport 2500:2510,tcp,in,VeeamSCP
service mgmt-vmware restart

Easy enough!

Wednesday, February 23, 2011

Partial/No Redundancy on iSCSI Datastores

Expensive fiber SANs are not price-compatible with a lot of my clients, therefore a lot of my time is spent in iSCSI environments. I’ve noticed in all instances that the Multipathing Status for all my iSCSI datastores are Partial/No Redundancy when viewed on the Storage Views tabs in vCenter. This bothers me because I always go to great lengths to ensure that I set up my iSCSI multipathing correctly.

I therefore breathed a big sigh of relief when I discovered that this behaviour is a bug as confirmed by VMware Technical Support. The rule for displaying the “Multipathing Status” is as follows:

Full Redundancy – If you have 2 separate adapters and 2 separate paths to the datastore
Partial/No Redundancy – If there is one path which is Up
Unknown – If there is at least one path with an “Unknown” status
All Paths Down – No way to reach the datastore

You will always only have one adapter when using a software iSCSI Initiator, this implies a single point of failure which gives us the dreaded “Partial/No Redundancy” status. So as things stand now software iSCSI will always be displayed with a degraded status. Methinks VMWare should develop separate rules / algorithms for fiber and iSCSI SANs…

Monday, August 30, 2010

ESX Hosts Disconnecting After Upgrade to vSphere 4.1

When you upgrade to vSphere 4.1 your hosts might start disconnecting from your vCenter Server with the following error message: A general system error occurred: internal error: vmodl.fault.HostCommunication. Restarting the management agents does not resolve the error, nor does rebooting the host. This VMware KB points to name resolution issues, but that is not at fault here. The issue is vCenter Server cannot manage an ESX 4.1 host.

Workaround / Solution

Currently there are two solutions available:

Upgrade your vCenter Server to version 4.1. (Once you've upgraded you'll have to remove the hosts from your inventory and re-add it - simply reconnecting didn't work in my case)
Downgrade your ESX hosts to version 4

Strangely enough I could not find this documented anywhere on the VMware Knowledge Base, even though it seems to be a pretty widely reported problem.

Upgrading to vSphere 4.1 via a SSH CLI Session

I was tasked with upgrading a clients vSphere installation from vSphere 4.0 to 4.1. Due to various external factors the client couldn't make use of the vSphere Update Manager, so I had to do it old-school style from the command line. Here's how to do it:

Download the required updates

Navigate to http://downloads.vmware.com/d/info/datacenter_downloads/vmware_vsphere_4/4
Download pre-upgrade-from-ESX4.0-to-4.1.0-0.0.260247-release.zip
Download upgrade-from-ESX4.0-to-4.1.0-0.0.260247-release.zip

Install the updates

Put your ESX host in maintenance mode with the following command: vimsh -n -e /hostsvc/maintenance_mode_enter
Install the pre-upgrade patch: esxupdate update --bundle=pre-upgrade-from-ESX4.0-to-4.1.0-0.0.260247-release.zip
Install the actual upgrade patch: esxupdate update --bundle=upgrade-from-ESX4.0-to-4.1.0-0.0.260247-release.zip
Reboot the ESX host via the reboot command
Last step is to exit maintenance mode: vimsh -n -e /hostsvc/maintenance_mode_exit

All of the above can of course be automated using the Update Manager, but for those occasions where it's not possible to use it the above will come in handy.

Thursday, August 19, 2010

Using Veeam backup to relocate VM's

Nope, I didn't fudge up the title. Veeam Backup and Replication is a wonderful product, allowing you to replicate vSphere VMs to a offsite Disaster Recovery location. When disaster strikes, it's a pretty straight-forward process to fail over to your DR site. It's what I call a forehead procedure - you only have to hit the spacebar with your forehead. Thanks - I'll be here all night!

What's not so intuitive and well documented is using Veeam Backup to move VM's to a different location, for example a Server Room / Data Center relocation *and then commiting those changes*, i.e. not failing back to Production. The below steps assume we've replicated and failed over our VM's to our DR location already.

Ruan's Step By Step Guide on using Veeam Backup to relocate VM's

Delete all Veeam VM snapshots using the vSphere Snapshot Manager
By default your DR replica will be named "VMname_replica", rename it back to its original name, i.e. VMname
Remove the VM replica from the list of replicas in the Veeam Backup and Replication Console
Delete the Production -> DR replication job responsible for replication the VM in question. Recreate it to reflect the new Source and Target locations
Delete the .vrb file from the VM datastore, as we will no longer be using these restore points
Delete the replica.vrb and running.rbk files
Pat yourself on the back - you've just done the easiest VM move you'll ever do!

Monday, March 29, 2010

Enabling Jumbo Frames in vSphere

Hi Kids, todays post is brought to you by the letter J. I previously mentioned a vSphere deployment I had to do, connecting to a EMC AX4-5i iSCSI SAN. Once I got the storage hooked up to my ESX hosts I of course wanted to enable jumbo frames.

Jumbo Frames, for the uninitiated heathens out there, is basically a Ethernet frame with a payload of more than 1500 bytes, up to a typical maximum of 9000 bytes per frame. Why would I want to do this? Performance, in a nutshell. The only requirement would be that your switch supports Jumbo Frames, which I believe most, if not all, mid to high-end kit does. So let's get down to business, shall we?

Switch Configuration

I'm fairly proficient with Cisco and HP networking gear, so I'll give a quick rundown of the commands needed to enable jumbo frames on their kit:

Cisco: Go into conf t mode and enter the following command "system mtu jumbo 9000". Once that is done issue a "reload" command to reboot your switch
HP Procurve: On these babies Jumbo frames need to be enabled per VLAN. Execute the following command: "vlan # jumbo"

That takes care of the networking side of things, let's move on to the ESX...

Configure Jumbo frames in vSphere

Log on to your ESX host using your favorite SSH client
Change your chosen vSwitch MTU with the following command "esxcfg-vswitch -m 9000 vSwitch#" Replace vSwitch# with the name of the vSwitch you want to modify
Seeing as one cannot change an existing VMKernel port MTU, you will either have to remove and recreate your existing VMKernel port, or create a new one. To delete an existing VMKernel port, enter the following command: "esxcfg-vmknic -d -p VMKernelport"
Now let's add a VMKernel port called "iSCSI01" to our vSwitch, like so: "esxcfg-vswitch –A iSCSI01 vSwitch#"
Now we enable jumbo frames on our VMKernel port: "esxcfg-vmknic –a –i 192.168.x.x –n 255.255.255.0 –m 9000 iSCSI01". You do know you need to change the IP address and mask to reflect your environment, right?
Let's make sure we didn't screw up somewhere by running "esxcfg-vmknic –l". Verify that the MTU is set to 9000
All done! You can refer to my previous iSCSI Multipathing post to ensure Jumbo frame enabled iSCSI Multipathing goodness!

I have thought of merging this post with the iSCSI Multipathing post, but seeing as not all environments can/will support end to end Jumbo frames, I have decided to keep them seperate for the time being.

Thursday, March 25, 2010

ESX 4 iSCSI Multipathing

There I was, setting up a small DR site for a client, hardware was a blinged up Proliant DL380, connecting to a sweet, sweet little EMC AX4-5i SAN. This being my first non-FC SAN, I obviously investigated the multipathing options. I was kind of (no, really!) taken aback when I discovered that it's not so straightforward to set up. What's a boy to do? I took a deep breath, bowed in the direction of the (not-so-sweet-anymore) AX4-5i SAN and unleashed the fury of my Google-Fu! Hee-Ya! About 3 hours later the fiendish ESX 4 and a AX4-5i had to yield to the fury of my touch typing - below I chronicle it for all eternity.

Setting up Multipathing
The DL380's I worked on here had an additional 4 port NIC installed, for a total of 8 1Gb ports. The 4 on-board ports I dedicated to the Service console, vmkernel et al, standard stuff. The 4 ports on the add-on NIC I used for iSCSI, perfect considering the AX4-5i also has 4 iSCSI ports. What I did then was set up a new vSwitch, added my 4 Nics to the vSwitch and then created 4 seperate VmKernel ports, so that we have a one-to-one mapping between the VMkernels and the NICs. Here's how to do that:

Create a new vSwitch that we're going to dedicate to iSCSI
Connect your NICs to this vSwitch by going vSwitch - Properties. Click the Network Adapters tab and click Add. Select your NIC's and click Next and Finish.
Now we create VMkernel ports for all our newly added adapters, like so: Go vSwitch - Properties and click the Ports tab and click Add. Select VMkernel, give it a nice label and a suitable IP address and click Finish. Create a VMkernel port for every adapter assigned to your iSCSI vSwitch.
Now the important bit, we're going to do the one-to-one VMkernel to NIC mapping: On the ports tab, select one of your new VMkernel port and click Edit.
Select the Nic Teaming Tab and check the Override vSwitch failover order
Make sure only one adapter is active, move all the other NIC's down to the Unused Adapters section
Rinse and repeat for every VMkernel port on your vSwitch

Alas, we're not done yet - now I'm going to introduce you to my little friend esxcli. This fun little guy will enable us to connect our newly created VMkernel ports to the ESX iSCSI initiator

Enable Software iSCSI Multipathing

From the CLI (console, ssh, pick your poison) run the following command esxcli swiscsi nic add -n vmk# -d vmhba33
Rinse and repeat for all your ports, verify your work by executing the following command: esxcli swiscsi nic list -d vmhba33
You can then rescan your iSCSI initiator via the vSphere GUI. You can also verify multpathing via the paths view for the vmhba33 adapter.

Happy Multipathing! It should be mentioned that iSCSI and Jumbo frames go hand-in-hand. I will do a post detailing that can of worms in the near future.

About This Blog

This blog serves 2 purposes. Firstly, I want to share information with other IT pros about the technologies we work with and how to solve problems we often face. I work with technologies from the desktop to the data center, Active Directory, System Center, Exchange, Hyper-V, VMware, Networking and Storage.

Less altruistically, I use my blog as a reference. There's so much to learn and remember in our field that it's impossible to keep up. By blogging, I have a notebook that I can access from anywhere. It has made me look much smarter than I probably am on many occasions.