Skip to content

Tag: vSphere

PowerCLI – Rescan HBAs for a Cluster on a Host by Host Basis

After having some bad luck with rescanning HBAs for entire clusters or datacenters all at the same time (Cliff’s Notes: the LUNs ended up looking like array based snapshots and therefore unusable), it was decided that any rescans should be done on an individual host basis. Below is the script I created to achieve this goal.

When you run the script, it will ask for the Cluster name so you don’t have to modify the code.


#Rescan HBAs for a cluster on a host by host basis

$InputCluster = Read-Host "Cluster Name"

$vmhosts = get-cluster $InputCluster | get-vmhost

foreach ($i in $vmhosts) {

Write-Host "Starting rescan of all HBAs on $i"
$i | Get-VMHostStorage -RescanAllHba | Out-Null
Write-Host "Rescan of all HBAs on $i is complete"
Write-Host ""

}

Click here to download a text file containing the script.

Note: this was a script that worked in my environment. There is no warranty or support with this script, please use at your own risk.

PowerCLI – Analyze a Cluster to Move VMs to a Resource Pool

After someone deployed a bunch of VMs, we let them know about the Resource Pool they were supposed to be deployed to. Oops. To correct this, and to avoid a couple hours of dragging and dropping VMs into a resource pool, I was able to create a script that detects if a VM is outside of a Resource Pool and then move it to the specified Resource Pool.

When you run the script, it will ask for the Cluster name and the Resource Pool name so you don’t have to modify the code.


#Analyze the Clusters and check for systems outside of Resource Pools

$InputCluster = Read-Host "Cluster Name"
$InputRP = Read-Host "Resource Pool Name"

#Cluster which will be analyzed
$cluster = Get-Cluster $InputCluster

#Resource Pool where VMs should be moved
$rp = Get-ResourcePool $InputRP

#Detection of VMs not in Resource Pools
$norp = Get-Cluster $cluster | Get-VM | where {$_.ResourcePool.Name -eq "Resources"}

foreach ($i in $norp) {

Write-Host "Moving $i to Resource Pool $rp"
Get-VM $i -Location $cluster | Move-VM -Destination $rp | Out-Null
Write-Host "Move complete"
Write-Host ""

}

Click here to download a text file containing the script.

Note: this was a script that worked in my environment. There is no warranty or support with this script, please use at your own risk.

Fusion-IO Caching Tests…

After some issues with the setup and configuration of the Fusion-IO ioCache cards we picked up, I finally got to dig in and do some basic testing with IOMeter.

To setup the test, I configured a new 20GB VMDK on it’s own paravirtualized SCSI Controller. The drive was formatted NTFS in a full/non-quick method as the F: drive. The IOMeter test was run on a single worker against the entire F: drive. The All-in-one test was selected and run for 20 minutes.

First up, I tested the drive all by itself with no caching enabled:

IOps: 1348.69 Read IOps: 670.1738 Write IOps: 678.5159

Next, I tested Volume based caching. I started off by making the following modifications to the Fusion-IO tab within vCenter as follows to add only the F: drive to the Volume Caching Filter:
Volume Based Caching

Then I reset the F: drive by formatting it again as NTFS in a full/non-quick method. Once the format was complete, I reran the IOMeter test and received these results:

IOps: 1486.163 Read IOps: 737.5608 Write IOps: 748.6018

Lastly, I tested the Drive based caching. I went back to the Fusion-IO tab within vCenter and removed the Volume Caching Filter on the F: drive and then set the Drive Caching Filter to Drive1 (Drive0 was the drive the OS was installed on, Drive2 was the drive which is attached by FusionIO automatically):
Drive Based Caching

Then I reset the F: drive by formatting it again as NTFS in a full/non-quick method. Once the format was complete, I reran the IOMeter test and received these results:

IOps: 1509.644 Read IOps: 748.7889 Write IOps: 760.8555

I also managed to grab a shot of the Performance graphs for the disk during the tests via vSphere client: (pardon the lapse between 2PM and 3PM on the graph)
Performance Graph on Disk

So to review and put the results all on the same table:

No Caching Enabled IOps: 1348.69 Read IOps: 670.1738 Write IOps: 678.5159
Volume Caching Enabled IOps: 1486.163 Read IOps: 737.5608 Write IOps: 748.6018
Drive Caching Enabled IOps: 1509.644 Read IOps: 748.7889 Write IOps: 760.8555

Remember, these are just initial results with nothing but having the card installed, drivers installed, firmware upgraded, ioTurbine installed, and the guest package installed. While some of the results weren’t exactly what I was expecting, I’m pretty excited to dig in deeper to see what kind of performance we can gain out of these cards.

Small update…

While this particular blog post is about caching, since that’s how these cards will be used in this environment, I couldn’t help but go back, mount the Fusion-IO card as VMFS storage, SvMotion the F: drive over to the Fusion-IO VMFS datastore and run the test again. So once again, the F: drive was formatted as NTFS in a full/non-quick method. Once the format was complete, I reran the IOMeter test and received these results:

IOps: 5443.904 Read IOps: 2700.201 Write IOps: 2743.703

Using the Cold Clone 3.0.3 ISO with vSphere 5

As many know, the Cold Clone ISOs have been discontinued and support has been removed from VMware. This is quite unfortunate especially when you get into the business of P2V’ing items like domain controllers, SQL servers, and other pesky boxes.

I understand the converters from VMware have improved by leaps and bounds over what it was in version 3, but there’s still a reasonable amount of security in P2V’ing a box while it’s completely down with no running services whatsoever.

First issue I ran into while attempting to P2V an old SQL server was finding the ISO. It has almost been removed from the internet as a whole. So I’ve uploaded it here: Cold Clone 3.0.3 Link

Second issue, boot time. It took me literally 15 minutes from the point of “hitting a button to boot to disc” to the point of accepting the EULA.

Third issue, drivers. I was lucky enough for the internal NICs to be found but the add-on NICs were not. To put this into perspective, the server I was working on was an HP G5. I tested with an HP G6 and it did not find any NICs at all.

Fourth issue, disappearing network settings. Every time I went to the networking settings, they would be cleared. The settings held as long as I clicked apply and then ok, but the second you went back to the network config menu I had to renter all the info.

Fifth issue, vCenter integration. It’s not exactly shocking that it didn’t work with vCenter, but I was hopeful. The converter would go through and recognize everything out of vCenter, but then ran into a bunch of issues as soon as the clone actually starts. Such as:

Couldn’t find the Distributed vSwitch vNIC:
[managedVMCreator,2657] No network named “SYSMGMT” was found
[imageProcessingTaskStep,194] VmiImportTask::task{9} step “create VM” destroyed
[vmiImportTask,439] Error during creation of target VM
[imageProcessingTaskStep,194] VmiImportTask::task{9} step “create and clone to VM” destroyed
[imageProcessingTaskStep,194] VmiImportTask::task{9} step “Clone VM” destroyed
[imageProcessingTaskImpl,552] VmiImportTask::task{9}: Image processing task has failed with MethodFault::Exception: vim.fault.NotFound
[imageProcessingTaskImpl,154] VmiImportTask::task{9}: SetState to error

Couldn’t find the FC attached storage on a host:
[NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Host address lookup for server esx01 failed: The requested name is valid, but no data of the requested type was found
NBD_ClientOpen: Couldn’t connect to esx01:902 Host address lookup for server esx01 failed: The requested name is valid, but no data of the requested type was found
DISKLIB-DSCPTR: : “vpxa-nfc://[VMFS_015] VM01/VM01.vmdk@esx01:902!52 f1 8e a1 39 1c c1 f8-9c d4 05 71 1a 4f ae c6” : Failed to open NBD extent.
DISKLIB-LINK : “vpxa-nfc://[VMFS_015] VM01/VM01.vmdk@esx01:902!52 f1 8e a1 39 1c c1 f8-9c d4 05 71 1a 4f ae c6” : failed to open (NBD_ERR_NETWORK_CONNECT).
DISKLIB-CHAIN : “vpxa-nfc://[VMFS_015] VM01/VM01.vmdk@esx01:902!52 f1 8e a1 39 1c c1 f8-9c d4 05 71 1a 4f ae c6” : failed to open (NBD_ERR_NETWORK_CONNECT).
DISKLIB-LIB : Failed to open ‘vpxa-nfc://[VMFS_015] VM01/VM01.vmdk@esx01:902!52 f1 8e a1 39 1c c1 f8-9c d4 05 71 1a 4f ae c6’ with flags 0x2 (NBD_ERR_NETWORK_CONNECT).
[diskHandleWrapper,87] DiskLib_Open failed on vpxa-nfc://[VMFS_015] VM01/VM01.vmdk@esx01:902!52 f1 8e a1 39 1c c1 f8-9c d4 05 71 1a 4f ae c6 with error NBD_ERR_NETWORK_CONNECT.
[imageProcessingTaskImpl,552] BlockLevelCloning::task{21}: Image processing task has failed with MethodFault::Exception: sysimage.fault.DiskLibConnectionFault
[imageProcessingTaskImpl,154] BlockLevelCloning::task{21}: SetState to error

How I actually got it to work was by sending the P2V directly to a specific ESXi 5 host. Once I did that, I could P2V the system to the locally attached storage or the FC attached storage. I attached the vNIC to a standard vSwitch.

VMworld 2012 In Review

After a nice long week away, it’s nice to finally be back home from VMworld…

Let’s cover some of the good stuff:

      Overall friendliness of people. Truly amazing how available people were and how they didn’t mind talking anywhere from a couple minutes to all night.
      My top three sessions: Definitely grab these sessions once they’re available on the VMworld site.
      NET2207 – vDS Deep Dive with Jason Nash from Varrow – Very in depth, Jason had a lot of tips and demos which even features new functionality for vSphere 5.1
      VSP1504 – Ask the Expert vBloggers with Chad Sakac, Scott Lowe, and Rick Scherer from EMC, also with Duncan Epping and Frank Denneman from VMware – Interactive sessions are always great, these guys sat up on stage and answered question after question, basically being turned into a makeshift helpdesk, and was quite entertaining
      VSP1353 – vCenter Deep Dive with Justin King, Ameet Jani and Deep Bhattacharjee all from VMware – this session had a lot of information to cover, especially with all the new features in vCenter 5.1 and the speakers covered it all with great depth and also featured a Q&A portion at the end with several good questions being brought up from the audience
      Hang Space with the vBrownBag and theCUBE spaces were terrific. It was amazing how many speakers, bloggers, and other on-demand people were hanging out freely in the Hang Space. The Community Tech Talks from the vBrownBag guys were terrific and as soon as they’re posted I’ll provide links. The Hang Space also featured a charging center which worked as a coat check for your portable devices, this was brilliant.
      Certificate Lounge and the Fast Track VCAP:DCA samples. This was a terrific idea and a bit of a hidden gem in the Marriott Marquis. I must have spent all of Thursday afternoon doing the samples. Some great information and awesome discussions with folks coming in from cert tests or even VCDX Defenses
      Parties. There were some terrific parties this year. My favorites were definitely the VMunderground, HP Customer Appreciation, EMC Customer Appreciation, VMware CTO, and then the VMware VMworld parties. So thanks to all those people who sponsored the parties as well as attended to make it such a great time.

There wasn’t a lot of negative stuff, but there were a couple items that stuck out:

      Speakers that phoned it in from home. If you went through the time to make the slides, at least learn the content so the session attendees aren’t stuck going through the slides with you
      Hands on Labs. These have been terrific in years past, but they just couldn’t keep them up and error free.
      Food. Breakfast was served in Moscone West with minimal tables and chairs. Lunch was prepackaged and normally soggy by the time it was served. Lunch was also served outside in the garden area. In the past lunch usually consisted of a self-serve buffet with table seating. The lack of proper seating wasn’t conducive to networking and a lack of shade is never a good thing. I will say, the VMworld blanket they handed out was very nice.
      Solutions Exchange being open on Sunday and closing on Wednesday Having the Solutions Expo open on Sunday instead of being open on Thursday was a poor choice. Personally, I couldn’t make it to the Solutions Exchange or the welcome reception due to other obligations and that took away a whole day from being able to visit with the vendors.

Some of the other notable items from VMworld this year:

Took some pictures as well:
Fixing the lack of Twitter handle on the Attendee Badges:
Fixed Badge

Kicking things off with the VMunderground Party:
VMunderground Line

VMworld Keynote Intro:
Keynote Intro

Stumping the “Expert vBloggers” (VSP1504 – Chad Sakac, Frank Denneman, Duncan Epping, Scott Lowe, a guest from the audience and Rick Scherer) on stage:
Stumping the Expert vBloggers

Paul Maritz speaking to the VMUG Leader group during the luncheon:

Stumping the Expert vBloggers

Pat Gelsinger taking the stage in front of the VMUG Leaders:
Stumping the Expert vBloggers

Pat Gelsinger, Paul Maritz, and Steve Herrod talking to our group:
CxO Group at the VMUG Leader Luncheon

William Lam setting up for his Unsupported vInception session in the vBrownBag area:
William Lam at vBrownBag

Jason Nash (Varrow) getting mobbed after his vDS session (NET2207):
vDS Session Aftermath

Steve Herrod at the CTO Party:
Herrod at the CTO Party

Geeky Tat found while in the Thirsty Bear (yes, it really is binary):
Binary Tat

Post VMworld Meat Coma thanks to the WPAVMUG folks:
Brazillian Buffett

Just to wrap things up, thanks to everyone I met out there and thanks to all the people that came up and introduced themselves. It was a terrific week, lots of information was learned, tons of connections were made and it was one of the best conferences I’ve been to yet… Thanks again!

VMworld Prep!

So you got your VMworld ticket… you got your hotel reservation… you got your flight booked… You’re ready to go right?

You’re only cracking the surface!

  • Build up your calendar with sessions: VMworld Session Builder
    With the way the sessions are being done, I’m taking the position of registering for sessions I would go to at time slots I have available. I would much rather be registered for a session and miss it, than miss it due to it being full and unavailable. You’ll also find that if you export your calendar, it will tell you what rooms your sessions are in so that you can plan accordingly.
  • Build up your calendar with after hours events: VMworld Gatherings
    Please note that many of these events are RSVP only, so definitely make sure to get registered and/or talk to your favorite company’s reps to get you registered.
  • Make time for the Hands on Labs
    The Hands on Labs are always very well organized and provide great content, don’t miss out on an opportunity to learn in an environment you’re not responsible for!
  • Check out the Solution Expo
    Yes, the vendors are going to want to scan your badge. However, there are always some diamonds in the rough. Either some amazing technologies with capabilities I didn’t realize existed, or some awesome prizes.
  • Stop by the Hangspace
    The Hangspace will be providing an area for the VMTN Community TechTalks and other interviews as well as blogger tables and space for general networking.
  • Stay in the know by following VMworld related hashtags on twitter: VMworld Twitter Community

A map of the layout around the Moscone Center has already been released:
VMworld Map

If you fill your schedule like I do, chances are you aren’t seeing the hotel for any reason but to sleep for a couple hours. So let’s talk about the accessories that can keep you on the go…

  • Don’t worry about a backpack. You’ll get one at registration.
    They’re always quite nice and very durable. Not to mention that’s one less thing to lug on the plane on the way there or find space for on the way back. A sneak peek has already been released:
    VMworld Backpack Sneak Peek
  • Bring some comfortable shoes
    You’ll be walking or standing a majority of the day, you don’t want to miss out on something because your feet hurt
  • Bring spare batteries or external chargers
    I cant stress this one enough. With 15,000+ attendees there aren’t enough plugs for everyone. Don’t miss out on a session or activity because you couldn’t get to the VMworld app or your calendar. Personally, I picked up a New Trent iPulse IMP100P. Two USB outputs and 10,000 mAh in the size of an iPhone should be good enough to get me through a full day.
  • A tablet to take notes on.
    In years past I’d have my iPad and the Pen Ultimate App, but this year I’m going with the Nexus 7 and the Evernote App. I’ve outfitted the Nexus 7 with the Poetic Leather case and an amCase amPen stylus. So far has been quite a combo for during business meetings and various local user groups.
  • Business Cards
    VMworld is an amazing resource to gain knowledge and information and the networking opportunities are incredibly strong.

Some other tips to those that have never been:

  • Dress code is normally said to be Business Casual, however you’ll see everything from high dollar business suits down to shorts and sandals. Be comfortable, but don’t look like a mess.
  • Wireless is readily available throughout. It may not be the fastest, but it’s available and free. In years past, the AT&T networks (my carrier) have been quite overloaded so use whatever has the best connection out.
  • Drinks (soda, water, juice) are normally readily available. Snacks are available at different times during the day as well, everything from granola down to candy bars.
  • Breakfast and lunch are provided, make sure you know how to get there. There are other options around the Moscone, but none are free and some aren’t exactly quick.

Not going to VMworld?

Well, watch the Keynotes and get access to quite a bit of other information live via VMware NOW: http://bit.ly/VMwareNOW and keep up with action by way of the Community Tech Talks featuring myself and many other vExperts: http://www.vmworld.com/docs/DOC-6032. There is also a large list of bloggers that will be onsite and providing content as well: VMworld Blogger Coverage.

RDM Conversion Pain Points…

The latest infrastructure I’ve inherited is loaded full with RDMs. My first order of business was to get rid of them, especially since we aren’t using them for any reason other than a possible performance improvement.

The steps we’ve been taking is to get rid of them:

  • Convert from a physical RDM to a virtual RDM
    • Shut down system
    • Take note of SCSI information
    • SCSI Settings

    • Remove and Delete from Disk
    • Apply
    • Re-add the RDM as a virutal RDM instead
  • Perform a Storage Migration from one datastore to any other datastore, specifically move the virtual RDM
  • Once complete, check the settings on the VM and verify that the hard disk is listed as “virtual disk”

A couple of the pain points we’ve run into:

Removing and deleting of the physical RDMs did not work as planned. Roughly 10% of the VMs ran into a problem where the pointer files were not properly removed and therefore the RDMs could not be remapped as virtual RDMs. We could still add a hard disk and point it at the pointer files and it properly added back to the VM. We tried rescanning HBAs, we tried different SCSI controllers, etc.

Finally, we figured out that by going into the datastore and manually deleting the pointer files and then vMotioning the VMs to another ESXi host within the cluster, we could then add a new RDM to those previously used RDMs.

In the case of Storage vMotioning the virtual RDMs to a new datastore, if we SvMotioned the RDM to a Storage DRS datastore cluster it only moved the pointer files. If we went through and checked the “Disable Storage DRS” option and selected an individual VMFS datastore, it did the conversion over to VMDK. Adds an extra step, but still gets the job done.
Disable Storage DRS

Only a 100+ more RDMs to go… Good times.

SRM: vSphere Replicated VMs stuck in a “Sync” status

Here recently I’ve noticed that there is an occasional time where the VMs I have replicating using the vSphere Replication system are stuck in a “Sync” status for an overly long time.

After pulling the logs, I was able to figure out what was happening… Timeouts, lots of them. The log file vmware-dr.log pulled from the remote site was full of lines like the following: (local is the SRM server, peer is the vCenter server)

2012-04-02T07:35:04.077-04:00 [02784 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCPStreamWin32(socket=TCP(fd=2596) local=10.xx.xx.xxx:9085, peer=10.xx.xx.xxx:55039)

2012-04-02T11:54:34.159-04:00 [02744 verbose 'Licensing'] Asset in sync.
2012-04-02T11:58:12.527-04:00 [02868 info 'LocalVC' opID=ac2d1cb] [PCM] Received NULL results from PropertyCollector::WaitForUpdatesEx due to timeout of 900 seconds
2012-04-02T11:58:12.723-04:00 [02860 info 'LocalVC' opID=596971f7] [PCM] Received NULL results from PropertyCollector::WaitForUpdatesEx due to timeout of 900 seconds

After a brief discussion with our network engineers, it was believed that there was no problem with the connection between the local and remote site. So I took a “when in doubt, reboot” approach. I restarted the SRM service on the remote SRM server. No luck. After that, I did a “Restart Guest” on the VRS system at the remote site. After about 5 minutes, the systems started to connect and replicate again.

I’ve noticed it a lot, and I’ve heard from other people whom also manage their own SRM deployments that a reboot is a pretty good first step in troubleshooting. So keep that in mind as issues arise and troubleshooting is required.

Standalone ESXi 5 Host Upgrade

Have an ESXi host which is a standalone box? No VMware Update Manager? No vMA?

Well, they still require patches. Luckily enough, you can still use the stripped down version of the console which is included in ESXi to update it.

Start by heading out to the VMware Patches portal http://www.vmware.com/patchmgr/download.portal and download the neccessary patches for the server that needs patched.

Upload the patch zip file to a datastore that the server can talk to via either SCP or the datastore browser
Upload the patch

Next, make sure the SSH service has been started.

To do this while in the vSphere Client, click on the desired host, and click on the “Configuration” tab followed by the “Security Profile” link in the “Software” box, then click on “Properties” in the top right side.
Get to SSH Service

Highlight “SSH” and then click “Options”, after the SSH Options screen pops up, click on “Start”, then click “OK” twice to get back to the Configuration tab.
Start SSH Service
Start SSH Service

After getting connected to the ESXi host, run the command: esxcli software vib install -d *full path to uploaded zip*
Example: esxcli software vib install -d /vmfs/volumes/VMO-01 Datastore/Temp/update-from-esxi5.0-5.0_update01.zip

There should be a message showing that the update was completed successfully and that the system needs to be rebooted.
Upload the patch

If ready to reboot, type in “reboot” and the system will reboot. Just remember to check to make sure that the SSH service has been stopped when it boots back up.

One error that I ran into, if you don’t give the full path to the zip file containing the update, the patching will fail with a “MetadataDownloadError” reading:
Could not download from depot at zip:/var/log/vmware/*update name*.zip?index.xml, skipping ((‘zip:/var/log/vmware/*update name*.zip?index.xml’, ”, “Error extracting index.xml from :/var/log/vmware/*update name*.zip: [Errno 2] No such file or directory: ‘/var/log/vmware/*update name*.zip?index.xml'”))
url = zip:/var/log/vmware/*update name*.zip?index.xml
Please refer to the log file for more details.
Error Message

Once I put in the full path, it worked just fine.

SRM Troubleshooting Fun!

We finally decided to get some real disaster recovery and business continuity plans set in place and after deliberating between a couple options, we decided to go with Site Recovery Manager 5 with hypervisor based replication.

There’s been plenty of fun in setting it all up and starting the replications.

Database Configuration

Couple notes:
Default Instance is absolutely required.
Mixed Mode Authentication is also absolutely required.
Create a login, database, and a schema within the database all with the same exact name
Database Settings
As a precaution, the created login is also a sysadmin and db_owner for the created database.
Connect to the SQL Server with IP, FQDN wouldn’t work.
vCenter connection should be listed the same as in the site connection (ie. if you connected the sites via FQDN, use FQDN for the VRMS configuration).

Only after those steps, could I get VRMS to connect to the SQL server (which if you notice by the screenshot, is the same SQL server as the SRM connection).

Initial Replication Error

Now that I had green check marks across the board… While trying to set the replication on any of the VMs, I would receive this error: Call “HmsGroup.OnlineSync” for object “GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx” on Server “x.x.x.x” failed. An unknown error has occurred.
Online Sync Error

Going through the logs, I saw lots of SSL Handshake errors and general connectivity problems, so I had to go back to our networking people and have them alter the hardware firewalls to allow connectivity across the board to all the systems involved (ESXi host, vCenter, SRM server, VRMS, VRS). Once that was done, it would successfully configure the virtual machine for replication.
Replication Success

I have yet to go back to firm up all of these port rules, I’ll report back once I have it done.
Side note: I have no reason to doubt the VMware TCP/UDP KB, I just know that I was still having some connectivity problems after following it.

Replication Locking Failure

Now that the connections are all good, I have a couple VMs replicating. I go to another VM, right click, vSphere Replication, add in a schedule and then I receive an error of: Configuring Virtual Maching for Replication… failed.
VRM Server generic error. Please Check the documentation for any troubleshooting information. The detailed exception is ‘Optimistic locking failure’.
Optimistic Locking failure

After searching the documentation, I come up with a synchronize storage step error, however this is not correct for this situation as I have not yet synchronized it.

I check the system and it’s up, it’s running, the VMware Tools are installed and functioning properly, everything looks good. So I go back and try and run the vSphere Replication wizard again and I am instantly hit with an error of: Call “HmsGroup.CurrentSpec” for object “GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx” on Server “x.x.x.x” failed. An unknown error has occurred.
HmsGroup.CurrentSpec

So I started by rebooting the system, once the VMware Tools were running I did it again only to see the same error. This time, I powered off the system, removed it from inventory, added it back by the browsing the datastore for the vmx file, and powered it on. Once the VMware Tools were running, I tried it again and it worked perfectly! That was a little painful and I wished I had made notes of the timestamps to go through the logs, but it was a success nonetheless.

Large VMDK Replication Problems

With that figured out, it was time to get the VMs replicated and SRM with VRMS worked wonderfully from that point… until we got to the file servers.

I know the “2TB – 512” disk sizing rule, we found that out the hardware from having upgraded from 3.5 to 4 with some RDMs. It was not a fun experience. So all of the vmdks of our file servers are 2040GB in size.

The initial replication is successful, however the re-sync is not. It gives an event of: Replication operation error: Virtual machine is in an invalid state.

As before, the system is up, it’s running, the VMware Tools are installed and functioning properly, everything looks good. So I go into the SRM plugin and tell it to “Syncronize Now” and I receive another error of: Call “HmsGroup.OnlineSync” for object “GID-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx” on Server “x.x.x.x” failed. An unknown error has occurred.
HmsGroup.OnlineSync

So I pull down the logs by going to the Sites button, Summary tab, looking in the commands for “Export System Logs”. The important part here is to get the logs for the site giving you the error. IE. if a server at your remote site is the one failing in the message, that’s the log you’ll want.

Unfortunately the event logs contain items such as:
2012-01-04T18:58:34.533Z [F5993B70 error ‘Main’ opID=hsl-a0edc478] [0] ExcError: exception N3Vim5Fault12FileTooLarge9ExceptionE: vim.fault.FileTooLarge
2012-01-04T18:58:34.533Z [F5993B70 error ‘Main’ opID=hsl-a0edc478] [1] Code set to: Generic storage error.

I’m currently working with a VMware Support Engineer to fix this problem. There has been a bug filed, so hopefully there is some new news soon. I’ll update when I know something.

Large VMDK Replication Problems – Resolved

Received some unfortunate news from VMware Support yesterday. With vSphere Replication, it uses snapshot technology to forward the deltas to the remote site. Well, unbeknownst to me, there is actually some overhead that is applied to the VMDK file itself! So really the 2TB minus 512B should be more like 2TB minus 512B minus 16GB to make a total maximum of around 2030GB.

So once I reduced the VMDK size down to 2030GB, it replicates and updates just fine. Now the problem is how do I shrink the VMDK files…

If you want to read more, check out the VMware Knowledge Base article: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012384 and more specifically the “Calculating the overhead required by snapshot files” section towards the bottom.

Disable Replication of VMDK = Delete VMDK

This was one of the things I learned the hard way through the testing of the larger VMDK files. I happened to go through and set one of the disks that had already been replicated to be disabled from replication. From the local site vCenter, it looks like the replication was turned off, right?
Disable Replication

Unfortunately, that was not the case. I pulled up the remote vCenter and was greeted with an event that says the virtual disk was deleted!
VMDK Deleted

That was certainly a surprise. I guess I understand why it does that, but the first time I did it, the VMDK that was deleted was 2040GB in size. It took me almost 2 days to copy all of that information down!