Feeds:
Posts
Comments

Cisco today announced a new technology called Overlay Transport Virtualization (OTV).  Not a great name…not cool like vMotion or anything but it’s still pretty neat.  My background is heavily networking so I love the fact that the networking field is now working tightly with virtualization (and storage..but not today).  If you remember back when I was at Cisco Live last year I did a quick post about VMware demonstrating long-distance vMotion.  At the time the details about everything they were doing wasn’t really out there.  Now we’re starting to see the pieces of that work and their vision for what the cloud can really do.

What OTV does is that it allows you to connect two L2 domains that are separated by a L3 network.  Basically, it’ll encapsulate Layer 2 traffic inside an IP packet and ship it across the network to be let loose on the other side.  In this way you can make two logically separated data centers function as one large data center.  The beauty of OTV is that it does away with a lot of the overly complicated methods we previously used for this sort of thing.  It’s really, really simple.  The only catch is that you need Nexus 7000s to do it today.  How simple is it?  Here is all the configuration you need on one switch in your OTV mesh:

otv advertise-vlan 100-150
otv external-interface Ethernet1/1
interface Overlay0
description otv-demo
otv site-vlan 100
otv group-address 239.1.1.1 data-group-range 232.192.1.2/32

That’s six lines, including a description line.  Basically, you enable OTV and assign an external interface.  The switch, like all good little switches, keeps a MAC table for switching frames but for those MACs on the other side of the L3 network it just keeps a pointer to the IP of the far end switch instead of an interface.  It knows that when a frame destined for a MAC address on another switch arrives to encapsulate it in to an IP packet and forward it out.  The switches all talk to each other and exchange MAC information so they know who is where.  This communication of MAC information is handled via a multicast address.  Very simple, very elegant.  All done without the headaches of other tunneling or VPN technologies.

So why do I want to do this?  Think back to when I mentioned long distance vMotion.  Now I can vMotion a server from one data center to another 50 miles away.  But what happens when it gets there?  Most organizations don’t stretch VLANs between data centers due to design or telco reasons.  Now you don’t need to do that.  You can join those two separate data centers together as a larger logical data center and when that VM lands on the other side it can function normally.  But wait, there’s more!

What else do you need to make this work seamlessly?  Storage that also follows the VM.  Sure…we could do a complicated FCIP configuration or iSCSI or something…but what about active/active storage?  EMC doesn’t currently offer that, but it’s coming.  I quote from a Chad Sakac video after the VMworld demonstration:

“Option 2: Long Distance vMotion with advanced Active/Active Storage – “one that leverages technology coming from EMC around active/active storage virtualization across distance”.

Now that’ll be cool.  Like many things in the networking arena OTV will get some press over the next few days and probably be forgotten by most server people.  But in a year or two it’ll be the underlying technology that provides the cool new functionality that lets organizations have active/active data centers with full failover functionality for a fraction of what it costs today.  Anyone that thinks virtualization is slowing down needs to look around.  These are the technologies that will let us truly build these public and private clouds.

When you’re using an HP C-Class with Flex-10 adapters there is an option called Smart Link that you can enable which passes the status of the links from the chassis switch down to the blade.  This way the blade knows if its uplinks have failed and can fail over as necessary.  There is a bug where this does not always work.  We hit this today when implementing the Nexus 1000v on several C-Class chassis at a customer site.  The customer was aware of the issue and had been using Beacon Probing in VMware to detect upstream link failures, but the 1000v doesn’t support any active failure detection.  It uses the link status.

Turns out there was a new driver released in late January that fixes the problem.  It is available here.  Install this and you should get correct Smart Link notifications and proper failover.

Well, it was formally announced.  The Apple tablet…the iPad…is here.  Like every other major announcement the reviews range from the Apple fanatics that love it no matter what to those that point out every single thing it doesn’t have, or was supposed to have depending on where you get your rumors.  It doesn’t have a camera (should it?).  It’s not true high definition.  It doesn’t have USB.  It doesn’t have a memory card slot.  But I bet it’s the best tablet device we’ve ever used.

Why?  Because Apple gets it. It’s not about the feature check list.  It’s not about the things that it doesn’t do.  It’s about the things it does do well.  The same criticisms today were brought up against the iPhone when it was first released and now it’s the #1 smartphone.  Why?  Because it just works.  You can hand it to almost anyone and they can instantly use it.  The GUI is laid out intuitively.  It’s smooth, it’s polished, and it does the things the vast majority of users need to do very well.  Does it do everything?  No, but it does what it needs to do.  The same goes for Apple’s own computer systems and OSX operating system.  I find myself using my applications and not using the system or the OS as on other platforms.  You forget it’s there until you need something.  From what I’ve seen I think we’ll find the iPad the same way.  When Steve Jobs says that Apple doesn’t release something until he feels it is finished this is what I think he means.

I never thought I’d be quoting Tycho from Penny-Arcade to prove a point on my blog, but these are strange days indeed.

It’s got to be so annoying to compete with Apple, at anything really, because it’s not like they’re doing something fucking crazy. Everybody’s had these ideas before. The difference, and this is grim if you are a competitor, but the difference is that everyone else spends a lot of time (and often, money) determining why those things aren’t possible. And then it comes out, for real, only you didn’t make it.  Some other guys did.  And when you come out with what is (on paper) a better version of the same thing, maybe even multiple times over, it’s too late.  You made a “product” to compete with their “product,” tastefully arranging your regiment, only to discover that they hadn’t made a product at all – they made a narrative.  A statement about how technology should interface with a life. -Tycho

A statement about how technology should interface with a life.  This is where Apple deviates from other technology providers.  There are other tablet and slate devices out there.  A number of Windows-based tablet devices were just shown at CES.  But every time I use one of the devices from these manufacturers it feels…well….disorganized.  They get the latest version of Windows “built” for a new purpose and quickly look at their box of commodity parts and build a device.  Sometimes that works (see my upcoming review of the HP Media Smart EX495) and sometimes it doesn’t (HP TC 1000 Tablet I used a few years ago).  There is no organized ecosystem, and that’s what makes Apple’s product so great.  The downside is that to do this Apple has to maintain control of that ecosystem which locks others out.  So you get Apple’s vision of the platform where in the Windows/Linux world you get everyone’s vision.

What I’d like to see is the creation of higher-end engineering groups in some companies like HP and Dell.  Now, they’ve tried this before but it ends the same way.  They bring in someone with a great reputation, like Alienware and Dell, and try to make it as efficient and consumable as the rest of their line and end up killing the brand.  If these companies truly want to fight Apple they  need a prestigious brand, similar to Toyota and Lexus.  This brand doesn’t have to be on stage holding a device with Steve Ballmer when the next iteration of Windows is announced.  Take the time to build a system the right way and then deliver.  If users want to go to Best Buy and play the feature checklist game where Product A wins because it has 4GB of RAM for $499 while Product B only has 3GB for $499 let them.  For the rest of us start creating really good technology that integrates with how we live and work.

It’s not unusual for a columnist to write an edgy piece and buck the norms and trends of an industry.  Sometimes they truly feel this way, sometimes it’s for hits, and sometimes it’s just to be the guy that’s different and if you’re right you hope to look like a genius (I’m looking at you Mr. Dvorak).  But sometimes people are just wrong.

The InfoWorld columnist Randall Kennedy  today published an editorial entitled “Act now to avoid the Apple tablet apocalypse”.  Anyone that knows me probably already knows which side I fall on in this discussion.  I’m an Apple user and an Apple fan, but I’m not an Apple fanboy.  Apple does many things I don’t agree with and is often slow to embrace change even while racing ahead in other areas.  But this isn’t an Apple discussion.  It’s a discussion about IT’s role and the effect of disruptive technology.

“I hate disruptive technologies. They’re antithetical to all that’s sane and stable in enterprise IT. So when I hear that one out of every five tech-savvy consumers is interested in buying the as-yet-unannounced Apple tablet device, I start to squirm a bit in my chair.” -Randall Kennedy

How can anyone in IT or technology as a whole hate or be against disruptive technology?  For about seven years I was a Network Manager at a mid-sized organization and then an architect at a large financial institution and now a Solutions Architect/Technical Consultant here at Varrow.  At no time have I seen a problem with disruptive technologies.  I seek them out.  I think all good IT personnel do the same.  The purpose of IT isn’t to keep the systems running and patch the servers but instead it is to enable the organization to work as efficiently as possible.  It’s easy for us to forget our true role in the organization and get bogged down in to the day-to-day operations of the IT environment, but we must remember who our customers really are.  It’s the people that gain benefit and efficiency from using the technologies we implement and support.

“In case you think I’m overstating the threat, consider the iPhone. Almost immediately, first-generation iPhone users were turning in their clunky old Windows Mobile devices and petitioning to have their Apple gizmos accepted as standard-issue company phones.” -Randall Kennedy

If users are knocking down your door to bring in a new technology you need to look really hard at why you aren’t able to do it.  Obviously they have found something that lets them work more efficiently, which helps the company.  While at the bank I carried at a Blackberry 8830 and before that I carried almost every flavor of Windows Mobile device.  The iPhone has completely changed how I work and has greatly reduced my turnaround time on many tasks.  I don’t use an iPhone because it’s cool or because it was made by Apple.  I use one for the same reason I’m typing this on an iMac.  It’s the best tool for the job I’m doing.  In my opinion and backed up by a great deal of experience, many people in IT are happy to just float along keeping the status quo.  They don’t like disruption or technologies that cause great change.  We all know that change usually introduces risk, but without change we don’t move ahead.  If you’ve ever read this blog before you know that I’m a very strong proponent of virtualization in the data center.  Virtualization has absolutely been a disruptive technology and brought considerable benefits to the data center.  It gave us much higher consolidation while at the same time reducing physical footprint and power and cooling costs to a fraction of what they were previously.  Is this antithetical to sane and stable enterprise IT?  I think you’ll find many, many people that will argue that it’s not.

“An Apple tablet device will be even more disruptive in that consumers will insist on using the new toy as their primary computing environments. This, in turn, will force IT shops to try to shoehorn their increasingly complex enterprise desktop computing stacks onto consumer-oriented devices that were never designed to support such workloads. Basically, it’s a recipe for disaster.” -Randall Kennedy

We have no idea how disruptive an Apple tablet will be to the existing IT ecosystem.  We’ll have a clue in 24 hours and probably a good amount of information in 30 to 60 days.  If the rumors are true and the tablet is a device running the iPhone OS it may not matter as it’s not a full blown compute environment.  If it’s running a more full featured OSX-type OS it might be just another Mac in your environment.  But that’s not my problem with this statement.  Why is your IT shops enterprise desktop computing stack becoming more complex?  Today you can simplify your desktop computing environment more than ever before, except for maybe the old green screen terminals that aren’t really a desktop.  Why are you still worrying about what the user has on their desk or in their hands?  Embrace more of these antithetical disruptive technologies such as virtual desktops or application virtualization.  Lock down those remote systems with technologies such as VMware’s ACE.

“So what can IT do to thwart the coming Apple tablet-pocalypse? First, an outright ban is in order. Use whatever excuse you think carries the most weight. For example, claim that the devices are insecure, and that plugging them into the corporate network will compromise its integrity. Then seek to contain the situation by offering up an alternative tablet solution running the IT-supported and IT-approved Windows 7 operating system.” -Randall Kennedy

This right here is why users go around corporate IT.  There are very valid reasons to have corporate standards and corporate platforms but when users sense that they are being denied for the sake of being denied they find ways around you and the problem gets worse.  A couple of examples….  When I carried my Blackberry at a financial institution they locked down SMS text messaging because it couldn’t be logged.  So what did we do?  Instead of using SMS which goes (pretty much) direct user to user we all installed Google’s Talk client which sent communications through the Google servers.  Instead of giving users a logged and secured way to IM with people outside of the company they blocked it totally.  I couldn’t tell you how many SSH tunnels I saw through the firewalls (that looked like SSL connections) just for IM traffic.  Users will get around you and cause bigger problems.  Is it against policy?  Absolutely.  But they will do whatever they feel helps them get their work done most efficiently.  Work with your users, not against them.  It goes without saying you can’t do everything your users want but they must feel like you are trying and feel the reasons for which they are being denied are valid and legitimate.

“And pay special attention to the higher-profile users in the executive suites. Seed them early on with their own prophylactic, Windows-based tablet alternatives. Because if just one of these individuals manages to pick up Apple’s latest fruity abomination — and brings it into the office — you’ll never be rid of the things.” -Randall Kennedy

Good luck.  Let me tell you how this will go down.  They get a kludgy Windows tablet and go to a meeting with other executives who have Apple tablets.  They see how nice it is and then they go buy one.  You’re now supporting two platforms without any ability to plan.  This doesn’t work.  Ask those of us that have tried it.

My main argument is to work with your users, not put up walls.  New technology is coming so get ready.  If you aren’t looking forward to it you may be in the wrong career.  Disruptive technologies are NOT antithetical to stable corporate IT, they are good.  They get us thinking about new ways to tackle problems and become more efficient.  Was the PC disruptive?  Was the Internet disruptive?  Was robotics in manufacturing disruptive?  Absolutely, and I’m glad they were.  Personally, I hope the new tablet is great.  I hope it has a really good software-keyboard.  If it does I’ll have one to use for taking notes and doing simple “whiteboards” when meeting with customers.  My experiments with a small netbook and Windows tablets failed miserably due to the form factor and keyboard so I hope that my current methodology gets disrupted tomorrow.

The recent hack attempts and successes against many companies, including Google, have been the talk in many IT circles and commercial organizations lately. It was a wake up call on several different levels. For one, you have a pretty successful infiltration of some very large and important organizations. You also have Google, the best known data warehouse company and a household name, being exploited and admitting that the hackers were going for the email information of some people, namely human rights individuals. When many people in the virtualization, storage, and computing industry are talking about the future of cloud computing is this something we need to put more thought toward? Many people think so. I do too.

While I normally blog about VMware, storage, data center networking, Apple, or whatever cool gadget I’m playing with one of my key interests is in information security. I’ve held a CISSP for almost six years and am now working to finish up my MS in Information Technology at East Carolina University in the summer. While I always try to keep track of what’s going on in the infosec world I find it even more interesting when my interests in infrastructure collide with security. Now with the news of these successful attacks we get to think about the implications of hosting data in clouds managed, and secured, by others.

How often do these types of successful attacks happen? No one knows for sure since it’s unusual for many of these types of companies and agencies to publicly report a successful attack but I think it’s safe to say this isn’t the first time for many. What makes this particular case so unusual is the shear number of targets infiltrated at one time and the type of data the attackers were targeting. Corporate espionage isn’t new and while it’s probably not as exciting as the movies it does occur so it’s not a great shock that many of the attacks were against specific companies for intellectual property. Many noted that it appeared the attackers were running for their source code repositories. But, what about Google?

Google has said the attackers went after proprietary source code but they also went after specific GMail users. That’s important. It’s the first time we’ve really seen, or at least know about, an attack against a cloud service such as this. In this case, as mentioned earlier, it was to obtain information about individuals working for human rights causes and groups within China. Thankfully, they “only” were able to access information in the email header: to, from, and subject. With this information they were able to social engineer their way in to even more systems and groups.

But what does this mean for cloud computing in general? I think in general it means that there must be serious consideration put around moving some applications and types of data in to the cloud. Yes, I know that’s obvious but it needs to be said. With the push to electronic medical records here in the US many healthcare organizations are looking for ways to efficiently and inexpensively process, manage, and archive all of this data. The idea of hundreds or thousands of spinning disks in a hospital data center just holding archive data for 21 years (in some cases) is not appealing. The same can be said for many other industries. Financial companies, manufacturing, research, and others have to keep records for many years. That’s why we’re seeing more and more attention being paid to cloud services such as EMC’s Atmos. Before I think we’re all safe to do this I think there needs to be more consideration given to proper cloud security and separation within cloud providers.

Look at how Google was attacked. While initially it was suspected to be a straight attack against a vulnerable system, it really wasn’t. It was a combination of an, at the time, unknown exploit against all IE versions 6 through 8 and good old fashioned social engineering. As I have seen reported, it was initially Google executives that were targeted and once their systems were compromised the attackers continued on in to the GMail system. For those that aren’t familiar with social engineering or don’t understand the true danger of it I highly suggest you read Kevin Mitnick’s book The Art of Deception. It will show you very quickly why I think too many people are involved in most IT operations and why that presents a large risk to organizations. Kevin Mitnick’s success, if you want to call it that, wasn’t due to his technical skills but instead his ability to talk his way in to the systems he accessed. The fact that the attackers were able to move, what appears to be quickly, from an executive’s system to the back-end GMail operation isn’t that hard to understand. If you received an email from an executive at your company with something that looked official and a URL, would you click it? As soon as you did your system is now compromised. That’s the beauty of the social engineering these attackers did. The same way they got in to Google and then leveraged that person’s identity to gain access to GMail is the same way they used information gleaned from GMail to get in to human right’s organizations. If you see a few emails go back and forth between people, even just the subjects, you can probably craft one that looks good enough to tempt an unsuspecting user. Then you’re in.

One positive note in all of this, to me anyway, is that Google detected the hacks first and did it pretty quickly. That tells me they had very good systems looking for unusual activity in their systems. Would it have been better to never happen or find it within a few minutes? Sure… But the amount of traffic and types of flows within Google have to be mind boggling. While I’m not usually a betting man I’d put good money on Google putting a lot of thought in to internal security measures right now. Google has been very good at releasing information that helps others and I’d like to see some of this released, but given the sensitive nature of the information I’m not sure they can or will do that. I think this also puts a lot more pressure on the cloud vendors such as VMware and Cisco to show that cloud computing can be secure. They need to release whitepapers, reference architectures, and policy/procedure guides on securely managing the operational aspects of cloud computing. They need to not do this just to calm the cloud providers but also the end-user of those cloud services. I guarantee you that the news and fallout of these attacks will come up the next time I meet with a CIO and we discuss public clouds. They are going to want to see work done and attention paid. No one wants to be the hospital or bank that had thousands of customer records stolen from a cloud provider in a similar attack.


I understand that may not be a fair question.  In many cases there are things that just can’t be virtualized, and I don’t mean for performance reasons.  I’m talking about non-X86 workloads and applications with specialized hardware.  Don’t forget about the dreaded dongle that some apps still require!

One thing that I find very interesting to discuss with customers is their comfort level limit with virtualization.  At what point in their application tiering do they think that something couldn’t or shouldn’t be virtualized.  It’s really not much of a secret that I’m a big proponent of virtualization and going as far with it as you can is something that I find myself preaching a lot.  I do it for a number of reasons and I’m starting to see more and more people follow a similar train of thought.

From what I’ve observed there is usually a common migration to virtualization in an organization.  I refer to it in a three step progression.

  1. Consolidation
  2. Cool Features
  3. Disaster Recovery

Several years ago I was a Network Manager at a mid-sized company.  Like most we were in the midst of serious server sprawl and needed to do something about it.  Just saying “No” didn’t seem to work.  We still had a rack full of 1U HP DL360 servers for varying tasks and groups.  There were several for accounting apps that couldn’t run on the same system due to app conflict, then we had a couple with other apps that had Java conflicts….and even more for groups that just didn’t want to share resources or weren’t comfortable with it.  All of these systems would sit at 5% utilization all day long sucking up power (that we didn’t have) and eating in to cooling (that we had even less of).  This was the reason we first dipped in to virtualization and I refer to this as the consolidation phase.  It’s the way to contain server sprawl and do it on low tier applications so you aren’t risking anything major.

We still see a lot of companies in the midst of the consolidation phase but ultimately they move in to the Management phase.  This is where they virtualized the low tier apps and started to see the benefit of VMware.  They now can VMotion machines around and do maintenance without downtime.  They like VMware HA for redundancy and FT even more.  Storage VMotion allows for easy storage migrations, again with no downtime.  They also get comfortable managing, backing up, and working with VMware at this level.  They start to think “Now, wouldn’t it be cool to just VMotion the Exchange server to another server for maintenance instead of that 8 hour downtime on a weekend?”.  But they are scared…..  Things like Exchange and SQL worry them.

The final stage is the Disaster Recovery stage.  I have several customers in this right now and it’s something I talk about a lot.  In fact, I did a keynote on this very subject at the Carolinas VMware Summit in the summer.  What really pushes people to the next level isn’t core VMware functionality, it’s Site Recovery Manager.  They start looking hard at their DR strategy and what they need to do to simplify it.  They get a taste of SRM and see how easy it makes DR planning and, more importantly, testing.  They see that they can easily test their DR plan any time they want without impacting production and without taking days to build an environment and then days again after the test to tear it down.  Those Tier 2, 3, and 4 apps take no time at all in the plan, but those pesky Tier 1 apps still have an inch thick play book to cover each time the plan is tested.  There are people out there running a single VM on a single ESX server just for this capability.  They get the abstraction and portability of virtual machines while still making sure that super-app gets all the resources it wants.

So what is stopping  you from virtualizing those Tier 1 applications?  IF you say performance I ask you to check again.  In most cases people are scared about I/O performance under any virtualization product.  Look at this white paper by VMware.  A single vSphere server can do 350K IOPS!  If you have an application that needs more than that on a single server I’d like to see it.  Here is another great comparison showing Oracle native against Oracle under VMware.  That’s also a very good blog for performance related information.

So why do we see people shy away from virtualizing Tier 1 apps?  They don’t have the necessary information to make them feel comfortable doing it.  One thing we do at the start of any engagement is to gather information, and sometimes a lot of it.  We have excellent tools to go look at a customer’s applications to see what performance requirements it has.  Too many times we see people just P2Ving a large app and having serious performance problems because they didn’t do the work ahead of time.  VMware’s own Capacity Planner tool that partners can use is really good at looking at servers to gather CPU, memory, and I/O requirements.  With this information you can really architect out your environment to handle any load.  That’s the key.  You have to build a good architecture before you start virtualizing these heavy hitter applications and it’s often something that gets overlooked.  Virtualization has gotten common and with common comes complacency.   When people get complacent they overlook the details that make or break a new deployment.

Once you have the information you need and the requirements for your applications you can then start specifying the equipment and I/O infrastructure.  We have customers now going full speed with 10Gb connectivity and Fibre Channel over Ethernet (FCoE).  They do this to give those really high-end applications the I/O that they need.  While most people will read that and think “We can’t possibly afford that!” they need to look at what it really costs them to deploy applications in a legacy  model.  If your standard ESX deployment is 6 or 8 Gb Ethernet connections and 2 or 4 4Gb Fibre Channel connections what is that costing you in switches, cabling, power, cooling, and management?  You will find that these new consolidated fabric solutions are not much, if any, more expensive then deploying more of these split fabric infrastructures.

In the majority of organizations the Tier 1 apps are SQL, Oracle, and Exchange-based services.  What people miss is that these really aren’t I/O heavy.  Sure, they can do a LOT of small transactions but that’s not a problem with VMware or even “legacy” Fibre Channel connectivity.  Be smart when moving those systems to VMware by planning your I/O, CPU, and memory but also pay attention to your disk layout.  Again, another common problem we see is a Tier 1 application being thrown on a datastore in use by other VMs and causing a problem.  It’s also common to see back-end spindles shared so even though the administrator has the application on a low use datastore it’s still fighting for spindle contention.  Gathering good performance requirements and a well planned architecture will stop that problem well before anything gets deployed.

So, in conclusion, get moving on those Tier 1 apps.  If you aren’t sure how to gather reliable data on performance requirements get with a good VMware and storage partner.  They can make the difference between a successful deployment and one where you spend your nights tracking down performance issues.

I use my Mid-2009 MacBook Pro all the time. As a consultant it goes everywhere with me and I run a lot of different applications all at the same time. I’m happy with the 2.53GHz Core2Duo CPU and 4GB of RAM (though I want to up it to 8GB) the hard drive is always a problem. Having a 4GB offline mail file with Outlook can cause a bit of disk churning and even the shipping 320GB 7200RPM drive struggles when I’m doing that and other things.

So today I finally took the plunge and put a solid state drive (SSD) in my MacBook Pro. Well….this is the second time, really. The first time was a few weeks ago when I tried to install an OCZ Vertex 250GB and had a lot of problems. The drive wouldn’t even make it through an install of Snow Leopard without failing in the MacBook Pro but worked fine in my Mac Pro. After some investigating it looks like that drive with the latest firmware has a lot of issues in the latest generation MacBook Pros…so that drive went back. It’s a shame too because I liked the size (250GB) and the features such as garbage collection and TRIM support.

This time I went with the “gold standard” and got an Intel X25 G2 160GB drive. This is the state of the art SSD with Intel’s latest 34nm process. It’s not as large as the OCZ but I figured Intel would be the most compatible, and so far I’ve been happy with it. One thing to note is that there can be large differences between SSDs on the market. All SSDs are not even close to being equal. So you need to be informed when you decide to buy one. For example, most OEMs including Apple and Dell ship Samsung SSDs as a factory option but unfortunately these drives are now the slowest of the bunch. That’s why I didn’t just get one in the MBP when I ordered it. I won’t go in to a lot of detail on the technology and what’s out there because Anandtech already did a really great job here.  It’s also a good idea to read about the underlying function of SSD drives and why they can slow down over time.  Things like garbage collection and TRIM support are very valuable.  Unfortunately, while Windows 7 supports TRIM my beloved OSX does not.  I’m hoping Steve puts that in there soon so my SSD doesn’t get slower over time and require a reformat/restore.

Installation was very easy.  My MacBook Pro is the latest generation Unibody with the new 7 hour battery.  Just flip it over and use a small philips screwdriver to remove the ten screws.  The entire bottom panel comes off as one piece and you have easy access to the HD, RAM, and battery.  Notice how easy the battery is to remove?  Don’t let anyone say these new batteries are not “easily replaceable”.  Two screws come out that hold the old HD in and the new one goes in its place.  Done.  A quick reinstall of Snow Leopard and a restore of my apps and profile from Time Machine and I was ready to go.

Anyway…the performance difference is amazing.  I have an Xbench comparison showing the shipping drive and the new SSD here.  While numbers are good a video I made is far more compelling.  This video shows a script launching the following apps on each drive..the 7200RPM is on the left and the new SSD is on the right:

  • Adium
  • FireFox
  • iPhoto
  • iTunes
  • Evernote
  • Microsoft Excel
  • Microsoft Word
  • Fusion (Booting an XP VM)

The spinning drive does it in 1:31 while the new SSD does it all in 0:30.  The video is here.  This is, by far, the best performance improvement you can make to a system.  I recently sold my Mac Pro and am ordering an i7 27″ iMac and plan to give it the same treatment.

Video:

This is an update and bit of a rewrite of an earlier post I made here.  I wanted to add more detail and information now that I’ve worked with the Nexus 1000v more.

If you’ve read a lot of the Varrow blogs you’ll see information and talk about Cisco’s Nexus technology and products.  To be blunt, it can be confusing and a bit convoluted.  The hardware products, the Nexus 5000 and 7000, have been out for a little while now and we’re seeing more and more interest in those as companies see the need for high speed connectivity and the benefits of FCoE, especially now that FCoE is a standard.  But I think the Nexus 1000v is still a mystery to a lot of people.  There is a lot of “It’s really cool!” information out there but not a lot on how it really works.  One thing I’ve found is that the concept of the 1000v is very nebulous to many customers.  I can white board out how it works, the pieces, how they talk but it’s hard to “get it” without seeing it.  For that reason I created a nice demonstration video that is here.  If you’re new to the 1000v I highly suggest you check it out and see if it makes things click a little better for you.

Now let’s get in to some specifics.

The New Distributed Switch

Before getting in to the 1000v you first need to understand the new distributed switch in vSphere.  The Nexus 1000v uses this framework to provide much of its functionality.  In fact, from within vCenter the 1000v looks very much like the standard shipping distributed switch (dvSwitch).  But, while you can make changes to the dvSwitch in the vCenter GUI you can’t do that with the 1000v.  All changes to that must be done via the Nexus-OS command-line environment.  The idea behind the distributed switch is you now have one place to do the configuration and management of the network connectivity for your entire ESX cluster.  In the past you had to manually create vSwitches and Port Groups on every ESX server you brought up in the cluster.  With the distributed switch you configure your Port Groups just like you want in vCenter.  When a new ESX server moves in to the cluster and is joined to the dvSwitch (distributed virtual switch) it automatically sees the configuration in place.  It’s really great.  I’d say network configuration is the hardest part of the install for most VMware administrators and often a point of contention between the server admins and the network admins..maybe almost as bad as the contention between server guys and storage guys!

The benefit for organizations with separate teams, such as network and server admins, is that this moves “the line” back.  Before we virtualized everything the demarcation point between the server and the network was pretty simple, it was at the switch port.  Now that we’ve virtualized that has moved and sits somewhere in the ESX server so now a lot of that responsibility falls, correctly or not, on to the server admin.  They are the ones configuring networking and networking policies.  While the Nexus 1000v doesn’t move the line back to the physical switch port it gives the network team a virtual switch in the cluster that looks, feels, and acts like a hardware Cisco switch just like they are used to managing.

Components of the Nexus 1000v

When deploying the 1000v there are a few moving parts, not a lot but a couple.  The two primary pieces are the Virtual Supervisor Module (VSM) and the Virtual Ethernet Module (VEM). One thing that helps a lot of people is to imagine the 1000v like a multi-slot chassis switch such as a 6509 or a 4507R.  The VSMs are the supervisor management modules and the VEMs are the I/O port blades that provide connectivity.  The 1000v can have up to two VSMs and 64 VEMs which equates to a 66-slot chassis switch.  That’s a big switch!

So let’s look at the components in a bit more detail.  First is the VSM as it is the central management authority.

Virtual Supervisor Module

The VSM is a virtual version of a hardware supervisor module.   Some switches have redundant modules and the Nexus 1000v is no different.  The VSM runs as a virtual machine  on an ESX server in the cluster.  To provide fault tolerance you can run a second VSM in a standby role.  The secondary VSM will take over if the primary should fail.  Like a physical Supervisor Module there really isn’t any extra maintenance or management needed to run the second one.  Any configuration change on the primary is automatically replicated to the secondary.  So…I don’t see why you wouldn’t run two.  Also like many physical switches the supervisor modules do not have stateful failover, meaning they don’t share current information.  When the primary supervisor fails the secondary reboots, reads in the configuration, and starts working.

It’s important to note that the VSM is not in the data path.  That means it does not stop data flow through the cluster if the VSM should go down.  You won’t be able to make management changes but your VMs will continue to talk.  As of the latest version, 4.0(4)SV1(1),  you can now vMotion the VSMs around as long as it’s on an ESX server with a VEM installed that the VSM itself manages.  Just be sure to exclude it from DRS so you know when it moves!

Virtual Ethernet Module

This is where it gets really cool.  On each vSphere server in the cluster you install the Nexus 1000v VEM.  This is a piece of software that ties that server in to the distributed switch of the Nexus 1000v.  When you install and attach the VEM to the VSM for the cluster that ESX server’s VEM appears as a module on the switch.  So just like you log in to a Cisco chassis switch and do a “show module” you’ll do the same here.  Each ESX server will be its own module.  And that’s why it’s called a Virtual Ethernet Module.  There is an example of the “show module” output below.

How the Modules Communicate

So you have one or more VSMs installed as the brains of the switch and you also have some VEMs installed on ESX servers to act as the access port modules.  How do they talk to each other?  This is an important thing to understand as you need to get this before you start rolling out your VSMs.  Basically the Nexus 1000v uses a couple of VLANs as layer 2 communication channels.  These are the control and packet VLANs.  Their purpose is:

  • The Packet VLAN is used by protocols such as CDP, LACP, and IGMP.

The Control VLAN is used for the following:

  • VSM configuration commands to each VEM, and their responses.
  • VEM NetFlow exports are sent to the VSM, where they are then forwarded to a NetFlow Collector.
  • VEM notifications to the VSM, for example a VEM notifies the VSM of the attachment or detachment of ports to the distributed virtual switch (DVS).

Cisco recommends that the Control VLAN and Packet VLAN be separate VLANs; and that they also be on separate VLANs from those that carry data.  In the original version of the 1000v the VSMs and VEMs had to be on the same layer 2 network.  As of version 1.2 you can have layer 3 connectivity from the VSMs to the VEMs.  Just be sure all of the VEMs that are managed by these VSMs can talk via layer 3.  To put it simply, the VSM can be on another network but all the VEMs it manages must be together.  With Cisco releasing a hardware VSM appliance you’ll see more use cases for this functionality.  The vCenter server can be on a different layer 3 network than the VSM, that doesn’t matter.  Before you start deploying VSMs and VEMs you need to decide which VLANs you plan to use for packet and control and then create them on your physical switches that connect ESX servers.  To reiterate, the Control and Packet VLANs are for layer 2 communication.  This means that no IP addresses or default gateway needs to be assigned to these VLANs so just pick any unused VLAN numbers and get them going.

Deploying the Components

This particular post is not meant to be an in-depth guide to installing the 1000v.  Cisco offers their documentation here.  The point of this post is to give you a deeper understanding of how everything fits together so you better understand the function of the 1000v and how it integrates in to your environment.

Deploying the VSM is very simple.  It’s distributed as a ..ova/ovf file set and is easily added to the cluster via the vCenter client. Version 1.2 and above make installation even easier than the original version.  On the original you just imported the .ovf and booted the VSM.  Once booted you were presented the familiar “basic setup” Cisco walkthrough.  As of 1.2 you import the VSM appliance using a .ova file which walks you through a GUI wizard to configure those same items.  It just makes it simpler and faster than before.  In the wizard you’ll be asked for the packet and control VLANs as well as a “domain ID”.  This just defines the VSMs and VEMs that are working together.  Pick an unused number, I start with 1, and continue.  If you have multiple 1000v installations you’ll need to assign unique Domain IDs to each one.

Installing the VEMs isn’t much more difficult.  You have the option of doing it manually on each ESX server or using Update Manager.  The choice is yours.  Manually is easy as it’s a single command once you get the package file on the server.  There is no manual configuration needed on each VEM.  Everything will come from the VSM or vCenter.  If you scp the .vib file to the ESX server all you need to type to install it is:

esxupdate -b *.vib update

Once that is done you can check the installation by using the “vem status” command, such as:

[root@labclt-esx01 ~]# vem status

VEM modules are loaded

Switch Name    Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch0       32          10          32                1500    vmnic0
DVS Name       Num Ports   Used Ports  Configured Ports  Uplinks
nexus1kv       256         51          256               vmnic1

VEM Agent (vemdpa) is running

That’s it.  Nothing else to do on each system.  Going forward I recommend updating the VEMs using Update Manager.  You can just add a repository in to Update Manager to get patches for the VEMs.  Makes it very easy.  The VSMs take a bit more work to update…but we’ll cover that in another post soon.

The final piece is some configuration in vCenter.  You have to install the Nexus 1000v plugin as well as perform some configuration on the VSM to connect it to vCenter.  This is easily done by opening a web browser and pointing it to the IP of the VSM you installed.  From there you can install the plugin, which is just a simple XML file to allow communication between the VSMs and vCenter.  When that happens your new distributed switch appears and the real fun begins….

Beyond Installation

Assuming everything went well and all the pieces are talking you are ready for configuration.  The first step is to bring each ESX host in the cluster and assign some unused NICs to uplink ports. You can kind of think of uplinks like vSwitches in the normal vSwitch that we are all used to.  Basically, an uplink defines which traffic goes over which physical NICs.  These are how you separate traffic.  So if you want vMotion over one set of NICs and VM Network traffic over another set you’d create two uplinks and assign the appropriate NICs to each.  Here is an example config for an uplink:

port-profile type ethernet system-uplink
 vmware port-group
 switchport mode trunk
 switchport trunk allowed vlan 20-30
 no shutdown
 system vlan 10-12
 state enabled

This is a very simple uplink called system-uplink.  It is allowed to carry all VLANs so all traffic in all port-groups will flow over any NICs assigned to this uplink.  Notice the “system vlan” option.  This specifies that the special VLANs such as Control and Packet can flow over this uplink.   You need to do this if you’re going to move those to the dvSwitch, which you’ll probably eventually do.  One thing worth mentioning is the “switchport trunk allowed vlan” command.  This tells the switch which VLANs can ride of this uplink.  This is how you relate port-groups to uplinks.  So a port-group used for VM traffic on VLAN 21 will ride over this uplink and these NICs.  If another port-group is used for VM traffic or a service console on VLAN 40 it will not go over this uplink and you’d need to define another uplink that could service that VLAN.  If you assign more than one they must be in a channel group.  The Nexus 1000v does not run spanning tree and therefore doesn’t like loops and will disable the connections if it sees duplicates.

If you’re using 10Gb CNA adapters you may only have two ports on the server which means, usually, everything will ride over a single uplink with redundant connections.  If you’re concerned about bandwidth problems due to heavy traffic features such as vMotion or FT you can use QoS to alleviate that.  If you’re in a legacy type configuration with 6 or 8 Gb connections you’ll probably have multiple uplinks just like you had multiple vSwitches before the move to the dvSwitch.  Some of the best practices we’ve had for a while still apply here, they may just be implemented a little differently.

Now that all hosts are attached to the distributed switch you can start creating port groups.  Creating port groups is done from the VSM via the Cisco NX-OS command-line.  NX-OS is the Nexus OS and is very similar to IOS.  Anyone used to IOS should feel at home and will find many nice enhancements.  Within NX-OS you create Port Profiles, which turn in to port groups within vCenter and the distributed switch.  Below is an example configuration for a Port Profile used to let VMs talk to the network.

 port-profile VM_VLAN300
 vmware port-group
 switchport mode access
 switchport access vlan 300
 no shutdown
 state enabled

In many ways it is similar to a port configuration on a physical switch.  This creates a port group in vCenter named VM_VLAN300 that lets VMs talk to other devices on VLAN 300.  Below is a screenshot showing this port group available in the settings for a VM.

Screen shot 2009-07-17 at 7.35.11 PM

Behind the scenes the VMs and port groups act like Cisco devices.  As mentioned before, each ESX server is like a switch module.  For example:

vc4labsw# show mod
Mod  Ports  Module-Type                      Model              Status
---  -----  -------------------------------- ------------------ ------------
1    0      Virtual Supervisor Module        Nexus1000V         active *
3    248    Virtual Ethernet Module          NA                 ok
4    248    Virtual Ethernet Module          NA                 ok

Mod  Server-IP        Server-UUID                           Server-Name
---  ---------------  ------------------------------------  --------------------
1    10.13.7.49       NA                                    NA
3    10.13.7.53       33393935-3234-5553-4539-32344e36464b  vc4lab1.varrow.com
4    10.13.7.52       33393935-3234-5553-4539-32344e36464c  vc4lab2.varrow.com

From that you can see there are two VEMs installed.  One on vc4lab1 and the other on vc4lab2, my two vSphere servers in this lab.  When a new connection is made to the dvswitch, whether it’s a VM or something like a Service Console, that connection is given a Virtual Ethernet interface or port assignment.  The connection will always keep the same port number no matter where it lives in the cluster.  Example again:

vc4labsw# show int bri

--------------------------------------------------------------------------------
Interface     VLAN   Type Mode   Status  Reason                   MTU
--------------------------------------------------------------------------------
Veth1         300    virt access up      none                     1500
Veth2         300    virt access up      none                     1500
Veth3         1      virt access up      none                     1500
Veth4         1      virt access up      none                     1500
Veth5         400    virt access up      none                     1500
Veth6         400    virt access up      none                     1500

vc4labsw# show int ve3

Vethernet3 is up
 Port description is JN_Win2k8, Network Adapter 1
 Hardware is Virtual, address is 0050.568a.31cf
 Owner is VM "JN_Win2k8", adapter is Network Adapter 1
 Active on module 3
 VMware DVS port 352
 Port-Profile is VM_VLAN1
 Port mode is access
 Rx
 13616 Input Packets 13220 Unicast Packets
 23 Multicast Packets 373 Broadcast Packets
 1039898 Bytes
 Tx
 198323 Output Packets 87263 Unicast Packets
 13016 Multicast Packets 98044 Broadcast Packets 19 Flood Packets
 80092259 Bytes
 8 Input Packet Drops 0 Output Packet Drops

In the example I performed a “show int brief” to get a concise list.  Notice it shows the interface and VLAN but not the type of connection.  If I narrow it down and look at ve3, Virtual Ethernet 3, I see that this is my Windows 2008 server named JN_Win2K8.  Notice the information displayed.  Very similar to a port on a physical switch.  No matter where this machine goes those statistics and information follow it.

Conclusion

There isa lot of good information out there on the Nexus 1000v now.  Cisco has a very good community site here.  If you’re starting out with the new virtual switch hopefully this post was useful and filled in some of the gaps.  Take a look at the video referenced above as well as the documentation from Cisco.  Cisco has very good documentation and install guides that will walk you through a normal deployment.  The key is to extrapolate what you need from those documents and fit it in to your environment.

On Friday Cisco finally released something I’ve been waiting on…version 1.2 of the Nexus 1000v virtual switch. This should fix a bug I’ve hit as well as add some nice new features, included a GUI for the initial configuration of the VSM.  The update is, of course, free to existing customers.

The new features include:

  • GUI for initial configuration drops install time down to 7 minutes.  FAST!  It also automates some of the configuration steps such as adding in the vCenter plug-in.  There is a video showing the GUI here.
  • Virtual Service Domain define a logical group of virtual machines protected by a virtual appliance.  All the traffic entering or leaving the group will be sent to that particular virtual appliance.  Allows for integration in to APIs such as VMsafe.  Not a lot of supporting information out there on this yet, just the configuration options.
  • Cisco standard security features such as IP Source Guard to defend against IP/MAC spoofing, DHCP Snooping to detect rogue DHCP servers, and Dynamic ARP Inspection to detect ARP attacks and MAC spoofing.
  • The connection between VSMs and VEMs can now be over a layer 3 link, but all the hosts that VSM controls must be in the same layer 2 network.  So a VSM on one network can control multiple hosts on another IP network, as long as those hosts can all talk via layer 2.  While not something I’ve hit yet I think this will be more important as we see the VSM hardware appliance.
  • You can now vMotion the VSM as long as it’s on a VEM that it is managing.  Nice!  That’s a welcome addition.  Oh yeah, you still can’t let DRS move your VSMs around but I don’t see that as a big deal.
  • iSCSI Multipathing feature to allow multipathing of iSCSI data across uplinks in a port-channel.  This means the Nexus 1000v will use different paths and routes between the server and storage.
  • You can now pin vEthernet, Control, and Packet data to specific uplinks in a vPC-HM configuration.
  • Lots more little things you can see here in the Release Notes.

The big takeaway is the GUI for installation and the security enhancements that make the virtual switch even more like a full blown hardware switch.  Those people deploying virtual desktops should be especially interested in the security features.  This really gives you physical desktop/network security in the virtual world.

The upgrade process is straight forward, but has a lot of steps.  You can upgrade the VEMs via Update Manager if you want.  The install guide recommends shutting down VMs while you do this though rolling them to another host should be fine.  One interesting note is that the switch stays at the current feature level until you get everything updated and instruct the VSM to raise the feature level to version 1.2.  All that and step-by-step instructions are available here.

We’re doing an engineering day at Varrow tomorrow and I’m doing a demo of the 1000v for some of the guys so this is good timing.

Power supplies aren’t the most exciting thing to talk about…but have you ever arrived to install new gear and found that the plug in your hand didn’t match the outlet on the wall? Then it gets real exciting…. The great thing about Cisco is they have a module and a cable to make anything talk to almost anything else. The down side to that is that it can get complicated on what and how you do things. One thing we’ve found lately is to pay extra attention to specifying power when implementing Cisco Nexus 7000 switches. You need to know which power supply you are specing out plus which cables you need to order.

The Nexus 7000 line has two available power supply options:

  • 6KVA (6,000 volt) Power Supply
  • 7.5KVA (7,500 volt) Power Supply

Each power supply has two inputs…two cables, basically. The thing here is that they don’t just act as redundant power connections. Think of it as two power supplies inside each one with separate feeds. As you’ll see in a minute you don’t have to use both, but you can for redundancy and capacity. The first thing you need to do when looking at the 7K’s power supplies for your environment is to run the power calculator with the planned configuration and see what your options are. The link for that is http://tools.cisco.com/cpc/.

Let’s talk a little bit about the redundant power paths and what we can do with those. There are four different power redundancy modes you can use.

Redundancy Mode

Description

Combined

No redundancy; power available to the system is the sum of power outputs of all power supplies in the chassis

Power supply redundancy (N+1)

Guards against failure of one of the power supplies; power available to the system is the sum of the two least-rated power supplies

Input source redundancy
(grid redundancy)

Guards against failure of one input circuit (grid); for grid redundancy, each input on the power supply is connected to an independent AC feed, and power available to the system is the minimum power from either of the input sources (grids)

Power supply and input source redundancy
(full redundancy)

System default redundancy mode; guards against failure of either one power supply or one AC grid, and power available is always the minimum of input source and power supply redundancy

Some more detail:

  • Combined – The aggregate power coming in to the system will be used to power the chassis and modules. If you lose a feed or a power supply the system may go in to a fault state. No redundancy here!
  • Power Supply Redundancy – This is the default. This requires, of course, two or three installed power supplies. The available power to the system is the sum of all power supplies minus one. This way a loss of one power supply will not affect the chassis.
  • Input/Grid Redundancy – In this configuration it is assumed the each power supply has a connection to two different power grids or input sources. The system can use the amount of power provided by one grid so that the complete failure of the other does not affect the system’s functionality.
  • Full Redundancy – This is the one you want! This will size the power so that a power supply or one power grid will not affect the system. The down side here is that this option gives the least amount of usable power to the system as it has to protect against both a power supply and grid failure.

So how do these different options affect the available power to the system? Here is another great chart. Note that this chart assumes each power supply is dual connected to a 220v feed. You can use 110v on the 6KVA units..and even use both so a single power supply has a 220v feed and a 110v feed. Refer to Cisco’s documentation for those configuration, this is just to show the difference between redundancy modes.

Power Supply Type

Number of Power Supplies

Power Supply Redundancy Mode

Combined

N+1

Grid

Full

6.0kW

1

6000W

6000W*

6000W*

6000W*

2

12,000W

6000W

6000W

6000W

3

18,000W

12,000W

9000W

9000W

4

24,000W

18,000W

12,000W

12,000W

7.5kW

1

7500W

7500W*

7500W*

7500W*

2

15,000W

7500W

7500W

7500W

3

22,500W

15,000W

11,250W

11,250W

4

30,000W

22,500W

15,000W

15,000W

So you can see that adding redundancy can reduce available power. That makes sense..as you have to have that built-in overhead.

Here is a picture of the 6KVA power supply.


And here is one of the 7.5KVA power supply.


Notice the difference? The 7.5KVA power supply has to be ordered with the type of cable you want. The 6KVA unit doesn’t come with cables. You have a large selection to choose from and you can change them later. Just make sure you order the right ones! So, the takeaways here are to run your expected (and future!) configuration through the Cisco power calculator. It will spit out all the information we just went over so you know your available options and power for each redundancy mode. Then make sure you connect and power everything correctly.

Older Posts »