Ballad of the Converged IT Guy

The inestimable Greg Ferro once said that what’s needed in modern IT are men & women whose skillsets are shaped like a capital “T.”

“You’ve got to have broad experience and familiarity with various technologies, see? That’s the wide part at the top of the T. And then you’ve got to be deep on some things in your portfolio. Maybe it’s storage, or WAN or who the hell knows?. That’s the leg of the T. Right?”

Right you are Greg.

I’m paraphrasing of course, but Ferro’s description of good IT Guys (or Gals) appeals to me because it more or less describes my career in IT: I’ve touched lots of tech and gone deep in a few things.

I have breadth and depth in my portfolio, in other words. Yay me and yay for confirmation bias!

Unfortunately for me, this reality -while good in Greg’s eyes perhaps- usually results in me being labeled with the contemptible catch-all “IT Generalist.”

“You’re sort of an IT Generalist,” the recruiter says. “Is that fair?”

Sigh.

“Yes. I’m a systems guy, but really, an IT Generalist works too,” I reply.

This is how I reluctantly describe myself to others and even on the About the Author Page.

But I hate that term, “Generalist…” it’s too prosaic, too generic, too….general. I want it banished and replaced, and I don’t want to be known as an IT Generalist.

So what to replace it with? Re-writing the About the Author page with “Hi, I’m Jeff Wilson, a T-shaped IT Guy” doesn’t exactly inspire confidence and might make the reader question my sanity. Systems Engineer is nice, but doesn’t hint at my rudimentary skills at herding packets with DSCP values intact across a WAN, does it?

What I need -no, what we Generalists need!- is some sizzle for our T-shaped career story boards. We need to make IT Generalism seem sexy, without using that lame g word. What we need is a way to converge all our skills -broad and deep- into one smart, market-aware, cloud-hip, fully-qualified and routable term that…

hey wait a second.

converge.

Converged IT Guy.

That’s it.

Contains Silicon Valley buzzword? Check.

Easy to remember? Check.

Clever, and only with a little bit of smart-ass spunk? Check.

Descriptive? Not really, but better than Generalist.

Done.

I’m a Converged IT Guy. And this is my ballad.

Ballad of the Converged IT Guy

 I’ve touched lots of tech, from VoIP to SQL,

the LAMP stack & PowerShell

I don’t fear multicast or spanning tree

I once wrote a Valentine’s to LACP 

Yay though, I’m a Converged IT Guy

Block, file, object, LUNs and Vols, NFS,

but wack-wack filesharings the best
seen every file extension from east to west
Kilo, Mega, Giga,Tera 
I dedupe, replicate and compress

Yay though, I’m a Converged IT Guy

Gone deep on storage and virtualization

but change out the tapes from time to time

From Voice and an analog PBX

to Layer 4 Load Balancing and Cisco’s FEX

Yay though, I’m a Converged IT Guy

ITIL, HIPPA, PCI & SOX

Waterfall, Agile and now DevOps

Declarative, Imperative

Concatenate, quiesce, compile

Yay Though, I’m a Converged IT Guy

Lo, the whiteboard is my kryptonite

and IT Siloes are my enemy

Yay Though, I’m a Converged IT Guy

Thoughts on EVO:RAIL

So if you work in IT, and even better, if you’re in the virtualization space of IT as I am, you have to know that VMworld is happening this week.

VMworld is just about the biggest vCelebration of vTechnologies there is. Part trade-show, part pilgrimage, part vLollapalooza, VMworld is where all the sexy new vProducts are announced by VMware, makers of ESXi, vSphere, vCenter, and so many other vThings.

It’s an awesome show…think MacWorld at the height of Steve Jobs but with fewer hipsters and way more virtualization engineers. Awesome.

And I’ve never been :sadface:

And 2014’s VMworld was a doozy. You see, the vGiant announced a new 2U, four node vSphere & vSAN cluster-in-a-box hardware device called EVO:RAIL. I’ve been reading all about EVO:RAIL for the last two days and here’s what I think as your loyal Hyper-V blogger:

  • What’s in a name? Right off the bat, I was struck by the name for this appliance. EVO:RAIL…say what? What’s VMware trying to get across here? Am I to associate EVO with the fast Mitsubishi Lancers of my youth, or is this EVO in the more Manga/Anime sense of the word? Taken together, EVO:RAIL also calls to mind sci-fi, does it not? You could picture Lt. Cmdr Data talking about an EVO:RAIL to Cmdr Riker, as in “The Romulan bird of prey is outfitted with four EVO:RAIL phase cannons, against which the Enterprise’s shields stand no chance.” Speaking of guns: I also thought of the US Navy’s Railguns; long range kinetic weapons designed to destroy the Nutanix/Simplivity the enemy.
  • If you’re selling an appliance, do you need vExperts? One thing that struck me about VMware’s introduction of EVO:RAIL was their emphasis on how simple it is to rack, stack, install, deploy and virtualize. They claim the “hyper-converged” 2U box can be up and running in about 15 minutes; a full rack of these babies could be computing for you in less than 2 hours. evo1They’ve built a sexy HTML 5 GUI to manage the thing, no vSphere console or PowerCLI in sight. It’s all pre-baked, pre-configured, and pre-built for you, the small-to-medium enterprise. It’s so simple a help desk guy could set it up. So with all that said, do I still need to hire vExperts and VCDX pros to build out my virtualization infrastructure? It would appear not. Is that the message VMware is trying to convey here?
  • One SKU for the Win: I can’t be the only one that thinks buying the VMware stack is a complicated & time-consuming affair. Chris Wahl points out that EVO:RAIL is one SKU, one invoice, one price to pay, and VMware’s product page confirms that, saying you can buy a Dell EVO:RAIL or a Fujitsu EVO:RAIL, but whatever you buy, it’ll be one SKU. This is really nice. But why? VMware is famous for licensing its best-in-class features…why mess with something that’s worked so well for them?
    Shades of Azure simplicity here

    Shades of Azure simplicity here

    One could argue that EVO:RAIL is a reaction to simplified pricing structures on rival systems…let’s be honest with ourselves. What’s more complicated: buying a full vSphere and/or vHorizon suite for a new four node cluster, or purchasing the equivalent amount of computing units in Azure/AWS/Google Compute? What model is faster to deploy, from sales call to purchasing to receiving to service? What model probably requires consulting help?

    Don’t get me wrong, I think it’s great. I like simple menus, and whereas buying VMware stuff before was like choosing from a complicated, multi-page, multi-entree menu, now it’s like buying burgers at In ‘n Out. That’s very cool, but it means something has changed in vLand.

  • I love the density: As someone who’s putting the finishing touches on my own new virtualization infrastructure, I love the density in EVO:RAIL. 2 Rack Units with E5-26xx class Xeons packing 6 cores each means you can pack about 48 cores into 2U! Not bad, not bad at all. The product page also says you can have up to 16TB of stroage in those same 2U (courtesy of VSAN) and while you still need a ToR switch to jack into, each node has 2x10GbE SFP+ or Copper. Which is excellent. RAM is the only thing that’s a bit constrained; each node in an EVO:RAIL can only hold 192GB of RAM, a total of 768GB per EVO:RAIL.In comparison, my beloved 2U pizza boxes offer more density in some places, but less overall, given than 1 Pizza Box = one node. In the Supermicros I’m racking up later this week, I can match the core count (4×12 Core E5-46xx), improve upon the RAM (up to 1TB per node) and easily surpass the 16TB of storage. That’s all in 2U and all for about $15-18k.Where the EVO:RAIL appears to really shine is in VM/VDI density. VMware claims a single EVO:RAIL is built to support 100 General Purpose VMs or to support up to 250 VDI sessions, which is f*(*U#$ outstanding.
  • I wonder if I can run Hyper-V on that: Of course I thought that. Because that would really kick ass if I could.

Overall, a mighty impressive showing from VMware this week. Like my VMware colleagues, I pine for an EVO:RAIL in my lab.

I think EVO:RAIL points to something bigger though…This product marks a shift in VMware’s thinking, a strategic reaction to the changes in the marketplace. This is not just a play against Nutranix and other hyper-converged vendors, but against the simplicity and non-specialist nature of cloud Infrastructure as a Service.  This is a play against complexity in other words…this is VMware telling the marketplace that you can have best-in-class virtualization without worst-in-class licensing pain and without hiring vExperts to help you deploy it.

Tales from the Hot Lane

A few brief updates & random thoughts from the last few days on all the stuff I’ve been working on.

Refreshing the Core at work: Summer’s ending, but at work, a new season is advancing, one rack unit at a time. I am gradually racking up & configuring new compute, storage, and network as it arrives; It Is Not About the Hardware™, but since you were wondering: 64 Ivy Bridge cores and about 512GB RAM, 30TB of storage, and Nexus 3k switching.

Cisco_logoAhh, the Nexus line. Never had the privilege to work on such fine switching infrastructure. Long time admirer, first-time NX-OS user. I have a pair of them plus a Layer 3 license so the long-term thinking involves not just connecting my compute to my storage, but connecting this dense stack northbound & out via OSPF or static routes over a fault-tolerant HSRP or VRRP config.

To do that, I need to get familiar with some Nexus-flavored acronyms that aren’t familiar to me: virtual port channels (VPC), Control Plane policy (COPP), VRF, and oh-so-many-more. I’ll also be attempting to answer the question once and for all: what spanning tree mode does one use to connect a Nexus switch to a virtualization host running Hyper-V’s converged switching architecture? I’ve used portfast in the lab on my Catalyst, but the lab switch is five years old, whereas this Nexus is brand new. And portfast never struck me as the right answer, just the easy one.

To answer those questions and more, I have TAC and this excellent tome provided gratis by the awesome VAR who sold us much of the equipment.

Into the vCPU Blender goes Lync: Last Friday, I got a call from my former boss & friend who now heads up a fast-growing IT department on the coast. He’s been busy refreshing & rationalizing much of his infrastructure as well, but as is typical for him, he wants more. He wants total IT transformation, so as he’s built out his infrastructure, he laid the groundwork to go 100% Microsoft Lync 2013 for voice.

Yeah baby. Lync 2013 as your PBX, delivering dial tone to your endpoints, whether they are Bluetooth-connected PC headsets, desk phones, or apps on a mobile.

Forget software-defined networking. This is software-defined voice & video, with no special server hardware, cloud services, or any other the other typical expensive nonsense you’d see in a VoIP implementation.

If Lync 2013 as PBX is not on your IT Bucket List, it should be. It was something my former boss & I never managed to accomplish at our previous employer on Hyper-V.

Now he was doing it alone. On a fast VMware/Nexus/NetApp stack with distributed vSwitches. And he wanted to run something by me.

So you can imagine how pleased I was to have a chat with him about it.

He was facing one problem which threatened his Go Live date: Mean Opinion Score, or MOS, a simple 0-5 score Lync provides to its administrators that summarizes call quality. MOS is a subset of a hugely detailed Media Quality Summary Report, detailed here at TechNet.

thMy friend was scoring a .6 on his MOS. He wanted it to be at 4 or above prior to go-live.

So at first we suspected QoS tags were being stripped somewhere between his endpoint device and the Lync Mediation VM. Sure enough, Wireshark proved that out; a Distributed vSwitch (or was it a Nexus?) wasn’t respecting the tag, resulting in a sort of half-duplex QoS if you will.

He fixed that, ran the test again, and still: .6. Yikes! Two days to go live. He called again.

That’s when I remembered the last time we tried to tackle this together. You see, the Lync Mediation Server is sort of the real PBX component in Lync Enterprise Voice architecture. It handles signalling to your endpoints, interfaces with the PSTN or a SIP trunk, and is the one server workload that, even in 2014, I’d hesitate making virtual.

My boss had three of them. All VMs on three different VMware hosts across two sites.

I dug up a Microsoft whitepaper on virtualizing Lync, something we didn’t have the last time we tried this. While Redmond says Lync Enterprise Voice on top of VMs can work, it’s damned expensive from a virtualization host perspective. MS advises:

  • You should disable hyperthreading on all hosts.
  • Do not use processor oversubscription; maintain a 1:1 ratio of virtual CPU to physical CPU.
  • Make sure your host servers support nested page tables (NPT) and extended page tables (EPT).
  • Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance.

Talk about Harshing your vBuzz. Essentially, building Lync out virtually with Enterprise Voice forces you to go sparse on your hosts, which is akin to buying physical servers for Lync. If you don’t, into the vCPU blender goes Lync, and out comes poor voice quality, angry users, bitterness, regret and self-punishment.

Anyway, he did as advised, put some additional vCPU & memory reservations in place on his hosts, and yesterday, whilst I was toiling in the Hot Lane, he called me from Lync via his mobile.

He’s a married man just like me, but I must say his voice sounded damn sexy as it was sliced up into packets, sent over the wire, and converted back to analog on my mobile’s speaker. A virtual chest bump over the phone was next, then we said goodbye.

Another Go Live Victory (by proxy). Sweet.

Azure Outage: Yesterday’s bruising hours-long global Azure outage affected Virtual Machines, storage blobs, web services, database services and HD Insight, Microsoft’s service for big data crunching. As it unfolded, I navel-gazed, when I felt like helping. There was literally nothing I could do. Had I some crucial IaaS or PaaS in the Azure stack, I’d be shit out of luck, just like the rest. I felt quite helpless; refreshing Mary Jo’s pageyellow-exclamation-mark-in-triangle-md and the Azure dashboard didn’t help. I wondered what the problem was; it’s been a difficult week for Microsofties whether on-prem or in Azure. Had to be related to the update cycle, I thought.

On the plus side, Azure Active Directory services never went down, nor did several other services. Office 365 stayed up as well, though it is built atop separate-but-related infrastructure in my understanding.

Lastly, I pondered two thoughts: if you’re thinking of reducing your OpEx by replacing your DR strategy with an Azure Site Recovery strategy, does this change your mind? And if you’re building out Azure as your primary IaaS or PaaS, do you just accept such outages or do you plan a failback strategy?

Labworks : Towards a 100% Windows-defined Daisetta Lab: What’s next for the Daisetta Lab? Well, I have me an AMD Duron CPU, a suitable motherboard, a 1U enclosure with PSU, and three Keepin’ it RealTek NICs. Oh, I also have a case of the envies, envies for the VMware crowd and their VXLAN and NSX and of course VMworld next week. So I’m thinking of building a Network Virtualization Gateway appliance. For those keeping score at home, that would mean from Storage to Compute to Network Edge, I’d have a 100% Windows lab environment, infused with NVGRE which has more use cases than just multi-tenancy as I had thought.

Stack Builders ‘R Us

This is a really lame but (IMHO) effective drawing of what I think of as a modern small/medium business enterprise ‘stack':

stack

As you can see, just about every element of a modern IT is portrayed.

Down at the base of the pyramid, you got your storage. IOPS, RAID, rotational & ssd, snapshots, dedupes, inline compression, site to site storage replication, clones and oh me oh my…all the things we really really love are right here. It’s the Luntastic layer and always will be.

Above that, your compute & Memory. The denser the better, 2U Pizza Boxes don’t grow on trees and the business isn’t going to shell out more $$$ if you get it wrong.

Above that, we have what my networking friends would call the “Underlay network.” Right. Some cat 6, twinax, fiber, whatever. This is where we push some packets, whether to our storage from our compute, northbound out to the world, southbound & down the stack, or east/west across it. Leafs, spines, encapsulation, control & data planes, it’s all here.

And going higher -still in Infrastructure Land mind you- we have the virtualization layer. Yeah baby. This is what it’s all about, this is the layer that saved my career in IT and made things interesting again. This layer is designed to abstract all that is beneath it with two goals in mind: cost savings via efficiency gains & ease of provisioning/use.

And boy,has this layer changed the game, hasn’t it?

So if you’re a virtualization engineer like I am, maybe this is all you care about. I wouldn’t blame you. The infrastructure layer is, after all, the best part of the stack, the only part of the stack that can claim to be #Glorious.

But in my career, I always get roped in (willingly or not) into the upper layers of the stack. And so that is where I shall take you, if you let me.

Next up, the Platform layer. This is the layer where that special DBA in your life likes to live. He optimizes his query plans atop your Infrastructure layer, and though he is old-school in the ways of storage, he’s learned to trust you and your fancy QoS .vhdxs, or your incredibly awesome DRS fault-tolerant vCPUs.

Or maybe you don’t have a DBA in your Valentine’s card rotation. Maybe this is the layer at which the devs in your life, whether they are running Eclipse or Visual Studio, make your life hell. They’re always asking for more x (x= memory, storage, compute, IP), and though they’re highly-technical folks, their eyes kind of glaze over when you bring up NVGRE or VXLAN or Converged/Distributed Switching or whatever tech you heart at the layer below.

Then again, maybe you work in this layer. Maybe you’re responsible for building & maintaining session virtualization tech like RDS or XenApp, or maybe you maintain file shares, web farms, or something else.

Point is, the people at this layer are platform builders. To borrow from the automotive industry, platform guys build the car that travels on the road infrastructure guys build. It does no good for either of us if the road is bumpy or the car isn’t reliable, does it? The user doesn’t distinguish between ‘road’ and ‘car’, do they? They just blame IT.

Next up: software & service layer. Our users exist here, and so do we. Maybe for you this layer is about supporting & deploying Android & iPhone handsets and thinking about MDM. Or maybe you spend your day supporting old-school fat client applications, or pushing them out.

And finally, now we arrive to the top of the pyramid. User-space. The business.

This is where (and the metaphor really fits, doesn’t it?) the rubber meets the road ladies and gentlemen. It’s where the business user drives the car (platform) on the road (infrastructure). This is where we sink or swim, where wins are tallied and heros made, or careers are shattered and the cycle of failure>begets>blame>begets>fear>begets failure begins in earnest.

That’s the stack. And if you’re in IT, you’re in some part of that stack, whether you know it or not.

But the stack is changing. I made a silly graphic for that too. Maybe tomorrow.

Respect my Certificate Authoritah

Fellow #VFD3 Delegate and Chicago-area vExpert Eric Shanks has recently posted two great pieces on how to setup an Active Directory Certificate Authority in your home lab environment.

Say what? Why would you want the pain of standing up some certificate & security infrastructure in your home lab?

Eric explains:

Home Lab SSL Certificates aren’t exactly a high priority for most people, but they are something you might want to play with before you get into a production environment.

Exactly.

Security & Certificate infrastructure are a weak spot in my portfolio so I’ve been practicing/learning in the Daisetta Lab so that I don’t fail at work. Here’s how:

As I was building out my lab, I knew three things: I wanted a routable Fully Qualified Domain Name for my home lab, I was focused on virtualization but should also practice for the cloud and things like ADFS, and I wanted my lab to be as secure as possible (death to port 80 & NTLM!)

With those loose goals in mind, I decided I wanted Daisetta Labs.net to be legit. To have some Certificate Authority bonafides…to get some respect in the strangely federated yet authoritarian world of certificate authorities, browser and OS certificate revocations, and yellow Chrome browser warning screens.

dlabs

Too legit, too legit to quit

So I purchased a real wildcard SSL certificate from a real Certificate Authority back in March. It cost about $96 for one year, and I don’t regret it at all because I’m using it now to secure all manner of things in Active Directory, and I’ll soon be using it as Daisetta Labs.net on-prem begins interfacing with DaisettaLabs.net in Azure (it already is, via Office 365 DirSync, but I need to get to the next level and the clock is ticking on the cert).

Building on Eric’s excellent posts, I suggest to any Microsoft-focused IT Pros that you consider doing what I did. I know it sucks to shell out money for an SSL certificate, but labwork is hard so that work-work isn’t so hard.

So, go follow Eric’s outline, buy a cert, wildcard or otherwise (got mine at Comodo, there’s also an Israeli CA that gives SSL certs for free, but it’s a drawn-out process) and stand up a subordinate CA (as opposed to a on-prem only Root CA) and get your 443 on!

Man it sucks to get something so fundamentally wrong. Reader Chris pointed out a few inaccuracies and mistakes about my post in the comments below.

At first I was indignant, then thoughtful & reflective, and finally resigned. He’s right. I built an AD Root -not a subortinate as that’s absurd- Certificate Authority in the lab.

Admittedly, I’m not strong in this area. Thanks to Chris for his coaching and I regret if I mislead anyone.

#StorageGlory Achieved : 30 Days on a Windows SAN

 Behold, these three remain. File. Block. Object. And the greatest of these is block.  – Sr. Systems Engineer St. Paul, in a letter to confused storage engineers in Thessalonika

Right. So a couple weeks back I teased the hardware specs of the new storage array I built for the Daisetta Lab at home.

Software-defined. x86. File and block. Multipath. Intel. And some Supermicro. Storage utopia up in the Daisetta Lab

Software-defined. x86. File and block. Multipath. Intel. And some Supermicro. Storage utopia up in the Daisetta Lab

My idea was to combine all types of disks -rotational 3.5″ & 2.5″ drives, SSDs, mSATAs, hell, I considered USB- into one tight, well-built storage box for my lab and home data needs. A sort of Storage Ark, if you will; all media types were welcome, but only if they came in twos (for mirroring & Parity sake, of course) and only if they rotated at exactly 7200 RPM and/or leveled their wears evenly across the silica.

And onto this unholy motley crue of hard disks I slapped a software architecture that promised to abstract all the typical storage driver, interface, and controller nonsense away, far, far away in fact, to a land where the storage can be mixed, the controllers diverse, and by virtue of the software-definition bits, network & hypervisor agnostic. In short, I wanted to build an agnostic #StorageGlory box in the Daisetta Lab.

Right. So what did I use to achieve this? ZFS and Zpools?

Hell no, that’s so January.

VSAN? Ha! I’m no Chris Wahl.

I used Windows, naturally.

That’s right. Windows. Server 2012 R2 to be specific, running Core + Infrastructure GUI with 8GB of RAM, and some 17TB of raw disk space available to it. And a little technique developed by the ace Microsoft server team called Tiered Storage Spaces.

Was a #StorageGlory Achievement Unlocked, or was it a dud?

Here’s my review after 30 days on my Windows SAN: san.daisettalabs.net.

The Good

It doesn’t make you pick a side in either storage or storage-networking: Do you like abstracted pools of storage, managed entirely by software? Put another way, do you hate your RAID controller and crush on your old-school NetApp filer, which seemingly could do everything but object storage?

When I say block, do you instinctively say file? Or vice-versa?

Well then my friend, have I got a storage system for your lab (and maybe production!) environment: Windows Storage Spaces (now with Tiering!) offers just about everything guys like you or me need in a storage system for lab & home media environments. I love it not just because it’s Microsoft, but also because it doesn’t make me choose between storage & storage-networking paradigms. It’s perhaps the ultimate agnostic storage technology, and I say that as someone who thinks about agnosticism and storage.

A lot.

You know what I’m talking about. Maybe today, you’ll need some block storage for this VM or that particular job. Maybe you’re in a *nix state of mind and want to fiddle with NFS. Or perhaps you’re feeling bold & courageous and decide to try out VMware again, building some datastores on both iSCSI LUNs and NFS shares. Then again, maybe you want to see what SMB 3.0 3.0 is all about, the MS fanboys sure seem to be talking it up.

The point is this: I don’t care what your storage fancy is, but for lab-work (which makes for excellence in work-work) you need a storage platform that’s flexible and supportive of as many technologies as possible and is, hopefully, software-defined.

And that storage system is -hard to believe I’ll grant you- Windows Server 2012 R2.

I love storage and I can’t think of one other storage system -save for maybe NetApp- that let’s me do crazy things like store .vmdks inside of .vhdxs (oh the vIrony!), use SMB 3 multichannel over the same NICs I’m using for iSCSI traffic, create snapshots & clones just like big filers all while giving me the performance-multiplier benefits of SSDs and caching and a reasonable level of resiliency.

File this one under WackWack\StorageGlory\Achieved\Windows boys and girls.

I can do it all with Storage Spaces in 2012 R2.

As I was thinking about how to write about Storage Spaces, I decided to make a chart, if only to help me keep it straight. It’s rough but maybe you’ll find it useful as you think about storage abstraction/virtualization tech:

Storage-Compared

And yes. Ex post facto dedupe is a made up term. By me. It’s latin for “After the fact, dedupe,” because I always scheduled my dedupes for Saturday night, when the IO load on the filer was low. Ex post facto dedupe is in contrast to some newer storage companies that offer inline compression & dedupe, but none of the ones above offer this, sadly.

It’s easy to build and supports your disks & controllers: This is a Microsoft product. Which means it’s easy to deploy & build for your average server guy. Mine’s running on a very skinny, re-re-purposed SanDisk Ready Cache SSD. With Windows 2012 R2 server running the Infrastructure Management GUI (no explorer.exe, just Server Manager + your favorite snap-ins), it’s using about 6GB of space on the boot drive.

And drivers for the Intel C226 SATA controller, the LSI 9218si SAS card, and the extra ASMedia 1061 controller were all installed automagically by Windows during the build.

The only other system that came close to being this easy to install -as a server product- was Oracle Solaris 11.2 Beta. It found, installed drivers for, and exposed all controllers & disks, so I was well on my way to going the ZFS route again, but figured I’d give Windows a chance this time around.

Nexenta 4, in contrast, never loaded past the Install Community Edition screen.

It’s improved a lot over 2012: Storage Spaces almost two years ago now, and I remember playing with it at work a bit. I found it to be a mind-f*** as it was a radically different approach to storage within the Windows server context.

I also found it to be slow, dreadfully slow even, and not very survivable. Though it did accept any disk disk I gave it, it didn’t exactly like it when I removed a USB drive during an extended write test. And it didn’t take the disk back at the conclusion of the test either.

Like everything else in Microsoft’s current generation, Storage Spaces in 2012 R2 is much better, more configurable, easier to monitor, and more tolerant of disk failures.

It also has something for the IOPS speedfreak inside all of us.

Storage Spaces, abstract this away

Storage Spaces, abstract this away

Tiered Storage Spaces & Adjustable write cache: Coming from ZFS & the Adaptive Replacement Cache, the ZFS Intent Log, the SLOG, and L2ARC, I was kind of hooked on the idea of using massive amounts of my ECC RAM to function as a sort of poor-mans NVRAM.

Windows can’t do that, but with Tiered Storage Spaces, you can at least drop a few SSDs in your array (in my case three x 256GB 840 EVO & one 128GB Samsung 830), mix them into your disk pool, and voila! Fast read-cache, with a Microsoft-flavored MRU/LFU algorithm of some type keeping your hottest data on the fastest disks and your old data on the cheep ‘n deep rotationals.

What’s more, going with Tiered Storage Spaces gives you a modest 1GB write cache, but as I found out, you can increase that up to 10GB.

Which i naturally did while building this guy out. I mean, who wouldn’t want more write-cache?

But there’s a huge gotcha buried in the Technet and blogposts I found about this. I wanted to pool all my disks together into as large of a single virtual disk as possible, then pack iSCSI-connected .vhdxs, SMB 3 shares, and more inside that single, durable & tiered virtual disk.What I didn’t want was several virtual disks (it helped me to think of virtual disks as a sort of Aggregate) with SMB 3 shares and vhdx files stored haphazardly between them.

Which is what you get when you adjust the write-cache size. Recall that I have a capacity of about 17TB raw among all my disks. Building a storage pool, then a virtual disk with a 10GB write cache gave me a tiered virtual disk with a maximum size of about 965GB.  More on that below.

It can be wicked fast, but so is RAID 0: Check out my standard SQLIO benchmark routine, which I run against all storage technologies that come my way. The 1.5 hour test is by no means comprehensive -and I’m not saying the IOPS counter is accurate at all (showing max values across all tests by the way)- but I like this test because it lets me kick the tires on my array, take her out for a spin, and see how she handles.

And with a “Simple” layout (no redundancy, probably equivalent to RAID 0), she handles pretty damn well, but even I’m not crazy enough to run tiered storage spaces in a simple layout config:

storage spaces

These three tests (1.5 hours each, identical setup against multiple configs) were done locally on the array, not over my home network

What’s odd is how poorly the array performed with 10GB of “Write Cache.” Not sure what happened here, but as you can see, latency spiked higher during the 10GB write cache write phase of the test than just about every other test segment.

Something to do with parity no doubt.

For my lab & home storage needs, I settled on a Mirror 2-way parity setup that gives me moderate performance with durability in mind, though not much as you’ll see below.

Making the most of my lab/home network and my NICs: Recall that I have six GbE NICs on this box. Two are built into the Supermicro board itself (Intel), and the other four come by way of a quad-port Intel I350-T4 server NIC.

Anytime you’re planning to do a Microsoft cluster in the 1GbE world, you need lots of NICs. It’s a bit of a crutch in some respects, especially in iSCSI. Typically you VLAN off each iSCSI NIC for your Hyper-V hosts and those NICs do one thing and one thing only: iSCSI, or Live Migration, or CSV etc. Feels wasteful.

But on my new storage box at home, I can use them for double-duty: iSCSI (or LM/CSV) as well as SMB 3. Yes!

Usually I turn off Client for Microsoft Networks (the SMB file sharing toggle in NIC properties) on each dedicated NIC (or vEthernet), but since I want my file cake & my block cake at the same time, I decided to turn SMB on on all iSCSI vEthernet adapters (from the physical & virtual hosts) and leave SMB on the iSCSI NICs on san.daisettalabs.net as well.

The end result? This:

Storage Networking-All of the Above Approach
nic Name VLAN IP Function
1 MGMT 100 192.168.100.15 MGMT & SMB3
2 CLNT 102 192.168.102.15 Home net & SMB3
3 iSCSI-10 10 172.16.10.x iSCSI & SMB3
4 iSCSI-11 10 172.16.11.x iSCSI & SMB3
5 iSCSI-12 10 172.16.12.x iSCSI & SMB3

That’s five, count ‘em five NICs (or discrete channels, more specifically) I can use to fully soak in the goodness that is SMB 3 multichannel, with the cost of only a slightly unsettling epistemological question about whether iSCSI NICs are truly iSCSI if they’re doing file storage protocols.

Now SMB 3 is so transparent (on by default) you almost forget that you can configure it, but there’s quite a few ways to adjust file share performance. Aidan Finn argues for constraining SMB 3 to certain NICs, while Jose Barreto details how multichannel works on standalone physical NICs, a pair in a team, and multiple teams of NICs.

I haven’t decided which model to follow (though on san.daisettalabs.net, I’m not going to change anything or use Converged switching…it’s just storage), but SMB 3 is really exciting and it’s great that with Storage Spaces, you can have high performance file & block storage. I’ve hit 420MB/sec on synchronous file copies from san to host and back again. Outstanding!

I Finally got iSNS to work and it’s…meh: One nice thing about san.daisettalabs.net is that that’s all you need to know…the FQDN is now the resident iSCSI Name Server, meaning it’s all I need to set on an MS iSCSI Initiator. It’s a nice feature to have, but probably wasn’t worth the 30 minutes I spent getting it to work (hint: run set-wmiinstance before you run iSNS cmdlets in powershell!) as iSNS isn’t so great when you have…

SMI-S, which is awesome for Virtual Machine Manager fans: SMI-S, you’re thinking, what the the hell is that? Well, it’s a standardized framework for communicating block storage information between your storage array and whatever interface you use to manage & deploy resources on your array. Developed by no less an august body than the Storage Networking Industry Association (SNIA), it’s one of those “standards” that seem like a good idea, but you can’t find it much in the wild as it were. I’ve used SMI-S against a NetApp Filer (in the Classic DoT days, not sure if it works against cDoT) but your Nimbles, your Pures, and other new players in the market get the same funny look on their face when you ask them if they support SMI-S.

“Is that a vCenter thing?” they ask.

Sigh.

Microsoft, to its credit, does. Right on Windows Server. It’s a simple feature you install and two or three powershell commands later, you can point Virtual Machine Manager at it and voila! Provision, delete, resize, and classify iSCSI LUNS on your Windows SAN, just like the big boys do (probably) in Azure, only here, we’re totally enjoying the use of our corpulent.vhdx drives, whereas in Azure, for some reason, they’re still stuck on .vhds like rookies. Haha!

Single Pane o' glass in VMM with SMI-S for the Hyper-V set

Single Pane o’ glass in VMM with SMI-S, GUIDs galore and more for the Hyper-V set

It’s a very stable storage platform for Microsoft Clustering: I’ve built a lot of Microsoft Hyper-V clusters. A lot. More than half a dozen in production, and probably three times that in dev or lab environments, so it’s like second nature to me. Stable storage & networking are not just important factors in Microsoft clusters, they are the only factors.

So how is it building out a Hyper-V cluster atop a Windows SAN? It’s the same, and different at the same time, but, unlike so many other cluster builds, I passed the validation test on the first attempt with green check marks everywhere. And weeks have gone by without a single error in the Failover Clustering snap-in; it’s great.

The Bad

It’s expensive and seemingly not as redundant as other storage tech: When you build your storage pool out of offlined disks, your first choice is going to involve (just like other storage abstraction platforms) disk redundancy. Microsoft makes it simple, but doesn’t really tell you the cost of that redundancy until later in the process.

Recall that I have 17TB of raw storage on san.daisettalabs.net, organized as follows:

Disk Type Quantity Size Format Speed Function
WD Red 2.5" with NASWARE 6 1TB 4KB AF SATA 3 5400RPM  Cheep 'n deep
Samsung 840 EVO SSD 3 256GB 512byte 250MB/read  Tiers not fears
Samsung 830 SSD 1 128GB 512byte 250MB/read  Tiers not fears
HGST 3.5" Momentus 6 2TB 512byte 105MB/r/w Cheep 'n deep

Now, according to my trusty IOPS Excel calculator, if I were to use traditional RAID 5 or RAID 6 on that set of spinners, I’d get about 16.5TB usable in the former, 15TB usable in the latter (assuming RAID penalty of 5 & 6, respectively)

For much of the last year, I’ve been using ZFS & RAIDZ2 on the set of six WD Red 2.5″ drives. Those have a raw capacity of 6TB. In RAIDZ2 (roughly analogous to RAID 6), I recall getting about 4.2TB usable.

All in all, traditional RAID & ZFS’ RAIDZ cost me between 12% and  35% of my capacity respectively.

So how much does Windows Storage Spaces resiliency model (Mirrored, 2-way parity) cost me? A lot. We’re in RAID-DP territory here people:

 

storagespaces5

Ack! With 17TB of raw storage, I get about 5.7TB usable, a cost of about 66%!

And for that, what kind of resieliency do I get?

I sure as hell can’t pull two disks simultaneously, as I did live during prod in my ZFS box. I can suffer the loss of only a single disk. And even then, other Windows bloggers point to some pain as the array tries to adjust.

Now, I’m not the brightest on RAID & parity and such, so perhaps there’s a more resilient, less costly way to use Storage Spaces with Tiering, but wow…this strikes me as a lot of wasted disk.

Not as easy to de-abstract the storage: When a disk array is under load, one of my favorite things to do is watch how the IO hits the physical elements in the array. Modern disk arrays make what your disks are doing abstract, almost invisible, but to truly understand how these things work, sometimes you just want the modern equivalent of lun stats.

In ZFS, I loved just letting gstat run, which showed me the load my IO was placing on the ARC, the L2ARC and finally, the disks. Awesome stuff:

In this Gifcam, watch ada0-6 as they struggle under load with the "Always Sync" option enabled.

In this Gifcam, watch ada0-6 as they struggle under load with the “Always Sync” option enabled.

As best as I can tell, there’s no live powershell equivalent to gstat for Storage Spaces. There are teases though; you can query your disks, get their SMART vitals, and more, but peeling away the onion layers and actually watching how Windows handles your IO would make Storage Spaces the total package.

Bottom line

So that’s about it: this is the best storage box I’ve built in the Daisetta Lab. No regrets going with Windows. The platform is mature, stable, offers very good performance, and decent resiliency, if at a high disk cost.

I’m so impressed I’ve checked my Windows SAN skepticism at the door and would run this in a production environment at a small/medium business (clustered, in the Scaled Out File Server role). Cost-wise, it’s a bargain. Check out this array: it’s the same exact Hardware a certain upstart Storage vendor I like (that rhymes with Gymbal Porridge) sells, but for a lot less!

#StorageGlory achieved. At home. In my garage.

Meet my new Storage Array

So the three of you who read this blog might be wondering why I haven’t been posting much lately.

Where’s Jeff, the cloud praxis guy & Hyper-V fanboy, who says IT pros should practice their cloud skills? you might have asked.

Well, I’ll tell you where I’ve been. One, I’ve been working my tail off at my new job where Cloud Praxis is Cloud Game Time, and two, the Child Partition, as adorable and fun as he is, is now 19 months old, and when he’s not gone down for a maintenance cycle in the crib, he’s running Parent Partition and Supervisor Module spouse ragged, consuming all CPU resources in the cluster. Wow that kid has some energy!

Yet despite that (or perhaps because of that), I found some time to re-think my storage strategy for the Daisetta Lab.

Recall that for months I’ve been running a ZFS array atop a simple NAS4Free instance, using the AMD-powered box as a multi-path iSCSI target for Cluster Shared Volumes. But continuing kernel-on-iscsi-target-service homicides, a desire to combine all my spare drives & resources into a new array, and a vacation-time cash-infusion following my exit from the last job lead me to build this for only about $600 all-in:

Software-defined. x86. File and block. Multipath. Intel. And some Supermicro. Storage utopia up in the Daisetta Lab

Software-defined. x86. File and block. Multipath. Intel. And some Supermicro. There’s some serious storage utopia up in the Daisetta Lab

Here are some superlatives and other interesting curios about this new box:

  • WP_20140705_01_19_31_ProIt was born on the 4th of July, just like ‘Merica and is as big, loud, ostentatious and overbearing as ‘Merica itself
  • I would name it ‘Merica.daisettalabs.net if the OS would accept it
  • It’s a real server. With a real Supermicro X10SAT server/workstation board. No more hacking Intel .inf files to get server-quality drivers
  • It has a real server SAS card, an LSI 9218i something or other with SAS-SATA breakout cables
  • It doesn’t make me choose between file or block storage, and is object-storage curious. It can even do NFS or SMB 3…at the same time.
  • It does ex post facto dedupe -the old model- rather than the new hot model of inline dedupe and/or compression, which makes me resent it, but only a little bit
  • It’s combining three storage chipsets -the LSI card, the Supermicro’s Intel C226, and ASMedia 1061- into one software-defined logical system. It’s abstracting all that hardware away using pools, similar to ZFS, but in a different, more sublime & elegant way.
  • It doesn’t have the ARC -ie RAM AS STORAGE- which makes me really resent it, but on the plus side, I’m only giving it 12GB of RAM and now have 16GB left for other uses.
  • It has 16 Disks : 12 rotational drives (6x1TB 5400 RPM & 6x2TB 7200RPM) and four SSDs (3x256GB Samsung 840 EVO & 1x128GB Samsung 830) and one boot drive (1x32GB SanDisk ReadyCache drive re-purposed as general SSD)
  • Total capacity RAW: nearly 19TB. Usable? I’ll let you know. Asking
    “Do I need that much?” is like asking “Does ‘Merica need to stretch from Sea to Shining Sea?” No I don’t, but yes ‘Merica does. But I had these drives in stock, as it were, so why not?
  • It uses so much energy & power that it has, in just a few days, erased any greenhouse gas savings I’ve made driving a hybrid for one year. Sorry Mother Earth, looks like I’m in your debt again
  • But seriously, under load, it’s hitting about 310 watts. At idle, 150w. Not bad all things considered. Haswell + full C states & PCIe power management work.
  • It’s built as veritable wind-tunnel as it lives the garage. In Southern California. And it’s summer. Under load, the CPU is hitting about 65C and the south-bridge flirts with 80c, but it’s stable.
  • It has six, yes, six, 1GbE Intel NICs. Two are on the motherboard, and I’m using a 4 port PCIe 2 card. And of course, I’ve enabled Jumbo Frames. I mean do you have to even ask at this point?
  • It uses virtual disks. Into which you can put other virtual disks. And even more virtual disks inside those virtual disks. It’s like Christopher Nolan designed this storage archetype while he wrote Inception…virtual disk within virtual disk within virtual disk. Sounds dangerous, but in the Daisetta Lab, Who Dares Wins!

So yeah. That’s what I’ve been up to. Geeking out a little bit like a gamer, but simultaneously taking the next step in my understanding, mastery & skilled manipulation of a critical next-gen storage technology I’ll be using at work soon.

Can you guess what that is?

Stay tuned. Full reveal & some benchmarks/thoughts tomorrow.

 

 

All the WANs are a stage

All the WANs are a Stage,

and all the packets and flows are players. 

They have their ingress and egress

from a vm here, through an F5 there, out the traffic shaper and then to the next hop

The Great Unknown, the Slash 8

Truly one packet in its time plays many routes

alas,  aggregate, balance or seek diverse routes

the packets do not

Into oblivion go the flows

when the WAN LED no longer glows

Let’s take a step together into a place unfamiliar and dark. A place that is, by all rights, strange and bewildering. A little place I like to think of as just one order of magnitude less rational than the Twilight Zone…a place few understand, and even fewer have mastered. A place just beyond my gateway, a place I really don’t care about except when I do, a place I like to call, the Wide Area Network.

That’s right. Let’s talk about the next hop. The land of BGP and OSPF and NAT and VPNs and QoS and CoS and DSCP and the “Goddamn ASA” and static routes and the “Goddamn firewall” all these words, phrases and acronyms you heard once, but dismissed as just so much babble out of the networking guy’s mouth, the one guy on your team who seems to age faster than all the others.

lacpHell, if it were up to you, Mr. Storage Networking Engineer, you’d do some LACP trunks or hook up MPIO up to that WAN and call it a day, amiright?  I mean what’s so complicated here? Of course links go down, that’s why teams (and virtual teams-of-teams!) are so cool!

But alas, all the world’s not a storage array, and all links to it are not teamed GigE interfaces with sub-millisecond latency.

And your business WAN, particularly the links to/from remote sites that comprise the RFC-1918d, encapsulated, virtual private wide-area network your typical mid-sized business with a large footprint depend on, fail far too often.

Or at least they have for me when I look back and survey the glories & wreckage of my 15 year IT career.

Verily I say unto you, the WAN is my White Whale, and I am an IT Ahab.

Here are some of the tools & techniques networking firms, engineers, architects and people way smarter than I have come up with to deal with the multiple pains of the WAN, followed by my snarky, yet honest, hurt, yet hopeful, lust-filled yet realistic view of them:

  • Multiprotocol layer switching (MPLS): The go-to solution for WAN pain, particularly for businesses that can’t/won’t employ a networking wonk equal to Mr. Ivan Pelpjnak. MPLS is a god-send for some firms, but it’s very costly. To really get value out of an MPLS strategy, you almost have to couple it with a session vritualization or in-datacenter-computing model (XenApp, RDS, VDI etc). Why? While MPLS makes the WAN as reliable and as accessible as your LAN, it doesn’t defeat latency. And latency is a hard thing to explain. Go on. Try it. On your spouse or significant other.
  • MPLS part two: And just so that I can get it off my chest…when the primary link at a branch site does go down, why do MPLS providers have such a hard time failing over to a secondary? I mean for real guys? Just keep the secondary WAN/VPN link up, or do something fancy with VRRP or VARP or something. Without a failover link, a downed-MPLS is worth less than a regular commodity internet circuit.
  • MPLS part three: In previous roles, I worried that maintenance of the MPLS became an end unto itself. I can see how this would happen, and I’ve been guilty of it myself; sometimes IT guys think in IP addresses, when they should have an eye to the future and think in FQDN, as the former is and shall forever be not routable, while the latter is the future. Underlining this point is the argument (well-supported in 2014, I think) that MPLS is, at best, a transitional technology. Build your business on it if you have to, but don’t tie anything to it, in other words. Sure it’s cloud-compatible, but so is dial up.
  • Inline Compression/dedupe: As a storage networking nerd, I Heart me some Riverbed and SilverPeak. But those are tools on the WAN that, in my experience, are just one CapEx ask too much. I’ve never actually used one of them. Love the idea, can never justify the cost. Open source alternatives? There’s really none (Except for this brave guy), speaking, perhaps, to how sophisticated and well-engineered these devices are, which justifies their cost but also makes them unobtainable for SMB shops.
  • Pertino and the like: I’ve been a fan of Pertino since I first started using this “Cloud VPN” product, which I likened more to a Layer 2 switch in the sky than a traditional VPN service. It’s some great tech; not clear that it can scale to 100s and 100s of users though. But very promising nonetheless, especially for really small but geographically-diverse environments.
  • It's just like Least Queue depth, you see, only ON YOUR WAN

    It’s just like Least Queue depth, you see, only ON YOUR WAN

  • Link aggregation + VPN all in one device: If you’re going to go hub & spoke because MPLS costs too much, or you can’t quite do full-cloud yet, this is a promising strategy, and one I’ll soon be testing out. I know I’ m not alone in the WAN-is-my-white-whale meme because companies like Peplink, Talari Networks, and even Cisco are still building products that address WAN problems. I have used Peplink before; was impressed, would use again, want one in my home with a second internet line, A+++++. The only thing that scuttled wider adoption in my last role was voice, a particularly difficult problem to sort out when you slap some good ol’ LACP-style magic onto your WAN ills. These devices, ranging from a few hundred bucks to several thousand, are almost too good to be true, as they tell the IT Pro that yes, he can have his cheap but rapidly-deploy-able commodity internet circuits aggregate into one, high speed, fault-tolerant link, and yes, that “unbreakable VPN” (as Peplink dubs it), can connect back to the HQ. Doesn’t defeat latency, true, but it sure makes the ASA look old-hat doesn’t it?
  • Cloud: The default winner, of course. But OpEx is hard to quantify. Sure, I guess I could up and move my datacenter assets to a CDN and let the network take care of the rest, or I could stand up a VM in a datacenter close to my users. But replication to on-prem assets/sources can be difficult, and, in some ways, in a really wide WAN, don’t we start worrying about version control, that what the New York branch is looking at is the same as the Seattle branch? Even so, I’m down with it, just need to fully comprehend it first.

What’s worked for you?

In defense of pizza boxes

Lately on the Twitters there has been much praise among my friends and colleagues for what I like to think of as datacenters on dollies: Cisco’s UCS, FlexPod, Dell’s vStart etc…You know what these are as I’m sure you’ve come across them: pre-configured, pre-engineered datacenters you can roll out to the datacenter floor, align carefully and then -put your back into it lads!- carefully drop onto the elevated tiles. Then you grab that bulky L1430P and jack the stack into your 220v 30 amp circuit that has A/B power and bam! #InfrastructureGlory achieved.

feat_fig1_flexpod_expressSupport’s not a concern because the storage vendor, the compute vendor, and the network vendor are simpatico under the terms of an MOU…you see, the vendors engineered it out so you don’t have to download and memorize the mezzanine architecture PDF. All you have to do now is turn it on and build some VMs in vSphere or VMM or what-have-you.

Where’s the fun in that?

Don’t get me wrong, I think UCS is awesome. I kind of want an old one in my lab.

But in my career, it’s always been pizza boxes. Standard 2U, 30″ deep enclosures housing drives & fans up front, two or four CPU sockets in the middle surrounded by gobs of RAM, and NICs…lots and lots of NICs guarding the rear.

mmmmm....pizza

mmmmm….pizza

And I wonder why that is. Maybe it’s just the market & space I tend to find employment in, but it seems to me that most IT organizations aren’t purchasing infrastructure in a strategic way…they don’t sit down at a table and say, ‘Right. Let’s buy some compute, storage, and network, let’s make it last five years, and then, this time five year’s from now, we’ll buy another stack. Hop to it lads!”

A good IT strategic planner would do that, but that’s not the reality in many organizations.

So I’ve come to love pizza boxes because they are almost infinitely configurable. Like so:

  • Say you buy five pizza boxes in year 1 but in year 2, a branch office opens and it’s suddenly very critical to get some local infrastructure on-prem. Simple: strip a node out of your handsome 10U compute cluster and drop-ship it to the branch office. Even better: you contemplated this branch when you bought the pizza boxes and pre-built a few of them with offlined but sufficiently large direct attached storage.
  • You buy a single pizza box with four sockets but only two are populated in year 1. By year three, headcount is surging and demand on the server -for whatever reason- is extraordinary. What do you do hotshot, what do you do? Easy: source some second-hand Xeons & heatsinks, drop them into the server and watch your cpu queue lengths fall (not quite in half, but still). But check your SQL licensing arrangements first and be prepared to consolidate and reduce your per-socket VMs!
  • Or maybe you need to reduce your footprint in the datacenter. If you bought pizza boxes in a strategic way, you just dump the CPUs and memory out of  node1 into node 2, node 3 into node 4 and so on. You won’t achieve the same level of VM density but maybe you don’t need to.
  • Or maybe you don’t want or need 10GbE this year; that would require new switching. But in year 2? Break a node out and drop in some PCIe SFP+ cards and Bob’s your uncle.

I guess the thing about Pizza boxes I like the most is that they are, in reality, just big, standardized PCs. They are whatever architecture you decide you want them to be in whatever circumstances you find yourself in.

A FlexPod or vStart, in contrast, feel more constricting, even if you can break an element or two out and use it in another way.  I know I’d be hesitant to break apart the UCS fabric.

You’d think a FlexPod would be perfect for small to medium enterprises, and in many cases, it is. Just not in the ones I’ve worked at, where costs are tight, strategic planning rare, and the business’ need for agility outstrips my need for convenience.

Also, isn’t it interesting that when you compute at “Google-scale” (love that term, is it still en-vogue with VARs?) or if you’re Facebook, you pick a simple & flexible architecture (in-house x86/64 pizza boxes) with very little or no shared storage at all. You pick the seemingly more primitive architecture over the highly-evolved pod architecture.

30 Days hands-on with VMTurbo’s OpsMan #VFD3

Stick figure man wants his application to run faster. #WhiteboardGlory courtesy of VM Turbo's Yuri Rabover

Stick figure man wants his application to run faster. #WhiteboardGlory courtesy of VM Turbo’s Yuri Rabover

So you may recall that back in March, yours truly, Parent Partition, was invited as a delegate to a Tech Field Day event, specifically Virtualization Field Day #3, put on by the excellent team at Gestalt IT especially for the guys guys who like V.

And you may recall further that as I diligently blogged the news and views to you, that by day 3, I was getting tired and grumpy. Wear leveling algorithms intended to prevent failure could no longer cope with all this random tech field day IO, hot spots were beginning to show in the parent partition and the resource exhaustion section of the Windows event viewer, well, she was blinking red.

And so, into this pity-party I was throwing for myself walked a Russian named Yuri, a Dr. named Schmuel and a product called a “VMTurbo” as well as a Macbook that like all Mac products, wouldn’t play nice with the projector.

You can and should read all about what happened next because 1) VMTurbo is an interesting product and I worked hard on the piece, and 2) it’s one of the most popular posts on my little blog.

Now the great thing about VMTurbo OpsMan & Yuri & Dr. Schmuel’s presentation wasn’t just that it played into my fevered fantasies of being a virtualization economics czar (though it did), or that it promised to bridge the divide via reporting between Infrastructure guys like me and the CFO & corner office finance people (though it can), or that it had lots of cool graphs, sliders, knobs and other GUI candy (though it does).

No, the great thing about VMTurbo OpsMan & Yuri & Dr. Schmuel’s presentation was that they said it would work with that other great Type 1 Hypervisor, a Type-1 Hypervisor I’m rather fond of: Microsoft’s Hyper-V.

I didn't even make screenshots for this review, so suffer through the annotated .pngs from VMTurbo's website and imagine it's my stack

I didn’t even make screenshots for this review, so suffer through the annotated .pngs from VMTurbo’s website and imagine it’s my stack

And so in the last four or five weeks of my employment with Previous Employer (PE), I had the opportunity to test these claims, not in a lab environment, but against the stack I had built, cared for, upgraded, and worried about for four years.

That’s right baby. I put VMTurbo’s economics engine up against my six node Hyper-V cluster in PE’s primary datacenter, a rationalized but aging cluster with two iSCSI storage arrays, a 6509E, and 70+ virtual machines.

Who’s the better engineer? Me, or the Boston appliance designed by a Russian named Yuri and a Dr. named Schmuel? 

Here’s what I found.

The Good

  • Thinking economically isn’t just part of the pitch: VMTurbo’s sales reps, sales engineers and product managers, several of whom I spoke with during the implementation, really believe this stuff. Just about everyone I worked with stood up to my barrage of excited-but-serious questioning and could speak literately to VMTurbo’s producer/consumer model, this resource-buys-from-that-resource idea, the virtualized datacenter as a market analogy. The company even sends out Adam Smith-themed emails (Famous economist…wrote the Wealth of Nations if you’re not aware). If your infrastructure and budget are similar to what mine were at PE, if you stress over managing virtualization infrastructure, if you fold paper again and again like I did, VMTurbo gets you.
  • Installation of the appliance was easy: Install process was simple: download a zipped .vhd (not .vhdx), either deploy it via VMM template or put the VHD into a CSV and import it, connnect it to your VM network, and start it up. The appliance was hassle-free as a VM; it’s running Suse Linux, and quite a bit of java code from what I could tell, but for you, it’s packaged up into a nice http:// site, and all you have to do is pop in the 30 day license XML key.
  • It was insightful, peering into the stack from top to nearly the bottom and delivering solid APM:  After I got the product working, I immediately made the VMturbo guys help me designate a total of about 10 virtual machines, two executables, the SQL instances supporting those .exes and more resources as Mission Critical. The applications & the terminal services VMs they run on are pounded 24 hours a day, six days a week by 200-300 users. Telling VMTurbo to adjust its recommendations in light of this application infrastructure wasn’t simple, but it wasn’t very difficult either. That I finally got something to view the stack in this way put a bounce in my step and a feather in my cap in the closing days of my time with PE. With VMTurbo, my former colleagues on the help desk could answer “Why is it slow?!?!” and I think that’s great.
  • Like mom, it points out flaws, records your mistakes and even puts a $$ on them, which was embarrassing yet illuminating: I was measured by this appliance and found wanting. VMTurbo, after watching the stack for a good two weeks, surprisingly told me I had overprovisioned -by two- virtual CPUs on a secondary SQL server. It recommended I turn off that SQL box (yes, yes, we in Hyper-V land can’t hot-unplug vCPU yet, Save it VMware fans!) and subtract two virtual CPUs. It even (and I didn’t have time to figure out how it calculated this) said my over-provisioning cost about $1200. Yikes.
  • It’s agent-less: And the Windows guys reading this just breathed a sigh of relief. But hold your golf clap…there’s color around this from a Hyper-V perspective I’ll get into below. For now, know this: VMTurbo knocked my socks off with its superb grasp & use of WMI. I love Windows Management Instrumentation, but VMTurbo takes WMI to a level I hadn’t thought of, querying the stack frequently, aggregating and massaging the results, and spitting out its models. This thing takes WMI and does real math against the results, math and pivots even an Excel jockey could appreciate. One of the VMTurbo product managers I worked with told me that they’d like to use Powershell, but powershell queries were still to slow whereas WMI could be queried rapidly.
  • It produces great reports I could never quite build in SCOM: By the end of day two, I had PDFs on CPU, Storage & network bandwidth consumption, top consumers, projections, and a good sense of current state vs desired state. Of course you can automate report creation and deliver via email etc. In the old days it was hard to get simple reports on CSV space free/space used; VMTurbo needed no special configuration to see how much space was left in a CSV
  • vFeng Shui for your virtual datacenter

    vFeng Shui for your virtual datacenter

    Integrates with AD: Expected. No surprises.

  • It’s low impact: I gave the VM 3 CPU and 16GB of RAM. The .vhd was about 30 gigabytes. Unlike SCOM, no worries here about the Observer Effect (always loved it when SCOM & its disk-intensive SQL back-end would report high load on a LUN that, you guessed it, was attached to the SCOM VM).
  • A Eureka! style moment: A software developer I showed the product to immediately got the concept. Viewing infrastructure as a supply chain, the heat map showing current state and desired state, these were things immediately familiar to him, and as he builds software products for PE, I considered that good insight. VMTurbo may not be your traditional operations manager, but it can assist you in translating your infrastructure into terms & concepts the business understands intuitively.
  • I was comfortable with its recommendations: During #VFD3, there was some animated discussion around flipping the VMTurbo switch from a “Hey! Virtualization engineer, you should do this,” to a “VMTurbo Optimize Automagically!” mode. But after watching it for a few weeks, after putting the APM together, I watched its recommendations closely. Didn’t flip the switch but it’s there. And that’s cool.
  • You can set it against your employer’s month end schedule: Didn’t catch a lot of how to do this, but you can give VMTurbo context. If it’s the end of the month, maybe you’ll see increased utilization of your finance systems. You can model peaks and troughs in the business cycle and (I think) it will adjust recommendations accordingly ahead of time.
  • Cost: Getting sensitive here but I will say this: it wasn’t outrageous. It hit the budget we had. Cost is by socket. It was a doable figure. Purchase is up to my PE, but I think VMTurbo worked well for PE’s particular infrastructure and circumstances.

The Bad:

  • No sugar coating it here, this thing’s built for VMware: All vendors please take note. If VMware, nomenclature is “vCPU, vMem, vNIC, Datastore, vMotion” If Hyper-V, nomenclature is “VM CPU, VM Mem, VMNic, Cluster Shared Volume (or CSV), Live Migration.” Should be simple enough to change or give us 29%ers a toggle. Still works, but annoying to see Datastore everywhere.
  • Interface is all flash: It’s like Adobe barfed all over the user interface. Mostly hassle-free, but occasionally a change you expected to register on screen took a manual refresh to become visible. Minor complaint.
  • Doesn’t speak SMB 3.0 yet: A conversation with one product engineer more or less took the route it usually takes. “SMB 3? You mean CIFS?” Sigh. But not enough to scuttle the product for Hyper-V shops…yet. If they still don’t know what SMB 3 is in two years…well I do declare I’d be highly offended. For now, if they want to take Hyper-V seriously as their website says they do, VMTurbo should focus some dev efforts on SMB 3 as it’s a transformative file storage tech, a few steps beyond what NFS can do. EMC called it the future of storage!
  • VFD-Logo-400x398Didn’t talk to my storage: There is visibility down to the platter from an APM perspective, but this wasn’t in scope for the trial we engaged in. Our filer had direct support, our Nimble, as a newer storage platform, did not. So IOPS weren’t part of the APM calculations, though free/used space was.

The Ugly:

  • Trusted Install & taking ownership of reg keys is required: So remember how I said VMTurbo was agent-less, using WMI in an ingenious way to gather its data from VMs and hosts alike? Well, yeah, about that. For Hyper-V and Windows shops who are at all current (2012 or R2, as well as 2008 R2), this means provisioning a service account with sufficient permissions, taking ownership of two Reg keys away from Trusted Installer (a very important ‘user’) in HKLM\CLSID and one further down in WOW64, and assigning full control permissions to the service account on the reg key. This was painful for me, no doubt, and I hesitated for a good week. In the end, Trusted Installer still keeps full-control, so it’s a benign change, and I think payoff is worth it. A Senior VMTurbo product engineer told me VMTurbo is working with Microsoft to query WMI without making the customer modify the registry, but as of now, this is required. And the Group Policy I built to do this for me didn’t work entirely. On 2008 R2 VMs, you only have to modify the one CLSID key

Soup to nuts, I left PE pretty impressed with VMTurbo. I’m not joking when I say it probably could optimize my former virtualized environment better than I could. And it can do it around the clock, unlike me, even when I’m jacked up on 5 Hour Energy or a triple-shot espresso with house music on in the background.

vmturboStepping back and thinking of the concept here and divesting myself from the pain of install in a Hyper-V context: products like this are the future of IT. VMTurbo is awesome and unique in an on-prem context as it bridges the gap between cost & operations, but it’s also kind of a window into our future as IT pros

That’s because if your employer is cloud-focused at all, the infrastructure-as-market-economy model is going to be in your future, like it or not. Cloud compute/storage/network, to a large extent, is all about supply, demand, consumption, production and bursting of resources against your OpEx budget.

What’s neat about VMTurbo is not just that it’s going to help you get the most out of the CapEx you spent on your gear, but also that it helps you shift your thinking a bit, away from up/down, latency, and login times to a rationalized economic model you’ll need in the years ahead.

Hyper-V 29% of Hypervisors shipped and Second Place Never Felt so Good

Click!!

Click!!

I couldn’t help but cheer and raise a few virtual fist bumps to the Microsoft Server 2012 and 2012 R2 team as I read the latest report out of some industry group or other. Hyper-V 3.0, you see, is cracking along with just a tick under 1/3rd of the hypervisor market.

Meanwhile, VMware -founder of the genre, much respect for the Pater v-Familias- is running about 2/3rds of virtualized datacenters.

And that’s just fine with me. 

Hyper-V is still in a distant second place. But second place never felt so good as it does right now. And we got some vMomemntum on our side, even if we don’t have feature parity, as I’ve acknowledged before. 

Hyper-V is up in your datacenter and it deserves some V.R.E.S.P.E.C.T.

Testify IDC, testify:

A growing number of shops like UMC Health System are moving more business-critical workloads to Hyper-V. In 2013, VMware accounted for 53 percent of hypervisors deployed last year, according to data released in April by IT market researcher IDC. While VMware still shipped a majority, Hyper-V accounted for 29 percent of hypervisors shipped.

The Redmond Magazine report doesn’t get into it beyond some lame analyst comments, but let me break it down for you from a practitioner point of view.

Why is Hyper-V growing in marketshare, stealing some of the vMomentum from the sharp guys at VMware?

Four reasons from a guy who’s worked it:

  • The Networking Stack: It’s not that Windows Server 2012 & 2012 R2 and, as a result, Hyper-V 3.0, have a better network stack than VMware does. It’s that the Windows server team rebuilt the entire stack between 2008 R2 & Server 2012. And it’s OMG SO MUCH BETTER that the last version. Native support for Teaming. Extensible VM switching. Superb layer 3 and layer 2 cmdlets. You can even do BGP routing with it. It’s built to work, with minimal hassle, and it’s solid on a large amount of NICs. I say that as someone who ran 2008 R2 Hyper-V clusters then upgraded the cluster to 2012 in the space of about two weekends. Trust me, if you played around with Windows Server 2008 R2 and Hyper-V and broke down in hysterics, it’s time for another look.
  • SMB 3.0 & Storage Spaces/SOFS…don’t call it CIFS and also, it’s our NFS: There’s a reason beyond the obvious why guys like Aidan Finn, the Hyper-Dutchman and DidierV are constantly praising Server Message Block Three dot Zero. It kicks ass. Out of the box, multi-channel is enabled on SMB 3.0, meaning that anytime you create a \\Hyper-V-Kicks-Ass\ file share on a server with at least two distinct IP addresses, you’re going to get two distinct channels to your share. And that scales. On Storage Spaces and its HA (and fault tolerant?) big brother Scaled out File Server: what Microsoft gave us was a method by which we could abstract our rotational & SSD disks and tier them. It’s a storage virtualization system that’s quite nifty. It’s not quite VSAN except that both Storage Spaces/SOFS & VSAN seem to share common cause: killing your SAN.

    "Turn me on!" Hyper-V says to the curious

    “Turn me on!” Hyper-V says to the curious

  • Only half the Licensing headaches of VMware: I Do Not Sign the Checks, but there’s something to be said for the fact that the features I mention above are not SKUs. They are part & parcel of Server 2012 R2 Standard. You can implement them without paying more, without getting sign-off from Accounts payable or going back to the well for more spend.Hyper-V just asks that you spend some time on Technet but doesn’t ask for more $$$ as you build a converged virtual switch.
  • It’s approachable: This has always been one of Microsoft’s strengths and now, with Hyper-V 3.0, it’s really true. My own dad -radio engineer, computer hobbyist, the original TRS-80 fan- is testing versions of radio control system software within a Windows 7 32 bit & 64 bit VM right from his Windows 8.1 Professional desktop. On the IT side: if you’re a generalist with a Windows server background, some desire to learn & challenge yourself, and, most importantly, you want to Win #InfrastructureGlory, Hyper-V is tier one hypervisor that’s approachable & forgiving if you’re just starting out in IT.

It’s also pretty damn agnostic. You can now run *BSD on it, several flavors of linux and more. And we know it scales: Hyper-V, or some variant of it, powers the X-Box One (A Hypervisor in Every Living Room achieved), it can power your datacenter, and it’s what’s in Azure.

Turning the page

WP_20140605_23_00_24_ProToday (Thursday) I voluntarily concluded my employment with a well-known Southern California company where I’ve worked as Sr. Systems Engineer for the last four years. On Monday, I open a new page in my IT Career with another firm, and I’m very excited to start.

But tonight, I’m in a mood to reminisce and reflect.

I know it’s cliche, but truly, when I consider where I was at four years ago this night compared with where I’m at professionally & personally tonight, this was the job opportunity of a lifetime. It literally lifted me out of the IT ghetto and put me on a track on which I could, if I executed properly, end up in the IT Hall of Fame, clutching my #InfrastructureGlory trophy as if it was the Stanley Cup.

And I capitalized on it in just about every way I knew how, both for myself, and for the infrastructure I fretted over constantly.

Parting is always bittersweet, but I’m resting tonight knowing that I -thanks to some IT strategery from the IT management guys who hired me- have left my former employer a higher performing, more durable, and cost effective Infrastructure stack than I had when I started.

Some superlatives & memories from my time with this company for the enjoyment of other engineers like me:

  • Proudest Engineering feat: Planning, wargamming and executing -in concert with my former boss- on an overnight virtual datacenter relocation involving two Dell R810s running Windows Server 2008 R2 & Hyper-V in Denver and four 2008 R2 nodes in Los Angeles over a 100meg Layer 2 VPWS circuit with two NetApp DoT 7.3.x filers at each end doing SnapMirrors of CSVs & RDMs by the hour, then the half-hour, then by the minute during Go-Live week. Sixty+ VMs, countless direct-mapped iSCSI LUNs, 8 vFilers & and the entire /24 subnet moved in the space of about four hours in spring 2012 with minimal consultant help in a plan I nicknamed the “Double Trident” (don’t ask). And yeah. This was in Hyper-V 2.0 days, where there was nothing awesome about Hyper-V switching.
  • Most humbling defeat: Missing a key “but….” in a Technet article about Exchange 2010 to 2013 migration. And no, it didn’t involve the basics. And yes, I’m sorry I didn’t spot the queues filling up sooner.
  • If I could make a bumper sticker from my time here: “Virtualization Engineers Find ‘em Physical and Leave ‘em Virtual,” or “Give me spindles or give me death,” or “Oh me, oh my NUMA Nodes” or, of course, “I Heart LACP”
  • Funnest project: Storage Refresh & bakeoff.  Picked the best array under the circumstances and achieved #StorageGlory. No regrets and like that Nimble is as hungry for glory & success as I am.
  • The Work/Blog effect: After storage bakeoff post, got noticed by the GestaltIT crew and invited to Virtualization Field Day #3. Sat among some incredibly sharp VMware-certified & OpenStack-familiar engineers and architects in the heart of Silicon Valley where we, in the best traditions of agnostic computing, challenged vendors on the products they try to sell guys like you and me (well, mostly guys like you if you’re VMware). And yes, we made fun of each other’s stacks. #PurpleScreenofDeath
  • Racked Gear I’ll miss the most: My old, power-hungry, 6509E and its twin WS-6748-GETX blades onto which I mapped out Hyper-V 3.0’s awesome converged switching architecture. Sure, it may not be a distributed vSwitch, but I made it purr like a kitten, and I extended iSCSi to the limit. Also, Wargamming Live Storage Migration is one of my most popular posts, so I suppose it’s a somewhat famous 6509E.
  • The 3am call that woke me up the most: Session virtualization (RDS/XenApp)
  • Dipped into dev on: .net, Visual Studio & ClickOnce architecture. Also SOAP & REST, which aren’t so dev anymore and are actually quite critical for operations guys
  • Engineering focus: Value.
  • Started With/Ended With Pairs: ESXi 4.5/Hyper-V 3.0, Motorola Droid/Lumia Icon, TDM & Analog Circuits/Cloud-hosted VoIP, 100Mbit Cisco/Gigabit Dell
  • Worst mobile phone I used for work: Toss-up. Windows Phone 7 (HTC Trophy) or Palm Pixi. But they had ActiveSync so there you are.
  • Most Favoritiest Visualization I created: A 24 hour clock arrayed against Netflow egress data on my 6509e, filtered by iSCSI & Live MIgration VLANs, with flags representing the regions as they put load on the infrastructure. Average Gb/s & GB/hr calculated with Excel Pivot tables via spider chart tool & 30 days of data, averaged out hour-by-hour. Netflow v7 & Manage Engine. Wish I hadn’t left the image on work laptop.

Those are some of my fondest memories from this employer, but of course, above & beyond the technology, the hardware, the underlay and the storage are the people. I’m leaving friends, colleagues and fellow veterans behind and it’s hard….can’t believe how thoughtful they were at my going away lunch. The photo at top is of my nameplate + one they made for me.. Hashtag Sickburn was something I ripped from The Vergecast and used liberally in our wild technology debates.

Most of all I’m thankful for this awesome time in my professional life and I wish my friends, colleagues and former colleagues the best.

On Monday I start a new chapter. I’m not sure where that leaves this blog, but I at least want to finish up my Cloud Praxis series, post a hands-on review of VMTurbo, and more so look for that over the days ahead.

Cloud Praxis lifehax: Disrupting the consumer cloud with my #Office365 E1 sub

E1, just like its big brothers E3 & E4, gives you real Microsoft Exchange 2013, just like the one at work. There’s all sorts of great things you can do with your own Exchange instance:

  • Practice your Powershell remoting skills
  • Get familiar with how Office 365 measures and applies storage settings among the different products
  • Run some decent reporting against device, browser and fat client usage

But the greatest of these is Exchange public-facing, closed-membership distribution groups.

Whazzat, you ask?

Well, it’s a distribution group. With you in it. And it’s public facing. Meaning you can create your own SMTP addresses that others can send to. And then you can create Exchange-based rules that drop those emails into a folder, deletes them after a certain time, runs scripts against them, all sorts of cool stuff before it hits your device or Outlook.

All this for my Enterprise of One, Daisetta Labs.net. For $8/month.

You might think it’s overkill to have a mighty Exchange instance for yourself, but your ability to create a public-facing distribution group is a killer app that can help you rationalize some of your cloud hassles at home and take charge & ownership of your email, which I argue, is akin to your birth certificate in the online services world.

My public facing distribution groups, por ejemplo:

distrogroups

 

There are others, like career@, blog@ and such.

The only free service that offers something akin to this powerful feature is Microosft’s own Outlook.com. If the prefixed email address is available @outlook.com, you can create aliases that are public-facing and use them in a similar way as I do.

But that’s a big if. @outlook.com names must be running low.

Another, perhaps even better use of these public-facing distribution groups: exploiting cloud offerings that aren’t dependent on a native email service like Gmail. You can use your public-facing distribution groups to register and rationalize the family cluster’s cloud stack!

app

It doesn’t solve everything, true, but it goes along way. In my case, the problem was a tough one to crack. You see, ever since the child partition emerged out of dev, into the hands of a skilled QA technician, and thence, under extreme protest, into production, I’ve struggled to capture, save & properly preserve the amazing pictures & videos stored on the Supervisor Module’s iPhone 5.

Until recently, Supe had the best camera phone in the cluster (My Lumia Icon outclasses it now). She, of course, uses Gmail so her pics are backed up in G+, but 1) I can’t access them or view them, 2) they’re downsized in the upload and 3) AutoAwesome’s gone from being cool & nifty to a bit creepy while iCloud’s a joke (though they smartly announced family sharing yesterday, I understand).

She has the same problems accessing the pictures I take of Child Partition on the Icon. She wants them all, and I don’t share much to the social media sites.

And neither one of us want to switch email providers.

So….

Consumer OneDrive via Microsoft account registered with general@mydomain.com with MFA. Checks all the Boxes. I even got 100GB just for using Bing for a month

Available on iPhone, Windows phone, desktop, etc? Check.

Easy to use, beautifully designed even? Check

Can use a public-facing distribution group SMTP address for account creation? Check

All tied into my E1 Exchange instance!

It works so well I’m using general@mydomain.com to sync Windows 8.1 between home, work & in the lab. Only thing left is to convince the Supe to use OneNote rather than Evernote.

I do the same thing with Amazon (caveat_emptor@), finance stuff, Pandora (general@), some Apple-focused accounts, basically anything that doesn’t require a native email account, I’ll re-register with an O365 public-facing distribution group.

Then I share the account credentials among the cluster, and put the service on the cluster’s devices. Now the Supe’s iPhone 5 uploads to OneDrive, which all of us can access.

So yeah. E1 & public facing distribution groups can help sooth your personal cloud woes at home, while giving you the tools & exposure to Office 365 for #InfrastructureGlory at work.

Good stuff!

vSympathy under vDuress

An engineer in a VMware shop that’s using VMware’s new VSAN converged storage/compute tech had a near 12 hour outage this week. He reports in vivid detail at Reddit, making me feel like I’m right there with him:

At 10:30am, all hell broke loose. I received almost 1000 alert emails in just a couple minutes, as every one of the 77 VM’s in the cluster began to die – high CPU, unresponsive, applications or websites not working. All of the ESXi hosts started emitting a myriad of warnings, mostly for high CPU. DRS attempted to start migrating VM’s but all of the tasks sat “In progress”. After a few minutes, two of the ESXi hosts became “disconnected” from vCenter, but the machines were still running.

Everything appeared to be dead or dying – the VM’s that didn’t immediately stop pinging or otherwise crash had huge loads as their IO requests sat and spun. Trying to perform any action on any of the hosts or VM’s was totally unresponsive and vCenter quickly filled up with “In progress” tasks, including my request to turn off DRS in an attempt to stop it from making things worse.

I’m a Hyper-V guy and (admittedly) barely comprehend what DRS is but wow. I’ve got 77 VMs in my 6 node cluster too. And I’ve been in that same position, when something unexpected…rare…almost impossible to wargame…happens and the whole cluster falls apart. For me it was an ARP storm in the physical switch thanks in part to an immature understanding 2008 R2’s virtual switching.

I’m not ashamed to say that in such situations intuition plays a part. Logs are an incomprehensible firehose and not useful and may even distract you from the real problem. Your ops manager VM, if stored within the cluster (cf observer effect) is useless, and so, what do you have?

You have what lots of us have, no matter the platform. A support contract. You spend valuable minutes explaining your situation to a guy on the phone who handles many such calls per day. Minutes, then a half hour, then a full hour tick by. The business is getting restless & voices are being raised. If your IT group has an SLA, you’re now violating it. Your pulse is rising, you’re sweating now.

So you escalate.  Engage the sales team who sold you the product..you’re desperate. This guy got a vExpert on the phone. At times, I’ve had MVPs helping me. Yet with some problems, there are no obvious answers, even for the diligent & extraordinary.

But if you’re good, you’ve a keen sense of what you know versus what you don’t know (cf Donald Rumsfeld for the win), and you know when to abandon one path in favor of another. This engineer knew exactly the timing of his outage…what he did, when he finished the  work he did, and when the outage started. Maybe he didn’t have it down in a spread and proving it empirically in court would never work, but he knew: he was thinking about what he knew during his outage, and he was putting all his knowns and unknowns together and building a model of the outage in his head.

I feel simpatico with this guy…and I’m not too proud to say that sometimes, when nothing’s left, you’ve got to run to the server room (if it’s near, which it’s not in my case or in this engineer’s case I think) and check the blinky lights on the hard drives on each of your virtualization nodes. Are they going nuts? Does it look odd? The CPUs are redlined and the putty session on the switch is slow…why’s that? ‘

Is this signal, or is this noise?

Observe the data, no matter how you come by it. Humans are good at pattern recognition. Observe all you can, and then deduce.

Bravo to this chap for doing just that and feeling -yes feeling at times- his way through the outage, even if he couldn’t solve it.

High five from a Hyper-V guy.

Cloud Praxis #4 : Syncing our Dir to Office 365

praxis4dirsync

The Apollo-Soyuz metaphor is too rich to resist. With apologies to NASA, astronauts & cosmonauts everywhere

Right. So if you’ve been following me through Cloud Praxis #1-3 and took my advice, you now have a simple Active Directory lab on your premises (Wherever that may be) and perhaps you did the right thing and purchased a domain name, then bought an Office 365 Enterprise E1 subscription for yourself. Because reading about contoso.com isn’t enough.

What am I talking about “if”. I know you did just what I recommended you do. I know because you’re with me here, working through the Cloud Praxis Program because you, like me, are an IT Infrastructurist who likes to win! You are a fellow seeker of #InfrastructureGlory, and you will pursue that ideal wherever it is, on-prem, hybrid, in the cloud, buried in a signed cmdlet, on your hybrid iSCSI array or deep inside an NVGRE-encapsulated packet, somewhere up in the Overlay.

Right. Right?

Someone tell me I’m not alone here.

You get there through this thing.

You get there through this thing.

So DirSync. Or Directory Synchronization. In the grand Microsoft tradition of product names, DirSync has about the least sexy name possible. Imagine yourself as a poor Microsoft technology reseller; you’ve just done the elevator pitch for the Glories that are to be had in Office 365 Enterprise & Azure, and your mark is interested and so he asks:

Mark: “How do I get there?”

Sales guy: “DirSync”

Mark: “Pardon me?”

Sales Guy: “DirSync.”

Mark: Are you ok? Your voice is spasming or something. Is there someone I can call?

DirSync has been around for a long, long time. I hadn’t even heard of it or considered the possibility of using it until 2012 or 2013, but while prepping the Daisetta Lab, I realized this goes back to 2008 & Microsoft Online Services.

But today, in 2014, it’s officially called Windows Azure Active Directory Sync, and though I can’t wait to GifCam you some cool powershell cmdlets that show it in action, we’ve got some prep work to do first.

Lab Prep for DirSync

As I said in Cloud Praxis #3, to really simulate your workplace, I recommend you build your on prem lab AD with a fully-routable domain name, then purchase that same name from a registrar on the internet. I said in Cloud Praxis #2 that you should have a lab computer with 16GB of RAM and you should expect to build at least two or three VMs using Client Hyper-V at the minimum.

Now’s the time to firm this all up, prep our lab. I know you’re itching to get deep into some O365, but hang on and do your due dilligence, just like you would at work.

  • Lab DHCP : What do you have as your DHCP server? If it’s a consumer-level wifi router that won’t let you assign an FQDN to your devices, consider ditching it for DHCP and stand-up a DHCP instance in your Lab Domain Controller. Your wife will never know the difference and you can ensure 1) that your VMs (whether 1 or 2 or several) get the proper FQDN suffix assigned, and 2) you can disable NetBIOS via MS DHCP
  • Get your on-prem DNS in order: This is the time to really focus on your lab DNS. I want you to test everything; make some A-records, ensure your PTRs are created automatically. Create some C-Names and test forwarding. Download a tool like Steve Gibson’s DNS Benchmark to see which public name servers are the closest to you and answer the quickest. For me, it’s Level 3. Set your forwarders appropriately. Enable logging & automatic testing
  • Build a second DC: Not strictly required, but best practice & wisdom dictates you do this ahead of DirSync. Do what I did; go with a Windows core VM for your second DC. That VM will only need 768mb of ram or so, and a 15GB .vhdx. But with it, you will have a healthier domain on-prem

Now over to O365 Enterprise portal. Read the official O365 Induction Process as I did, then take a look at the steps/suggestions below. I went through this in April; it’s easy, but the official guides leave out some color.

Office 365 Prep & Domain Port ahead of DirSync

  • Go to your registrar and assign and verify to Microsoft you own the domain via TXT record: Process here
  • Pick from the following options for DNS and read this:
    • Easy but not realistic: Just handover DNS to O365. I took the easy way admittedly. Daisetta Labs.net DNS is hosted by O365. It’s decent as DNS hosting goes, but I wouldn’t have chosen this option for my workplace as I use an Anycast DNS service that has fast CDN Propagation globally
    • More realistic: Create the required A Records, C Names, TXT and SRV records at your registrar or DNS host and point them where Microsoft says to point them
    • Balls of Steel Option: Put your Lab VM in your DMZ, harden it up, point the registrar at it and host your own DNS via Windows baby. Probably not advisable from a residential internet connection.
  • Keep your .onmicrosoft.com account for a week or two: Whether you’re starting out in O365 at work or just to learn the system like I did, you’ll need your first O365 account for a few days as the domain name porting process is a 24-36 hour process. Don’t assign your E1 licenses to your @domain.com account just yet.
  • I wouldn’t engage MFA just yet…let things settle before you turn on Multifactor authentication. Also be sure your backup email account (The oh shit account Microsoft wants you to use that’s not associated with O365) is accessible and secure.
  • Fresh start cause I couldn't build out an Exchange lab :sadface:

    Fresh start cause I couldn’t build out an Exchange lab :sadface:

    If you are simulating Exchange on-prem to hybrid for this exercise, you’ll have more steps than I did. Sadly, I had to give O365 the easy way out and selected “Fresh Start” in the process.

  • Proceed with the standard O365 wizard setups, but halt at OnRamp: I’m happy to see the Wizard configuration method is surviving in the cloud. Setting all this up won’t take long; the whole portal is pretty easy & obvious until you get to Sharepoint stuff.

Total work here is a couple of hours. I can’t stress how important your lab DNS & AD health are. You need to be rock solid in replication between your DCs, your DNS should be fast & reliably return accurate results, and you should have a good handle on your lab replication topology, a proper Sites & Services setup, and dial in your Group Policy and OU structure.

Daisetta Labs.net looks like this:

daisettalabsad

 

and dcdiag /e & repadmin show no errors.

Final Steps before DirSync Blastoff

  • With a healthy Domain on-prem, you need now to create some A Records, C-Names and TXT records so Lync, Outlook, and all your other fat clients dependent Exchange, Sharepoint and such know where to go. This is quite important; at work, you’ll run into this exact same situation.  Getting this right is why we chose to use routable domain, it’s a big chunk of the reason why we’re doing this whole Cloud Praxis thing in the first place. It’s so our users have an enjoyable and hassle-free transition to O365
  • Follow the directions here. Not as hard as it sounds. For me it went very smoothly. In fact, the O365 Enterprise portal gives you everything you need in the Domain panel, provided you’ve waited about 36 hours after porting your domain. Here’s what mine looks like on-prem after manually creating the records.

dns

And that’s it. We’re ready to Sync our Dirs to O365s Dirs, to get a little closer to #InfrastructureGlory. On one side: your on-prem AD stack, on the launch pad, in your lab ready for liftoff.

Sure, it’s a little hair-brained, admittedly, but if you’re like me, this is how you learn. And I’m learning. Aren’t you?

On the other launch pad, Office 365. Superbly architected by some Microsoft engineers, no longer joke-worthy like it was in the BPOS days, a place your infrastructure is heading to whether you like it or not.

I want you to be there ahead of all the other guys, and that’s what Cloud Praxis is all about: staying sharp on this cloud stack so we can keep our jobs and find #InfrastructureGlory.

DirSync is the first step here, and I’ll show you it on the next Cloud Praxis. Thanks for reading!