Podcast: Avoiding the Most Common Errors and Mistakes that Can Have a Debilitating Effect on IT Infrastructure Performance and Budgets, Part 2

Don't let common errors cripple your IT infrastructure performance.
Don't let common errors cripple your IT infrastructure performance.

Jeff Gilmer of Excipio Consulting understands risk in the data center. Most are a result of human error, and that’s not always in just operations. Poor planning and a lack of strategy are forms of human error that cause a lot of problems.

We delved into a lot of meaningful detail in a recent podcast with Jeff on this topic, and due to the length of that discussion we scheduled an additional conversation to make sure we were to cover more completely the most common errors and mistakes that Jeff has seen in his long career.

In this Part 2 Jeff goes into some depth on a number of issues and common mistakes made by organizations with their IT infrastructure including:

  • Having an incomplete understanding of the time and detail required to execute a data center migration, along with a lack of clarity on the new roles and functions of all those involved in IT infrastructure operations.
  • Putting all effort into a production data center migration without devoting proper thought and resources into how the disaster recovery data center will be influenced by the changes.
  • A lack of appreciation of how difficult it can be to migrate on-premise applications into a cloud environment, especially from those in upper management who are dictating such a move without proper consideration of the impact on IT and operations.

All of these issues, along with the potential repercussions, are covered in-depth in our half-hour conversation, which has a significant “real world” element in discussing subjects that are often treated in a theoretical manner.

You can listen to the conversation with Jeff in the player, and/or you can read the full transcript beneath the player.

 

 

 

Transcript 

Kevin O’Neill, Data Center Spotlight: This is Kevin O’Neill with Data Center Spotlight, and we’re here with Jeff Gilmer of Excipio Consulting, and this is part two. Jeff and I did a podcast where we talked about the issues, mistakes, and errors that organizations frequently make with their IT infrastructure and their overall infrastructure strategy, and there’s a lot to talk about, so we went along and we continued into today, but before we get started, Jeff, as an introduction to people who are just joining one of our discussions for the first time, can you tell us about Excipio Consulting, and what it is that you do there?

Jeff Gilmer, Excipio Consulting: Sure, Excipio Consulting, we are an advisory service for companies that help organizations put together IT strategic plans, business case justification, and data center areas, areas around data center lifecycle management are probably one of the largest areas that we get asked to perform strategic analysis, and help them address their current environment, whether it’s retaining their old data center, how to liquidate the assets in an aging data center, and what do I do with it? Impacts such as cloud computing, or virtualization, or colocation, or external parties, like do I build a new data center? Do I upgrade my current data center? All these questions seem to be out there within the environment of all organizations and corporations today. So we help them weed through all of those options, and identify potential strategies that might or might not work for their particular unique situation.

Data Center Spotlight: Well, and that work certainly puts you in a position to be aware of issues, mistakes, and errors that organizations are making with their IT infrastructure, and last time we talked, we got to three main topics. First was lack of a goal, or lack of a strategy, sort of a half-baked approach, and that approach being undertaken without proper evaluation. Secondly, too many people rely on vendors and providers and simply pick the service path the provider or vendor offers as opposed to insisting upon the proper solution for their actual needs, and third, they don’t spend the appropriate time in the migration process, both in planning and execution, maybe a lack of appreciation for the fact that migration is a pretty significant event, and should be treated as such. So those are the three main points of discussion as to mistakes and errors organizations make with their data center cloud and overall IT infrastructure. What do you want to start with today, Jeff? What’s the next mistake or error that is commonly made that you see?

Jeff Gilmer: Well, let’s take the last one you mentioned there, about not spending appropriate time before making a data center migration, or data center decision, and understanding the planning needed for a migration. Probably one of the biggest things that we see is when they actually get to that migration, that they completely misapproriate resources, and understanding resource demand. They feel and they believe that their day to day resources that are running the operations can also complete the migration, and in some cases that may be possible, but in most cases it’s not. I think what they really need to sit down and understand is, where do they need to provide supplemental resources to make sure that that migration happens appropriately?

And let’s just step back, let’s take a look at their day to day operations, right? They’ve got server admin people, they’ve got storage people, they’ve got network people. They’ve got certain technical resources that are operating and managing the day to day operations for their organization, and they’re dealing with support issues, they’re dealing with upgrades, they’re dealing with patching, they’re dealing maybe with security issues, and all these other areas that continue to allow the business to function. Thinking that we can take these people away and have them go through all the staging and planning and design and prep and moving and migrating and reinstallation and on and on and on, during a data center migration is you start to look at that, really has an impact on the overall operations from the day to day aspect. You just can’t pull those people off.

Secondly, there are additional roles and responsibilities that are going to be needed by those people that they forget, things such as you’re going to need a security person at the shipping dock for that equipment to go off, and you may need a security person at the shipping dock to receive it, so that you can verify and validate that nothing has been touched or tampered with that equipment as it goes from place to place. You’re going to need to have people that are there available to receive the equipment, to rack and stack the equipment at the facility. Now, that can come from the provider you’re moving to, it could come from an external organization, but really, you kind of have three segments of resources that you need to address. You need to address your day to day operations. You need to address your technical operations from a standpoint of the server, the storage, and the network connectivity from that aspect of it, which includes, if you’re going to physically move, it’s one thing. If you’re not going to physically move, you’ve got to acquire that new equipment, and rack and stack that equipment and test and install your virtualization sessions, and make sure that everything is appropriate, including your network bandwidth to sling those sessions over. So you’ve got people that need to be actually doing the transfer. You need to have people that are installing and implementing the applications and testing the applications prior to slinging the data, and activating those applications.

And then you’ve got the facilities people that are there that are operating those facilities, such as shipping, receiving, security, physical security, and probably the fifth area that you need to address is if you are going to move physical equipment, who’s the moving company? Who’s going to actually move that equipment, and do they understand moving that equipment? You don’t just go put it in the back of the pickup truck and drive it over to the new location.

Data Center Spotlight: All right, so we’re talking about migration issues, and the fact that there’s a lack of understanding as far the resources required, not only to migrate, but to handle things on a day to day basis after the migration occurs. It sounds like, Jeff, that you see a lot of details being overlooked that are integral to the sort of the production that is required out of their data center.

Jeff Gilmer: Yeah, absolutely, and even taking that one step further is the skillsets. If you really honestly step back and talk to the people that work for you today, how many of those people have actually moved or migrated a data center? It’s probably pretty rare. So you’re much better off to incorporate those people that have the expertise that are migrating data centers on a regular basis, just simple things from, what procedure have to happen first, second, third, fourth, fifth? What order? What’s the priority? What’s critical? How do you go about it? There’s just so many things that you need to have there, because think about it realistically. That data center migration does not go as planned, and that data center fails, and it shuts down, what’s going to happen to your business? It’s probably going to have a significant impact to most organizations out there today.

Data Center Spotlight: Scary though that hopefully will have people paying attention to their Ps and Qs, Jeff.

Jeff Gilmer: Absolutely.

Data Center Spotlight: What’s next? What’s the next error, mistake, or thing that people overlook in their data center?

Jeff Gilmer: Well, I’m sure, Kevin, probably in your discussions, I know in our discussions, that people are pretty comfortable today with their production data center. They gain some maturity through virtualization, they’ve gained some maturity through storage management, maybe tiered storage. They understand the bandwidth requirements, their network has come a long way in the last five years to meet some of the network demands. The production center is running very effectively. However, when you start to look at their disaster recovery, that’s a whole another ballgame for them. Where do they start, and where are they at? I would guess, Kevin, from a disaster recovery standpoint, what are you seeing out there? Are there a lot of data center organizations that are up and functioning with their DR capabilities?

Data Center Spotlight: Well, Jeff, you’re going into an area that I’ve certainly been aware of a few times, and I don’t think this is that rare a story, that the DR is a little bit of an afterthought, and maybe a little bit of an under-resourced afterthought at that, I’m sure that under-resourced is a word, but a company is taking care of modernizing their data center, they’ve taken care of the data center transformation that they wanted to achieve, but their DR hasn’t really caught up yet. Is that what you’re talking about?

Jeff Gilmer: Yeah, that’s the way I see in most cases, I really see one of two things, either one, the DR has not caught up, or two, the IT people want to make sure that the business is there, and that they can supply the demand that’s required in the case of a disaster to senior management that they overbuilt. They have 100% production, they have 100% disaster recovery, which can be very expensive. A lot of which, you’re just talking about resource demand, or making a migration, it’s no different with resource demand when you have a disaster. You are only going to have so many resources with the ability to bring up only X number of applications, or X number of services for your particular business. So, you can’t assume that just by having 100% duplicated there, you’re going to be able to recover. Let alone, every time you make a change in your production, if you have 100% duplicated to DR, you now have to make a change every time in your DR environment. So from lack of resources with the ability to recover 100% DR, with the issues of having to maintain 100% DR, with the cost and expense of having 100% DR really is not the proper way to go about it. There are better ways to go about it. We can talk about that Kevin, if you’d like.

Data Center Spotlight: Yeah, please do.

Jeff Gilmer: Okay, so well, I won’t get into too much detail, but let me just kind of give you a high level process that most organizations really to go through to understand this. The biggest thing that we find is that they don’t truly understand their critical services, their critical applications. The old saying, whoever screams the loudest kind of gets theirs first, and you can’t function in that manner. When you’re talking about building the ability to recover, whether it’s a single application that goes down, or someone with a backhoe cuts your network line and you need to recover one item, versus a complete disaster, which is very rare, but you still need to plan for it. You really have to look at the criticality of those services and those rankings to know, what do I bring up first? What do I bring up second? What really needs to be in that disaster recovery center? And by structuring that criticality analysis first, you can identify what is truly critical to go into that particular facility, and that should be driven by factual things, compliancy, PCI data, HIPAA data, CJIS data, anything related to regulatory issues, if you have certain regulatory things within the insurance, the finance, the utility, the healthcare industries. Your own insurance policies that you have, that define what you have to be able to recover. Your client contracts, any service level agreements or other agreements you might have with client contracts, anything that relates to penalties for the organization.

All of those are factual items that can be measured that you should be using to define those particular services that your organization is delivering, and what should be brought up first. And then from there, you need to start mapping it back, right? You’ve got the primary applications, you need to map that back to understand the secondary and dependent applications. You need to map it back to the infrastructure, your service, your storage, your network requirements, and then from there you can select and pick the type of facility that you really want to utilize in your organization for that data center structure, whether it’s just [[?? 12:15] sale space, whether it’s colocation, whether it’s managed colo, whether it’s a cloud solution. Until you define those requirements, you really can’t select a proper type of disaster recovery data center services that you need.

Data Center Spotlight: Okay, those are some very good points, and you made them pretty completely, Jeff, and now that we’ve talked about DR, what about cloud? It seems like there are people who when they’re talking about DR, actually, production or DR, but they seem to think that they can just throw everything into the cloud that if they transform to the cloud, and adopt the cloud, their DR will be covered by that, and we know that it really isn’t. They really seem to overestimate what can be moved to the cloud.

Jeff Gilmer: Yeah, and it’s even a bigger picture that just the DR in the cloud. I mean, there’s a wide variety of what is truly disaster recoverability with a cloud solution, but it even goes the standpoint of, how do we know what we can move to the cloud, and what really will function in a cloud environment? And I think that’s a big issue for a lot of organizations today. And you may have people in the technology area within that organization and the CEO comes down and says, I want to close the data center in six months and move everything to the cloud. Not very realistic in most points, as a matter of fact, if you look at cloud adoption, from the end of 2015, while they haven’t produced any 2016 figures, but at the end of 2015, only 20% of most organization’s applications could truly function properly within a cloud type of solution. So you’re talking 80% of them function within that solution.

So, let’s go back to the beginning of this. How do we identify, and how do we find what really can move to the cloud? That’s really the bottom line, and what’s interesting here is from a general methodology, we talked last time about the steps you need to do to prepare to make a move and migration. We just talked a little bit about the criticality and the steps that you need to do to define your disaster recovery. Well, in determining whether to move to the cloud, it’s very similar framework, it’s the planning for a migration, or the defining of a disaster recovery. Again, it comes back to let’s start with the very basic thing, an enterprise application inventory. Do you have a complete inventory of all your applications within your environment? And then secondary, do you have a complete inventory of all your infrastructure, of all your servers, of all your storage, of all your WAN gear, your LAN gear, et cetera? Do you have all of that complete, because without an accurate asset inventory of all of those things, it’s going to be really difficult to start to put this together.

The next thing you have to do is you have to look at that application maturity. We find that there are people that say, we’ve got this application, it’s 12 years old, we haven’t made a lot of updates to it, we’re afraid that this infrastructure it’s running on, if we turn that server off, and we have to turn it back on, it’s not going to come up and running. We want to move it to the cloud so we don’t have to worry about it. Well, that’s even a huge flag right there. I mean, if it’s 12 years old, what operating system is it on? What maybe database version is it tying to? Where are all the dependent applications that need to function with that old legacy application? It’s very unlikely that that would be a solution to move it to the cloud, let alone the maturity and the process of where some of the other factors are. And you get other side effects, the actual manufacturers of that application, or people supporting that application are going to tell you, no, we won’t support it in a cloud solution. There’s some people telling you they won’t support it in a virtual situation, so you’ve got to really make sure that you understand your support structure. From there, you need to really start to go through and map everything out, so now I’ve got my entire inventory of my applications, I’ve got my infrastructure inventory. I need to map my applications to all the secondary, and all the dependency applications, and understanding what as a group, from the primary and secondary applications, must move to the cloud?

I also then can map the workload of all of those applications, from a processing power, from memory capacity, from a storage requirement, from a bandwidth requirement. What is going to be required from the workload, if I move that to the cloud for this particular application? And you can gather that from your infrastructure inventory, and by assessing what infrastructure that particular application and their secondary applications are really demanding, and what are they really utilizing? And now you’re going to have the requirements to take to that cloud provider, to get a proper proposal to make sure that it’s very clear, and that there’s communication that they can move it to the cloud. I’m sure, Kevin, you’ve seen a lot of people that get sold on the marketing side of the cloud provider, and they want to move it to that cloud provider, because it’s a nice, fancy facility, but yet they haven’t considered this workload. Are you seeing people go through this type of analysis?

Data Center Spotlight: Yeah, I think there’s always an overestimate, Jeff, as far as what can be done by cloud, and when you’re dealing with currently existing infrastructure. I mean, it’s not easy to move everything to the cloud as you’re stating it. Another omen I’d like to bring into the mix is, when you don’t have a thorough process, which you’re suggesting a lot of people don’t in the world in which you reside, cloud can be very expensive if you just stumble into it without a plan, and you don’t know what you’re doing. When things are just sort of left open ended with a public cloud provider, a lot of times that is what can lead to busted budgets and significant cost overruns.

Jeff Gilmer: Yeah, absolutely, and let me expand on that with some of the things you’re seeing that really impact your overall cloud’s success, your cost, your management, your optimization, functionality. The first thing I think that people don’t understand is that every cloud provider is different, meaning they use a different infrastructure, they have maybe a different way that they structure the operating system, or the virtual instances, the type of storage they’re using, or how that all functions and ties together as a unique solution. So, you go to cloud provider A, and you want to be on their solution, that’s great, but then if you go to cloud provider B and you want them to be your disaster recovery, they don’t talk to each other, and it’s kind of created a whole type of another structure in the IT world of developing software solutions, or solutions that allow two cloud providers to talk to each other. The reality is, it doesn’t function very well. So you really need to look at when you’re just picking a cloud provider, you’re making a long term strategic decision. A long term strategic position from the standpoint of your production, but also your disaster recovery, and then let’s make sure you take into account test and dev, because test and dev are the ones that you can turn on and you can turn off. Let’s make sure that if your organization’s cyclical, meaning, are there typically times in a day, or typically times in a week, or in a month, or in an annual period where you have higher demand or lower demand? Because that’s how cloud was initially thought of, it’s, well, I’ll run it in my own data center today, and then when I need this over expansion, I’m going to put cloud in here for that extra capacity that I need, and I’m going to turn that cloud on. What people have learn is, the cloud doesn’t necessarily mesh with what they’re running in their own environment, and they need to move all of it to the cloud and ramp up or ramp down, or they need to make sure that it’s maybe a private cloud solution that has been built with the proper infrastructure to support and match the infrastructure they’re running within their own environment, and a private cloud is obviously more expensive than a public cloud, because the assets are designated specifically for that client’s own organization.

To kind of summarize it is, you’ve got to manage a cloud environment very specifically, when you turn things on and when you turn them off, much more than if you look at people that start virtualizing today, they probably have multiple virtual sessions that have been running for three, six, nine, maybe a year, and they haven’t even turned any of those off. So if it’s been out there for three months on a virtual session, should you be shutting that down? You may just need to pay the license. Maybe you do, maybe you don’t. When you go to cloud computing, now it’s not just the virtual instance license you’re doing, you’re paying for compute power, and storage power, and network bandwidth, and on and on and on. You want to make sure that you truly manage that from an optimized perspective of when you need the demand, to control your cost factor. You need to understand how to connect to that, and is that particular cloud environment going to work with your environment, because as we mentioned earlier, 20% of the applications today could really go to the cloud, 80% are in your environment. So the majority of people are going to be running a hybrid cloud, which is some is still in their current data center, and some in the cloud, and they’ve got to communicate.

Now, there are some exceptions to that. When you get into what we kind of call the commodity types of applications today, if you look at something like Office 365 from Microsoft with SharePoint or with Lync, or you look at Salesforce.com, or you start to look at some of these others that have been out there awhile, in that type of area where they’ve become a commodity type of cloud solution, those solutions function just fine. When you get into an organization where you’ve got custom databases, and database demand is huge, the biggest cost here on the cloud for the database is the network. Your network, when you start replicating databases into a cloud solution, might be 50% of the cost of that total cloud solution. So you’ve really got to look at your bandwidth and network capabilities, and those databases have been defined, and how large they are, and how much data are you really transferring, and where are you sending it to, and on and on and on. We could spend hours talking about it, Kevin, but you kind of get the idea out there.

Data Center Spotlight: Yeah, I mean, that’s some great information, Jeff, and we’ve talked about migration issues, and a lack of understanding of what’s required as far as resources and roles go, both migration and post migration, we’ve talked about something similar, not having the proper thinking and resources dedicated to DR following a data center upgrade, or a data center transformation, and issues that can occur with allowing the DR to sort of lag behind, and overestimating what can go to the cloud, both from a DR perspective as really as from a production perspective, and the dangers of having a lack of a complete process to determine what can go to the cloud, and Jeff, we’ve done a number of these podcasts, and there’s been a lot of valuable information, but honestly, I don’t know that you’ve delivered quite so much pertinent information as you have in this podcast. This has been a really good discussion, and I would love to hear you wrap this up, or add anything new that you’d like to add.

Jeff Gilmer: Well, it can get very immense, and very overwhelming for organization, and again, I’m going to go back to how we started this podcast, initially on issues and mistakes, and we talked a lot about planning. Planning is so critical, having an overarching strategy is so critical, and if you spend the time to do the planning, you spend the time to do the overarching strategy, if you put all of those things in place, you’re going to have success from whether it’s a migration, whether it’s disaster recovery, whether it’s having proper resource and staffing and organizational aspects, supporting your business or migrating to the cloud, any of those issues that we talked about, you can greatly increase your success, greatly increase your visibility to the organization, your benefits to the organization, and help your organization grow and be profitable, all by making sure you follow the planning and put together a solid strategy.

Data Center Spotlight: And that’s a great wrap-up as well, Jeff, if people want to address some of these issues within their own organization, and want to reach out and get in touch with you, what’s the best way for you to reach out to you at Excipio?

Jeff Gilmer: The easiest way is to go to our website at excipio.net. We have a lot of information there, we’ve got some of these podcasts and others, we’ve got seminar information, we’ve got strategic planning documents, we talk about solution suites that identify all the different areas that need to be addressed, from a framework perspective, if you’re looking on how to put together your plan, and how to put together your strategy, just loads of information that can be found on our website, again, excipio.net.

Data Center Spotlight: And excipio.net is a very resource-heavy website with a lot of good information. If you enjoyed this conversation, and the past couple of conversations with Jeff about errors and mistakes and issues that can occur in creating and executing a data center strategy, I think you’ll like what you find at the excipio.net, at Excipio’s website. Jeff, as always, I appreciate the time today. Very informative and illuminating call, thank you very much.

Jeff Gilmer: Yeah, thank you, again, Kevin.