Tuesday, 24 May 2016

VMware Photon Platform, Notes from the Field - Entry Two

Time to Build The Cluster Services

Is it a bird,  plane or the Photon Platform? In the early days of Photon Platforms announcements, in my own mind, I had it pegged as a purpose built platform alternative to 'vSphere Integrated Containers (VIC)'. Keeping in mind that VICs sole purpose in life is to enable Containers in vSphere with a cool concept of a 1:1 alignment of VM to Container. This meant a container can have all the flexibility inherent in a container with all the control and security of a VM. Alright come on down Photon Platform just bigger, better more focused right? Well not really....

So one thing that strikes you with the Photon Platform is the flexibility of choices it provides. From a scheduling cluster manager's perspective you can utilise Swarm, Kubernetes or Mesos out of the box. On the other side, this is ESXi after all (marketing slides aside that keep alluding to the slimmed down hypervisor) so it will happily run Virtual Machines  and Containers side by side (well containers within VMs anyway).

Being able to support both is important as not all workloads are created equal in this new age. Let's be honest, not everyone working in this new 'Cloud Native' world has grown a beard, rides a bicycle and wears skinny jeans, there is a variety of personalities that need to be catered for. What is common is we don't all need the assurance and crutches that come from the more traditional platforms. If I fall over it is ok, I will just get myself back up. In the traditional world I would have crutches, cushions, people watching me, ready to catch me if a stumble, etc. This is because in this traditional world I am seen as needed to be treated like a VIP (i.e. Very Important Process) and require all the care of advanced services to keep me going (and the costs associated with those advanced services). Sometimes a more traditional approach may be preferable to containerising (is that a word?) the workload so Virtual Machines live on even if they are seen as cattle (I'll let Duncan Epping define that for you).



This mixed capability is emphasised by the fact that all the management VMs are hosted out of the hypervisors that are flagged as Management hosts (you designate if a host is 'Management' so participates in the control plane or 'Cloud' which only provide a hosting platform capacity, or both). Anyway myself for one, I'm a big fan as by including the ability to host containers and VMs as this provides that flexible, cost and scale focused platform I want without having to look at alternatives such as Openstack. Add to this the fact that it also includes support for the 3 main container and workload cluster scheduling services out of the box and your covering a lot of bases for possible consumers of Photon Platform.

My personal choice is to go forward with Mesos at this stage as it gives me the open choices I want when paired with Marathon. I have played with all three though on Photon as all it takes is that you import the 3 images and enabled the 3 cluster types. I am not here to tell you how to do this or how to create a cluster as there are plenty of blogs that do that, I just want to give my view and some lessons I learnt on the way that may help someone out there. 

It is important to note that when creating clusters, they align to there traditional model and not some Hybrid like in VIC. What I mean by that is that if you create a Docker Swarm cluster, the slave nodes will be running the containers in a shared multi-node format. The nodes are utilising PhotonOS as there Operating System but we don't have the 1:1 Container to VM alignment of VIC. This is important to consider as your sizing of those nodes should reflect your needs. There are default 'Flavors' (remember Flavors define the resource configuration profiles) aligned to the different clusters but they can be overwritten at the API, CLI level when creating a cluster.

This system is built as a multi-tenant service from the ground up so when you create workloads they need to be placed within a Project assigned to a Tenant. As you would expect, a Tenant gets an allotment of Resources (vCPUS, memory, Disk Capacity, VMs, etc) which are then subdivided into Resource Tickets (think Gold , Silver Bronze classes or whatever). A Project is then assigned a Resource Ticket which it then further carves out a resource reservation /limit out of. All the way down this logical structure you can add Access Controls via the integration with VMware Lightwave



So importantly as a tenant, I can create my own cluster service to host workloads or just create VMs directly based on virtual appliances or virtual disks imported into the system.  I can then control my VM workloads via the CLI and API into Photon Platforms Controller VM as you would of normally via vCenter, or use the cluster manager of choice to control the workloads within them.

As a final statement on the installation of the clusters what I can say is this. I have built all 3 flavours using simple sandbox methods as well as going through the full manual processes (more then once as I like punishment) and there is a highly compelling value proposition here. To be able to create these services as you require them  on a production class platform so simply is huge. There will be challenges with the way it is done now such as how do they stay close to release parity (hence the compelling part of it being Open Source), but this is a new era platform that brings together the traditional on-premise controls to the new cloud native era's service requirements in the one stack. This is great stuff for a version 0.8 release!

Some Notes On Cluster Enablement:


Be Wary Of Resource Stinginess
My nature is to give less then more when allocating resources. This sent my on a dance to the requirements of the various clusters as I did not have enough resources aligned to my Project.  Just be aware of the sizing needs (they vary of course depending on your own configuration requirements) and if you are resource constrained, use custom 'Flavors'. I prefer to bang my head on the wall so got very use to the 'Photon Cluster Create' operation being closely followed by the 'Photon Cluster Delete' operation to delete the failure and started again (side note, it didn't like me trying to recreate deleted clusters with the same name, didn't ping this down but got into the habit of using unique names each attempt) :)

Cluster Creation IP Address Requirements
Within the instructions it is emphasised that DHCP needs to be disabled / not present in the management network which is where the clusters are installed. Ironically, if DHCP is disabled the cluster deployment fails with a 'Unable to Obtain an IP Address' error. To progress forward I re-enabled DHCP in the management network (the one that the Photon Controller is installed into). The more accurate requirement description is to ensure the static IP addresses you provide (i.e. for the 'zookeeper' servers in Mesos and 'etcd' servers in Kubernetes and Swarm) are not also served within the active DHCP scopes.

Those Damn Certificates
Maybe you lucky that your company provides an open network and all traffic is created equal. That is not the case in mine, we trust no-one and inject ourselves into every certificate coming into the network. Any problems with this is mitigated by adding our own certificates into the trusted root but this is difficult for those environments that are created dynamically as these clusters are.

When you create a cluster PhotonOS based VMs are created corresponding to your configurations requirements (i.e. how many slaves, how many provide the interconnect service) and are then configured by templates contained on the Controller. Where the certificates being trusted becomes critical is the fact that the services run as containers within the VMs and as such the images need to be downloaded as part of the initial execution. The end result is the cluster creation operations fail with 'Time Exceeded' errors.

Anyway its an easy fix! I edited the template files contained in the controller directory '/usr/lib/esxcloud/deployer/scripts/clusters' with descriptive names such as 'kubernetes-master-user-data-template' to inject our certificates into the VMs. This at least made that problem go away and you of course could do this to apply any other customisations that may be required. It would be nice to see some option in the future to inject certificates into the process. I also provided this same feedback for 'vSphere Integrated Containers' as had the same problem there!




Sunday, 22 May 2016

VMware Photon Platform, Notes from the Field - Entry One

First, some overview ramblings:


Ok, first of all let me say this, I am excited about the changing landscape in the industry. This shift to containerisation is exciting to me in two ways. Firstly, as an automation focused engineer, this just seems like the natural progression / evolution of the current platform (incoming alert, Unikernels). Secondly, the technology landscape is very exciting, lots of new tools (or is that toys) to play with!



That said I have been fortunate to be working with VMware (shout out to Roman Tarnavski @romant) for a while now on their rapidly evolving 'Cloud Native' program. With this a couple of recent announcements have there two primary offerings 'vSphere Integrated Containers' and 'Photon Platform' are both now Opensource on github so now everyone can have a go. A key point of differentiation between the two is that VIC leverages your existing VMware vSphere platform to host container centric workloads on a 1:1 basis on a Virtual Machine, the other, Photon Controller still leverages ESXi (today anyway) but has it's own control plane.

Photon Controller is a replacement to vCenter focused on the operational requirements of the container workloads, not virtual machines. Why you may ask? An easy way to think about it is the rate of change and scale that us usually associated with a container aligned environment versus the long term service life and different scale models associated with traditional Virtual Machines drives a different model. 

If you are all in for containers, or alternatively, have a substantial container footprint today (or will soon) you have some considerations when assessing VIC versus Photon Platform:

  • Will you be limited by the sizing maximums of vSphere. Think vCenter, is 10,000 VMs enough when 1 VM equals 1 container how about the change rate of those VMs.
  • You also may be questioning the commercials around vSphere in a container world, do you really need those advanced availability services (e.g. HA/DRS) in a container world. I would say no, the idea of containers is pretty simple, let them scale out and have availability controls north of the service such as with NLB (I know it is an over simplification) or south with stateful data services (should be scale out themselves but that is for another day).

Neither of these statements cover the why VMware anyway? Isn't this new world order a shift away from the vendors we typically associated with infrastructure, doesn't the consumer in this new world not care about infrastructure? All I would say there is that there is a world of difference between developing in a sandbox environment and having your workload run in a production class platform with all the controls, monitoring and reporting thats in place within organisations today for traditional workloads. I also do a lot of testing in the Cloud and directly on my laptop, tat is the beauty of containers, you can just shift around. But for it to go production, there are certain requirements that a lot of organisations need to comply to. This may be due to regulatory demands, data sovereignty requirements, cost controls, to name a few.  

The idea of either offering is to be able to provide the application developers, operator, <insert whoever else here> the same user experience, with the same toolsets but aimed at a different platform. This platform then is able to have the same operators (VMware Admins) look after it in the same / similar way that they do there vSphere environment. Everyone wins :)

Anyway, next I will look at the install and configuration stage. There is some great content at github and by other bloggers such as William Lam. But I am finding some differences with my own experiences to what others have documented. 

Some quick notes:


Controller Install VIB Upload fail
I kept getting file upload errors when the installer tried to upload the Photon VIB to the Management ESX host (see these issues in the log /var/log/esxcloud/deployer/deployer.log) . I still have not isolated where the issue is and not been able to isolate it as yet in the source code. What I did do though is simplified my naming convections and the issue went away. I noticed all the blog samples where pushing into default bare ESX hosts where I aligned mine to my own pseudo naming standards. In the end I simplified my environment by taking hyphens out of my port group and datastore naming and was now able to install. 

YAML Install File with CLI
The examples provided and the exported file provide the 'image_datastores' field within the 'deployment' section as an array. I ended up getting errors around the conversion of the field to an array when attempting to deploy through the cli. This occurred regardless if I had or or more values assigned. As such changed the field to a straight string and it worked. Change shown below:



Uploading Images getting HTTP 413 Error
To enable management clusters (schedulers) you need to upload the corresponding dish images into the Controller (Mesos, Kubernetes or Swarm). I found that regardless of the image I was getting a NGINX error of 413 'Request Entity To Large'. You don't have to be a google ninja to identify the error quickly with the reported resolution to set the 'client_max_body_size' value in the 'nginx.conf' file.On the controller VM the NGINX service runs as docker container called 'ManagementUi' so it was easy to then pull the configuration file down with a 'docker cp' command to have a look, low and behold the setting is not there!

OK, now I have just a rat down he drainpipe I can say that I was on the wrong track. NGINX looks after the User Interface but not the API. Looking at the docker containers there is one running HAPROXY which is the loadbalancer service, called you guessed it 'LoadBalancer'. Jumping into this VM and looking att he file 'haproxy.cfg' you can see that the API is served by ports 28080 and 9000.  By switching the photon cli target to either these ports let the images upload successfully (e.g. photon target set http://10.63.251.150:9000).

I took the long way but got there in the end, time to start playing!!!



Friday, 25 March 2016

Unbox, Power On, Watch the RackHD Magic Begin!

What am I saying, you are probably asking? I have had great fun over the last week getting down and dirty with one of the headline EMC {code} projects, RackHD. This is a very cool solution for taking care of your low level activities for bare metal infrastructure. Think configuration management for all those physical pieces in a rack in the data center. This is a way to manage the firmware through to the personas of these devices, very good stuff and it is Open Source!

I guess first up I should give some context around what I am referring to with EMC {code}. At EMC we can be perceived to be a player in the more traditional areas of IT Infrastructure, EMC {code} is one example (of many) where this is a long way from the current day EMC that I work for. EMC Code is the landing place for developer enablement and open source projects for EMC.

This is your one stop shop to find any open source projects supported by EMC, community projects by EMC staff, partners and customers. It also includes training content and projects aligned to helping enable the developer community on wide spectrum of those new Mode 2, Platform 3 Cloud Native, SDDC technologies (you get the idea). I strongly encourage anyone to have a look and join in. There are some very smart people in this community and very accessible via tools such as Git and Slack.


Anyway back to RackHD, what is it really. So, we know that there is a number of configuration management solutions out there today such as Puppet, Ansible, Salt, Chef, etc but these tend to have a common trait, they look after the nodes/hosts/clients through remote agentless access (SSH, WBEM, etc) or via agents that are installed in the target device. What this of course requires is that the device is ready to accept remote requests or have agents installed. in other words they take control and configuration management of the platform once it is operational. A few such as Puppet with Razor have the ability to control the physical world but not as an all inclusive service with mulitple action workflow smarts.



Looking at RackHD you have a solution that provides:


  • Bare metal configuration management across the physical infrastructure stack. So not just with the compute but all the bits that go in a rack (hence the name) including:
    • The compute
    • The Network
    • The Storage
    • The enclosures that may contain the nodes
    • The Racks themselves and PDUs (remember, just plug the thing in)
  • A strong but intuitive Restful API
  • Aligns to the 'Infrastructure as Code' model. Allows node definition, associated workflows and SKUs to be fed in as JSON files via the API (or UI if thats your preference)
  • Fully self contained service providing all the mechanisms required to control a physical environment such as DHCP, PXE,TFTP, HTTP, etc
  • A scale-out architecture that can grow to your environment needs
  • Full support of dynamic discovery and physical control through interactions with hardware via physical interaction standards like IPMI, SNMP, BMC, DMI
  • Ongoing low level configuration tracking and management through Pollers
  • Provides that one stop shop for physical telemetry data and alerts
This stuff is cool and to watch something be discovered and then have a profile assigned and provisioning actions kicked off is very cool. It does not matter if is using Zerotouch for Network switches or building out a Docker Cluster via Kickstart scripts, Ansible modules and Docker-Machine (did I mention that there is a Docker-Machine driver), it is great to watch.

The process of discovery and workflow execution

The best way to learn about tools like this is to start playing with it and luckily the EMC {code} have mode that very easy for all of us with a fully functional Vagrant Demo setup that leverages VirtualBox off your laptop. I highly recommend anyone to give this a go as I definitely have had some fun with it. The guys have also written a docker-machine driver that can also be tested with RackHD within Vagrant, get it from GitHub now in under an hour you would have your first workload up and going! 

If you want to see this in action, Kendrick Coleman did a great demo video on YouTube.




http://bit.ly/rackhd-docker