Guidelines – Virtualizing Hadoop on VMware vSphere

The best procedures for setting up each of the main calculate resources— disk storage, I/O, CPUs, and memory— when preparing them to operate virtualized Hadoop-based workloads. This information should be read as an introduction to these types of best-practice areas.
  • The sum of all the memory size set up in the VMs on a server must not exceed the size of physical memory to the host server. Reserve about 5-6% of total server memory intended for ESXi; use the remainder for the digital machines.
  • The actual CPUs on the vSphere host should not be overcommitted. One viable approach here is how the total number of vCPUs configured throughout all VMs on a host machine is equal to the physical primary count on that server. This a lot more conservative approach ensures that no vCPU is waiting for a physical PROCESSOR to be available before it can perform. If that type of waiting would be to occur, the administrator would visit a sustained increase in %Ready time since measured by the vSphere performance equipment.
  • When hyperthreading is enabled at the BIOS degree, as is recommended, the total number of vCPUs in all VMs on a host machine can be set up to be equal to two times the number of physical cores— that is, corresponding to the number of logical cores on the machine. This “ exactly committed” strategy is used in demanding situations in which the best performance is a requirement. Both conservative method and the match-to-logical-core technique are viable approaches, with the last mentioned being seen as the more aggressive from the two in achieving performance outcomes.
  • VMs whose vCPU rely fits within the number of cores in the CPU socket, and that exclusively make use of the associated NUMA memory for that plug, have been shown to perform better than bigger VMs that span multiple electrical sockets. The recommendation is to limit the particular vCPUs in any VM to an amount that is less than or equal to the amount of cores in a CPU socket at the target hardware. This prevents the particular VM from being spread throughout multiple CPU sockets and can help it to perform more efficiently
  • Create 1 or more digital machines per EM UMA node.
  • Limit the number of hard disks per DataNode to increase the utilization of each disk – four to six is a good starting point.
  • Make use of eager-zeroed thick Drive VMDKs along with the ext4 filesystem inside the guest.
  • Use the VMware Paravirtual SCSI (pvscsi) adapter for disk controllers; make use of all 4 virtual SCSI controllers available in vSphere 6. 0.
  • Use dedicated network switches for your Hadoop cluster if possible and ensure that most physical servers are connected to the ToR switch. Use the vmxnet3 system driver; configure virtual switches along with MTU=9000 for jumbo frames.
  • Use a network that has the particular bandwidth of at least 10 GIGABYTE per second to connect servers working virtualized Hadoop workloads.
  • When configuring ESXi host social networking, consider the traffic and loading specifications of the following consumers, each slot should be connected to a separate switch with regard to optimizing network usage.
  1. The management system
  2. VM port organizations
  3. IP storage (NFS, iSCSI, FCoE)
  4. vSphere vMotion
  5. Fault threshold
  • Set up the guest operating system for Hadoop performance including enabling jumbo IP frames, reducing swappiness, and stopping transparent hugepage compaction.
  • Place Hadoop master roles, ZooKeeper, and journal nodes on 3 virtual machines for optimum efficiency and to enable high availability.
  • Dedicate the worker nodes to run only the HDFS DataNode, WOOL NodeManager, and Spark Executor functions.
  • Use the Hadoop stand awareness feature to place virtual devices belonging to the same physical host within the same rack for optimized HDFS block placement.
  • Operate the Hive Metastore in an individual MySQL database.
  • Established the Yarn cluster container storage and vcores to slightly overcommit both resources.
  • Adapt the task memory and vcore necessity to optimize the number of maps plus reduces for each application.
Above information are provided in more details in beneath articles.