M”crosöft Internal Distribution
This white paper discusses the approach used to convert the Hotmail web server farm from UNIX to Windows 2000, and the reasons the features and techniques were chosen. It will focus primarily on the planners, developers, and system administrators. The purpose of the paper is to provide insight for similar deployments using Windows 2000. We will discuss the techniques from the viewpoint of human engineering as well as software engineering.
Early results from the conversion, which was limited to the front-end web servers, are:
n Windows 2000 provides much better throughput than UNIX.
n Windows 2000 provides slightly better performance than UNIX.
n There is potential, not yet realized, for stability of individual systems to be equal to that of UNIX. The load-balancing technology ensures that the user experience of the service is that stability is as good as it was before the conversion.
n As this paper will show, while the core features of Windows 2000 are able to run the service, its administrative model is not well suited to the conversion.
The observations related here are derived from experience gained at a single site. More work would be needed to establish whether they are representative.
Initial draft for review
Rewrote abstract; added a section on initial experiences; other updates
Version History. 1
Table of Contents. 1
Project Overview. 3
Critical Features of Hotmail as a .COM Site. 4
Advantages of UNIX. 6
Problems of Windows. 7
Strengths of Windows. 9
Hotmail Architectural Decisions. 10
Project constraints. 10
Installation Methodology Conserved. 10
Conversion to ISAPI 10
Load Balancing Technology. 11
System Creation, Mastering and Installation. 12
OS installation and configuration. 12
IIS configuration. 13
Tuning and hardening the system.. 13
Use of Active Directory. 14
Users in AD. 14
Computer systems in AD. 14
Application Installation and Update. 15
Application update styles. 15
Application update techniques. 15
Distribution mechanism and format 16
Monitoring and Logging. 16
Network Operations Center 16
Autonomous monitoring. 16
Ad-hoc Maintenance. 17
Converting the UNIX Administrator 18
M”crosöft acquired Hotmail at the end of 1997 as a going concern. The service”s creators had defined a two-layer architecture built around various UNIX systems:
n Front end web servers, built with dual Pentium systems on racked motherboards, running Apache on FreeBSD (a configuration with no need to install licensed software)
n Back end file stores, built with Sun Enterprise 4500 servers, running Solaris 2.6 (Sun”s UNIX) and with all user data stored on RAID arrays, accessed using very simple filing semantics
n Incoming mail listeners, built on Sun Sparc 5 processors, and interacting directly with the back end
n Name/password verification engines, build on Enterprise 4500 servers
n Member Directory, built on PCs with NT and SQL
The conversion of the Hotmail web servers to Windows is an ongoing project with several rationales. The team was hoping for better utilization of the existing hardware resources. The superior development and internationalization tools are important. A M”crosöft property should eat its own dogfood. Finally, we wished to use the conversion experience as a model for other UNIX conversions that we hope to carry out in the future.
The first phase of the conversion, described here, was limited to the web servers. Appropriate hardware was already in place, and the planning and development staff were confident that they already understood how to perform the conversion successfully.
There were several constraints on the conversion process, which are probably typical of the average Internet site:
n Hotmail has established an 8-week cycle of version upgrades, and there was a desire (and some partner pressure) to keep that cycle going.
n It is essential to keep the service running continuously.
n The staff is small, and there was not an opportunity to add staff.
We believe Hotmail is instructive as an example of the large Internet server site. It is one of the largest such sites on the planet, so we should be judicious in applying its principles to sites that are comparatively very small, and don”t have the issues deriving from multiplication of resources.
As stated above, we are concentrating on the front-end web servers. Although some of the following comments are also applicable to the database machines, we will not address them specifically in this paper.
1) Restricted, well-controlled application. The application under UNIX was a collection of CGI programs, serving about 100 distinct URLs, which have been converted to an ISAPI module. The programs are written in C++. The entire application is under the control of one team, and its architecture is well understood by all of the teams (dev, test and operations). Updates are only due to scheduled code releases, or hotfixes. This contrasts with a site like M”crosöft.com, which has many different authors and continuous updates.
2) Lights-out administration. All the servers are in a controlled facility that may be staffed by contractors, and it should not be necessary for skilled staff to visit the individual machines for any reason. Machines should be self-monitoring, and Operations staff should be able to maintain them remotely using minimal interaction.
3) Multiple identically configured machines. This leads to a need to have all regular system administration functions, including OS and application update, be scripted, rapid, reliable, and non-interactive. There is simply not time for an administrator to interact personally with all machines. A load-balancing mechanism routes customer requests from a virtual address to one of the real servers.
4) System costs suffer multiplicative effects. Adding a VGA monitor or a second NIC to a server, or running a serial cable to it, may be pocket change when applied to a single machine. Purchasing several thousand such devices, however, becomes a significant investment and has to be thoroughly justified.
5) 100% availability. A large Internet site must provide service 24×7. Furthermore, the full capacity should be available all the time. Hotmail”s load fluctuates daily according to the time across the US, but not by much; the international usage is high.
6) Simultaneous upgrade. The pervious two points mean that the servers must be upgraded essentially simultaneously, unless some kind of server affinity mechanism can be implemented per user session. Since a typical user interaction involves several clicks, it would not be good for a user to jump backwards and forwards between code releases; the problems would range from inconsistency in style to (apparently) half-implemented new features.
7) No personal machines or accounts. All machines are assumed to be secure because of physical location and electrical isolation. Generally speaking, when an administrator is operating on the server or a scheduled tasks runs, full administrative privileges are given. This increases the danger, but reduces the load required to maintain and synchronize accounts.
8) Remote monitoring. All performance monitoring is done by querying the server or by automated reports, and monitoring uses the single NIC. In Hotmail”s case, there is plenty of spare network headroom on each server for this monitoring not to penalize the primary operation.
9) No architectural limits on growth. An Internet site expects to keep growing, and built-in limits that seem unreasonably high in the early days will one day loom up and need to be fixed, using resources that should be enhancing the site. Hotmail has grown from 9 million accounts when it was acquired by M”crosöft, to 100 million in July 2000, without significant changes in the hardware or software architecture.
The final four items are more closely related to Hotmail”s architectural choices, but we believe they are representative of the market.
10) Scale-out. The Hotmail website is built from several modules, with each module present in different multiples and able to be scaled out almost indefinitely. In this phase of the project, we are considering the front-end, the web servers that house all of the user interface logic and some parts of the business logic. Among the servers, the majority (“front doors”) runs some code in response to each click, and these were the primary targets for conversion. The machines are single-board x86 PCs, moderately powerful, using Apache, running on FreeBSD version 3.0, to deliver content. Fortunately, these servers are good Windows 2000 hosts.
There are also some servers that serve static content and will be almost trivial to convert once the front doors have been converted. Administration of these servers will use the same methodology as the front doors. They also run on FreeBSD, using the server “boa”, which is optimized for serving static content rapidly.
11) Configuration conservatism. There are more than 3,000 front door machines, all identically configured. Having the servers essentially identical is important to the operators” ability to administer the site. The approach to the hardware is very conservative: once a hardware configuration is established, it is easy to keep rolling out copies rather than try to qualify a newer model.
This conservatism also applies to the software design. The need to run the project on Internet Time has an impact on this project in several ways: in this case, designers always need to be improving the application and there is little resource left over for redesigning the basic architecture. Furthermore, the various modules of the site are developed independently, creating a force for stability in the internal protocols.
12) Design for stability. Virtually continuous uptime and a consistent response time are crucial. This is achieved by some overcapacity, and highly reliable load-balancing hardware (Cisco Local Director). Local Director is just another module in the scale-out solution.
13) Controlled and understood systems. A fact about UNIX is that it is easy for an administrator to ensure that there are no irrelevant services running. As well as giving the potential for maximizing performance, it is useful to be sure that there are no random TCP/IP or UDP ports open that could be used as a basis for an attack. To some, this transparency is intrinsic to UNIX, but it also comes from a greater familiarity among system administrators with its internal workings.
The headless nature of the systems, and their remote location, have a profound influence on the way the systems are administered. Headless operation means that any direct interaction will be through a remote session (telnet or Terminal Server); nobody will be able to detect an important dialog on the console, and even a bluescreen is not apparent. Remote operation means that there is a specific cost associated with walking up to the machine. The site is serviced by contractors whose job is mainly limited to replacing failed servers and rebooting on demand; it is possible to attach a monitor and keyboard to a running system, but that is operationally an exception.
Commonly, although not strictly correctly, the generic term UNIX describes a family of operating systems that are deployed on a variety of systems. Although their internal design may be different, the variants appear to their end-users as the same system, with minor (and annoying) differences in usage. There are two variants in use at Hotmail: FreeBSD, which can be used without license cost and is available in source form, and Solaris, which is bundled with Sun hardware. Linux, which is just another UNIX variant, was not used at Hotmail.
The following sections will examine facts about UNIX (specifically FreeBSD) as they relate to the conversion problem. We also consider Apache as an intrinsic part of the UNIX-based solution, in the same way that IIS is an intrinsic part of Windows 2000 Server.
1) Familiarity. Entrepreneurs in the startup world are generally familiar with one version of UNIX (usually through college education), and training in one easily converts to another. When setting up a new enterprise, it”s easy to work with what you know than to take time investigating the alternatives.
2) Reputation for stability. Both the UNIX kernel, and the design techniques it encourages, are renowned for stability. A system of several thousand servers must run reliably and without intervention to restart failed systems. For Windows 2000, we must first prove the stability in the same environment, and we must then convince the rest of the world.
Apache is also designed for stability and correctness, rather than breadth of features or high performance demands.
3) FreeBSD is free. Although there are collateral costs (it”s not particularly easy to set up) the freedom from license costs is a major consideration, especially for a startup. The free availability of source also means that it can be fairly simple (or it can be very difficult) to make local changes.
4) Easy to minimize. The typical UNIX server is taking care of one task, not acting as a desktop and development platform for a user. It is particularly easy to cut down the load on the system so that only the minimum number of services is running. This reduced complexity aids stability and transparency.
5) Transparent. It”s easy to look at a UNIX system and know what is running and why. Although its configuration files may have arcane (and sometimes too-simple) syntax, they are easy to find and change.
6) Preference for text files. Most configuration setups, log files, and so on, are plain text files with reasonably short line lengths. Although this may be marginally detrimental to performance (usually in circumstances where it doesn”t matter) it is a powerful approach because a small, familiar set of tools, adapted to working with short text lines, can be used by the administrators for most of their daily tasks. In particular, favorite tools can be used to analyze all the system”s log files and error reports.
7) Powerful but simple scripting languages and tools. Again, familiarity and consistency among UNIX implementations is the key. Over the years, UNIX versions have evolved a good set of single-function commands and shell scripting languages that work well for ad-hoc and automated administration. The shell scripting languages fall just short of being a programming language (they have less power than VBScript or JScript). This may seem to be a disadvantage, but we must remember that operators are not programmers; having to learn a block-structured programming language is a resistance point. Scripts that combine executables into pipelines are simple to build incrementally and experimentally, and even the experienced Hotmail administrators seem to be taking that approach for special purpose scripts (using CMD) rather than authoring with one of the object-oriented scripts.
On the other hand, PERL (another language that has grown organically with a lot of community feedback) is more of a programming than scripting language. It is popular for repeated, automated tasks that can be developed and optimized by senior administrative staff who do have the higher level of programming expertise required.
Consider the above list of UNIX strengths to be also a list of Windows weaknesses. However, there are some specific issues that need to be called out.
1) A GUI bias. Windows 2000 server products continue to be designed with the desktop in mind. There are too many functions that are either too difficult or impossible to perform using a text-based interface.
Why is this important” There are several reasons:
n GUI operations are essentially impossible to script. With large numbers of servers, it is impractical to use the GUI to carry out installation tasks or regular maintenance tasks.
n Text-based operations are more versatile; an administrator can usually do more to a system (good and bad) than is provided by the restricted, planned methods using the GUI.
n There is in place at Hotmail an established secure channel into the production system, using a text-based secure shell interface.
n Using a GUI amounts to hiding the true system modifications from the system administrators and operators. UNIX operators like the sense of control that comes from their ability to modify system tables and configuration files more directly.
n Operating a GUI through a slow network connection can be too slow to be useful. Although this is less important, it can still be a consideration when there is a need to administer or diagnose a system through a dialup connection.
There are, indeed, many non-GUI administrative programs provided in the core Windows 2000 product and in the Resource Kit. The problem is that the collection is somewhat arbitrary, incoherent and inconsistent. Programs seem to have been written to fill an immediate need and there is stylistic inconsistency and poor feature coverage.
2) Complexity. A Windows server out of the box is an elaborate system. Although it performs specific tasks well (such as being a web server) there are many services that have a complex set of dependencies, and it is never clear which ones are necessary and which can be removed to improve the system”s efficiency.
3) Obscurity. Some parameters that control the system”s operation are hidden and difficult to fully assess. The metabase is an obvious example. The problem here is that is makes the administrator nervous; in a single-function system he wants to be able to understand all of the configuration-related choices that the system is making on his behalf.
4) Resource utilization. It”s true that Windows requires a more powerful computer than Linux or FreeBSD. In practice, this is a less important constraint. When you are building a large operation, you will use smaller numbers of relatively powerful systems. The PC systems in use at Hotmail are perfectly capable of running Windows, and the machine”s basic power is the same whether it is run with UNIX or Windows. For most of the time, it is only executing application code and most of the extra elaboration is not apparent.
5) Image size. The team was unable to reduce the size of the image below 900MB; Windows contains many complex relationships between pieces, and the team was not able to determine with safety how much could be left out of the image. Although disk space on each server was not an issue, the time taken to image thousands of servers across the internal network was significant. By comparison, the equivalent FreeBSD image size is a few tens of MB.
6) Reboot as an expectation. Windows operations still involves too many reboots. Sometimes they are unnecessary, but operators reboot a system rather than take the time to debug it. For example, a service may be hung, and rather than take the time to find and fix the problem, it is often more convenient to reboot. By contrast, UNIX administrators are conditioned to quickly identify the failing service and simply restart it; they are helped in this by the greater transparency of UNIX and the small number of interdependencies. Some reboots are demanded by an application installation, and are not strictly necessary.
7) License costs. As we will see when discussing load balancing, the license cost of Windows software is a major consideration when converting from the unencumbered UNIX implementations. Although there were no costs to the Hotmail project, as a M”crosöft department, the team did consider the software costs in order to make the conversion a useful model for future customers.
n They used Server in preference to Advanced Server (no features of Advanced Server were necessary).
n They reluctantly used Services for UNIX and Interix, to get access to features that were not adequately provided in Windows. Future releases of Windows will have the features that would make it unnecessary to add those subsystems and avoid their notional cost.
n No business analysis was undertaken to determine whether the benefit of the conversion would outweigh the notional cost of the Windows licenses.
1) Windows has more resources behind its development. It does have greater complexity than the free UNIX distributions, and used wisely (and with knowledge) that can lead to a more effective solution. For example, IIS is more self-tuning than Apache.
IIS and Windows have many more tuning parameters than Apache and FreeBSD. The problem here is to make them comprehensible to new administrators.
2) The development platform, specifically Visual Studio, is a major advantage. Even before the conversion to Windows was contemplated, Hotmail developers used Visual Studio on NT4 to develop and debug their code. The code was eventually recompiled for UNIX when the first level of testing was complete. There is nothing approaching the power of Visual Studio on any UNIX, let alone the free ones, with the possible exception of the Java development tools.
The superior development platform has also had a positive operational impact in the live site. In the first days of deployment, some server threads went into a CPU-consuming loop. Using Visual Studio, Hotmail developers were able to find the application-level problem in a few minutes. That would have been impossible using UNIX tools.
3) Vastly better monitoring infrastructure. UNIX has some rudimentary event reporting and performance monitoring tools, but nothing to approach the integrated power of the event logging and performance monitoring features. Again, it is necessary to use them wisely; event logging in particular has a human and system overhead that we”ll talk about later.
4) Better hardware detection. Setting up UNIX on a new PC is difficult, requiring a more intimate knowledge of how the hardware is built. That”s an up-front cost; given the existence of multiple identically configured systems, cloning an established system doesn”t present the same problems.
5) Internationalization. The software tools available in Windows to provide multiple localized solutions are far ahead of most UNIX systems.
The constraints called out earlier (the 8-week upgrade cycle, the need to keep the service running, and the small number of staff) produced enough pressure on the development and administrative staff that the team agreed to devote one cycle to the platform conversion and not change the application during that time. This allowed the developers and testers to focus on the specific conversion issues. During the conversion, the application itself was the same on both platforms. This means that a user may have successive pages served by either platform, and not notice the difference.
The same constraints led to a desire not to change operational practices without good reason, because of the investment in training staff at all skill levels, and the feeling that the fewer things were changed, the fewer were the potential blocking problems.
Finally, the economic necessity of not adding technical staff to the conversion means that there was no consideration given to major re-architecture of the application.
There is in place a method of remotely bootstrapping a server to a new OS and application suite, and converting one rack (21 machines) in about 20 minutes. Replicating the installation capability was a goal of the project, and conserving as much as possible of the infrastructure to do it was strongly desired.
The web server application suite consists of about 90 different transactions, each corresponding to a click on a web page. Using Apache, each one is implemented as an executable program using the CGI interface, and run in a separate process managed and owned by the web server. Processes are the natural way of encapsulating a single stateless transaction using UNIX.
Converting to Windows, the development team decided not to use the CGI interface to IIS. Creating a new Windows process is more expensive than creating a UNIX process. Instead, the team converted the CGI code to run as an ISAPI application, in which the transactions are processed by code that (in the most basic implementation) runs within the IIS process.
Running in process will be more efficient than running as a CGI, because the process creation overhead is avoided. We could have brought that advantage to UNIX. Apache supports the same concept; the equivalent to an ISAPI filter is called a module. Naturally, we did not waste time building the module implementation just to throw it away.
Conversion from CGI to ISAPI was essentially automated by using a filter that effectively presents the standard CG interface (using data streams and environment variables) to the user code. Because the application code was well written and did not make assumptions about its environment, the major part of the conversion went very smoothly and did not require significant unexpected engineering. There were some intentional pieces of re-engineering:
n The spell, dictionary, and thesaurus functions were rewritten to use M”crosöft technology from Office and Encarta. The UNIX versions use binaries from Merriam Webster. The spellcheck feature is much improved; there are coverage problems with the dictionary data that need to be addressed.
n The SMTP service of IIS was used to handle outgoing mail, replacing a UNIX standard mail service.
n Virus scanning of attachments used an external UNIX utility from McAfee; this was replaced by its NT equivalent.
The most challenging, and anticipated, problem with converting from CGI to ISAPI derives from the forgiving nature of the CGI architecture. Memory leaks, unclosed files and similar problems can be tolerated, because they are automatically cleaned up when the CGI process terminates. Even an occasional abort is tolerated; it results in an invalid page to one customer, but does not usually affect any other part of the system.
By contrast, ISAPI modules share a process with the web server, as do Apache modules. Resource leaks will accumulate, and crashes have the potential to bring down the server (although not the entire service, thanks to load balancing). There are process isolation techniques available in IIS to minimize these problems, but the team decided to use the in-process model for full efficiency. Among the actions taken:
n Use a private heap that is cleared at the end of each web transaction.
n In testing, monitor for resource leaks and fix them.
n Implement an IIS heartbeat monitor that will quickly notice and restart any failed IIS service.
Converting to ASP was not considered. That would have been a complete rewrite of the application, with no great advantage (Hotmail does not use a WinDNA infrastructure, for example). In fact, the implementation uses some ASP ideas and terms, as much of the user content is determined by template files that look like ASP files, but the interpretation engine is completely homegrown. One motivation for borrowing ASP syntax was to use M”crosöft development tools (for example, to aid internationalization).
Hotmail has a large investment in Cisco Local Director; every web access goes to an LD, which redistributes the load among real servers. Hotmail chose to continue with LD, rather than use the Windows load balancing technology, because the infrastructure was in place and did not need to be reconfigured (reducing the learning curve). Also, LD fits the Hotmail model well; it is possible to place up to 400 servers behind the virtual address, and each Hotmail cluster can have over 300 identically configured servers.
Another major issue is the potential cost. Although Hotmail uses M”crosöft software without license fees, we must consider this project as a model for real customers. Use of WLBS requires Advanced Server, but Server provides all the other features used by Hotmail. Using list prices, the cost comparison for a farm of 3500 servers is:
n Using WLBS (hence Advanced Server): $15M+
n Using LD and Server: $6M+
This does not take into account any extra PCs necessary to handle WLBS overhead (administrative, as well as the cycles needed to redirect the load) or the plans by Cisco to further reduce the cost of LD by building it into their network switches.
When considered in the context of a large web farm, WLBS has a serious economic disadvantage that can only be justified by the value of its administrative and monitoring tools. There is considerable competition in the IP load balancing market, which drives costs down; the numbers quoted above are based on the price we paid in mid-1999, around $17,000 per unit. An existing system that has load balancing in place will presumably have adequate tools, so the added value of WLBS, in terms of operational flexibility and superior monitoring, must be considerable if it is to be economically justified.
Each of several thousand systems must be converted to the new operating system and application suite, and this process must be carried out while the service is operating, and within a short timespan. Required are a mechanism for packaging the image and a method for delivering it. Among the special requirements:
n Each server already has a name and static IP address; to fit in with existing operating practices and configurations, they should retain the same name and IP address. Using a static address, compared with DHCP, makes system administration simpler and more transparent. A machine”s name relates to its physical position within a cluster.
n It should be possible to convert a machine without physical access.
n It should be possible to revert systems quickly to FreeBSD in case of serious problems with the Windows conversion.
n Downtime for reboots and service restarts should be minimized.
Several technologies were investigated and rejected. In most cases, there were blocking issues that were seemingly small, but without guarantee of resolution the team had to adopt a method that they could control. Some of the issues were:
n RIS can be used for automatically installing an image from a server when a machine is initially booted. Drawbacks include: physical access is required to the machine (to force a network boot), and the system requires that an IP address be supplied with DHCP (DHCP is not used at Hotmail, because of the requirement for static IP addresses). It was impossible to control the name of the new server as required. In addition RIS was not supported for installing Server, although it was known to work.
n AppCenter is intended for this kind of application. However, the initial release of AppCenter is targeted for small installations. It also lacks some features needed by application installation and update.
n Unattended setup performs a standard installation across the network; because of all the file copying and calculation involved, it is too slow.
The team opted to extend an existing technology, “kickstart”. This uses the OS existing on the machine to bootstrap an image, prepared using sysprep, and then run scripts to perform the remaining configuration tasks that need to be carried out after the install. The image copy is sufficiently fast, and the post-install steps are minimal.
It proves to be difficult to configure IIS in a precisely controlled way. The metabase is obscure and poorly documented, and produced too many surprises. Furthermore, a system created using sysprep does not produce a ready-to-run metabase.
Consequently, it was necessary to construct the metabase by using scripts. The scripts were a mixture of command files that repeatedly call the mdutil utility, and some special-purpose pieces of scripting code (VBScript in this case, although any language that supports COM would work). The scripts are run as part of the mini-setup step that follows construction of the operating system on the target computer.
Figuring out the metabase structure, which elements needed to be set, and how to suppress the unwanted elements (for example, the trees defining the default and administration site) was the most complex and error-prone part of the entire setup design. Considerable reverse engineering was necessary. Major improvement is needed in the way the metabase is described to users, and the way that administrators can script the commonest tasks.
The task was to tune the system for the best combination of throughput and performance, and also to harden it against attack from outside. This required attention in several areas:
n System configuration, in removing all unnecessary system services and making sure the remaining services are configured as effectively as possible.
n Registry settings for performance and security.
n Metabase settings for performance and security.
The team was unable to find a comprehensive set of published settings that covered the above areas, perhaps because there are so many sets of demands on system configuration in general. However, we feel that configuring a system to be a locked down web server will be a common enough task that it would be useful to establish and publish a set of recommended actions and settings.
Active Directory (AD) is a key addition in Windows 2000, yet it has been difficult to justify its use in the web server farm context.
AD is generally used to manage populations of users and machines. At Hotmail, it is not interesting to use AD to manage customers. User privileges and restrictions are already handled by the Hotmail application code, and there is no concept of granting or restricting access to customers within the Hotmail infrastructure. Furthermore, there is a constantly changing population of many usernames (over 100 million in July, 2000), a size that may be beyond the capabilities of Windows 2000.
The site has users in another sense: administrator accounts that are used to manage the machines by hand or by script. However, all administrators are fully trusted in the system (once they are inside the firewall), and it is normal to allow them to log in with full administrative privileges. This is the equivalent to the UNIX root account. It is useful to allow single sign-in, to allow an administrator to move from one machine to the next, and also to add new users at a central point; however, these needs are easily met by NT4″s NTLM.
There is a stronger argument for entering the servers in AD. This will provide integration with DNS, and holds out the potential for administrators to classify machines in whatever ways they find useful operationally.
The Hotmail server farm is organized as a series of clusters, each containing several hundred servers. These machines must be named systematically. In practice, server names are duplicated between clusters, as they are identified uniquely by the fully qualified domain name (each cluster is a subdomain). This presents a problem for AD, which (apparently because of NetBIOS compatibility) does not permit duplicate short names anywhere within a set of subdomains. Getting rid of the NetBIOS legacy will be a great boon for M”crosöft.
This apparently trivial restriction was enough to postpone the idea of constructing an AD, which in any case is additional work without obvious benefit. It was necessary to maintain the names of systems through the upgrade, because of legacy monitoring and administrative tools. Existing administrative mechanisms were adapted and did not need the benefits of AD. It is expected that, later, administrative staff will be able to develop tools that can make use of AD (for example, the ability to query on servers with a particular characteristic may be useful) but for now there is no need to break into the circle.
The Windows DNS service, operating without AD, proved perfectly capable of handling the load, and was able to take up the data from a UNIX BIND server easily. Windows DNS is used at the site for both internal and external name resolution.
It is naturally a requirement that a web-based service operate continually, without customer-visible degradations of service. This is not just a matter of pride; even a loss of availability for a few minutes every month can produce too much degradation in the perceptions and (assuming we publish uptime numbers) the availability measurement.
It”s a solution, but a weak one, to put servers behind load-balancing equipment and take them out of service when required for upgrade or other maintenance. The challenge is to keep each server running continuously as much as possible. Except for operating system upgrades, a system based on FreeBSD and Apache can keep operating while the application is upgraded, and Windows should be able to do the same.
Application updates at Hotmail are of two kinds: content and code. Content updates change only data files, generally those that directly determine what the customer sees on the screen, and they are carried out on their own schedule. Apache can handle both content and code updates without stopping the service. Updates can be rolled out directly, when the data is updated in place. They can also be timed, when the updates are put on the servers in a staging location together with an update batch job that will be triggered at the desired moment. The timed update is used when it is important for the application”s integrity that the entire site be updated simultaneously, something that is impossible to achieve when updating several thousand servers across a single network.
Apache running under UNIX supports both kinds of updates very simply. A CGI application can be replaced, even while the old file is being executed, and the next execution will use the new file. The same is true of content. If Apache”s own configuration files must be updated, there is a procedure to signal the server to reset itself and reread its configuration, and that takes around a second.
Unfortunately, IIS 5.0 does not support either kind of update well. When IIS accesses content directly, it locks the folders. Fortunately, this doesn”t apply to most of the Hotmail upgrades. The bigger issue is updating the ISAPI filters, which must be done while the IIS server is stopped. The entire process can take a minute or so.
The Hotmail staff has invented a technique that uses a thin ISAPI filter (the “shim”). It loads the application as a separate DLL and passes on all the ISAPI requests. It also watches for updates to the application DLL in a predetermined place, and when it is notified of an update it maps the new DLL, sends it all new requests, and allows the old requests to terminate before removing the old DLL. This technique has been made available to the IIS team.
The team investigated, but decided not to use, Intellimirror-based update. First, Intellimorror requires AD to be implemented. Second, Intellimirror (working with the Installer) only makes updates to applications when a user logs in or when the system is rebooted. Since user login is an irrelevant activity in this context, and the whole idea is to prevent a reboot, Intellimirror-based update does not meet the need.
The UNIX implementation packaged new code as a compressed file using the UNIX tar format, and distributed it (and the necessary installation code) using the UNIX rdist utility.
The team investigated use of MS Installer technology for a packaging format. Although it would probably have met the requirements (including the ability to unpack versioned files into specific locations, make registry changes, and run arbitrary code during installation) it proved too difficult to learn, despite the availability of a few decent authoring packages. The team stayed with the zipfile method of packaging.
The UNIX rdist mechanism is also well suited to installation and updates on a large number of identical machines. From a central location, the administrator can iterate over a list of servers and push packages to them. The rdist daemon (service) running on the remote systems will extract files from the packages into their specified locations and run arbitrary commands before and after installation. This is approximately equivalent to MS Installer features, with the additional ability to push distributions over a list of machines. The Hotmail team implemented a version of the rdist daemon to run on Windows.
The Hotmail infrastructure is monitored remotely, in an operations center located with the development staff in the Sunnyvale campus. There are many tools in place to monitor the performance of the server farm. Some of these measure the systems by their external behavior, and they did not need modification. Others use information gathered by the servers themselves (performance counters, disk statistics and so on). It proved to be relatively simple to write scripts that would extract the desired information from the Windows performance counters and send them to the Operations consoles.
Some of the self-test and monitoring features of the servers are performed by customized operations (usually scripts) executed at predetermined intervals. These intervals are anything between a minute and a week.
Using FreeBSD, such tasks are scheduled by the cron service. Jobs are scheduled by being listed in a file, one line per job. Changing the file is easy to accomplish using the command line (or rdist), and replacing the entire file is a good way to ensure that each server has exactly the schedule of jobs that the administrator intended. Jobs can be scheduled to execute once, or at intervals down to one minute.
Although the Windows Task Scheduler service is fundamentally able to look after such jobs, the interfaces provided in Windows does not measure up to the task.
n The usual interface is the GUI, which is appropriate for setting up jobs on a machine at a time, is labor-intensive and error-prone.
n The command at is deprecated, is not able to schedule repeated jobs at a frequency of less than one day.
n The command jt was offered by the Task Scheduler team, but it is unsupported and awkward to use (it was intended for testing).
n None of the three interfaces offers an easy way to replace the current task schedule entirely.
The team met the need by running the cron service provided in Services for UNIX. As described earlier, relying on Services for UNIX (or any other package subject to extra license costs) provides a bad model for other customer deployments. We have provided input to the Whistler command line team for an improved interface to Task Scheduler.
There was a minor issue concerning the UNIX integrated logging feature (syslog). The kernel, standard services, and application code can write lines of text to syslog, and a single configuration file is used to determine the destination of the text lines. Thus an important alert can result in a console message and email, while an informational message can be written to a log file. The administrator can change the destinations without code having to be recompiled.
An application like Hotmail often uses the application access to syslog to write statistical data of business interest (such as creation of a new account or sending of an email message). Administrators can use other tools to analyze the logs, archive them, or simply count occurrences and throw the logs away. Typical usage is at the order of one event per second; the high performance associated with the kernel log is not required.
There are no features in Windows 2000 that provide the same combination of convenience and configurability, although the kernel event log comes close. For convenience, and also to avoid recoding, the team elected to use the syslog feature from the Interix subsystem, introducing the issues about notional cost that have already been discussed.
Whistler introduces the Enterprise Event Log, a lightweight WMI feature, which seems to provide the desired functionality. A closer examination of the kernel logging may show that it too can meet the need, Any replacement should involve trivial change to the existing application code (perhaps even using a macro); it would be desirable not to have to recode calls to syslog in order to keep down the amount of source code conversion.
There are occasions when, after deployment, the administrators need to make a configuration change consistently across the entire farm. The rdist mechanism can be used for configuration changes; if the change is simple then rsh can be used. The key fact about UNIX that makes this work is, again, that all system administration tasks can be done from the command line.
Windows should provide the same functionality, given some means of aggregating a group of servers and some way of performing an operation consistently across all the servers. Single commands, pipelines, or scripts (command scripts or COM-based scripts) would be appropriate actors; however, scripts need to be downloaded, executed (and, if necessary, cleaned up). There should be the ability to defer the activity until a specific time, presumably using the improved Task Scheduler. In other words, Windows must support old-fashioned batch processing.
One specific example of a feature that is not accessible to the batch model is Network Interface Card customization; for example, there have been requirements to change the card”s speed from 10 Mbps to 100Mbps (at a specific time) or to change the MTU setting. The configuration model of an Ethernet NIC varies between manufacturers, and the standard GUI is driven by a schema that is found in the registry. Such a GUI is not at all adaptable to the batch model. It is possible to make the required changes to the registry, but that would require a subsequent reboot, which is not acceptable. A brief period off the network, while the card resets itself, is the most downtime that can be accepted.
The Hotmail team, with help from a network engineer, developed a rudimentary application that would put a specific value in the registry and (using an undocumented interface) reset the card in a way that will make it pick up the new value. We strongly urge that the feature be put into the shipping system.
Helping UNIX system administrators with the transition to Windows is an experience in itself, and much was learned. Again, this is data from a single corporate experience, but we suspect it is fairly typical. Here, then, is the human engineering overview.
Initially, the plan to convert from FreeBSD to Windows was met with responses ranging from skepticism to hostility, in a way that should be familiar to those who share the attitudes of the various UNIX communities to M”crosöft software.
We engaged with the operations staff by asking them to define what their everyday tasks are, in all areas of operating system and application maintenance. Instead of a set of tasks, we were handed a set of the UNIX commands and features that were used to carry out those tasks. While this did not directly meet the need, it gave us an opportunity to address all of the features directly, and show that Windows has an exact equivalent in the core system, or in the Resource Kit, or easily provided with a script. There were very few cases where no satisfactory alternative could be found. Essentially, this was throwaway work, as the eventual solution solved the problems in a more Windows-like way, but it was an excellent opportunity to gain the confidence of the operations staff.
It was clear from the responses that some people from the UNIX side of the house cannot distinguish our different systems that are marketed under the Windows brand; there was an inbuilt assumption that Windows 2000 shares the features and faults of Windows 95. Those who were somewhat familiar with Windows NT were not aware of the range of the non-GUI offerings (to be fair, neither were we); the set of commands in the product and the Resource Kit is fairly broad although, as we have seen, there are gaps and they lack stylistic consistency.
Other staff members, not members of the regular operations team, carried out the conversion. When deployment came near, the Operations staff had to learn the new tools and paradigms. Their existence proved enough; the main interest of Operations staff is, after all, to run and administer the system, and once they found that there were tools, whether custom-built or standard, that did the job well enough, they were able to take control and gain a sense of ownership. Some standard one-day courses were also given to the staff, to prepare them for handling system debugging, hotfix application and so on. By this time, the staff had become convinced that Windows is, after all, a real operating system with surprising richness.
n Make the message clear: Windows 2000 is a modern operating system; it”s not Windows 9x.
n Gain the confidence of the skeptics by showing them that it is a real operating system, and not so different from other operating systems in fundamental ways, by showing some basic command-line tools that monitor the system in action.
n Use the self-interest of operations staff to ensure that they have full authority over their areas of responsibility.
These are the main lessons that we can extract from the Hotmail conversion.
1) We need a consistent, comprehensive, thoughtful approach to integrated management of a set of servers. This does not necessarily mean that we should slavishly follow the UNIX model of iterating through a list of machines with an rsh command, or pushing configuration files to a list of machines. The fundamental goal is to be able to manage machines as an aggregate; doing this through a GUI is not necessarily evil, so long as it can be done remotely, and once. The point applies to application distribution as well as to system tuning.
2) NLBS is at an economic disadvantage, due to its association with Advanced Server, and Hotmail operations staff were sufficiently satisfied with the existing solution that they did not feel the need to investigate NLBS”s operational advantages.
3) The metabase needs to be ripped out and replaced with something that is much easier for an administrator to see and understand, and be confident that there are no hidden surprises. The IIS6 planners have heard this opinion.
4) It should be easier to tune and lock down a single system, and have the changes propagated to all systems in a given class.
5) Windows is too complex to understand at first, particularly during a conversion from UNIX. There are just too many things about it for a planner in a startup to understand. Typically there is little time to attend training. The problem is most Computer Science graduates come to their startups already understanding enough about UNIX to be confident that they can use it effectively. We do need to be careful to balance the complexity and transparency carefully.
6) The basic need for an Internet site, converting from UNIX to Windows, is to be able to quickly replace their application and operational methodology with something at least equally good. Improvements that come for free are good, but implementing new technologies and programming methods will need to take a back seat so long as they delay the main purpose, which is to keep a site online and competitive. Anything else is a cost that needs to bring a clear benefit.