Distributed computing concerns any kind of system in which computing nodes are connected by a network. These computing systems are an important and growing part of society. To understand how students will benefit from the European Doctorate in Distributed Computing, we first take a step back and look at the history and long-term trends in distributed computer systems and how they affect society. We then classify the different ways in which distributed computing has permeated society. Finally, we identify five key research directions in distributed computing that are covered by the present proposal.
Paradigm shift toward deeply distributed applications
Historically we are in the process of a paradigm shift. Thirty years ago almost all computer-based systems were built with a central computing unit accessed from terminals with a minimum of resources. This changed with the introduction of the personal computer and during the last twenty years we have seen a tremendous increase in the resources available at the client side. During several years, the computational power and the number of personal computers increased more rapidly than the network connectivity, leading to scenarios where services were tied to the client hardware with limited or no communication capacity. As an example, consider how computer games were bundled just a few years ago: the application was installed on the personal computer and it was used as a stand-alone system.
During the last ten years an important trend could be observed: the connectivity has been increasing almost as fast as the computational power. The increased connectivity derives both from the fact that the backbone infrastructure has been gradually replaced from voice telephony to data centric networks and from the increasing coverage and bandwidth available in the last mile. Thus, applications do not need to be standalone and can benefit from the available connectivity, for additional interaction (such as in on-line games) or just to benefit from extra computational power deployed in data centers. In the limit, many applications can be deployed directly in the data centers and client machines only act as powerful terminals. Thus, we may quickly find scenarios that resemble the typical deployment forty years ago but now on a global Internet scale: available anytime, anywhere using more or less any terminal. The advantage from a service developer’s perspective is not only the near elimination of initial investment in hardware needed to deploy a service, but that the service can scale and support thousands or millions of users. For service developers this has drastically reduced the threshold for developing and launching services; resource planning, that traditionally has been a large part of system design to address peaks of demand, has become a non-issue. Moreover, high availability, providing the service despite network and computer failures, that is extremely problematic and costly, is something that is provided as part of the networked infrastructure.
However, the infrastructure that supports these services is now an extremely complex distributed system. The main problem is that it has to scale seamlessly with increased usage. It should do so while maintaining the image of a increasingly powerful single computing machine. The most cost effective way of doing this is via horizontal scaling, by implementing the service using a growing set of small relatively cheap computers. Instead of using one machine to its limit and then, at high costs, replace it with a machine with twice its power, smaller machines can gradually be added to the set. A large set of computers will also give us a fault-tolerant system since it is less likely that several computers malfunction simultaneously. Providing a scalable, fault-tolerant infrastructure is of course highly desirable. Clearly, we need professionals that are able to design and implement the mechanisms and the software that provide it: this is where the European Doctorate on Distributed Computing and its students will find their role.
Distributed computing and society
We can classify the infrastructures supporting distributed computing as follows:
The Internet , which can be defined in a wide sense as all the high-level protocols, middleware, and applications built on top of a world-wide network implemented with the TCP/IP protocol family. Many important services exist on top of this network, for example, the World-Wide Web, which is based on the HTTP protocol family.
Mobile computing including mobile phones and other devices connected primarily by wireless networks of all kinds (e.g., GSM, Wifi, Bluetooth, Zigbee). Mobile computing is the mobile part of the telecommunications network, which also contains a large fixed network.
Pervasive and ubiquitous computing . This is a paradigm of human-computer interaction in which computing devices become part of our everyday living environment while being inconspicuous. We may not even know they exist. But they can affect our lives profoundly. One (controversial) example is the increasing presence of video surveillance cameras in public areas. Pervasive computing devices are most often networked and therefore form bona fide distributed systems.
High-performance computing infrastructures include Grid and cloud infrastructures, which can be implemented on top of the Internet but also independently with cluster computing, multiprocessors, and multicore computing. Multicore computing has become the mainstream part of high-performance computing: a growing percentage of consumer microprocessors are multicore.
It is a commonplace observation that all these areas are profoundly affecting modern society. The most visible changes are in commerce, social interaction, information retrieval, and mobility:
Web commerce is changing the face of commerce. For example, the media economy (music, video, books, and news) is changing profoundly from a model where the media are scarce (e.g., DVDs, books, newspapers) to a new model where media are cheap (computer storage and copying). We are still in the middle of this change and media industries are still struggling to adapt their business models to this new approach.
Social networks are changing the face of how we socialize. Old style letters and phone communications are increasingly giving way to social networks at all levels of granularity: from small messages (Twitter) to longer messages (blogs) to all kinds of sharing (Facebook, Flickr, Youtube, etc.). Scientific interchange is increasingly being carried by digital journals.
Global services are rapidly raising the intelligence of the Internet. From the origins of the Web in the early 1990s to search engines in the late 1990s, current services include more sophisticated interactive information extraction tools such as Google News and Trends, geographic services such as Google Earth and Maps, Content-Distribution Networks and video on demand, and information-intensive services such as Google Translate and Wolfram Alpha. Other services are tied to commerce applications, such as the Amazon and Netflix recommendations services. In addition, there are numerous smaller services used by consumers and businesses everywhere. All these services provide increasingly intelligent interaction with the information on the Internet.
Mobile devices such as mobile and smart phones, and recently tablets, have become omnipresent and are removing all limitations between us and the social services mentioned above. They are surpassing the fixed telephone network in numbers and growth. They are deeply connected to all the developments mentioned previously, but they are not limited to those developments: they also have their own application ecosystem.
These developments have seen an intensified research and development in distributed computing.In addition to the above areas, which each has its own research challenges, we have identified in our consortium the following five more fundamental research directions which can be seen as enablers for the above areas. Note that this list is not exhaustive, as the challenges for research in Distributed Computing evolve continuously:
Ubiquitous data-intensive applications;
Scalable decentralized distributed systems: P2P, large-scale (grid, cloud, overlays);
Adaptive (self-managing) distributed systems;
Advanced networking, including protocols, architectures, and intelligent networks;
Applied distributed systems.
We present each of these five key research directions below.
Key direction 1: Ubiquitous data-intensive applications
In the last couple of years we have seen a strong trend towards a network based service infrastructure often referred to as “Cloud Computing”, based on data centers which provide computational, network and storage resources available for service developers at a competitive price. Services reside in these centers and are accessed through thin clients such as smart phones. Cloud computing has been surrounded by an incredible amount of hype. Let us give a clear definition (taken from the Future of Cloud Computing Report, January 2010, European Commission Expert Group):
A cloud is an elastic execution environment of resources (computational and storage) involving multiple stakeholders and providing a metered service at multiple granularities for a specified level of quality (of service).
This definition focuses on what the cloud provides, not on how it is implemented (in particular, many clouds are implemented in data centers, but this is not a necessary property). From this definition, we see that cloud computing offers a new property with respect to earlier client/server infrastructures, namely elasticity : the ability to scale resource usage up and down rapidly according to instantaneous demand. Since a cloud has metered service (pay only for what you use), elasticity implies that for a given cost, more resources can be made available as the usage time decreases. This ability makes possible a new class of data-intensive applications that use computational and storage resources in short intense bursts. Observing the development of Internet applications, we see that applications with some “bursty” properties are starting to appear on the Internet. Most of the global services mentioned earlier are in fact bursty applications. These applications are just the beginning. We foresee that bursty applications will cause more profound, qualitative improvements in Internet operations. For example, Google announced in Feb. 2010 that it is working on a real-time audio language translation service.
Partners of the present proposal are deeply involved in this new development. It crucially involves a combination of distributed computing (especially on clouds) and domain expertise. We call this new form of computing data-intensive computing. Project partners are deeply involved in both distributed computing as well as all the relevant data-oriented disciplines, such as machine learning, data mining and analysis, databases, signal processing, image recognition, and so forth.
Key direction 2: Scalable decentralized distributed systems
Even though there are contemporary large-scale computing infrastructures (namely for utility and cloud computing) deployed resorting to data centers, these are mostly proprietary and paid. An alternative to achieve the cloud computing metaphor, that can provide a decentralized and free (both from payment and corporate control) infrastructure, is motivated by the current success of resource and content sharing (in Grid and peer-to-peer scenarios).
A next generation of resource and content sharing will employ, adapt and integrate existing large-scale distributed systems such as institutional Grids, federated clusters, public computing infrastructures (such as BOINC and SETI@home), and content-sharing on peer-to-peer overlays (e.g., BitTorrent). This implies offering cloud computing interfaces (e.g., Infrastructure-as-a-Service, Platform-as-a-Service, Software-as-a-Service ) but implement their functions by resorting to such decentralized infrastructures.
To offer a peer-to-peer cloud infrastructure, innovative research must be performed in several topics of distributed systems and distributed computing: resource and service discovery, scheduling, cooperative scheduling, distributed storage and replication of data and task/virtual machine execution, application shipping, task division, reliability and security of local execution, reputation, quality-of-service.
The rise of these public computing clouds adds increased resilience and resource utilization, since many of such machines are always in and connected anyway. Even environmental advantages could be drawn of reusing such machines.
Key direction 3: Adaptive (self-managing) distributed systems
Given the very large size, and possible geographical distribution and decentralization of cloud computing infrastructures, to manually deploy, configure, manage, tune, and repair such infrastructures is a daunting if not completely impossible task, regardless of the level of software support. Therefore, such large scale computing infrastructures, and even applications aided by middleware, must be able to dynamically adapt to unavoidable variations in resource and machine availability, while avoiding significant infrastructure and application disruption and preventing shutdown. Such adaptability is usually referred in the realm of autonomic computing that essentially aims to embody computing with a set of so-called self-* properties (e.g., self-configuration, self-healing, self-managing). This adaptability can be designed by leveraging existing research on context-awareness, adaptive systems, and reflective middleware.
Large scale computing infrastructures must be designed using a principled, layered and component-oriented approach where each entity must provide interfaces that enable introspection (i.e., check the configuration and performance parameters), and adaptation (e.g., reconfiguration, redeployment). Research is required to leverage such software sensors and actuators. They can be governed by declarative high-level policies, coordinated by upper layers in the infrastructure or processed and updated in peer-to-peer manner in network vicinities, ensuring eventual convergence and balance of global behavior.
Key direction 4: Advanced networking
It can be argued that the Internet does not do any single task terribly well, but it does everything well enough. In economic terms, “well enough” is what matters. There are two problems with this position that we address in the present proposal. First, the Internet really does not do everything well enough. The limitations are well known: the Internet does not provide predictable quality of service and does not provide a sufficiently robust and secure infrastructure for critical applications. Second, companies involved in the Internet are under enormous commercial pressure, which makes it almost impossible to introduce radical changes to the network to support very demanding applications. The inevitable consequences are that the core network changes only through the accretion of point solutions that provide immediate benefits to organizations paying the bill.
The trend is clear: the Internet can be improved by embedding application knowledge in the network to improve robustness, quality of service, manageability, and intelligence of networking operations. The problem is that in doing so, we lose the benefits of generality that led to the success of the Internet in the first place. Nevertheless, our goal is simple: we wish to stimulate innovation on the Internet’s networking architecture. Achieving this goal requires a careful balance of the principled and the pragmatic. This is the main idea behind the networking research we propose to do in this proposal. For example, one promising direction (which is being taken in one of the FP7 projects in which the current partners are active) is the development of a general-purpose flow-processing architecture, to break the innovation logjam that has been developing over the last fifteen years. (Flow processing is any manipulation of packets where the service given to those packets is different because they are part of a flow of packets.)
Key direction 5: Applied distributed systems
The application of distributed computing in many important socio-economic areas results in foundational needs for research. We consider for example context awareness, collaboration, publish-subscribe (messaging), distributed storage, voluntary computing, and mobile computing. These foundational needs are important forsocio-economic areas such as smart spaces, health-care, intelligent transportation systems, smart energy systems, social networks, entertainment, Internet-wide storage services, future media and content delivery, distributed AI, Cyber-Physical Systems (CPS); distributed applications on hybrid clouds, mobile applications (applications on hand-held devices and mobile phones).