Research in a Connected WorldKey ConceptsResearchIntroductResearch today is often critically dependent on computation and data handling The practiceknown under various terins such as e-Science, e-Research, and cyberscience We would like tthese terms, but when it is unavoidable, in the interests of brevity, we use the term e-Researchsense to include all information processing support for research Irrespective of the name, manyacknowledge that the use of computational methods and data handling is centThere is no question that scientific research over the past twenty years has undergone a transformationThis transformation has occurred as a result of various factors New technologies, leading to new methodsof working, have accelerated the pace of discovery and knowledge accumulation not only in the naturalsciences but also in the social scieand arts and humanities Advances in scientific and other knowledggenerated vast amounts of data which need to be managed well so that thebe analysed, storedd preserved for future re-use Larger scale science enabled by the Internet, and other information andmunication technologies (ICTs), scientific instrumentation and automation of research processes hassulted in the emergence of new research paradigms that are often summarised as data-rich science
Afeature of this new kind of research is an unprecedented increase in complexity, in terms of the sophisticationof research methods used, in terms of the scale of phenomena considered as well as the granularity ofputer-enabled methods to achieve new, better, faster or more efficientesearch and innovation in any discipline It draws on developments in computing science, computationutomation and digital communications Such computer-enabled methods are invaluable within this contextof rapid change, accumulation of know ledge and increased collaboration Thehroughout the research cycle, from research design, data collection, and analysthe dissemination ofresults This is unlike other technological equipment, which often only pluseful at certain stagesof research Researchers from all disciplines can benefit from the use of e-Research approaches, from thephysicalces to arts and humanities and the social seThe following sections in this introduction will elaboratehese transformations in research and therole played by IcT, describing research collaborations, "big research"in a globalised world and participatiin researchResearch collaborationsoday's research into social and scientific issues and problems often involves increased sharing of resourebecause individual research institutions cannot afford having these resources or because they are inherentAvailableforfreeatConnexions
distributed (for example in theof linked radio telescopes) The research community has changedt more work is done in international collaborations and these collaborations haveMulti- or interdisciplinarTackling the grand challenges of many disciplines today requires the coordinated effort of groups ofresearchers working on different aspects of a problem Also, individual researchers can more rapidlyt heir know ledge in a particular field if they are able to become part of an international and interdisciplinarcollaborative network Instead of working on their own or only with colleagues within their own institutionsresearchers now often work in collaborations with colleagues in other institutions, who can provide specialistknowledge, skills or ae-Research provides researchers with an environment, for sharing remaking large, distributed data sets accessible, through enabling synchronous or asynchronous collaborations geographical diand providingto resources regardless of location This opening up ofresearch means that researchers need not be held back by theirparticipate in cutting-edge projectse-Research Technologies Supporting Collaboratione-Research technologies support the research collaborations described above by introducing a model forharing based on the notions of "resources"that are accessed through"services" Resources cansuch as high-performance computers, storagees datasets held by data archives or even remote instruments suchscopes available to collaborating researchers, their owners provide services that pro-vide a well-described interface specifying the operations that can be performed on or with a resource, e gsubmitting a compute job or accessing a set of dataThis simple underlying model of collaboration is complemented by additional functionality such as arthentication and authorisation to regulate access to a resource or management functions such as resourcereservation It is important to note that the underlying model is kept simple and that any additional fundtionality layered on top of it is also formulated in terms ofherever possible Usingthese general principles, it is possible to build a vast range of tools andations that support collaborativeresearchComputer-enabled methods of collaboration for research takcing, wikis, social networking websites and distributed computing itself For example, researchers mightuse Access Grid for video conferencing to hold virtual meetings to discuss their projects
Access Grid andvirtual research environments provide simultaneous viewing of participating groups as well as software toallow participants to interact with data onn Wikis have also become a valuable collaborative toolThis is perhaps best demonstrated by the Open Wet Ware website, which promotes the sharing of informa-ion between researchers working in biology, biomedical research and bioengineering using the concept of airtual Lab Notebook This allows researchers to publish research protocols and document experiments, Italso provides information about laboratories and research groups around the world as well as courses andevents of interest to the communitySocial networking sites have been used or created for research purposes The my Experiment Gebsite is becoming an indispensible collaboration tool for sharing scientific workFlows and buildingties Such sharing cuts down on the repetition of research work, saving time and effort, and leadingadvances and innovation more rapidly than if researcherswn withoutsimilarork (for comparison to their own) Other social networking sites such as Facebook have been adopted byresearchers and extensions have been built to allow them to be used as portal to access research informationFor example, content in the ICEAGE Digital Librarybe accessed within Facebookp: //wwwjanet/services/video/agse/AGSCHome/whatisaccessgridhtmlmyexperiment org/Thttp://libraryiceage-euorg/AvailableforfreeatConnexions
Systems Research in a Globalised WorldMany researchers now devote a significant amount of their attention to global issuescould not be addressed due to technological and informational limitations These globalinstance, climate change, pandemics, rainforest destruction and biodiversity Such"big resproblemsfall under wider contemporary concerns about living sustainably and understanding human biology andhealth(including the aetiology of diseases and the search for cures)This ubiquitous global perspective hasge part emerged because of a worldwide exchange of information and the availability of data resulting from use of IcT, coupled with the use of Ict to organise thatata For example, the earth is seen as a systhin systems, which necessitates the need5Vfor cross-scale research Earth systences provides a useful example of this change to-ms research" ICT is used to model and simulate integrations of geology, oceanography and environmentalences, generating a more complex, holistic view than was possible prior to the increased use of computerabled methods There has also been a recent concerted development of systems biology, which involvesegration of mathematics, engineering and computer science to manage the data deluge in biology in orderig questions concerning sustainable living and human health on a global levela significant number of researchers in the social sciences and arts and humanities have also taken up thisglobal view For the social sciences, this perspective is clear, for instance, in the idea of global knowledged attempts to solve social issues relating to sustainable living through large-scale data gatheringalysis In the arts and humanities, a global perspective is evident in the development of the GlobalPerforming Arts Consortium, an international database of performing arts resources, and in global culturaland international studies research which often relies on requires access to large amounts of cross-culturallderived datadequately substantiate conclusiParticipation in Research- Democratising"Big Sciencee-Research not only enables scientists to tackle"big questions, but it has also allowed for wider participationresearch
Volunteer computing allows members of the public to support and take part in rech conductedby teams of professional researchers by providing compute resources or by performing specific tasks that arepart of the research process For example, the Setighome project makes use of volunteers'desktopomputers to search for extraterrestrial life while Folding ahome U uses the compute power provided byvolunteers to study protein folding In the case of climateprediction nety member of the public withppropriate computer equipment can contribute to the study of climatIn each of these casesks and data are shared across a network of dispersed computerseasing the compute powerd storage capacity available far beyond the capabilities of a singleSeveral of the exof inspiring e-Research projects we will introduce here have been successful as a result of using voluntcomputingOpen Source Science is not just about direct public participaIt is also abouthat the public has access to and can observe the research process Open Notebook Science enables bettercollaborationg researchers at thetime that it makes research project records available online forerusal by the lay public In this way, big science"is democratised, no longer purely the product and toolf a cloistered research elite but an activity within a wider societal context that society membersartshttp://AvailableforfreeatConnexions
Research in a Connected World- Fundamental Concepts and Inspirng Examplesa strong argument for themethods by illustrating their importance in a multitude of research endeavors The rea ConnectedWorld brochure serves as an int roduction to e-Research for those unfamiliar with suchrevealing it,'spotential and promise for all disciplines The brochure consists of individual modules that give researchersgrounding in fundamental concepts and a taste of what is possible when using computer-enabled methodsWe provide an introduction to distributed systems, contrasting them to desktop PCs, and then moto detailed discussion of inspiring examples of e-Research, looking at projects in many different fieldsThese examples are followed by examples that show the wider impact of e-Research and explore the uniquecollaborations that have developed not only among other academic researchers but also between researchersd the wider public
The subsequent section of the brochure describes elements of and issues relatindistributed systems, beginning with a short history of distributed computing and including modules onthe taxonomy of research computation problems, distributed computing architectures, issues concerningmanaging complex data, visualisation, use of portals and virtual research environments A final modulecontains a list, of relevant services and contactsWe hope thisce will not only inform you but also inspire you to begin toomputer-enabledmethods to further your research If you already consider yourself an e-Researcher, we hope to have intro-duced you to new tools that you can begin to apply in your own workvailableforfreeatConnexions
What is a Distributed System?Key Conceptsdistributed systemsIntroductionOver the past decades, as we havexplain, we have moved from processing the data that we can holda lab notebook to workingthousands of terabytes of information(For reference, a terabytemillion megabytes, and a mes a million letters A plain textbook might be a few megabytes inze, as might a high-quality photograph -a terabyte is like a huge library And yet we keep striving towork with ever more: more genomic data; more high-energy physics data; ever more detailed astronomicalphotographs; ever richer seismographic measurements: ever more layers of interpretation of artistic detailsgreater volumes of financial datamplex and realistic simulations We drill dedeeper into the details How areWe are in the middle of a huge revolution in information processing, driven by the fact that our toolchoice for working with information- the computer -has been getting exponentially better ever sincetheir invention during the Second World War We live in the middle
of an age of wonder And yet, despinow being able to hold immense quantities of computation and storage in our hands, our desire to work witever more has grown even fasteThankfully we have been living through another revolution at the same time: the telecommunicationsrevolution The telecommunications revolutioned with the invention of the telegraph, but acceleratedwith the convergence of computers and telecoms to create the InternetThis not only allows people to shareinformation, but also computers, and it has transformed the world The first indication of just how amazingthis would be came with the Worldwide Web(www), the first internet system to really reflect everythingthat people do throughout society [ link /reference here to history chapter] But it will not be the last; theapples from the second wave are now being felt, and it is the global research community that are in the leadThis second wave is Distributed Computinghe Way Distributed Computing WorksSimply put, distributed computing is allowing computers to work together in groups to solve a single problemlarge for any one of them to perform on its own However, to claim that this is all there is to it massivelythe pointDistributed computing is not a simple matter of just sticking the computers together, throwing the datat them and then saying Get on with it"! For a distributed computatioork effectively, those systemaust cooperate, anddo so without lots of manual intervention by people This is usually done byens inteer pieces, each of which can be tackledsimply than the whole problemThe results of doing each piece are then reassembled into the full solution'thiScontentisavailableonlineat
The power of distributed computing can clearly be seen in some of the most ubiquitous of modernapplications: the Internet search engines These use massive amounts of distributed computing to discoverd index as much of the Web as possible Then when they receive your query, they split it up into fastarches for each of the words in the query The results of the search are then combined in the twinkling ofan eye into your results What about locating computers on which to execute the web search? That is itselfa distributed computing problem, both in the process of looking up computer addresses and also in findingan actual computer to respond to the message sent on that addressEarly distributed systems worked over short distances, perhaps only within a singlecould really do was to share a very few values at set points of the computation Since then, things haveevolved: networks have got faster, numbers of computers have got larger and the distances between theThe speeding up of the networks(from the telecommunications revolution) has been extremely beneficialt has allowed many more values to be shared effectively, and more often The larger number of computershas only partially helped; while it has meant that it is possiblere total computation and to splitthe problems into smaller pieces(allowing a larger overall problem), it has also increased the amount of timeand effort that needs to be spent on communication between the computers, since the number of ways tocommunicate can increase(see Figure 1)The growth in the number of links as the number of computers goes upTherese, ways to improve communication effiy having a few computersalize in handling the communications (like a post office) and lettink, butdoes not always succeed when the overall task requires much comnThe distance between computers has increased for different reasotproduce heat A single PC normally only consumes a small amount of power and produces a tiny amountf heat; it is typically doing nearly nothing, waiting for its users to tell it to take an action
withcomputational task, it would be far busier and will be consuming electrical energy in the process; the busierheat Ten busy PCs in a room can produce as mucpowerful domestic electric heater With thousands in one place, very powerful cooling is requiredthe systems from literally going up in smoke Distributing the power consumption and heatreduces that problem dramatically, but at a cost of more communications delay due to the greater distancesat the data must trayThereany ways that a distributed system can be built You can do it by federating traditionalsupercomputers(themselves the heirs to the original distributed computing experiments)to produce systensive but able to communicate within themselves very rapidly; this remains favoured for dealingwith problems where the degree of internal communication is very high, such as weather modelling or fluidHow simulations You can also make custom clusters of more traditional PCs that are still dedicated to beinghigh-capability computers; these have slower internal communications but are cheaper, and are suited for"somewhat-parallel"problems, such as statistical analysis or searching a database for matches(egsearching the web), And you can even build then by in effect, scavenging spare computer cycles from acroAvailableforfreeatConnexions
Research in a Connected worldCollection editors:Alex vossElizabeth vander meerDavid FergussonAuthorsMalcolm atkinsbias blankeAndy KerrAna lucia da costaErwin laureDavid de roureSteven NewhouseStuart dunnSDonal felloMartin turnerElizabeth vander meeraul fisherAlex vosseremy FreyKaty WolstencroftCarole goblerichard sinnottSarah harrisOnline
org/content/col10677/112/>CONNEⅩIoNSce University, Houston, Texas
of content as a collection is copyrighted by Alex Voss, ElizabethlicensedundertheCreativecOmmonsAttribution30license(http://creativecommons
org/licenses/by/30/)collectiocture revised: November 22 2009PDF generated: October 26, 2012For copyright and attribution information for the modules contained in this collection,94
able of contentsWelcomeEditor's introduction to research in a connected worldResearch in a Connected worldWhat is a Distributed System?1 Examples of e-Research1 Archaeology2 Text Analysis in the Arts and Humanities3 Climate Prediction5 nanoCMOS Device, Circuit, and System Simulation6 Computational Chemistry2 Distributed systems21 The European e-Infrastructure EcosystemThe EGEE Distributed Computing Infrastruct3 Managing Complex Data31 Scholarly Communication and the Web32 Scientific Workflows33 Repositories3
4 Resource Sharing: Trust, and Security6664 Using Distributed Systems in Research41 Portals4 2 Visualization matters4 3 Virtual Research environments5 Resources51 Examples of e-Research Videos- from the elUs project2 Virtual research Environments- Videos5 3 e-Research Glossary002GlossaryAttribution
1VvailableforfreeatConnexions
org/content/col10677/112>
To do this, researchers must gain new skills in computational thinking and data-intensive research, Thisevolving as the pace of the digital revolution throws up new questions and deliversnew capabilities This book is an excellent snapshot; a launching pad from which to get started Its readerswill find key insights and authoritative references, but they must expect to move on to rapidly develop andshape the ideas needed for research in a connected world They will be on the lookout for claims that appearto break the fundamental principles of distributed systems, but they will also enjoy the rewards of being atthe forefront as new methods and technologies make significant advances in research possibleThe mfactor in the sueof Homo sapiens is their ability to communicate and collaborateThe connected world enables this as never before, as both the speed and scale of collaboration have experpioneer new forms of global behaviour It ishat researchers draw on this ne for combiningalent to address the world,s most pressing challenges before it is too lateWhen Sir John Taylor launched e-Science he said "e-Sabout global collabin keyscience and the next generation of infrastructure that will enable it'"
This book shows that Taylor'sassertion was a serious understatement It shows that the new capabilities delivered by the connected worldr new kinds of human collaboration for all forms of thinking and doing Research has a two-fold rolenew ways of thinking and doing wherever it will achieve intellectual and practical advanceand to reflect on the deep changes that are underway in global society by recording the massive changes of thedigital revolution and better understanding how they shape, and are shaped by, society This book provideswindow into research transformed by the digital revolution, revealing its benefits across disciplines and theadded responsibilities that come with these new methods of working It calls on researchers to observe, recordand analyse the digital revolution It is a valuable resource for researchers as they seize the opportunitiesbrought by the digital ageMalcolm Atkinson UK e-Science Envoy and Director of the e-Science InstituteDavid De Roure National Strategic Director for e-Social SciencevailableforfreeatConnexions
Editor's introduction to research in aConnected worldThe massive availability of net worked information and communications technologies today allows us to changethe ways we go about our daily working lives as well as the wayshopping of staying in touch with colleagues and friends, of learning or of navigating places have emergedthat are enabled by the ubiquitous electronic devices and net worked services that have become available overthe past few years Similarly, as researchers we are today utilising computers in many ways, be it throuthe use of basic services such as email or the utilisation of the most advanced digital technologies enablingnew research methods No matter what discipline we work in, there are legitimate questions about whatpotential use we might make of these technologies and what the implications of such use might beOver the past decade, funding organisations such as the UKs research councils have funded efforts to makethe most advanced information and communications technologies available to researchers and invest mentsare made to develop persistent and sustainable infrastructures to underpin a widespread uptake of digitalmethods-the development, of e-Research What has been lacking, however, is the development, of appropriatelearning material such as textbooks that would teach the basics of advanced information systems and digitalmethods in a way that is accessible to researchers from a wide range of disciplines This book is an attempfill this gap Its aim is to fill the gap between the initial interest generated by presentations of the potentialof e-Research and the various traininghat convey the skills necessary to use specific technologiesChapter outlineThe book is divided into four main sections The first two chapters provide a general introduction to theprinciples behing e-Research and introduce distributed systems, showing how they differ from single-userdesktop systems
The second section discusses a number of different examples of e-Research from a rangeof disciplines, demonstrating how research can benefit from and be driven forward by the use of advancedformation and communications technologies The third section outlines a number of infrastructures forsearch that are available to researchers today and discusses the strategies behind the development ofEuropean grid initiatives that aim to provide a sustainable environment for the development of e-Researchpractices Next, we discuss the role of data and its management, over the research lifecycle as well as a numberof relevant technologies The fifth section discusses different ways that researchers can access infrastructureservices and the ways they can be factored into actual everyday research practices Finally, we conclude thebook with a collection of resources that we hope will help the reader explore the field of e-Research furtherand make informed choices about the adoption of the technologies and methods described in this book2thiScontentisavailableonlineat
AcknowledgementsFirst of all, we would like to thank our colleagues who have contributed chapters to this collection Thehave given generously of their time and the essential input of expertise without which this book could nothave come into existence
We would also like to thank the organisations that have provided support in casISCThe UKs JIsc has provided financial support through the funding for the e-Infrastructure Use cases andservice usage models projectd Computer Science Alliance has supported the editing process by fundingcontributions made by alex vossoOO Nationaoo-e-Scienceooo CentreThe National e-Science Centre has supported the editing process by funding the contributions made by DavidMeRcThe Manchester e-Research Centre has supported the editing process by funding contribade by aleVoss and by administering the production process of the first edition of the book