To go directly a particular section of this paper, click on a section title below.
|1. A Day in the Life of a Cyberscientist|
|2. Revolutions in Scientific Communication|
|3. Science on the Web|
To return to the Science and Disease Articles Table of Contents page, click here.
AbstractInternet technologies, including electronic mail, preprint archives, and the World Wide Web, are now ubiquitous parts of scientific practice. After reviewing the full range of these technologies and sketching the history of their development, this paper provides an epistemological appraisal of their contributions to scientific research. It uses Alvin Goldman's epistemic criteria of reliability, power, fecundity, speed and efficiency to evaluate the largely positive impact of Internet technologies on the development of scientific knowledge.
Marie now accesses the World Wide Web, first checking her local institute Web page for news about visiting speakers. More important to her research, she links to the preprint archive for her field, which contains electronic versions of papers not yet published. Marie is quickly able to see that ten new papers in her special area of research have been posted to the archive since the day before. She clicks on the names of the papers to read the summaries, and makes a note to download several of them for more thorough examination later. Marie then sends a new paper that she finished the day before to the preprint archive, knowing that it will be quickly available to all the other researchers working on similar topics. Now she is ready to use the Web for her ongoing research.
Marie begins her most important work of the day by using a Web link to an internal site at her institute that is accessible to her research group. Immediately she can see that her students and research assistants working late the previous night have collected some new experimental data that she examines on her screen. These data raise a question about similar observations made in Charles' laboratory, so she finds the link to that lab's Web site and goes into a data base that contains the results of their experiments from the previous year. To find the particular data that interest her, she uses a search engine that Charles' lab has conveniently provided to take her immediately to the part of the very large database that contains the information she wanted.
This information, along with the new experimental results from her own lab, raise some interesting questions concerning the structure of the objects that she and Charles are investigating. Fortunately, another research institute has provided a Web site that vividly displays what is known about such structures, so she moves to that site and uses a search engine to call up the relevant objects. There she can examine their structure using several valuable tools that go well beyond the presentation of simple 2-dimensional pictures found in textbooks. First she runs a special "Virtual Reality" browser to examine 3-dimensional representations of the objects, using her mouse to navigate through the 3-D representation to view the objects from different angles. Then she downloads an animation that enables her to watch the objects moving together over time. Together, the animation and the 3-D model suggest a new theoretical insight into how these objects might produce the experimental effects that she has been getting in her lab. Because testing her new theoretical ideas requires some new software that has just become available for interpreting the kind of data she has collected, Marie follows a link to another Web site that makes the software available. She is pleased to see that she does not have to download the software, but can immediately run the program on her own computer as a Java application that is automatically set up for her by her Web browser. Excited by what the program suggests about her data, she emails Charles and her students a sketch of her new ideas and results, suggesting a time when they can have a collective Web conference where they can interactively discuss new research directions.
The story I have just told is not speculative science fiction: every technology it mentions is currently available on the Internet and is in use by scientists, although of course not all scientists use every technology. The Internet, particularly the World Wide Web, is now an essential part of scientific communication. By examining the ways that scientists are now using them to further the development of scientific research, we can see how new technologies can contribute to the spread of knowledge.
In the 1990s, communication underwent another dramatic revolution with the development of the World Wide Web and other Internet applications. Conceived in the 1960s as a U.S. military communications system called the ARPANET, the Internet became in the 1980s a convenient means of scientific communication, enabling scientists at major research institutions to send email, participate in news groups, and transfer files. Working at the European particle physics laboratory CERN, Tim Berners-Lee proposed in 1989 a networked project for high-energy physics collaborations, employing hypertext to provide a flexible means of linking words and pictures. By 1991 his group had produced a simple browser for their "World Wide Web" project, which was superseded in 1993 by a more sophisticated browser, Mosaic, produced in the U.S. by the National Center for Supercomputer Applications. Mosaic was in turn quickly supplanted by more sophisticated browsers such as Netscape Navigator and Internet Explorer. The number of hosts on the Internet grew from 213 in 1981 to 313,000 in 1990, then to more than 12 million in 1996; and the number of Web sites grew from 130 in mid-1993 to an estimated 230,000 in mid-1996. (This information is due to Network Wizards and is available at http://www.nw.com/. Historical information about the Internet is available on the Web, for example at http:// www.cern.ch/CERN/WorldWideWeb/RCTalk/history.html.)
These tools have inspired thousands of scientists to create Web sites and Internet tools that are dramatically changing how science is done. To show how the Internet is transforming scientific research practices, I will describe how the Web is used at CERN where it was first invented, as well as how it makes possible rapid and effective communication in the Human Genome Project and other research. Like the application of the printing press to scientific publishing, use of the World Wide Web has enabled scientists to increase the reliability, speed, and efficiency of their work.
The Web has also become a regular tool used by many scientists in the production of their research. Especially in fields like high energy physics and genetics, contemporary science is a huge collaborative enterprise involving international teams of scientists (Thagard, 1997). It is not unusual for published articles in physics to have more than a hundred co-authors, reflecting the diversity of expertise needed to carry out large projects involving complex instruments. Located near Geneva, CERN is a collaborative project of 19 European countries involving several nuclear accelerators and dozens of experimental research projects. Each project involves numerous different researchers from a range of different institutions in the participating counties. Since it began in 1954, CERN has been the source of many of the most important discoveries in particle physics, such as the 1983 finding of evidence for the top quark.
The World Wide Web was invented at CERN to improve information sharing among scientists from diverse institutions working on joint projects. It was conceived as a hypermedia project so that scientists could exchange pictorial information such as diagrams and data graphs as well as verbal text. Today, CERN has a World Wide Web team to support experiments, using numerous Web servers (http://www.cern.ch/).
The basic idea of the World Wide Web originated in a document written in 1989 by Tim Berners-Lee (http://www.w3.org/pub/WWW/History/1989/proposal.html). He argued:
CERN is a wonderful organisation. It involves several thousand people, many of them very creative, all working toward common goals. Although they are nominally organized into a hierarchical management structure, this does not constrain the way people will communicate, and share information, equipment and software across groups.Berners-Lee recommended that the information at CERN should be handled, not as a linear book or a hierarchical tree, but as hypertext. He had previous experience with hypertext, having written in 1980 a program for keeping track of software that he later adapted for use at CERN. He outlined how CERN could benefit from a large non-centralized hypermedia system, linking graphics, speech, video, and text in an unconstrained way that would enable users to jump from one entry to another. He stated that researchers needed remote access for the many computers used at CERN, independent of the particular kind of computer used. Berners-Lee presciently noted that CERN's diverse computer network was a miniature of the world in a few years time, anticipating that the World Wide Web would not merely be a local application.
The actual observed working structure of the organisation is a multiply connected "web" whose interconnections evolve with time. In this environment, a new person arriving, or someone taking on a new task, is normally given a few hints as to who would be useful people to talk to. Information about what facilities exist and how to find out about them travels in the corridor gossip and occasional newsletters, and the details about what is required to be done spread in a similar way. All things considered, the result is remarkably successful, despite occasional misunderstandings and duplicated effort.
A problem, however, is the high turnover of people. When two years is a typical length of stay, information is constantly being lost. The introduction of the new people demands a fair amount of their time and that of others before they have any idea of what goes on. The technical details of past projects are sometimes lost forever, or only recovered after a detective investigation in an emergency. Often, the information has been recorded, it just cannot be found.
If a CERN experiment were a static once-only development, all the information could be written in a big book. As it is, CERN is constantly changing as new ideas are produced, as new technology becomes available, and in order to get around unforeseen technical problems. When a change is necessary, it normally affects only a small part of the organisation. A local reason arises for changing a part of the experiment or detector. At this point, one has to dig around to find out what other parts and people will be affected. Keeping a book up to date becomes impractical, and the structure of the book needs to be constantly revised.
CERN's various research groups now make extensive use of the World Wide Web. For example, the DELPHI (DEtector for Lepton, Photon and Hadron Identification) project at CERN involves about 550 physicists from 56 participating universities and institutes in 22 countries (http://www.cern.ch/Delphi/Welcome.html). These scientists can use the Web to access data acquired over the past eight years, including pictorial representations of important experimental events. Also available are DELPHI news bulletins, a discussion forum, and electronic versions of papers by the project's participants, as well as links to preprint servers, participating institutions, and other physics information sources.
After CERN's programmers initiated use of the World Wide Web for scientific research, they made the software they had developed freely available and international Web use expanded rapidly with the development of more sophisticated browsers. One of the most effective scientific users of the Web has been the Human Genome Project, an international consortium of research institutions working since 1989 to identify all of the approximately 100,000 genes that are responsible for human development. This project is medically important, because many diseases such as diabetes and some forms of cancer have a large genetic component. The identification of all human genes should be a substantial aid to finding genes responsible for diseases, which can potentially lead to new medical treatments.
Scientists working on the genome project are producing an astonishing amount of information. If published in books, descriptions of the DNA sequences of all the human genes would require more than 200,000 pages (http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/intro.html). However, books would be a poor technology for keeping track of such information, not just because of its quantity, but also because new genes are being mapped daily and a printed text would be instantly obsolete. Fortunately, genome scientists have turned to computer databases to store the rapidly expanding information about gene locations. Storing this information would be useless, however, without effective means for accessing it, which search engines provide. Like CERN, the Human Genome Project is highly collaborative, involving dozens of different institutions in various countries. The arrival of the World Wide Web has been an immense boon to international collaboration on the genome project, with more than twenty-five contributing institutions making their data available on the Web for general access.
One of the major contributors to the genome project is the Human Genome Center at the Whitehead Institute at MIT (http://www-genome.wi.mit.edu/). Since its creation in May, 1994, the number of weekly accesses to their Web site has grown to over 100,000 (figure 1). Internal users from MIT and external users from various institutions access the gene mapping information available at the site, as well as various documents and software available there. Like other genome project sites, the MIT site contains searchable databases that researchers can consult to find the latest information.
Figure 1. Web site accesses at the MIT Human Genome Center from May, 1994 to November, 1996.For medical researchers, a more directly useful database is Online Mendelian Inheritance in Man (OMIM), available since December, 1995 (http://www3.ncbi.nlm.nih.gov/Omim/). The reference book Mendelian Inheritance in Man (eleventh edition, McKusick and Francomano, 1994) has been a valuable source of information on genetic traits and diseases, but the World Wide Web version is even more useful. Whereas the reference book was updated approximately every two years, OMIM is updated almost daily. It has an excellent search engine that enables users to quickly access entries about characters, genes, and diseases, and the entries provide links to relevant information such as genome maps and journal references. Whereas the reference book is expensive ($165) and typically found only in research libraries, anyone with Internet access can quickly obtain the information available on OMIM.
CERN, the Human Genome Project, and OMIM illustrate only some of the technologies available to scientists over the World Wide Web (Renehan, 1996). Various sites provide animations and videos that enable viewers to see nature in motion. The Virtual Reality Modeling Language is beginning to be used to enable scientists to view objects in 3 dimensions, for example in the Image Library of Biological Macromolecules (see http://www.vrml.org/ and http://www.imb-jena.de/IMAGE.html). A different kind of virtual reality environment is AstroVR, a multi-user networked environment with access to many astronomical tools and databases (http://brando.ipac.caltech.edu:8888/). It enables astronomers and astrophysicists equipped with the proper software and hardware to talk, work together on a whiteboard, share images, make data plots, and look up astronomical data and literature. The goal of AstroVR is to enable users to interact and do collaborative research almost as if they were in the same room. Sites such as the NCSA Biology Workbench make computer programs and other tools readily available to scientific researchers (http://biology.ncsa.uiuc.edu/).
These exciting scientific uses of the World Wide Web provide examples of how it is contributing to the development of knowledge. To examine this contribution more comprehensively, we can use five standards proposed by Alvin Goldman (1986, 1992) for evaluating how well different social practices lead to true beliefs. These standards are reliability, power, fecundity, speed and efficiency. I will show how each of these standards enables us to see more clearly why the printing press was so important for communication, and then apply a similar assessment to the Web. Reliability, the most important standard for looking critically at the Web, is saved for last.
Goldman's criteria are all "veritistic," presupposing that science aims at and sometimes achieves truth understood as correspondence between beliefs and the external world. See Thagard (1988, 1992) for defenses of scientific realism and objectivity.
The World Wide Web is similarly powerful in helping scientists find answers to the questions that interest them. The full range of representational techniques now available on the Web can help people find answers that would otherwise be unavailable. Suppose, for example that you want to understand binary pulsars. A new electronic astronomy journal will include a video simulation: "You will see how two stars rotate around each other: They evolve; one star sucks up matter from the other, explodes in a supernova explosion, and so on. It is a very beautiful way to illustrate a theoretical model" (Taubes, 1996). Similarly, if you are curious about the operation of the new kind of bacteria that have recently been found to be a major cause of stomach ulcers, you can view an animation of Helicobacter pylori (http://www.helico.com/).
Web sites can use hypertext organization to facilitate the ability of researchers to find answers. For example, the Tree of Life provides information on many species of animals and plants, organized so that browsers can easily traverse the tree up from a species to a genus or down from a genus to a species (http://phylogeny.arizona.edu/tree/phylogeny.html). Following hypertext links can serendipitously lead to new sources of information previously unknown to the user. The immense and rapidly increasing size of the Web, however, can limit its power. People can get so lost in following one link after another that they become "Web potatoes", so caught up in chasing the next bit of information that they lose track of the questions they wanted to answer.
Unlike printed materials, digital data bases can be searched quickly and thoroughly. The entire Web can be searched for information using numerous search engines such Yahoo! and AltaVista that have become available in the past few years. Scientists can use such search engines to find sites that are presenting information relevant to their own work. For researchers on such projects as the Human Genome Project, huge data bases containing genetic information can yield useful answers because they are accompanied by search engines that enable users to find answers to their questions about particular genes and diseases. It is not just that search engines enable people to find information more quickly - in large scientific data bases they enable people to find answers that they would otherwise not find at all.
Email and news groups are also potential sources of power when they are used to solicit answers to interesting questions. Many Internet users subscribe to list servers that enable them to send email automatically to people with similar interests. For example, I subscribe to a list on the psychology of science to which I can send queries or announcements. Many news groups are available for people to participate in discussions that interest them. Unmoderated news groups, to which anyone can send any message, often fill up with junk, but there are science news groups that have a moderator who screens out worthless postings, leaving entries that are likely to be relevant to researcher's work. (Compare the difference between moderated news groups such as sci.physics.research and unmoderated, junk-laden groups such as sci.physics.) Web conferences provide an even more immediate way in which researchers can communicate with each other to generate answers to questions of common interest.
Software easily available on the Web is another source of power when scientists use programs thereby obtained to generate answers to statistical or other questions that would be unanswerable otherwise. Software availability will rapidly increase when more Java applications become available. The advantage of Java programs is that they run on any computer with a Web browser, eliminating the need for separate programs for Unix computers, PC's, Macintoshes, and so on.
Electronic preprint archives of the sort now available for physicists also increase the ability of scientists to find answers to interesting questions. The physics archive can be searched by author and title, enabling scientists to find papers related to their questions. A similar archive is now being established for cognitive science (http://cogprints.soton.ac.uk/). Even without a special archive, scientists can use the general search engines on the Web to find answers to questions on an astonishing array of topics, from aardvarks to medicine to zoos. Increasing numbers of scientific journals are available on-line, with searchable tables of contents and links from article to article that make it very easy to hunt down sources of information.
The Internet can encourage the development of new theoretical ideas also, as in the following example reported by Herb Brody (1996; http://web.mit.edu/afs/athena/org/t/techreview/www/articles/oct96/brody.html):
Physicist Andrew Strominger ... wrote a paper that suggested a radical departure from Einstein's conception of space-time as a smooth and continuous surface. Strominger e-mailed a question about the subject to Brian Green, who pursues similar research at Cornell. Green started to answer Strominger's question, then read the article, which Strominger had just posted on the Internet. The two scientists entered into a brief interchange of e-mail, joined by David Morrison of Duke University, and three days later all three had cowritten and posted a second paper that further refined their theory showing that tiny black holes can be transformed mathematically into infinitesimal vibrating loops of energy, called superstrings.There are undoubtedly numerous other examples of new theoretical contributions arising from Internet-based collaborations.
The Internet and the World Wide Web satisfy the standard of fecundity to the extent that they provide answers for many people. Some critics have seen these technologies as providing information for the technological elite but generating yet another barrier to economic and social opportunity for residents of underdeveloped countries and the underprivileged in developed countries. But just as the printing press made books available to many who previously lacked access to university collections, so the World Wide Web makes information available to those who previously lacked access to good libraries. Physicists in underdeveloped countries whose libraries cannot afford increasingly expensive scientific journals can have the same instant access to papers that their peers in developed countries enjoy. Computers and Internet connections are not free, but they are much cheaper than travel to libraries or purchasing or copying numerous books and journals (see efficiency). When Java applications become more widely available, people will not require special hardware or software to run them, so many people will be use them to help answer their questions. News groups and email lists reach many people simultaneously, so that they can contribute to the ability of many people to find answers to their questions. Web conferences and on-line forums have the potential to increase the knowledge of many people.
The Internet and the World Wide Web have enormous advantages with respect to the speed of producing answers. Electronic mail can transfer information around the world in seconds or minutes, in contrast to the days or weeks required for communication by traditional mail. When scientifically information is posted on a Web site such as OMIM or sent to a preprint archive, it becomes available instantly, in contrast to the months or even years that publication in books and journals can take. The entire Web and many Web databases have search engines that provide information with a speed that was unimaginable a decade ago. New applications such as Java offer the potential for speedy use of new software.
Of course, as everyone who has used the Web knows, the speed of Internet use is heavily affected by the extent of usage at a particular moment. Information that might be almost instantaneously available at 6 a.m. may be painfully slow to load later when many more people are accessing the Internet. The WWW has been called the "World Wide Wait". Information seekers may waste time chasing one link after another when a trip to the library would tell them what they want to know more quickly. The many entertaining sites on the Web may distract people from looking for the information they need and slow down their work rather than speeding it up. But used intelligently, the World Wide Web can be the most rapid source of information ever available.
A friend once told me that "WWW" stands for "Wicked Waste of Wesources". Using the Internet is indeed costly because of the computers and information storage required, but nevertheless it fares very well on the standard of efficiency. Email, news groups, and electronic archives are much cheaper than sending paper mail. It is much less costly for CERN's international research groups to communicate electronically and access information from a common Web server than to try to meet frequently and to exchange data using physical tapes. Storage of papers electronically can now be done for less than 1/1000 of a cent per page, far cheaper than paper storage and reproduction. Expensive computer and network connections are not efficient for an organization whose people are using them to download recipes, play multiple user games, and learn more about their favorite TV shows. But the increasing amount of valuable information on the Web, including scientific information, makes it a highly efficient source of true answers if used intelligently.
The power, fecundity, speed, and efficiency of the Internet are impressive, but they raise problems about the quality of information that are even more severe than those that arose with print and television. Anyone with a Web site can post virtually anything, and a random look at what Joe Hacker has to say about the origin of the universe may be worse than useless. Web pages and postings to unmoderated news groups (for example the claim on alt.conspiracy that flight TWA 800 was shot down by the U.S. Navy) undergo no screening and evaluation, whereas even a profit-driven book or magazine publisher or cable TV provider has to apply some standards of taste and credibility. Libraries and other purchasers apply standards when they decide what is worth buying and making available to readers. In contrast, the lack of screening on the Web is accompanied by an unprecedented degree of access by anyone who has a connection to the Internet. Compared to print and television, the Web provides less scrutiny and more access, so the problem of distinguishing knowledge and nonsense is even more acute.
Of course, the printing press was and is a mixed blessing. From the beginning, it was used to promulgate nonsense as well as knowledge. Shoddy books on worthless topics could sap the time and energy of thinkers and fill their minds with error. Books on astrology were as likely to attract the printer looking for a profitable product as books on empirical astronomy; in fact, horoscopes were published shortly after Gutenberg's bible, well before the publication of astronomical observations. Nevertheless, the overall contribution of the printing press to the production and availability of scientific knowledge is clear, according to the standards of reliability, power, fecundity, speed, and efficiency.
How do the Web and other Internet technologies improve the reliability of the research of scientists like Marie Darwin? Various technologies can help her to avoid erroneous beliefs. By emailing notes and drafts of papers to her students and collaborators, she can get immediate feedback that can correct misconceptions before they become entrenched in her thinking. Similarly, sending her preprints off to an electronic archive gives other researchers a chance to examine her work and suggest improvements. Conferencing over the Web provides another way in which the reliability of Marie's work can benefit from the critical response of her collaborators. Science, like knowledge in general, is an inherently social enterprise in which achieving truth and avoiding error gains enormously from feedback that Internet technologies can help to provide.
Seeking information generally on the World Wide Web is not always a reliable practice. But in the hands of scientists and other careful users, posting information on the Web has several features that can increase reliability. Unlike books and journals which are sent out into the world permanently, it is very easy to update and correct information on the Web. Whereas printed information needs to wait for further publications or new editions to correct errors, changes to a Web site can be made quickly to prevent propagation of erroneous information. Experimental data bases such as those used at CERN and in the Human Genome Project can undergo continuous expansion and correction. Preprint archives are a potential source of misinformation, since the papers sent to them do not undergo the careful reviewing process that precedes journal publication. This problem may turn out to be more acute for psychology than for physics, whose journals have lower rejection rates than psychology journals: a physics paper is probably going to end up published anyway. But the potential for introduction of errors is to some extent compensated for by the ease with which new preprints and emailing among researchers can help to correct earlier mistakes.
Many scientific fields such as chemistry involve objects whose 3-dimensional and dynamic character are inadequately captured by verbal and 2-dimensional representations. More reliable information may sometimes be provided by special Web tools such as Virtual Reality browsers that provide much richer 3-dimensional information. Videos and animations can provide more realistic depictions of the motions of a system under investigation. Like any picture, virtual reality displays and animations can provide erroneous impressions, but they have the potential for giving more accurate representations of the inherently 3-dimensional and dynamic aspects of the world that they are intended to describe. These examples show that the World Wide Web can increase reliability as well as diminish it. But even more than readers of printed sources, Web users need intellectual tools for discriminating between reliable and unreliable sources of information.
In describing these various ways in which Internet technologies such as the World Wide Web can contribute to scientific knowledge, I have provided a positive model of how the technologies can be used to foster the development of knowledge in anyone, including nonscientists such as students. At the other extreme, there is the real and frightening prospect of students and other people wasting electronic resources to fill their heads with nonsense gleaned from the many worthless sites on the Web. Internet Epistemology includes the highly critical task of examining and evaluating the large quantities of pseudoscience that the Web is being used to promulgate. My purpose in this paper has been more positive, to describe the Internet at its best in aiding the development of scientific knowledge.
|email, news groups||feedback for corrections||many answers available||faster than mail||multiple recipients||cheaper than paper mail|
|hypertext||easily revised||follow links, use search engines||instant publishing, no wait for access, searching||widely available, distance irrelevant||storage cheap|
|animation, video, VRML||more accurate depiction of structures and motion||lots of visual information not otherwise available|
|Java||software not under local control||instant provision of software to do examination, searches||no wait for software||use by everyone regardless of kind of computer||no need to buy software, or spend time on getting it|
|databases||updatable, checkable||huge amount of information available||fast searchers, instant availability||accessible to many||storage is cheap|
|preprint archives||potentially quick feedback||find out latest research results||instant access||journal access unnecessary||total cost much lower than print|
|conferencing||immediate corrections||combine new ideas||no need to meet||everyone involved||cheaper than meeting|
Table 1. Summary of the contributions of Internet technologies to scientific research.
Brody, H. (1996). Wired science. Technology Review (October).
Eisenstein, E. L. (1979). The printing press as an agent of change. Cambridge: Cambridge University Press.
Goldman, A. (1986). Epistemology and cognition. Cambridge, MA: Harvard University Press.
Goldman, A. I. (1992). Liaisons: Philosophy meets the cognitive and social sciences. Cambridge, MA: MIT Press.
McKusick, V. A., & Francomano, C. A. (Eds.). (1994). Mendelian inheritance in man: A catalog of human genes and genetic disorders. 11th edn. Baltimore: Johns Hopkins University Press.
Renehan, E. J. (1996). Science on the Web: 500 of the most essential science Web sites.
Taubes, G. (1996). Science journals go wired. Science, 271, 764.
Thagard, P. (1988). Computational philosophy of science. Cambridge, MA: MIT Press/Bradford Books.
Thagard, P. (1992). Conceptual revolutions. Princeton: Princeton University Press.
Thagard, P. (1997). Collaborative knowledge. Nôus., in press.