Persistence of web references in scientific research - Computer

icon

6

pages

icon

Français

icon

Documents

Écrit par

Publié par

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris

Découvre YouScribe et accède à tout notre catalogue !

Je m'inscris
icon

6

pages

icon

Français

icon

Documents

Lire un extrait
Lire un extrait

Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus

Persistence of web references in scientific research - Computer
Voir icon arrow

Publié par

Langue

Français

26
C O M P U T I N G P R A C T I C E S
Persistence of Web References in Scientific Research
The lack of persistence of Web references has called into question the increasingly common practice of citing URLs in scientific papers. Although few critical resources have been lost to date, new strategies to manage Internet resources and improved citation practices are necessary to minimize the future loss of information. esearchers have long desired immediate Steve access to all scientific knowledge. Al-Lawrence FGlaarkyemduaIntereiRecneotsmouminacettehrinettochtecnofmrnadferelarveranere,befo though there are still major hurdles to David M. 1 overcome, the Internet has brought this Pennockgoal closer to reality. Scientists use the findings to a broader Willia information on the Web are increasingly common. RobertHowever, the lack of reliable or stable Internet pub-Krovetzlishing sources sometimes offsets the advantage of NEC Research being able to easily share various materials at mini-Institute mal cost. Individuals and organizations abandon Web pages, shut down servers, and rename files. Frans M. Invalid URLs do more than contribute to user Coetzee annoyance and frustration. Ultimately, they can lead to Certus the loss of important data as cited works and research International Inc. findings gradually disappear from circulation. Eric GloverProposed formal approaches to bypass pages with 2 University ofor reduce their rankingdead links during browsing 3 Michigan when presenting search engine results threaten to exacerbate the problem. The lack of persistence of Web Finn Årup references has led many researchers to question Nielsen whether articles and other published works should Technical continue to include URL citations. University We analyzed references to Web resources in numer-of Denmark ous computer science publications, considering the vol-Andriescitations, validity of links, and detailed natureume of Krugerof invalid links. We found that URL citations have University ofincreased dramatically in recent years and that many Stellenbosch of these references are now invalid. At the same time, we determined that most missing URLs are easy to C. Lee Giles relocate. Although formal references to published arti-The Pennsylvania cles are always preferable, we believe that Web refer-State University ences facilitate scientific communication and progress.
Computer
However, new Internet resource management strate-gies and improved citation practices are necessary to minimize future information loss.
SEARCHING FOR THE MISSING LINKS We investigated URLs cited in research papers using NEC Research Institute’s scientific digital library 4,5 ResearchIndex, formerly known as CiteSeer. This database, created in 1997, indexes Postscript and PDF research articles on the Web. Aimed at improving com-munication and progress in science, ResearchIndex incorporates Autonomous Citation Indexing as well as the ability to quickly and easily see the context of subsequent papers in which authors refer to a given article of interest. A free service is available at http:// researchindex.org/.
Search methodology From 3 to 5 May 2000, we analyzed 270,977 com-puter science journal papers, conference papers, and technical reports that were available at that time on 1 the publicly indexable Web. From the 100,826 arti-cles cited by another article in the database (thus pro-viding us with the year of publication), we extracted 67,577 URLs. We then attempted to access each one, following redirected URLs to their new destination. We searched for strings starting with “http:”, “https:”, or “ftp:” and, after removal of trailing punctuation, ending with a quote or white space. As Figure 1a shows, the number of URL citations has increased substantially since the inception of the Web. Figure 1b dramatically illustrates the lack of per-sistence of Internet resources. The percentage of invalid links in the articles we examined varied from
00189162/01/$10.00 © 2001 IEEE
Voir icon more
Alternate Text