Web 2.0 Applications in Medicine: Trends and Topics in the Literature

Background: The World Wide Web has changed research habits, and these changes were further expanded when “Web 2.0” became popular in 2005. Bibliometrics is a helpful tool used for describing patterns of publication, for interpreting progression over time, and the geographical distribution of research in a given field. Few studies employing bibliometrics, however, have been carried out on the correlative nature of scientific literature and Web 2.0. Objective: The aim of this bibliometric analysis was to provide an overview of Web 2.0 implications in the biomedical literature. The objectives were to assess the growth rate of literature, key journals, authors, and country contributions, and to evaluate whether the various Web 2.0 applications were expressed within this biomedical literature, and if so, how. Methods: A specific query with keywords chosen to be representative of Web 2.0 applications was built for the PubMed database. Articles related to Web 2.0 were downloaded in Extensible Markup Language (XML) and were processed through developed hypertext preprocessor (PHP) scripts, then imported to Microsoft Excel 2010 for data processing. Results: A total of 1347 articles were included in this study. The number of articles related to Web 2.0 has been increasing from 2002 to 2012 (average annual growth rate was 106.3% with a maximum of 333% in 2005). The United States was by far the predominant country for authors, with 514 articles (54.0%; 514/952). The second and third most productive countries were the United Kingdom and Australia, with 87 (9.1%; 87/952) and 44 articles (4.6%; 44/952), respectively. Distribution of number of articles per author showed that the core population of researchers working on Web 2.0 in the medical field could be estimated at approximately 75. In total, 614 journals were identified during this analysis. Using Bradford’s law, 27 core journals were identified, among which three (Studies in Health Technology and Informatics, Journal of Medical Internet Research, and Nucleic Acids Research) produced more than 35 articles related to Web 2.0 over the period studied. A total of 274 words in the field of Web 2.0 were found after manual sorting of the 15,878 words appearing in title and abstract fields for articles. Word frequency analysis reveals “blog” as the most recurrent, followed by “wiki”, “Web 2.0”, ”social media”, “Facebook”, “social networks”, “blogger”, “cloud computing”, “Twitter”, and “blogging”. All categories of Web 2.0 applications were found, indicating the successful integration of Web 2.0 into the biomedical field. Conclusions: This study shows that the biomedical community is engaged in the use of Web 2.0 and confirms its high level of interest in these tools. Therefore, changes in the ways researchers use information seem to be far from over. (Med 2.0 2015;


Introduction
Over the past two decades, the World Wide Web has changed researchers' habits. These changes were further expanded when "Web 2.0" became popular in 2005 [1], providing tools and platforms that facilitate user collaboration, user-generated content, and data sharing. These tools have gradually influenced the world of research [2,3], especially in biology and medicine [4][5][6][7][8][9], and their use is increasingly common, notably with the arrival of "digital natives" in laboratories [10,11].
Bibliometrics is a helpful and widely used tool for describing patterns of publication and interpreting temporal evolutions and the geographical distribution of research in a given field. However, few studies employing bibliometrics have been carried out on the correlative nature of scientific literature and Web 2.0. A bibliometric analysis was performed in 2009 by Chu and Xu [12] on a set of 1718 documents relating to Web 2.0 using several databases. It was found that Web 2.0 is a rapidly developing area, with medicine and sociology being the major contributing disciplines to the scholarly publications. In 2011, Aharony [13] performed a statistical descriptive analysis and a thorough content analysis of descriptors and journal titles in the field of library and information science in a study of 472 articles. They focused on the subject of Web 2.0 and its main applications. Main findings revealed that the percentage of articles related to Web 2.0 was low, and showed a close link between Web 2.0 and library topics. In the field of medicine, Van De Belt et al [6] performed a systematic literature review in 2010 of electronic databases (PubMed, Scopus, CINAHL) and gray literature on the Internet using search engines to identify unique definitions of Health 2.0/Medicine 2.0 and recurrent topics within the definitions. The analysis was done on 1937 documents and they concluded that Health 2.0/Medicine 2.0 were still developing areas, and that there was still no general consensus regarding the definition of Health 2.0/Medicine 2.0.
The aim of the present study was to provide an overview of Web 2.0 implications in the biomedical literature and to answer the following questions: What is the growth rate of biomedical literature on Web 2.0?; What are the key publications, countries, and authors in the field?; Which Web 2.0 terms are the most recurrent in biomedical literature?; and, Are the various applications of Web 2.0 expressed in the biomedical literature? Established bibliometric methods have been used to perform the present study. One example is the identification of core journals using Bradford's law of scattering which has, to the best of our knowledge, never been done to study literature related to Web 2.0.

Methods
The search for papers to be included in this study was carried out on February 7, 2013, using the PubMed database [14], developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM). Keywords used in the search were chosen since they were known to be representative of Web 2.0 applications [7,12,13,15]. Search strategy was built around identifying keywords in medical subject headings (MeSH) and completed, in the absence of results, by a text search in the title and abstract fields. When necessary, keywords were accompanied by a truncation to bring in all possible variants. The study was limited to original research articles corresponding to "Journal articles" shown under the "Publication type" field. Data downloaded from PubMed in Extensible Markup Language (XML) were processed through developed hypertext preprocessor language (PHP) scripts, then were imported to Microsoft Excel 2010. All articles were manually reviewed by the author of this article and those not related to Web 2.0 were eliminated. When no abstract was available for a reference, PubMed "Related citations" were consulted to determine the eligibility of the article in the present study.
Microsoft Excel served for assessing the growth of literature, for journals, language of publication, authorship pattern, and number of publications per country. The average yearly growth rate was calculated as the mean percentage of annual growth for the period studied, with average yearly growth rate=(Current year total -Previous year total)/Previous year total [16,17].
Average yearly growth rate and percentage of articles published in English were also calculated for the whole PubMed database for the period 2002-2012. This period was chosen because it corresponds to the period where articles related to Web 2.0 were found in this study.
Bradford's law of scattering has been used extensively in the information science literature to describe the dispersion of articles in any scientific field [18] and to identify core journals of serial titles [16,19,20]. Bradford's law states that "if scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as 1: n : n 2 " [21]. This means that "Bradford's law predicts that the number of journals in the second and third zones will be n and n 2 times larger than the first zone respectively, and therefore, it should be possible to predict the total number of journals containing articles on a subject once the number in the core and middle zone of journals is known" [22]. To identify the core journals and predict the number of journals containing articles related to Web 2.0, we applied Bradford's law by dividing the publication frequency ranked journals into three groups, with each group containing approximately the same number of articles.
The Journal Citation Reports (Thomson Reuters) was used for Impact Factor determination. For the determination of affiliation of authors, England, Scotland, Northern Ireland, and Wales were clustered into the United Kingdom. Words from both title and abstract fields were recovered for keyword frequency calculation using TextSTAT 2.9 software [23]. Words or expressions were manually sorted to extract those relating to Web 2.0. Similar words-differing by the singular/plural, upper/lower case-were aggregated (eg, wiki/wikis, facebook/Facebook). Words thus obtained were manually sorted into eight categories: one general category and seven others corresponding to blog, cloud computing, microblogging, social bookmarking/document sharing, social network, syndication, and wiki.

Overview
The publication search turned in a total of 1578 references. After manual sorting and elimination of inappropriate references, 1347 articles were retained for inclusion in the study.

Growth of Literature
As shown in Figure 1, Web 2.0 references, starting in 2002 with one article, had risen to 1000 per year by 2009 and continued to grow throughout 2011 (2012, being incomplete, is not represented). The average annual growth rate for the period 2002-2011 was 106.30% for Web 2.0 related articles, and 6.27% for the whole PubMed database for the same period.

Journals
A total of 614 journals were identified during this analysis. As shown in Figure 2 and Table 1, one-third of the published articles were found in a mere 27 journals (27/614; 4.4%). This first third represents the journals that published the most articles (between 7 and 53 articles on the period studied) and that are presumed to be of highest interest for researchers interested in Web 2.0 ("core journals"). The middle third corresponds to the journals (178/614; 29.0% of journals) that published an average amount of articles, and the last third includes the "long tail" of journals (409/614; 66.6% of journals) that published one article and must be regarded as being of least importance. The theoretical ratio of number of journals (43.4) and the theoretical number of journals in the last third (1172) were higher than the values obtained experimentally (15.1 and 409, respectively).

Language of Publication
A total of 1355 declared languages were retrieved among the 1347 articles. This disparity could be explained by the fact that some articles have two languages declared in the language field in PubMed. The most commonly used language was English (1301/1355; 96.01%), followed by French (17/

Geographical Repartition of Authors (Country Contributions)
For 395 of the 1347 articles (29.32%), it was impossible to identify the contributing country because the author claimed no affiliation and the articles failed to name the country of publication. Therefore, only 952 of the articles studied could be linked to countries. Table 3 shows the number of papers published per country.

Number of Papers Per Author
In total, 4209 authors were found for the 1347 articles retained, corresponding to 3762 different authors. The great majority of authors (91.54%; 3444/3762) wrote only one article, 6.51% (245/3762) wrote two articles, whereas 73 (1.94%; 73/3762) wrote three or more. The maximum number of articles written by one author was 14.

Word Frequency Analysis
A total of 274 words in the field of Web 2.0 were found after manual sorting of the 15,878 words belonging to title and abstract fields. Similar words differing by singular/plural, upper/lower case were aggregated and 99 words finally obtained. As shown in Tables 4 and 5, word frequency analysis reveals that the ten most frequent words or expressions are "blog" followed by "wiki", "Web 2.0","social media", "Facebook", "social networks", "blogger", "cloud computing", "Twitter", and "blogging".
In the general category, "Web 2.0" was the most common expression followed by e-health, Health 2.0, and Medicine 2.0. "Blog" was the predominant category of any Web 2.0 application encountered in the biomedical literature, followed by social networks and wiki (1279, 1199, and 803 occurrences, respectively). Micro-blogging, cloud computing, social bookmarking/document sharing, and syndication were much less represented with 332, 260, 183, and 175 occurrences, respectively.

Principal Findings
The appearance of literature relating to Web 2.0 in the biomedical field is recent, and correlates with the year 2005, when Web 2.0 became popular [1]. Some Web 2.0 applications existed before this date, which is why some articles were identified earlier. The scientific production of Web 2.0 really started in 2006 and has been growing rapidly ever since. The comparison of average annual growth rate for Web 2.0 related articles and for the whole PubMed database (106.30% and 6.27%, respectively) has confirmed that the topic continues to be of much interest to the biomedical community.
Using Bradford's law of scattering, the theoretical ratio of number of journals (43.4) and theoretical number of journals in the last third (1172) were higher than the values obtained experimentally (15.1 and 409, respectively). Thus, articles related to Web 2.0 are published in a lesser number of journals (n=614) than the expected Bradford theoretical value (n=1377). This can be explained by the innovative nature of the subject studied, which has not yet been taken into account by a great number of journals.
In the list of the 38 journals that published more than six articles, including core journals according to Bradford's law (Table 2), widely disseminated journals with high impact factors (IF) are present: three are in the 100 journals that have the highest IF (Nature, Science, BMJ), and one is in the top 10 (Nature, IF=38.597). Six journals out of 38 (16%) have an IF greater than 5, whereas, in the complete Journal Citation Reports, the percentage with an IF higher than 5 is 6.2%, indicating that journals with significant scientific influence are interested in Web 2.0. Only one of the 38 journals that published more than six articles, the Journal of Medical Internet Research, specializes in Internet studies. It should be noted that many journals with educational objectives are in this list, because Web 2.0 tools and techniques are new and their comprehension and utilization require a learning period.
English was by far the most predominant language of the articles included in the study, and the percentage of articles in English was higher compared to the entire PubMed database (96.01% and 90.84%, respectively). This can be explained by the fact that English is the official language for scientific publications in most countries. As mentioned elsewhere [24,25], PubMed is a US database, so it may have introduced a bias because most of the journals indexed are written in English, which could accentuate its predominance. These observations match those of other studies done with bibliometrics in fields where information is predominantly found in English.
The United States was by far the most productive country. Europe came second globally, whereas Africa and South America were very poorly represented.
Distribution of number of articles per author shows that the great majority of authors (91.54%; 3444/3762) wrote only one article, whereas 73 (1.94%; 73/3762) wrote three or more. Thus, the core population of researchers working on Web 2.0 in the biomedical field can be estimated to approximately 75.
Considering generalist terms or expressions, the word frequency analysis reveals "Web 2.0" as the most common term, followed by "e-health", "Health 2.0", and "Medicine 2.0", which are the expressions most commonly used to describe Web 2.0 technologies applied in this field [8].
The most represented category of the eight was blog (1279 occurrences). This can be explained by the fact that blogs are among the oldest Web 2.0 applications and the facility of their implementation has established their popularity. Quite logically, in the second category, "social network", the well-known Facebook was by far the most represented. Among wikis, Wikipedia was the most represented term. The high number of terms in this category is due to the many applications based on wiki platforms developed by researchers, and most of the articles related to these terms are actually presentations of these applications. The most cited micro-blogging application was, as expected, Twitter, confirming its high popularity. Cloud computing applications, currently on the rise, are also well represented, even though access to them is fairly recent compared to that of blogs or wikis. Amazon, best known for its online shopping website, is cited because it also offers solutions for the development of cloud computing applications. The category social bookmarking / document sharing was predominantly represented by YouTube. Unexpectedly, social bookmarking sites specially developed for the scientific field were scarce (eg, Citeulike), or simply not present (eg, Connotea or Bibsonomy). The same can be said of other categories in which the most represented terms were related to popular applications (Facebook, YouTube, Wikipedia, and Twitter, respectively the first, third, sixth, and tenth most consulted sites in the world according to [26]). Of note, apart from wikis, applications specifically developed for science, biology, or medicine were rare (eg, PatientsLikeMe) or not represented in every category (eg, researchblogging.org for blogs, Researchgate and Academia for social networks).

Limitations
One should be aware that this study presents some limits: for even if PubMed is widely used for bibliometric analysis, it does not contain all biomedical journals [24], and some relevant articles may have been omitted. Furthermore, the methodology for identifying the country of authors (PubMed) indicates only one country per article and fails to identify transnational research. Moreover, some articles (395/1347; 29.32%) did not mention any country of affiliation for authors. Therefore, the geographical repartition of the latter might be underestimated in some locations [27]. Furthermore, some Web 2.0 applications, specifically developed for biology or medicine, may not have been retrieved by the search because they were only named and not described as Web 2.0 applications in articles.

Conclusions
This paper presents an exploration of the geographical distribution and temporal trends of the biomedical literature related to Web 2.0 found in PubMed, together with an analysis of related words and expressions. The study indicates the ongoing expansion of a field currently dominated by the United States. All categories of Web 2.0 applications abound within the literature, indicating that Web 2.0 has been integrated into the biomedical field. Of note, applications developed specifically for biology and medicine were less represented than their generalist counterparts (eg, Facebook, Twitter). The study of articles published clearly shows a great diversity of journals, including those with significant scientific influence, displaying interest in Web 2.0, and confirms the high level of interest the topic holds for the biomedical community. Therefore, the changes in the informational uses of researchers, initiated by the arrival of the World Wide Web and continued by Web 2.0, seem to be far from over.