How many journals do we have? An alternative approach to journal collection evaluation through local cited-article analysis By Jason S. Price, The Claremont Colleges’ Libraries
A faculty member recently asked me “the number question”: How many print and online biology journals does the library own? In the past 5 years I’ve spent working in a life sciences library, I have avoided answering this question each time it was raised. I hope to continue to do so, because I strongly believe that the question—and, for an unrelated reason, its answer—is seriously flawed.
The question is seriously flawed because collection size is no longer an indication of collection quality. It may have been in the past, when each journal was individually selected by the subject specialist and faculty. Today, however, big electronic-journal package deals include large fixed sets of titles that are bundled in ways that do not reflect local interest. The faculty have access to more titles than they need or want, particularly at small and medium-sized institutions that have less comprehensive programs than their large counterparts. Furthermore, by their very nature, these “big deals” contain more second- and third-tier journals. As such, their addition may actually be a disservice to students, who tend not to discriminate among articles by journal quality. In my opinion, the only way to assign meaning to the number of journals we have is to know the number—and even more, the identity—of the journals we need.
The answer will be seriously flawed because the edges of the life sciences—biophysics, mathematical biology, biochemistry, bioengineering, nanotechnology, neuroscience—cannot be clearly delineated. Where do we draw the line between what is and what is not “life science” for any of these subdisciplines? Even the relatively small department that is the subject of this study (eight tenure-track faculty) includes a bioengineer, a biochemist, and a mathematical biologist; who can define the limit of journals pertinent to their research? Which journals qualify as biology journals for this department? Which journals do the faculty need? The best answers to these questions are found through an analysis of the articles that the faculty use.
I expect that the number of journals available in a given broad discipline must occasionally be estimated (e.g., for accreditation reports). However, this case offers an opportunity to redefine the question. The Harvey Mudd College (HMC) Department of Biology is preparing a self-study for an external review by selected faculty from peer departments. They want to include an assessment of library resources as part of the study, thus their interest in our biology journal holdings. To ensure that we would only be “counting” journals that are of value to the department, I proposed an analysis of the proportion of their published cited references that are available to the Claremont Colleges (of which Harvey Mudd is one) online, in print, and via interlibrary loan (ILL).
In this article, I present the results of this analysis, along with a discussion of its value and limitations. I conclude by advocating for the use of this approach whenever subject-specific journal collection assessment is called for and whenever the requestors are willing to accept a more sophisticated and context-sensitive answer than the number of journals can provide.
METHODS
Thomson ISI®’s Web of Science was used to identify the two most recent publications by each of the eight tenure track faculty listed on the HMC Biology Department web page as of February 2007. All cited references from each paper were copied from Web of Science and pasted into a word processor as text. Find and replace was used to create tab-delimited entries for each citation that included first author, article title, journal title, volume, page, and year, with records separated by paragraph marks. This tab-delimited text was then imported into a spreadsheet program. Care was taken to ensure that all data appeared in the correct column; citations without article titles had empty cells inserted between author and journal fields. The local faculty author of the citing paper was added for each citation (each row), and duplicate records were removed, as were those lacking volume or page data (most of which were books or conference proceedings).
The combined unique journal article citations (n = 861) from all 16 faculty publications were reverse-sorted by year within journal title, with journal abbreviations normalized to full titles as necessary. Sorting helped identify each unique journal title (n = 247) and resulted in excellent scalability: As the number of articles analyzed increased, the growth in the number of unique titles slowed. Unique titles were then checked against The Claremont Colleges (TCC) e-journal A–Z list and online public access catalog (OPAC) holdings, and the access format (Online, Print, or ILL) was recorded for each cited reference, taking into account holdings dates for each journal. Online access type or availability was also recorded, as follows. (Note: Throughout the remainder of this article, accessibility refers to the current state of TCC access, and availability refers to the potential for future access.)
When an article was accessible online, print holdings were not checked for that citation. The type of online access (Open Access or Paid) at TCC was recorded according to listings in our e-journal A–Z list. When items were listed under both types of resource, open access was recorded.
When an article was not accessible online at TCC, it was marked Print or ILL, as appropriate. The status of its online availability via subscription was determined with Serials Solutions E-Catalog, which lists all the electronic titles available from and to any of the vendors and customers as well as their default holding dates.
Pivot table functionality was then used to determine the accessibility and availability results as the percentage of citations in each format (Online, Print, ILL) and online access type (Open Access, Paid, Available [for Subscription], and Not Available). The percentages of overall accessibility by format were used to calculate expected values for χ2 tests to determine whether individual faculty members’ cited article accessibility differed significantly from overall accessibility by format. Because this calculation involved eight separate tests, the significance level was adjusted to α = 0.01 to control the experiment-wide Type I error rate.
RESULTS
Overall, 81% of the cited articles were accessible online by TCC, an additional 8% were owned in print, and the final 11% were accessible only via ILL (Table 1). More than half of the articles that were accessible online were from freely accessible (Open Access) journal-years, representing 45% of all citations; an additional 36% were accessible only through library subscriptions (Figure 1). The majority of the remaining citations (13%) were available online via additional subscriptions (Figure 1, open yellow and red sections). Overall, only 6% of the citations were not available via potential or existing online subscriptions.
Table 1. Distribution of access format and type for the cited articles from 16 recent papers (2000–2007) by authors from the HMC Department of Biology. Mutually exclusive access categories were defined by assigning a single format and type category to each citation, with Online, Open Access taking precedence over less readily or widely available access. Cited articles accessible only in print or via ILL were determined to be Available Online or Not Available Online according to Serials Solutions E-catalog data.
Figure 1. HMC Department of Biology cited article access and availability at TCC. The hatched area indicates current online access requiring a subscription, and horizontal lines indicate that access is not available online to anyone.
Accessibility by article format cited by four of the eight faculty members differed significantly from the overall article accessibility distribution (Table 2). The set of cited articles of one faculty member (Haushalter) was significantly better represented in TCC online collections than the overall pool of articles. Two others’ sets of citations were underrepresented in TCC online holdings, resulting in a three- to fourfold increase in the proportion of articles requiring ILL requests.
Table 2. Comparison of cited-article accessibility for HMC Department of Biology faculty. Significant χ2 tests (df = 2) indicate cited article sets that differ significantly from overall accessibility percentages. Green values suggest favorable differences from the overall distribution, whereas red numbers suggest unfavorable differences.
DISCUSSION Benefits of the Method
Department and faculty citation analysis of a journal collection provides several practical insights for library collection managers. The list of unique journal titles sorted by citation frequency ranks the most important journals for local faculty (Tables 3 and 4). Additional detail in this same list identifies the most popular titles that are not locally accessible online, creating a data-based priority list for future subscriptions or one-time purchases. For example, these data could be used to gauge the local importance of specific backfile purchases according to the level of increased access to cited articles they would provide. In addition, the data from the HMC Department of Biology revealed a monographic book series that would be a valuable electronic acquisition.
JOURNAL
ACCESS FORMAT
TOTAL
Journal of Biological Chemistry
1 Online
37
Nature
1 Online
31
Cell
1 Online
30
Nucleic Acids Research
30
Science
1 Online
29
Proceedings of the National Academy of Sciences of the United States of America
1 Online
24
Genes & Development
1 Online
21
3 ILL
1
Journal of Cell Biology
1 Online
19
Journal of Experimental Biology
1 Online
19
Molecular and Cellular Biology
1 Online
18
Molecular Biology of the Cell
1 Online
15
Cell Motility and the Cytoskeleton
1 Online
9
2 Print
4
3 ILL
1
Investigative Ophthalmology & Visual Science
1 Online
14
Plant Journal
1 Online
14
Embo Journal
1 Online
13
Plant Cell
1 Online
13
Plant Physiology
1 Online
13
Journal of Physiology–London
1 Online
12
Genetics
1 Online
11
Bioinformatics
1 Online
10
Development
1 Online
10
Methods in Cell Biology
2 Print
10
Journal of Applied Physiology
1 Online
9
Biochemistry
1 Online
8
Journal of Eukaryotic Microbiology
1 Online
5
2 Print
3
Coral Reefs
1 Online
7
Current Biology
1 Online
7
Experiemental Cell Research
1 Online
3
2 Print
4
Journal of Cell Science
1 Online
7
Journal of Molecular Biology
1 Online
5
2 Print
2
Molecular and Biochemical Parasitology
1 Online
7
American Journal of Physiology
1 Online
6
Journal of Muscle Research and Cell Motility
1 Online
5
3 ILL
1
Molecular Cell
1 Online
6
Annual Review of Physiology
1 Online
5
Biochemical Journal
1 Online
5
Current Opinion in Genetics & Development
1 Online
5
Eukaryotic Cell
1 Online
5
Genome Biology
1 Online
5
Hydrobiologia
1 Online
5
Marine Biology
1 Online
5
Trends in Plant Science
1 Online
5
Zool Verh–Leiden
3 ILL
5
Bulletin of Marine Science
1 Online
2
3 ILL
Ecology
1 Online
4
Evolution
1 Online
4
Table 3. Partial pivot chart of overall title data, sorted by citation frequency. Details from each access format cell can be shown as needed.
Table 4. Details from the Cell Motility and the Cytoskeleton print format articles listed in Table 3.
This analysis also provides a valuable evaluation of journal resources for departmental self-studies. At HMC, 8 of 10 cited articles were instantly available online, 1 required a visit to the library (or 24-hour turnaround for electronic document delivery), and 1 could be delivered electronically via ILL in 2–14 days. This strongly positive overall picture is dampened somewhat by the individual faculty data, which show that one-quarter of the faculty have instant online access to only one-half to two-thirds of their cited articles. If the department sought to recommend subject areas that merit greater attention in the collection, two (tissue bioengineering and coral evolution, ecology, and systematics) would be appropriate candidates.
This last point brings up two related issues: benchmarking and variation in electronic accessibility, and availability across disciplines.
How strongly positive is instant online access to 80% of the collection? Valid benchmarks could come from two sources: equivalent data from peer institutions based on their faculty and local collections (in Claremont, the other four biology departments are obvious candidates) or the same list of citations relative to holdings at a large research university library. The latter source would be simpler for single-department institutions, at least until and unless the proposed analytical method becomes more widely used. For the latter case, comparing local citations against holdings of a large research university seems to be more reasonable because smaller institutions presumably are differently specialized.
Worth noting is the high likelihood of significant inherent variation in the number of cited articles per article among subdisciplines (represented here largely by single faculty members), as well as the extent of the variety and age of the articles they rely on. For instance, the field of evolution, ecology, and systematics undoubtedly will use a larger proportion of older literature than nanotechnology or bioinformatics. I argue that these differences do not need to be taken into account, except when the online availability of the cited articles is decreased. As long as the source journals have been digitized, greater demand for breadth or historical depth of coverage should be treated as legitimate need.
This form of analysis of faculty candidate citations also has several perhaps less obvious benefits. For small and medium-sized institutions, it can be used as a recruiting tool to reassure candidates that local collections will effectively support their research needs. It also informs librarians and faculty of the significant impact of openly accessible journal content and provides for the objective assessment of assertions that “everything I need is accessible online.”
I also have found that working with faculty publications and citations has served to identify errors in our A–Z e-journal list and catalog machine-readable cataloging (MARC) records and to educate me about the emphases of local faculty research, which in turn helps me to more effectively select books that will be of value to our faculty and students. Moreover, this analysis can help to mitigate faculty and librarian bias toward favorite titles that are nonetheless rarely cited.
Limitations of the Method
One major problem with citation analysis is the potential for “ease of access” bias. If local authors are more likely to cite articles that are accessible to them online, then their cited article sets may reflect the set of accessible articles more strongly than the set of articles they truly need. However, several observations suggest that the strength of this bias should be quite limited, for both the HMC Department of Biology and faculty in general.
First, the data show extensive evidence of “offline citation” (3–48% among HMC Department of Biology authors; mean 22%). Furthermore, two observations strongly suggest that local authors have access to additional online resources not available to TCC: Many of the papers were written before individual faculty members were appointed to the HMC faculty or while faculty were on sabbatical in residence at another institution, and collaboration with coauthors at other institutions was nearly universal (15 of 16 papers included at least one author at a non-Claremont institution, with 30 different institutions represented).
Additionally, an identical analysis of articles cited in eight articles by four job candidates for an HMC Department of Biology tenure-track position in December 2006–January 2007 indicated higher online access in TCC online collections than those cited by resident tenure track faculty (89% online, 2% print, 10% ILL; n = 331; df = 2; χ2 = 17.59; p = 0.0002). This result is opposite that expected if an ease-of-access bias were influencing the citation patterns of local faculty.
The most significant limitations of this form of citation analysis are related to its comprehensiveness. A sample of recent papers may not completely represent current research directions being pursued by local faculty. Furthermore, this research-based approach does not take into account teaching needs, which include subject-specific education-related journals as well as primary and review journals that are used extensively in teaching or student research but do not overlap with faculty research emphases. One approach to addressing these shortcomings is to complement citation-based analysis with local downloads per title, an ISI® Impact Factor subject category analysis, or both. To be sure, these analyses have their own limitations and would create largely overlapping lists of journal titles, but any unique titles identified by these additional methods would clearly complement a citation-based list.
CONCLUSIONS
Collection evaluation based on recently cited articles authored by local faculty is far more meaningful and effective than a subject-based count of available journal titles. This approach assesses the collection relative to demonstrated local need and does not require artificial delineation of subject boundaries, which is particularly difficult in the life sciences. Benefits of this approach derive from its exposure of the identity of important journals and their local accessibility and global availability. Every effort should be made to use this approach whenever a richer answer to “the number question” is acceptable.