had relatively few tools at hand for finding out which bits of their
work are appreciated and used by other academics. There are well-known,
first generation, proprietary citations tracking systems (like ISI Web
of Science and Scopus) that cover only or chiefly well-established
journals with long time-lags. In the digital-era there are also newer
Internet-based systems drawing extensively on Google which now offer a
much broader and more responsive picture of who is citing or using whom
in academia. Both types of systems have pros and cons that we will
discuss in detail, and we will give step-by-step guidance on how
academics can use the systems to look at their own work.
find out how their work is being used by other academics is to use a
combination of the three best tools, which are:
- Harzing’s ‘Publish or Perish’ (HPoP) software, which is a
tweaked version of Google Scholar that delivers rapid feedback and
covers far more sources (and somewhat more diverse sources) than
- ISI Web of Science, which is most useful for senior
academics with a slate of published work already in high impact
journals, and for academics in the physical sciences; and
- Google Book Search and Google Scholar for people working in disciplines where books and other non-journal academic outputs are important.
these three systems and quite a few alternatives in depth, and explain
how they work, what each of them is good for, their limitations, and how
to get the best possible results from each of them. Armed with our
advice notes we suggest that readers try out these systems and see which
ones seem to work best for their discipline and for tracking their
particular type of research.
on how to maximise finding an academic’s name in a search engine so that
her citations can be more easily tracked. Next we consider the older
citation tracking systems that focus only on (some) journal articles. In
section three we look at the new Internet-based systems.
name (with an uncommon surname and plenty of initials to identify her
uniquely) then it will easier to find out how many other authors are
citing her research. However, if an author has an indistinct name (like
Smith, Jones, Brown, Li, Dupont, etc. and only one initial), it will
take longer to obtain the same accurate information. It may not be
possible to efficiently use some of the best citation systems at all,
such as Harzing’s Publish or Perish (HPoP), and an academic may have to
piece together citations for each of their publications using the titles
to exclude references to many namesakes. A key implication arise here
for new researchers just starting out on academic career (or a mentor
advising a new researcher). She must choose her author name with great
care, using the full first name and adding her second name or initial if
applicable. Academics should keep in mind that from now on (for the
rest of their career) people will be looking for their work in a
global-sized haystack of competing information.
huge extra problem to citation tracking arising from the restrictive
and old-fashioned practices of journal style sheets. Coming from mostly
small countries it is still common to find that most European social
science journals include only the first initials of authors in footnotes
or reference lists, so that they do not give authors’ first names in
full, nor include their second or subsequent initials. Since academic
knowledge is now organised on a global scale this is very bad practice.
In the US, where there are now over 300 million people, the demands of
finding people in a larger society have generally meant that much better
author details are included. This is a pattern that European academics
and journal editors should urgently start to copy.
ISI completely ignores a vast majority of publications in the social sciences and humanities. Anne-Will Harzing (2010)
proprietary systems for tracking citations, also known as bibliometric
systems. Compiled by hand and run on mainframe computers, they started
as far back as the 1970s, and the best-known now is the ISI Web of
Knowledge (which has a Social Science Citation Index). Its main rival is
the less well known Scopus. Since these mainframe systems went online
they have become a lot more accessible and somewhat easier to use. Most
academics, post-docs and PhD students should now be able to access one
of them from their offices or home computers via their university
library. (Few libraries will pay for both of them, because their
subscriptions are expensive).
(Thompson for ISI and Elsevier for Scopus) rightly stress that they are
well-established and well-founded on decades of experience. The systems
give accurate citation counts (without duplications or phantom
citations) because they are human-edited systems – one reason why they
are also expensive to produce and hence are charged for. Above all they
emphasise that the carefully guarded portals of the ISI and Scopus only
include academically verified journals and exclude irrelevant or
non-standard sources. However, there are conflicts of interest in Scopus
being run by a company that is itself a major global journal publisher.
Both databases also have a strong vested interest in running their
operations in a restrictive way, to protect their costly proprietary
research boards love the solid, IBM-era technology of these systems, and
view their costliness as a sign of quality. In addition, there is a
whole sub-community of scholars and consultants who have grown up to
analyse scientific referencing, especially in the physical sciences.
Practitioners in this sub-field of library science have invested a lot
of intellectual capital in learning how to use these large systems.
Because it requires years of apprenticeship to extract meaningful data
from ISI and Scopus, most bibliometrics experts favour a strategy that
presents their data as comprehensive of the best journals. This has
hindered the development and recognition of newer internet-based systems
Scopus have some severe limitations that need to be kept in
mind-especially by social scientists and academics in the humanities –
because these systems cover only a limited number of journals, and no or
few books. In addition, the indexing criteria for journals are lengthy
and heavily weighted towards journals that have already accumulated a
critical mass of citations from journals that are already in the index.
bias in coverage towards English-language and towards older established
journals. ISI especially is heavily American- dominated. Because the US
is a large and rich society, with many more academics in most social
science fields than in Europe or any other region of the world, the
conventional systems automatically tend to deliver rankings and
statistics that are weighted heavily towards success in the US ‘market’,
compared with the rest of the world. The ISI system does not cover
references in books, (although it does cover some book reviews in
journals). The Scopus System covers Book Series. Excluding books is a
fairly small problem in the physical sciences, which explains why the
ISI systems are set up in this way. But it is an insurmountably serious
limitation across the humanities where books are the main mode of
scholarly communication and a key vehicle of disciplinary development.
The lack of book coverage poses is a serious difficulties for accurately
measuring citations within ‘softer’ social science fields where books
remain very important.
references in working papers or conference papers, and hence have very
long time lags. Publishing in a journal across the social sciences
generally takes a minimum of two years from submission to publication,
and often up to 3.5 years in the most competitive and technical fields
like economics. In the interim, conference papers and working papers
often provide many indications of how much work is being cited. But
neither type of outputs is included in the ISI, nor in the Scopus index.
Rather than reflecting the latest advances in academic research, these
systems tend to reflect the output component of the discipline three or
four years in the past. As a result of all these factors, ISI and
Scopus only cover a low fraction of academic journal papers in social
science published worldwide, and far less than the coverage in the
physical sciences, which can be regarded as near complete.
of the ISI databases in 2006 by showing the percentage of references
made in ISI articles that were made to journal articles already included
in the database.
far the ISI Citation Indexes for 2006 include the references cited by
articles contained in the database across groups of related disciplines
|Percentage of references cited in the ISI databaset hat are to other items included in the database|
|High (80-100%)||Medium (60-80%)||Low (40-60%)||Very low (less than 40%)|
|Molecular biology and biochemistry (90%)||Applied physics and chemistry||Mathematics (64%)||Languages and communication (32 to 40%)|
|Biological Sciences - humans (82 to 99%)||Biological sciences - animals and plants (c 75%)||Engineering (45 to 69%)||All other social sciences (24 to 36%)|
|Chemistry (88%)||Psychology and psychiatry (c.72%)||Computer sciences (43%)||Humanities and arts (11 to 27%)|
|Clinical medicine (85%)||Geosciences (62 to 74%)||Economics (43%)|
|Physics and astronomy (84 to 86%)||Social sciences in medicine (62%)|
important work in a field, then most of these references should be to
articles elsewhere in the ISI database. Figure 3.1 shows that ISI’s
internal coverage was indeed high in the medical and physical sciences,
for instance over 90 per cent in physics. Across other STEM disciplines
from four fifths to nearly all of the references are included. In more
applied physical science fields this proportion falls to two thirds or
three fifths, and in maths and engineering to between two and three
fifths, a level that is relatively lower. Social sciences, however, are
strongly affected by ISI’s coverage bias. With the exception of social
sciences related to medicine, coverage for the rest of social sciences
falls below 50%; for example, 43% for economics and between 24-36% for.
The humanities are the most affected with only 11-27% of internal
coverage. Most bibliometric experts acknowledge that the usefulness of
these systems declines sharply if they include fewer than three quarters
to two thirds of all journal articles world-wide.
orientation towards American journals affect coverage when we come to
look at research undertaken in other countries, like the UK. A detailed
analysis was undertaken of the research submitted to the UK’s Research
Assessment Exercise for 2001 (covering publications in 1996-2000),
providing a useful external measure of coverage. It found that the ISI
database included five out of every six RAE items submitted in the
physical sciences (the STEM disciplines) , but only one in four items
for the social sciences, as Figure 2.2 demonstrates. These numbers are
very similar to the ISI internal coverage numbers above, even though
they relate to different dates. So the internal coverage estimates for
the database as a whole and the UK-specific external estimates of
coverage offer a similar picture.
social sciences concerns the trends over time – has the ISI got better
at including social science materials? Do its continuing problems
perhaps reflect chiefly its origins in the physical sciences and
initially rather restrictive approach to including journals? As the
database has expanded along with the growth of social sciences journals
and publishing, has it become any more inclusive? Figure 2.3 shows how
the detailed ISI internal coverage of the social science disciplines
changed over a decade and a half. There has indeed been a general
substantial improvement in coverage of these disciplines, but one
starting from a pretty low base. By contrast, in humanities subjects the
ISI’s inclusiveness has generally either declined or increased only
slightly. Subjects bridging from the social sciences into STEM
disciplines also show increases in internal coverage, but with smaller
percentage changes because they start from a higher initial base.
far the ISI Citation Indexes have improved over time in their including
the references cited by articles contained in the database across social
science and neighbouring disciplines, from 1991 to 2006
|ISI's internal coverage (%) in||Percentage change 1991 to 2006|
|For comparison: Life sciences||93||87||7|
|Inter-disciplinary social sciences||40||33||21|
|Languages and linguistics||40||26||54|
|Information science, Communication science||32||32||0|
|History, Philosophy, Religion||27||24||13|
|Political science, Public administration||24||17||41|
|Creative arts, Culture, Music||14||17||-18|
Notes: ‘Internal coverage’ means the percentage of references
cited in articles in the ISI database that are to other items included
in the database. The yellow-shaded rows here are those for social
sciences, green for humanities, and blue for subjects that are primarily
physical sciences or STEM subjects.
the ISI databases in the social sciences were routinely acknowledged,
but none the less were put somewhat on one side because the data
represented one of the only sources of insight. However, in the modern
era where there are viable alternatives (indeed superior options for
most social scientists) this stance is no longer appropriate.
Bibliometricians commissioned by the UK’s Higher Education Funding
Council to help them consider the use of citations data recommended that
it was not appropriate to rely on conventional citations systems like
ISI unless the internal coverage of items approached four fifths (the
‘high’ level in Figure 3.1) (Centre for Science and Technology, 2007,
pp.xx-xx). The lower that coverage gets in a field, the less useful ISI
ratings could be for assessing scholarly performance. They recommended
that in disciplines where less than 50 per cent of references are being
included in ISI, citations analysis could not contribute reliable
information to a research assessment process.
limited coverage and geographical bias, academics should interpret ISI
citation data with some degree of caution. In the social sciences ISI
does not in any sense provide a more accurate insight into the overall
and global impacts of academic work than newer internet-based systems.
It can offer, however, a somewhat better picture of academic impact for
those disciplines which tend to focus on high-prestige American-based
journal articles. As the US is still normally rated as the first or
second most influential country in the world across all social science
disciplines, this is an important consideration.
somewhat complicated processes that are normally necessary to extract a
record from it of how your work has been cited:
normal literature review searches. However, it doesn’t provide the very
helpful ‘snippet-view’ materials that Google Scholar does, which can be
very helpful in ascertaining what a paper is about if it has an obscure
title, and which are more helpful for checking through the backlist
works of particular authors. But ISI does provide a relatively useful
means of checking for key terms in article titles. It has a good date
record and hence is an effective way of surfacing some of the main
journal articles with keywords in their titles in say the last 5 or 10
years, often the most relevant search periods.
development of article-finding, book-finding and citations-tracking
systems free over the Internet, having ambitiously declared its mission
to ‘to organise the world’s information.’ Less than a decade after its
founding, the company’s twin academic research engines Google Scholar (for journal articles and other academic papers) and Google Books now dominate the university sector.
system from Elsevier, a free-to-use counterpart to their Scopus system
which draws more widely on current working papers and conference papers.
It operates similarly to Scholar and is worth checking as an additional
source. In the US there are some other Scholar competitor sites, but
they all rely on academics registering and voluntarily uploading
materials. As many academics are unlikely to do this, the coverage of
these sites (like CiteSeerX and getCITED) is now far too restricted and
non-comprehensive to be very useful.
systems like Google Scholar (also an approach used by Scirus) derives
from the fact that they voraciously and automatically record all citations. In particular they include:
- all ‘black’ literature in journal articles or books that has been definitively and formally published, plus
- less conventional ‘grey’ literature, such as working papers,
conference papers, seminar discussions or teaching materials that has
been issued in a less formal or definitive form – often, of course,
including versions of material that is later formally published.
far more up-to-date in its picture of academic debates and controversies
in each discipline, especially so in fields like computer science and
IT studies where the pace of change in technologies and social uses of
IT is very rapid. Scholar also gives users much more immediate
information about the work being found, and it often gives full-text
access to it if the material is not in a published book or placed behind
a journal pay wall.
been strengthened (and the obsolescing of American voluntary
article-aggregator sites has been speeded up) by the growth of online
research depositories in most serious universities in the advanced
industrial countries. These university archives now host copies of their
professors’ and lecturers’ works that previously were accessible only
with great difficulty (by going to each individual author’s personal
website) or behind journal pay walls. University online depositories
also often contain conference and working papers that have not yet been
formally published in journals, which Scholar and Scirus can both access
and provide immediate full text access to.
Scirus has been the development of some important multi-institutional
sources hosting key research in pre-journal forms for free download. In
the physical sciences newsletters and research feeds now often sustain a
vigorous window into professional culture and current developments. In
the social sciences these networks are somewhat less developed, but
research paper depositories are big news. Two of the most important are
the multi-field Social Science Research Network (SSRN) and in American
economics the National Bureau for Economic Research (NBER),. But there
are many others.
articles, papers and related materials, at first sight it seems clear
that Scholar and Scirus should be the most useful search tools. However,
there are also four significant problems.
clearly access a range of mainly academic sources, but unlike ISI and
Scopus neither company provides any full specification of exactly which
sources they use. Scholar clearly searches many conventional academic
index systems, as well as journals’ and publishers’ websites, conference
proceedings, university sites and depositories and other web-accessible
materials in academic contexts. But Google provides almost no
information on exactly how this is done. This non-disclosure creates a
big problem for government or professional bodies, and feeds their
resolution not to take what Google says on trust.
Scirus are both equally secretive about the algorithms that they use to
sort and search, in particular to discount duplicate entries for the
same material, and how they count the remaining citations (after
duplicates are removed). This is a highly sensitive subject and adds
another barrier. However, the companies also argue that only by keeping
their algorithms secret can they effectively counter spam, which is a
growing and huge problem. Clearly if the ranking of sites could be
distorted by spammers, the usefulness of Scholar or alternatives would
be completely devalued.
Scirus are automated systems they sweep up together lots of different
academic sources, some major journal articles, key professional
conferences or major university e-depositories, but others quite likely
of questionable academic status and provenance. So citations become
blurred and over-inclusive, with far more marked variations in the
‘academic value’ or ‘research’ status of different citations than occur
within the walled gardens of the ISI database.
that they cannot recognise duplicated outputs, for example, a paper that
is available both on a standard journal website and on the author’s
personal website. This has implications for accurately counting the
number of outputs and citations.
purpose of accessing Google Scholar (or Scirus) were to rank scholars’
standing or citations to their research comparatively, or if its
rankings were used to allocate rewards like research support funding
between departments or universities. However, we have chosen to focus on
two distinct features of these systems:
- allowing individual academics and researchers, or teams and departments to track their own citations; and
- expanding literature searches of other authors’ or researchers’ main works.
are still worth bearing in mind, but they are only limitations that
emphasise the need for individual judgement by the person consulting
them. Authors and research teams know their own work better than anyone
else in the world, and are therefore better able to analyse the data.
tweaked forms of accessing Google Scholar, of which the most important
is the ‘Public or Perish’ software available for free download from www.harzing.com/pop.htm.
This is a valuable programme that combats many of the problems of
interpreting Google Scholar outputs and allows academics to easily check
their own or others’ performance.. It presents academic outputs quickly
and computes excellent citation statistics about each author’s work,
including a overall ‘times cited’ score and times cited per year since
publication. We will continue the discussion of the more complex
versions of HPoP citation statistics in section 2.4. Box 2b explains
how to download and use the programme.
designed to make available a range of different online views of a
book’s contents to potential readers. Essentially Google has now run
around 10 million books through optical character readers so as to
create online images of each page. For books that are out of copyright,
Google makes available the full text for reading online, but the
material cannot be downloaded in the free use version of the programme.
The text of most out of copyright books is also fully searchable, so you
can easily find specific sentences, quotations, or words of interest
anywhere in the book. This software is so powerful and so good that many
scholars now use Google Books as an online index to find material
within books that they already have on their shelves, but which have
either no index or the normally very inadequate academic book index
system. There are also links through from Google Books to the
publisher’s website, to booksellers offering the book, or to libraries
nearby to the searchers’ location that stock it.
information is viewable on Google Books depends on what agreement the
book’s publisher has reached with them. The most restrictive ‘no
preview’ entry just replicates the publishers’ blurb and perhaps gives
the contents pages. The next most restrictive approach is a ‘snippet
view’ that offers only a few short glimpses of the book’s content, but
still allows readers to search the full text and to find relevant
material. If you want to find out if a book covers the kind of topic you
are interested in, even in snippet view you can very quickly check far
more material in a fraction of the time that would be needed for
previous literature searches. The most expansive Google Books preview
allows you to read many full pages of the text, but normally will leave
out some key chapters or sections. However, you can usually search
across the omitted sections as well as the full text pages (helpful for
knowing how much coverage a book gives to your topic of interest). But
again you cannot download a copy of the book in the free version.
worldwide in a commercial version that will make all copyrighted books
in its database available for download, of course in return for a fee
that will be agreed between Google, the publishers and universities.
Google will potentially have an enormous monopoly position here, in a
market that is bound to grow very strongly in size and value over the
next decade, as e-books take off. How governments in the US, Europe and
other regions of the world decide to regulate Google’s operations of
this key intermediary role will have very substantial consequences for
how academic research develops, especially in the most book-based
disciplines, such as the humanities and ‘softer’ social sciences.
however, what concerns us here is the citations-counting capacity of
Google Books, and Box 2c explains how to use it.
categories of citations systems perform against each other. In general
the HPoP/Google Scholar database is much more inclusive than the ISI
one, especially in disciplines where books and book chapters are an
important means of professional communication. Figure 2.4 shows how the
ISI and HPoP indices compare. The top two parts show only the items
included in the ISI, first on a linear scale (which shows a strong
bunching of low-scoring items) and secondly on a logarithmic scale
(which helps to spread out the lower scores and shows the patterns of
data better). In every case the ISI cites score for a publication is
less than the HPoP/Google Scholar score (the point where they would be
equal being shown by the parity line).
and b include only items in the ISI database for this academic; Figure
2.4c includes all items in the person’s HPoP listing with at least three
cites. The HPoP scores have been manually cleaned to eliminate
duplicate Google Scholar entries.
person’s HPoP scores. The items scoring high on HPoP but zero on the ISI
are in all cases comprised of books, book chapters and journal articles
in journals that are not indexed by ISI. Five of this author’s top 6
cited items fall into this category, and 12 of the person’s top 20 cited
is only indicative, and so to get a broader picture we turn next to data
from the Impacts Project Database (IPD) described in Annex 1: Impacts Project Database.
Essentially we collated ISI and HPoP scores for all the traceable
publications of a sample of 120 academics spread across five social
science disciplines. We also carefully checked by hand all the
publications listed in HPoP/Scholar and looked at all the sources citing
them. We removed all duplicate entries, unacknowledged citations,
publishers’ publicity materials etc to produce a completely ‘cleaned’
IPD score, one that also incorporated citations in books. We aggregated
the ISI, HPoP and IPD scores for each academic concerned, and compared
|ISI scores||Harzing scores|
|Harzing scores||0.22 (0.24)|
|IPD scores||0.14 (0.46)||0.95** (0.0)|
the picture drawn above. Most ISI cites scores for authors are much
lower than their HPoP scores, although it is noticeable that one in 10
of the sample showed ISI scores that are higher than their HPoP score.
One in twelve of the sample were rated by ISI as having a minimal score
of 1, whereas their HPoP scores ranged from 0 to 2089 cites. (On a per
author basis there are obviously fewer instances of ISI registering zero
cites than was the case for the per item basis in Figures 2.4a, or (b)
also strongly born out at the author level in the manually checked IPD
scores. The key reason for this is shown in Figure 2.5c, where the
HPoP/Scholar and IPD scores are shown to correlate almost perfectly (and
of course significantly). By contrast, the ISI scores correlated weakly
with the HPoP/Scholar scores for our sample, and even less well with
the carefully checked IPD scores.
analyses, at the level of an author’s whole profile of work. By pooling
data across multiple authors, and looking instead at the level of
individual items we can examine how the relationships between the ISI,
HPoP and IPD scores operate at the level of individual publications.
Figure 2.6 shows the results for all the publications of a small
sub-sample of 15 academics taken from our 120 in the pilot survey. [We
are in the process of extending the item level analysis to cover all
authors in the pilot survey, so these early findings are indicative only
at this stage – although the patterns here are so strongly marked that
we do not expect them to change much on further analysis]. We
essentially repeat here the previous analysis, but at the level of
between ISI and HPoP scores, and between ISI and IPD scores, for all the
publications of a subset of 15 academics drawn from the IPD
strengthened. Figures 2.6a and 2.6b show that only around a quarter of
the score that individual social science publications get in the ISI
database can be explained in terms of HPoP
the manually cleaned and checked IPD scores (also including manually
checked Google Books scores). By contrast, Figure 2.6c shows that the
HPoP/Google Scholar scores for all publications included in the analysis
are very similar indeed to the checked IPD scores. Indeed the R squared
proportion of variance explained is 97 per cent, meaning that the two
indicators are clearly tapping the same phenomena. Interestingly,
although our analysis eliminated a good deal of double counting in the
HPoP/Google Scholar listings, none the less the checked IPD scores are
somewhat above the parity line here – reflecting the role of
Google Books in boosting item scores. The two indicators move closely in
step, but are not exactly the same. By contrast, the ISI citations
count for most social science publications is far less than the
HPoP/Google Scholar or IPD counts.
clear-cut for academics. The quickest, most reliable and most
comprehensive way of understanding how their research is being cited is
to run a HPoP/Scholar analysis of their outputs and to manually clean
the results so as to correct for problems, as discussed above. The ISI
cites scores perhaps add insight into which journal articles
are being cited in other US-orientated research articles. But in most
social science fields, and especially more book-orientated disciplines,
the ISI simply does not include enough materials to be a useful or
reliable guide to what is being found useful and cited by other members
of the profession.
Impact of Social Sciences – 2: Using citation tracking systems