presented at the ICCC/IFIP Electronic Publishing Conference
- ElPub03 - From Information to Knowledge - held at Universidade do
Minho, Guimarães, Portugal, 25-28 June 2003
(A PDF
of the published version is now available from the Scix.Net archive)
John W T Smith,
The Templeman Library, University of Kent, Canterbury, Kent
CT2 7NU, UK
J.W.T.Smith@kent.ac.uk
Developments in net based academic publishing since 1999 are reviewed in light of the requirements of the Deconstructed Journal (DJ) academic publishing model. This model proposes that all the activities of traditional journal publishing could be carried out by a group of co-operating independent agents without the necessary requirement of a publisher to co-ordinate them. It is shown that most of the elements required for the DJ publishing model to operate are now available. The one major element missing is the independent Certification Agent (CA). The role of the CA in the DJ model is to provide quality control (refereeing, etc.) and to confirm this it should affix its ‘seal of approval’ to the document. Document integrity and authentication are described and the operation of Digital Signatures (DSs) reviewed including current usage problems. It is concluded that easy to use DSs or similar tools are needed to allow the emergence of truly independent CAs but the reward will be documents that are completely free.
Keywords: academic publishing, scholarly publishing, new publishing models, deconstructed journal model, document integrity, document authentication, digital signatures
In earlier papers (Smith, 1997, Smith
1999a) a new academic publishing model was described entitled the Deconstructed
Journal (DJ model)[1] . This model proposed that all the roles
of the academic journal could be fulfilled by a group of quasi-independent[2]
co-operating agents each playing a part in the activity of academic publishing.
It also proposed that these agencies did not need to be organised or co-ordinated
by a central publisher. Between them these independent agents could fulfil all
the needs presently fulfilled by the current academic publishing industry.
It was shown in those earlier papers that the academic publishing industry (which
gives rise to the current form of the academic journal) plays certain clearly
defined roles vis-à-vis the activity of academic research. The following
main roles were identified along with the agencies that fulfilled these roles
in the current and proposed (DJ) model.
| Role | Agency in current model | Agency in DJ model |
| Quality control (content) | Referees, organised by publisher | Independent ‘certification agents’ (called ‘evaluator organisations’ in Smith, 1999) |
| Conferring recognition of work done | Referees and journal editorial board | Independent ‘certification agents’ or (less directly) editorial boards of overlay journals[3] , (called ‘subject focal points' in Smith, 1999) |
| Making available | Publisher – printing the article in an issue | Placing of material in local or centralised freely accessible electronic archives[4] |
| Making aware or marketing | Publisher – marketing of the journal to libraries and other customers | Overlay journals, general or specialised search engines, Web directories, subject portals, Weblogs. |
In order for the DJ model to operate fully there must be examples of all the required agencies that form it in existence.
Although the model would operate at its most flexible if all the agents were
independent some agencies could be combined and the basic model would still
operate. For example overlay journals could also be certification agents but
they must not claim ‘ownership’ of the item certified or pointed
to.
When the model was originally proposed some of these agencies existed in a proto-form,
i.e., they had not been created to play a role within the DJ model but they
already had the basic necessary functionality. For example there were freely
accessible centralised archives like the Physics E-print archive. Also many
search engines already existed to allow users to find items of interest on the
Web thus performing the ‘Making aware’ role. However only a very
few subjects had substantial archives and the ‘finding tools’ were
rudimentary.
This section considers what elements of the DJ model already exist in proto or fully realised form. We will look at each role and its required agency (within the DJ model) in turn and see if there are existing examples.
As yet there do not appear to be fully independent certification agents. By ‘independent’ I mean providing certification separately from publication or ‘making available’. There are journals that make articles they have published in e-form freely available. For example the journal Learned Publishing makes its papers freely available, as does the New Journal of Physics. BioMed Central (BMC) makes the research articles in all of its e-journals freely available (BMC is discussed in more detail in the section Making available below). Many journals allow authors to place copies of final papers (the version as published in the journal) in e-print archives and some authors do this even if the journal does not explicitly allow it. Finally there are many more journals that make a subset of their articles freely available (usually limited to issues over a certain age). This pool of already certified articles allows the possibility of ‘proto overlay journals’ that point to selections of these articles. This is discussed further below under Making aware or marketing.
As noted above there are no examples of truly independent certification agents yet.
Obviously, by choosing to link to an article the producers of an overlay journal
are indicating they think the work ‘cited’ is of some value. However,
using the current normal form of linking they can only point to an address (URL),
they cannot guarantee that the item pointed to is the same one they originally
chose. So the link is really saying, “This is a good article (assuming
it is the one we read when we made this decision)”. This limits the extent
to which they can confer recognition of work done. The problem of document integrity
and authenticity in an electronic environment is discussed below in the section
The next steps.
See Making aware or marketing below for a detailed discussion of the
current status of overlay journals.
The growth in electronic archives for scientific and social science subjects
has been enormous over the past few years. Some have followed the Physics ArXiv[5]
model and other subjects have invented their own, for example, CogPrints[6]
(for papers related to the study of cognition, e.g. psychology, neuroscience,
linguistics, computer science, philosophy and biology), NCSTRL[7]
(computer science), and RePEc[8] (economics).
Production of new e-print archives has been made easier by the provision of
free software to build them. Already three different packages are available,
CDSware[9] from CERN, DSpace[10] from MIT,
and EPrints[11] from the University of Southampton. It is
planned that a fully supported commercial version of the EPrints software will
be available from Ingenta[12] . All of these packages are
OAI compliant (OAI is discussed in Making aware or marketing below).
Further impetus to the provision of e-print archives has been given by the Budapest
Open Access Initiative (BOAI)[13] from OSI[14]
. This builds, in part, on the Open Archive Initiative and encourages the use
of open archives and open journals to make the results of research freely available.
BOAI have published two detailed guides to explain how to launch a new open
access journal (Crow and Goldstein 2003a) and how to
convert a subscription-based journal to open access (Crow
and Goldstein 2003b). Another recent promotion of the idea of open access
repositories is the recent report from SPARC[15] (Crow,
2002) which strongly promotes the idea of institutional repositories. This
case is reiterated for a European context in Buckholtz
(2003).
In the area of commercial open publishing there is BioMed Central[16]
which publishes a range of e-journals in biomedicine. The papers are freely
accessible. BioMed Central has an ‘article processing charge’ (paid
by the author or author’s employer) which pays for the work in publishing
the article, including “obtaining peer reviews and in preparing the article
for publication”[17], the inclusion of a reference in
PubMed, and archiving the article in PubMed Central[18].
Thus there is a growing (and the growth is accelerating) repository of freely
available academic material either already quality certified or needing to be
certified (or which would benefit for certification). This also provides the
target material for overlay journals which are discussed next.
The name overlay journal comes (I believe) from a comment in Ginsparg
(1996) where he discusses the possibility of information services provided
as an ‘overlay’ on the Physics e-print archive. Such a service already
existed in 1996 according to Smith (2000). An excellent
example of a new operational overlay journal is Applications of Superconductivity[19].
This title happily describes it self as a ‘virtual journal’ and
it contains "a multijournal compilation of developments in superconducting
electronics, materials and largescale systems". This journal shows exactly
how an overlay journal can add value. In addition to links to relevant articles
it provides e-mail alerting of new items, the ability to search across the virtual
journal and links to article supply services if the text you want is not freely
available. Although it is currently free one can see how it could charge a small
fee and be worth the cost. Applications of Superconductivity is one
of a series of virtual journals (Virtual Journals in Science and Technology[20])
developed jointly by the American Physical Society and the American Institute
of Physics. There is also an experimental overlay journal concerned with electronic
publishing entitled Perspectives in Electronic Publishing (PeP)[21]
. This is described as a “journal-centred portal” and is much more
than a simple set of links. In Krichel and Warner (2002)
the journal Geometry and Topology[22] is referred
to as an overlay journal but it does not fit my definition as makes articles
available on its own server and only uses the ArXiv service as an archive.
A major step forward in the area of ‘making aware’ has been the
Open Archives Initiative (OAI)[23]. This promotes interoperability
between independent archives[24] by specifying a standardised
form of metadata presentation (the OAI metadata harvesting protocol) which allows
automated harvesting of metadata by external services to provide cross-archive
search services and current-awareness services. Currently there are 86 OAI compliant
archives[25]. These are contributing records to a small range
of service providers[26]. A particularly interesting service
is provided by DP9[27] from Old Dominion University. This
service forms a link between traditional search engines and the contents of
the OAI compliant archives allowing the search engines to index the contents.
Although I see Weblogs as precursors to new overlay journals, or even new forms
of journal, I have given them a separate section in this article as I want to
describe them in detail.
At their most basic Weblogs are web pages containing lists of sites visited
(hence ‘web logs’) with comments by the producer or editor. Their
original form was like a diary recording interesting pages found. They are intended
to be constantly updated with the latest addition being at the top of the list.
There are variations on this format, for example, one might have a ‘thought
for today’ approach with links on a theme embedded in a few lines or paragraphs
of text then the next day another theme would be explored. Another Weblog might
be devoted to a single theme with more and more links being added over an extended
period. They have existed in their current form since 1997 (although some writers
on their development claim the earliest Web site listings produced by Tim Berners-Lee
and others in the early 1990s were proto Weblogs). More recent developments
have added a list of related Weblogs (or other Web sites) that the editor of
a Weblog regularly reads to the Weblog home page or a linked page. This helps
place the editor and the content of that Weblog in a context. This list is known
as a ‘blogrolling list’ (Paquet 2002).
It is possible to build and maintain a Weblog site with conventional Web authoring
tools (or just a plain text editor) but with the growth of interest in Weblogs
dedicated production and maintenance tools have become available. Some take
the form of a Web site where one can add a new item in a standard form and this
is added automatically to your existing Weblog. An example of this approach
is ‘blogger.com’[28]. Another approach is to install
a dedicated Weblog production package like Radio Userland[29]
on your own PC. This enables the user to build and maintain a Weblog on their
PC to be published on an external hosting service. The updating can be carried
out on or off-line. Such is the interest in producing Weblogs there is now a
site that offers comparisons of a range of Weblog production tools[30].
Some writers have discussed the possibilities of Weblogs for researchers (Paquet
2002, Mortensen and Walker 2002). Jill Walker
also has a Web page entitled ‘Research Blogs’ which lists Weblogs
maintained by researchers[31].
There have been a few articles discussing Weblogs in the general library and
information literature over the past few years (2001-2002) but these have mainly
concentrated on their use in library and information work. No one appears to
have spotted that Weblogs have all the basic attributes of full scale overlay
journals. With very little (if any) modification one could take one of the Weblog
production packages (online or locally installed) and build a passable overlay
journal quite quickly. As was pointed out in Smith (1999b)
almost all the genuine innovation in e-publishing has come from net users not
from the commercial publishing world. Also, end users often use tools designed
for one thing for something the designers didn’t envisage. When Tim Berners-Lee
originally invented the Web he was thinking of hyperlinked technical documents
not the Web as we see it today.
Finally, it is interesting that Weblogs started as online diaries or journals
(in the original meaning of the word 'a record of the days activities') –
it would be somehow poetic if these new journals replaced those old journals.
Although we have all the new tools discussed above there is continuing need for the traditional search engines like Google[32] and AltaVista[33] . There are also more specialist services like Scirus[34] concentrating on specific areas of knowledge. In the future one can imagine specialist search engines so focussed that they border on being new overlay journals. There is also a continuing need for the general purpose directories like Yahoo[35] and the Open Directory Project[36] as starting points for less focussed searches. The specialist directories and subject portals like those that form the RDN (Resource Discovery Network)[37] will probably move towards becoming overlay journals over time.
As can be seen from the previous section the main elements of the DJ model are
beginning to form. This formation is not to satisfy the requirements of the
DJ model but is happening simply as outcomes of other activities. It is interesting
to note that there appears to be almost an inevitability about this process
– many of the precursors of a revolution are becoming available even though
the processes bringing about these changes do not have this as their specific
goal.
Looking back to the Introduction above it was stated that in order
for the DJ model to work all of the main agencies (or elements) that form it
have to be extant. We already have the independent repositories in the form
of institutional and central open archives and the software packages to build
more. We have the beginnings of a mechanism to provide detailed search and retrieval
services with the OAI metadata harvesting protocol and the services being built
using this standard. Overlay journals already exist as such or in proto-form
as Web directories or subject portals. We may find ourselves with a surfeit
of overlay journals if Weblogs develop as I suspect they might. The only major
element that is missing is ‘independent certification agents’ (and
it is this critical element that most distinguishes the DJ model from other
similar proposed models).
Why is this element so critical to the DJ model? Because without the separation of quality control from making available (publishing) you still have remnants of the traditional journal model with articles only available from a specific source. It has to be admitted that the model adopted by BioMed Central almost escapes this criticism with the deposit of copies in PubMed Central but we still have a partly centralised model.
Any person or organisation that can claim expertise in a subject and is respected for that knowledge could set up as a CA. Learned or professional societies have a head start in this new world. They already have the necessary reputation and their members have the expertise. Commercial organisations could do it by ‘buying in’ or otherwise organising such expertise. This is what commercial publisher already do. They persuade recognised academics to sit on editorial boards of journals or act as referees for papers. So existing publishers could just move to become CAs. It is clear there is no reason why independent CAs as required by the DJ model should not exist. There is one technical requirement that is not yet fully available – this is described and discussed below.
The ideal envisaged in the DJ model is that a document can be anywhere (including the possibility of multiple copies in more than one place) and the CA can be anywhere. What is needed is a mechanism whereby the CA can attach a ‘seal of approval’ to the document that guarantees this is a true copy and it was certified by this CA. Once we have such a mechanism the document can be placed anywhere on the net with no continuing connection to the CA. This leads us to the problems of ‘document integrity’ and ‘document authentication’.
If you print out a page of an article and put it in a drawer for a year you
can be reasonably sure that it will still be there and readable when you look
again although the ink may fade and the paper become brittle. One thing you
can be absolutely certain about is that the words will not move around the page
or some of them disappear without trace or be replaced by others. This is not
true with electronic documents. They are just computer files and computer files
only exist physically as patterns of magnetism on a disc or even more ephemerally
as patterns of charges in memory. They can easily be altered intentionally or
unintentionally. Even when the file exists as a relatively stable magnetic pattern
on a disc it may not stay the same pattern for long even if the file is not
being deliberately changed as utility programs move files around to de-fragment
them or make better use of the space on the disc. The integrity of computer
files (and hence electronic documents) has always been a problem. It is less
of a problem as long as the document stays on the same computer because it is
possible to track these moves and be sure that a file has remained unchanged
in terms of content even if its physical representation has changed. However
once the document is made available on the network and can be downloaded to
other computers this basic certainty is lost.
Fortunately there are ways to ensure integrity of the contents of a computer
file (and hence an electronic document). One of these is to use what is known
as a ‘one way hash function’ to compute information about the file
which can be used later to see if the file content has changed. Any hash function
takes an input string of a variable length (like a file containing an electronic
document) and returns a fixed length string usually much shorter. A one way
hash function takes this a step further such that it is very hard to reconstruct
the original string given the fixed length string and it is also very hard to
construct another input string that hashes to the same output string. The output
string is given a range of names, e.g., message digest, fingerprint, cryptographic
checksum, or message integrity check. The most commonly used name seems to be
‘message digest’. For a detailed description of this (and other
related techniques) see Schneier (1996). Hash functions
are not secret so given a file, the message digest and the name of the hash
function used to produce the original message digest it is possible to re-calculated
the message digest of the file you have and compare it with the one given with
the file. If they are the same one can be sure the file you have is identical
to the original.
So now we have a way of ensuring that the file is unchanged but how can we be
sure the sender is who they claim or in our case that this is the file certified
by the relevant CA? One way to do this is to use a Digital Signature (DS). The
use of a hash function as described above to check the integrity of a file is
the first half of a DS. A DS also uses public key cryptography (PKC) to ensure
the authenticity of a message by ensuring the sender (or the person or organisation
who ‘signs’ the message) is who they claim to be. PKC was originally
invented to get around the key exchange problem of basic symmetric cryptography
where the key that encrypts the message also decrypts it. With PKC there are
two keys, a private key and a public key, a message encrypted with the one has
to be decrypted with the other. This has the added advantage that only the sender
knows the private key and the public key only decrypts messages encrypted with
the matching private key. So you can be sure if someone’s public key decrypts
a message it must have been sent (or encrypted) by them. We could prove both
the integrity and authenticity of a message (or document) by encrypting the
whole thing but encryption and decryption are computationally expensive and
so a DS combines the use of a hash function with PKC to make it easier. The
procedure is as follows. A message digest is calculated for the document, this
message digest is encrypted using the sender’s private key, the two items
(the document and the encrypted message digest) are bundled together (e.g. in
an e-mail message). The recipient takes the document and calculates the message
digest, then finds the public key for the sender and decrypts the accompanying
message digest. If the two message digests are the same this is the document
sent (or certified) and the sender (or certification agent) is who they claim
to be.
Simple isn’t it? Unfortunately, no it isn’t. Although the elements
that enable DSs to work are all known there appears to be no agreed standard
for how they are put together. It is possible to buy DS programs that run on
PCs which automatically do the calculations and encryption and package up the
file ready to send or to be downloaded but the recipient has to have the same
software for the unpacking and verification to be done automatically. It is
as if it was agreed that all cars have to have a steering wheel and brakes (and
also agreed how these things work) but there is no agreement on which side the
steering should be on or whether the brake is the middle or left pedal. Any
competent computer scientist could carry out the process described in the previous
paragraph, I am assured it is not particularly difficult, but we are not all
computer scientists. It is possible that in time commercial packages will converge
on a common standard, at least to the extent that someone using one DS program
will be able to accept and process a file processed and packaged by another.
However there is no guarantee of this. Word processing programs have been around
for many years but most still use their own proprietary file formats to store
documents even though there is a generally accepted interchange format (RTF[38]
). There is work being done at NIST[39] to enable DS services
to interwork[40]. However, since the basic FIPS[41]
Digital Signature Standard[42] was published in 1994 and we
still don’t have an agreed way for DS programs to interwork I am not expecting
a solution soon. Maybe what we need is an initiative similar to the OAI that
designs a simple standard sufficient for academic publishing needs.
There may also be simpler ways to achieve our goal. Since all we want is to
be sure the document we have is the one originally certified we could just calculate
the message digest, send this to the claimed certification agent (or a secure
site that maintains a list of certified documents) and ask it to return the
title (or bibliographic record) of the document. This may not be as secure as
using a DS but we are not dealing with national secrets or sensitive personal
information.
If we want to achieve the full flexibility of the DJ model we need an easy to
use solution to the document integrity and authentication problem.
All the elements of the DJ model are available but one – the independent CA. In some ways they do exist – all publishers of reputable academic journals are CAs but in most cases there is still the ownership link between the CA and the item certified. Even if ownership is relinquished there is still a need for a guaranteed copy to be available on a server usually controlled by the CA. This limits the development of the distributed open archive or open repository model and prevents the emergence of truly independent CAs. As Krichel and Warner (2002) point out:
“since these papers are in places where they can be modified by authors, it does not appear to be possible to base a certification system on these papers.” |
With digital signatures or a similar mechanism to allow the paper to carry its certification with it we can escape this limitation. The phrase ‘freeing the journal literature’ is used almost like a battle-cry by those that advocate making all the journal literature freely available in open archives or repositories. This single step finally does this, the certified article can be anywhere on the net. It is not tied to any server or site, it is completely free.
Buckholtz, Alison, et al, 2003, Open Access: Restoring scientific communication to its rightful owners, European Science Foundation Policy Briefing 21, April 2003. <http://www.arl.org/sparc/SPB21_OAI.pdf> (last accessed 14/5/03)
Crow, R, 2002, The Case for Institutional Repositories: A SPARC Position Paper <http://www.arl.org/sparc/IR/IR_Final_Release_102.pdf> (last accessed 14/5/03)
Crow, R, Goldstein, H, 2003a, Guide to Business Planning for Launching a New Open Access Journal, Open Society Institute, Edition 1.0, January 2003. <http://www.soros.org/openaccess/pdf/business_planning.pdf> (last accessed 15/5/03)
Crow, R, Goldstein, H, 2003b, Guide to Business Planning for Converting a Subscription-based Journal to Open Access, Open Society Institute, Edition 1.0, January 2003. <http://www.soros.org/openaccess/pdf/business_converting.pdf> (last accessed 15/5/03)
Ginsparg, P, 1996, Winners and Losers in the Global Research Village. Paper presented at the conference Electronic Publishing in Science, UNESCO HQ, Paris 19-23 February 1996. <http://arxiv.org/blurb/pg96unesco.html> (last accessed 14/5/03)
Krichel, T and Warner, S, 2002, Open Archives and Free Online Scholarship <http://openlib.org/home/krichel/papers/koganei.a4.pdf> (last accessed 14/5/03)
Mortensen, Torill & Walker, Jill, 2002, Blogging thoughts: personal publication as an online research tool (Chapter 11 of the proceedings of SKIKT-RESEARCHERS' CONFERENCE 2002 - Researching ICTs in Context, InterMedia, University of Oslo, 8 April 2002) <http://www.intermedia.uio.no/konferanser/skikt-02/docs/Researching_ICTs_in_context-Ch11-Mortensen-Walker.pdf> (last accessed 14/5/03)
Paquet, Sébastien, 2002, Personal knowledge publishing and its uses in research <http://radio.weblogs.com/0110772/stories/2002/10/03/personalKnowledgePublishingAndItsUsesInResearch.html> (last accessed 14/5/03)
Phelps, Charles, 1998 Achieving Maximal Value from Digital Technologies in Scholarly Communication, in the Proceedings of the 133rd Annual Meeting of the Association of Research Libraries – Confronting the Challenges of the Digital Era. Washington, D.C., October 14-16, 1998 <http://www.arl.org/arl/proceedings/133/phelps.html>. (last accessed 14/5/03)
Schneier, Ben, 1996, Applied Cryptography : protocols, algorithms, and source code in C, 2nd Ed., New York, Wiley, ISBN 0-471-12845-7
Smith, Arthur, 2000, The journal as an overlay on preprint
databases, Learned Publishing 13 (1), pp 43-48
<http://www.ingentaselect.com/vl=747003/cl=42/nw=1/fm=docpdf/rpsv/catchword/alpsp/09531513/v13n1/s6/p43>
(last accessed 15/5/03)
Smith, J W T, 1997, The Deconstructed Journal, in the Proceedings of ICCC/IFIP Conference on Electronic Publishing '97 - New Models and Opportunities, University of Kent at Canterbury, UK - 14-16 April 1997, ISBN 1-891365-00-2, pp 73-84 <http://library.kent.ac.uk/iccc/1997/papers/deconjnl.htm> (last accessed 14/5/03)
Smith, J W T, 1999a, The Deconstructed Journal, a new model for Academic Publishing, Learned Publishing 12 (2), pp 79–91 <http://www.ingentaselect.com/vl=11603875/cl=25/nw=1/fm=docpdf/rpsv/catchword/alpsp/09531513/v12n2/s3/p79> (last accessed 15/5/03)
Smith, J W T, 1999b, Prolegomena to any future e-publishing model, in the Proceedings of ICCC/IFIP Conference on Electronic Publishing '99 - Redefining the Information Chain, New Ways and Voices, Ronneby, Sweden, 10-12 May 1999, ISBN 1-891365-04-5, pp 293-298 <http://library.kent.ac.uk/library/papers/jwts/Prolegomena.htm> (last accessed 14/5/03)