08.04.2016

Reinhard F. Werner

© Private

Following our last blog post by Nicolas Gisin on the dangers arXiv publishing procedures, Reinhard Werner takes the discussion on science publishing practice several steps further.

In a witty and insightful reflection on the machinery of current science journals, the forces that power them and the instruments that serve them, he throws the dangers of relying on bibliometrics as a decision-making tool into stark relief.

Is watering down one’s work to make it more “accessible”, or truncating a complex theoretical process beyond recognition to meet PRL’s four-page limit (resulting in examples of what Reinhard calls the four-page pest) truly what makes a good scientist? Posing a number of pertinent and provocative questions, he establishes the need for pest control measures.

Reinhard Werner, perhaps best known for his “Werner states”, is highly respected in both the mathematical physics and in the quantum information community. Having delivered groundbreaking results in the theory of entanglement and in quantum nonlocality, he is very interested, as he himself puts it, in “anything in which the structure of quantum mechanics plays a non-trivial role”. He has published widely and broadly, in PRL and in more specialized journals whose role he strongly endorses.

Let’s continue the discussion.


Why we should not think of PRL and Nature as THE top journals in physics

By Reinhard F. Werner

Let me hasten to say that Physical Review Letters (PRL) and Nature are certainly fine journals, not least because many of us send our best work there. This self-amplifying process has always been characteristic of good journals. What I am talking about here, however, is a concentration process, which goes far beyond this, and is connected to the notion of a High Impact journal. This refers, of course, to the Journal Impact Factor (JIF), which is defined as the average number of citations per article in a two year window. It is also a product by Thomson Reuters, who sell what they claim is the “official” count, as an annual list with JIFs given to three digit precision. This fake accuracy claim must be ridiculous to every scientist, but practically all journals cite their JIFs (with three decimal places) on their web sites, and thus make it part of their advertising.

PRL and Nature score highly on this scale, and I believe this is partly due to their basic journal models. For Nature this is the idea of an all-sciences journal based on original articles. While the interdisciplinary character will hardly make researchers read and appreciate papers from other disciplines, it helps the standing of Nature as a physics journal: Even if no physics paper in Nature were ever read or cited, the citation-happy community of life scientists alone would raise Nature’s JIF above that of all dedicated physics journals. Hence, following JIF logic, Nature should be considered as the “best” physics journal. The defining feature of PRL is speed. It grew out of the “Letters to the Editor” section of the Physical Review in 1958, and Goudsmit’s opening editorial announces that, in the interest of speed, compromises on the quality of typesetting and the suspension of external refereeing were necessary. Since then refereeing has been introduced, but speed is still a main concern. This is seen, for example, from the adherence to the four page length limit (not counting references to help the JIF). With the demise of hardcopy printing this is no longer necessary for the distribution process, and indeed arbitrary length electronic supplements are allowed. But these are not supposed to carry any essential scientific substance and referees do not necessarily look at them: This would be in the way of speedy review. How does speed help the citation count? Even long before the JIF craze authors needed to show that they were familiar with the current literature, for which a recent PRL citation would always be handy. But also the self-amplifying process is at work: Some referees will assess the importance of a field by the coverage in High Impact journals, like the PRL referee who, rather hilariously, found the subject of one of my papers unimportant, because there were not many PRLs about it.

Of course nothing is wrong with the two journal models I described, and I repeat that these are fine journals. The problem begins when the community looks at them as THE top journals, so good papers are adapted to these journal models just to get recognition. This is bad, because the defining features of these journals are intrinsically anti-correlated with scientific quality. For Nature this is “broad interest”. This is sometimes nice to have, and indeed it is good practice to try to explain cutting edge science in media such as Scientific American (like Nature, now majority owned by the Holtzbrinck group) or the New York Times. But to an original article the demand may be damaging. The adaptation typically begins with the title, which must be free of specialized terminology, and stated in simple terms. That is, you do not try to indicate in the title what kind of experiment you did, and describe in the abstract or outlook that this has potential relevance to an aspect of quantum computation, but instead directly use a title like “Experimental Quantum Computation”. This simplified and therefore hyped up style is mandatory also for the conclusions, and you would be well advised to stick to the mode of “happy science without ifs and buts” throughout the paper. For a paper in theoretical physics it is usually impossible to eliminate “technicalities” to the required degree, so this kind of “top quality” is basically denied to the whole subject.

In PRL the damaging effect is largely due to the page limit, leading to the so-called Four Page Pest. This refers to papers whose content might have a natural length of six or ten pages, but which have been compressed to four pages making them unintelligible. This would be no problem for the rare case of a little one-thought paper that naturally fits the format or, closer to the original plan of the journal, for breaking-news announcements which would be followed up by a full-length paper. But increasingly this no longer happens. Why bother with writing a long paper when you have already harvested the high prestige with a PRL? Why make your ideas and methods understandable to the competition? Better go on to something else. Or as James Thurber said: “Don’t get it right, just get it written”. Again the constraint is especially harsh on theoretical papers. Some people in that field have thought that one could use the electronic supplements to give the supporting arguments or proofs for claims made in the body of a letter. Indeed this would be a good paper format, and could make PRL attractive as a place to publish serious theory. But editorial policy seems to be against that, because letters are supposed to stand on their own, and referees typically don’t appreciate it[1]. This means that PRL style “top quality” is largely denied to fields like argument-rich theoretical or mathematical physics, where the pudding is in the proof.

One effect of the high prestige of PRL is that they are flooded with papers. This makes it hard to find competent referees, so “broad interest” becomes an excuse for trying the paper on random referees, leading to the characterization of PRL as the Physical Review Lottery. Of course, some learn how to play this lottery, like the colleague whom I heard boasting at a conference that he could turn anything into a PRL. I checked some of his papers afterwards and found that he was right. I do understand that the editorial office is doing their best, but with thousands of submissions per month, what can they do? It certainly means that bad reviews, i.e., those which only give an opinion on importance without evidence that the paper was read, let alone understood, are not routinely discarded. This acts as a strong force of reversion to the mean. Or as Nicolas Gisin once put it: “When two random referees agree on a paper, can it be really new?” Consequently, the trash level in PRL is lower than on arXiv, but not by an order of magnitude. This may change a bit with the new acceptance criteria adopted last year, which require a paper to open a new field of physics, or present a method of pivotal future importance, or perhaps the signed commitment of three votes from a certain committee of the Swedish Academy. Large teams of top experts find it hard to predict future scientific developments. So how good can referees possibly be at the crystal ball reading of future importance that PRL asks them to perform? Their answers will mostly assert current fashion, adding to the drive towards the mean. The new acceptance criteria will certainly increase the rejection rate, but in itself this does not mean better quality: It might just mean a lottery with worse odds. The new rules will probably reduce the trash level a bit, but mostly ensure that the remaining trash, and also the otherwise good papers, become more pretentious.

So where should we send our best results? Of course, if you have something that you can comfortably adapt to the journal profile, by all means send it to PRL or Nature. I will certainly continue to do this, if only for the benefit of my young co-authors. But don’t neglect writing full length scientific papers[2], containing all the technical detail needed for other researchers to actually build on your results, and also to detect possible flaws. This is the kind of “peer review” that really matters to science, much more so than the idea that the quality control for journals should be done by unpaid volunteers. In the old days a good rule for the choice of journal used to be to select one with good circulation, in which the particular debate you are contributing to has largely been conducted. That is, you should maximize the scrutiny by an audience that knows and hopefully appreciates what you are talking about. That rule is still true, but it must now be augmented by the demand to post everything on arXiv as soon as it is ready. There is no better circulation, and if you select and cross-list the right subject classes you get a good critical audience which is much better at recognizing quality than conferred by the vague statistical promise of “impact” implied by the JIF.

Obviously, journals are no longer needed for making your results publicly available. Publishing in this literal sense is what the arXiv does: instantaneously, worldwide, for free, and with a useful search interface. So-called “publishers” actually do the opposite[3]: They grab your copyrights for the sole purpose of restricting access and erecting pay walls. The only remaining function of a journal these days is to issue seals of approval. A good journal therefore is one whose approval carries weight. This is entirely defined by its editorial board and practices, often by a long-term managing editor. When I see on a publication list a paper from, say, the Journal of Statistical Physics (JSP), which was managed and shaped for decades by Joel Lebowitz, I am confident that this is a paper of some quality. The JIF is entirely irrelevant. But the specialization of JSP helps to build a profile not just in subject matter, but also in quality. Even with the good work done at the editorial office of PRL, they have little chance to achieve this. Indeed, when I see a PRL, I only know that someone was successful at the lottery, possibly on the ground of merit, but quite possibly not. Moreover, when I see a large rate of PRLs over full papers, I know that this person may be spending more time at playing the lottery and turning out short announcements than at doing science, fulfilling the promises announced, and working out a larger picture. If we want to hire a new professor it is important that this person is able to define his or her own research agenda in the future, and speak to a variety of sub-communities and levels from general audience to specialists. So variety in topics and also in journals is important. A single-topic applicant, even if currently highly cited and with all high JIF publications, is out. I recently saw an application in which the High Impact publications were listed separately from the rest. Whether the applicant was thereby showing what was important to him, or was expressing his expectations that the committee would consist of JIF-counting robots, it did not help his case.

So who is actually promoting the silly identification of journal quality with JIF? Apart from Thomson Reuters, the large journal companies (Holtzbrinck Group, Elsevier, Wiley and a few others) are especially fond of it. They use it to justify huge differences in subscription prices, which incidentally show what their profit margins over production costs really are. High JIF flagship journals also help to sell package deals, because supposedly you couldn’t possibly cancel one of those subscriptions. It is an extremely profitable business, where those doing the essential and highly qualified work, i.e., authors, academic editors, and referees are anyhow paid by the taxpayer[4]. The taxpayer also has to foot the absurdly inflated subscription bills[5]. As if this was not bad enough, journals found a new way to milk science budgets and authors, namely to sell the open access to some individual articles in a subscription-financed journal, of course without lowering subscription prices. This is the rip-off model of open access. In any case “publishers” do very little to recommend themselves as partners in the development of the new scientific information infrastructure utilizing the possibilities of the internet, which is bound to develop eventually [6].

Strong support for JIF-based assessments also comes from administrators, who are naturally fond of a criterion for hiring and firing, which they can apply without collecting any expert opinions, and without having to know anything about contents. Especially politicians from countries like China and Poland, who wish to show that they are playing in the premier science league, apparently put all the weight on this criterion. But it is too cheap to blame administrators here – they may well be acting in good faith. All too often they can see that the scientists themselves apply these criteria. After all, enough scientists seem to fancy the JIF as an “objective” criterion, as somehow more “scientific” than judgement by humans who understand content. I have mostly heard this from young and productive people working at a place where the big chiefs are some older guys who hardly publish internationally. But could one not make this valid complaint without resorting to dubious criteria?

The JIF is just the most idiotic tip of the iceberg called Bibliometry, the counting of citations for assessment purposes. Bibliometry is generally a bad idea, and fails miserably at identifying the best papers. It is a pseudoscience, even if Elsevier and Springer each devoted a journal to it. The latest “improvement” from this quarter is altmetrics, which seeks to collect activity from social networks and news coverage. Of course, it may be nice to know how many people are tweeting about you, but the aim is, once again, assessment. The new judges of scientific quality are those who spend too much time at clicking and tweeting, and the obvious way to come out top is to make a silly claim that enrages as many people as possible.

If we do not put up some resistance, we will find ourselves in a kind of science in which the quest for knowledge is replaced by the quest for PRLs and Nature papers, for citations, and for facebook likes. I think we should make a start by resisting JIF based decisions wherever we can, and looking beyond the high impact section in every publication list.


[1] I have had different referee reactions to proofs in the supplement. Most will ignore them (as they would anyhow mostly ignore anything labelled “proof”). Some demand making a full length paper by integrating letter and supplement (thus proposing a “downgrade” to PRA), and some demand that full proofs be given in the supplement (“since they apparently already exist”). I did not manage to get a clarification from the editors. So it seems editorial policy is also made by the referees at random, adding to the volatility of the process.
[2] To be fair, the PRL guidelines say under “Presentation” that “When appropriate, a Letter should be followed by a more extensive report in the Physical Review or elsewhere”. But the relative value of, say PRL over PRA, is made very clear to everyone by the downgrading process of referring good papers (which are maybe not of sufficiently “broad interest”) to PRA. I am waiting to see a standard button on the PRL website for each paper, pointing to the full version or else confirming that the authors had nothing more to say about the “new method of pivotal future importance” or the “new area of research”.
[3] In German, don’t call them “Herausgeber”, call them “Zurückhalter”.
[4] The two journals I have been talking about are actually unusual in that they employ professional in-house editors, so only 95% of the work is at the taxpayers expense.
[5] Many colleagues have made the point that this criticism of commercial publishers does not apply to The Physical review, which is run by the physics community, more precisely by the American Physical Society. This is partly true. Their pricing is a bit more moderate. But taking copyright and erecting paywalls is part of their business model as well, and physicists from the rest of the world may not be entirely happy with supporting the APS through the profits from the journal operation.
[6] See, for example, this recently launched journal, which does not ship manuscripts around, but is an overlay to arXiv. Their selection process is otherwise pretty standard. If you want to think about new ways, a good starting point is Michel Nielsen’s Blog or his book “Reinventing Discovery”.

Comments (7)

  1. Lídia del Rio, Christian Gogolin, Marcus Huber
    Lídia del Rio, Christian Gogolin, Marcus Huber at 13.12.2016
    Dear Reinhard,

    Thank you for the spot-on article.

    As part of the effort to change the current publication model, we are preparing the launch of a free-open access arXiv overlay journal for quant-ph.

    In particular, there will be no format restrictions, except that long papers should include a summary of results early on, for the benefit of the reader. There will be no need for cover letters as the paper should speak for itself.

    The journal will be free for authors and readers, and the small operating costs will be covered by external grants and donations. It will be legally set up as a non profit.

    We are currently inviting researchers to join the editorial board, and we expect to launch towards the end of the year. We will publish a full version of the proposal soon, and we encourage feedback.
  2. Valerio Scarani
    Valerio Scarani at 13.12.2016
    This gives a great perspective! One small comment: I agree that some editors are overwhelmed and try to do their best — but I witnessed a few cases in which they did “their best to secure a nice press release”, rather than “their best to secure serious science”. Offline I can be more specific :)
  3. Anon
    Anon at 13.12.2016
    As you highlight in your “three decimal places” line, ultimately the problem with metrics, or any decision process in general, is that the people behind them don’t accept that there are huge error bars. How can we be expected to judge work in even a slightly different area accurately? Even when reading work in my own research area I frequently fail to understand its significance or otherwise. I am sure that I am not alone. Yet referees and panel members rarely express doubt in their judgments. This culture really has to change.

    Ultimately the source of many of the problems of science is the ludicrous pursuit of “validation”: all scientists should be Einsteins, Einsteins are the only people that governments should fund or universities hire, every paper should be the next relativity, and all referees should be superhuman scientists who can offer well thought out opinions on everything that is sent their way. When those unrealistic expectations are not met, out comes the marketing phrase-book or the strongly worded opinion without any hint of hesitation. Reviewers and panel members get validation from their position, and seem to think that expressing doubt about their own ability to judge is a sign of weakness.

    When I started my university studies around 20 years ago I was full of curiosity but now I increasingly think about ways to leave academia. I’m not a “world leading” scientist (a phrase that is over used anyway) but I think that I have made some positive contributions and could continue to do so. The way that the journals operate, their influence on university administration, and similar problems in the way that we are expected to teach, have almost completely removed my passion for science. If I could go back in time I would probably have not chosen this career path.

    Thanks for writing publically about these problems. I wish a few more highly respected figures would do the same.
  4. Marcelo França Santos
    Marcelo França Santos at 13.12.2016
    Very good read. Enjoyed every bit of it. I’d only reinforce that one of the most important ways in which the community can help is to avoid the above mentioned traps when refereeing a paper. In the end, we are the referees so we are at least partially responsible for the editor’s opinions. Acceptance is almost always positive, so let’s focus on rejection: there’s nothing more frustrating in the subject than having your paper rejected based on stupid and lazy arguments.

    On the other hand, careful, detailed and technical reports are most of the time pleasurable to read even when they lead to rejection or “downgrading” (terrible choice of words by the journals). You have the feeling that you are actually doing science properly, i.e. discussing your results with your peers. You also feel that you have already managed to reach at least another scientist with your effort.

    We should also, by all means, avoid the “small club arrangement”. Evaluate the paper for what it is and not for the authors, address, citations or eventual self benefits or prejudices.
  5. Anon2
    Anon2 at 13.12.2016
    I strongly agree with the article, and would hope that the entire community would also see this. The problem, specially for young researchers, is the system that is built around the JIF. JIFs are needed for positions and for grants. Having a good paper on the arXiv will generally not give you citations if you are not already well established. A PRL will, at least, increase your chances of being cited.

    The quantum-journal.org mentioned in the first comment could be a way out of it, but the entire system (authors, universities, institutes, governments agencies, etc…) needs to embrace it, otherwise nothing will change. Scientists will still go for the high-impact factor journals because they need it.
  6. Sabine
    Sabine at 13.12.2016
    Hi Werner,

    I basically agree with you. As I said on another occasion though, complaining that there are bad measures doesn’t help. What we need are better measures. In particular we need measures that scientists (not administrators or publishers!) can individually customize from available data.

    Concretely what I am thinking is that, rather than providing some already-done universal index, we have a service that simply collects available data for researchers. Their bibliometric indices and co-author networks and the number of co-authors per paper (pet peeve: you’d think that the more co-authors a paper has the less is counts to your publication score, but no such weighting exists), the number of papers in certain journals. The length of the paper. Public outreach efforts. Talks given at prominent conferences in the field. Lectures held at schools. And so on. Whatever you can get.

    Then everyone can decide for themselves how much weight they want to give to what. So if someone thinks long papers in CQG score highly, they can do that. It would be highly individual and field-dependent. Open-source and constantly updated.

    A completely obvious type of information that is often relevant for hiring decisions is a keyword match. You want to know, eg, how much does the candidate’s research interest overlap with that of people already present at an institution. Maybe that’s something which shouldn’t matter, but it’s something that people do draw upon in practice.

    Now look. I actually tend to think that we would benefit from such a measure because it prevents some types of biases. It would probably bring up candidates that otherwise would not be considered because they’re not personally known to those in the committee – a big problem in academia.

    Measures aren’t going to go away. People have complained about measure as long as they have existed, but that hasn’t changed anything. The only thing that will help is to make better measures. It would take but a few million Euro to fund a small group to develop this, and I really think it could solve a big problem. Best,

    Sabine
    1. Reinhard F. Werner
      Reinhard F. Werner at 13.12.2016
      Hi Sabine,

      I didn’t really go into much detail on bibliometry here, just one little corner, the JIF fallacy. My complaint regarding bibliometry would be not so much about bad measures, but that measures are bad, and using them uncritically is very bad. The most relevant situation where this matters is when a CV is assessed by a selection committee. I think we can quickly agree that a committee which automatically discards applications on the basis of length of publication list, total number of citations, h-index, or number of PRLs is just not doing a good job. Of course, these numbers are indicators for some very relevant questions, like a candidate’s general level of scientific activity. But these can be seen at a glance anyway. The same goes for subject area, which would be numerically covered by your proposed keyword statistics, although maybe not well, because there are so many keywords and they change so fast. But look at another question: How is the candidate organizing her research? Are there some lines followed persistently over some period? What triggered the opening of new lines (correlate this with CV)? Is there a balance between letter-format announcements, thorough papers, conference contributions and reviews? Outreach and web activities are also relevant, of course. Here some stats would help to tell the difference between something read only be close friends and a blog with an actual audience. But do you really base your personal distinction of what you find worth reading on eyeball statistics? To put it differently, I like your blog and often find it worth reading, but that is quite unrelated to looking at your visitor counter (which I only did just now. It looks a bit like a fake.). In other words, metrics have little to do with it.

      Customizable criteria for looking at a publication list already exist: We each have our own. One could now spend time to set up a weighted numerical indicator that reflects these preferences, or rather several indicators for initial screening, for last round candidates, and for any stages in between. Initially you would have to play a lot with the weights, feed in new keywords and the like, until it begins to match your preferences. But you would never be sure. Your algorithm is bound to make pretty bad mistakes in some cases. You probably forgot to tell your weights that, for one part of the project you are hiring for, a brilliant guy from quantum combinatorial topology could make a difference. When you personally look at a publication list you can find surprising connections. Your carefully customized weights won’t. So why waste time with this process?

      Best regards, Reinhard
  7. Adrien Feix
    Adrien Feix at 13.12.2016
    I completely agree with the post. However, I believe that the main issue is a consequence of a problem running somewhat deeper than usually recognised.

    Funding agencies need to make sure that they only fund science that is worth funding. While they might be able to judge what is good science *within a specific field* (by using impartial scientists from the field as referees) they cannot do a good job at deciding *which fields* to fund, because nobody is impartial when it comes to splitting the cake they will eat from.

    One (highly imperfect) method to compare contributions from different fields is to delegate this task to prestigious, international, multidisciplinary journals, such as Nature and Science (PRL and Nature Physics are also multidisciplinary, if only within physics). It is for this reason that journal impact factors have become crucial?–?with all the deleterious consequences we know of.

    But what to replace this method with? I don’t think there can be a really good way. The fundamental problem is that the criteria to allocate funds are necessarily *exogenous* to the individual fields. They follow a logic which is foreign and therefore more or less damaging to them.

    So I would contend that relying on journal impact factor and bibliometry in general is just one form of the inevitable waste of resources introduced by externally assessing scientific fields. As long as funds are limited, this issue will remain?–?in some form or other.