The Auditing Problem

Posted on September 18, 2020 by Willard

Auditing leads to an open problem. Let’s try to specify it as lightly as possible. Technical notes follow the main text, they’re tagged using curly brackets, like {this note}.

§1. Alvaro’s Story

Alvaro wrote a piece called What’s Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers. Here’s his announcement:

Over the past year I have skimmed through more than 2500 social science papers. I wrote a giant post about everything that's wrong with them and how to fix the process that generates them: https://t.co/sWASWYlmls

Some highlights below.
— Alvaro de Menard (@AlvaroDeMenard) September 11, 2020

Since many of us are up in arms about how academia and science actually work, Alvaro’s piece was cheered. Last time I checked his tweet gathered nearly a thousand retweets, more than two hundred quote tweets, more than two thousand likes, probably thousands of comments. The TL;DR for individual researchers:

Just stop citing bad research […] Read the papers you cite […] When doing peer review, reject claims that are likely to be false […] Stop assuming good faith.

For my purpose, the episode boils down to this. An auditor concludes that scientists should beware their citations. His piece goes viral. Speed and enthusiasm of the shares and likes indicate that very few paid any due diligence before citing it.

When life gives you irony, make an iron-clad tirade.

So here goes. The story rests on a {DARPA} auction mechanism that I have not seen investigated. Alvaro claims having spent about 2.5 minutes on each paper. He offers no data. No real number crunching can be found. The conclusions follow from prejudices more than analysis. There’s no way to replicate his piece.

There are two other tells. First, the triviality of the moralism. “Stop citing bad research” is as useful as “stop making bad bets” as Poker advice. Second, check the social network. Lip service to Bryan Caplan in the About page. Citations of Scott Alexander and Gwern. Thanks to Alexey Gusey. Interactions with Eliezer Yudkowsky on Twitter. High five to Gareth Evans’ {10% Less Democracy}. Lesswrongian would be my safe bet, which usually indicates to me Panglossian unreasonableness.

There’s nothing wrong with trying to make things better. Unless, of course, it creates unrealistic expectations. Here is one crucial problem that auditing can’t solve.

§2. Audit World

Suppose two worlds. In the first, Audit World, one would need to {double check} every single claim in Alvaro’s piece before approving or opining on it. The second is ours, in which we do more or less as we please. Which one do you believe is more efficient?

Alvaro obviously does not live in Audit World—witness how his piece was received. To put money where the mouth is, praising Audit World would commit us to live there. I would not bet on Audit World, as it implies what I would call omnipotent auditing.

To see why, let’s review the implications. Should you choose to live in Audit World, everything Alvaro and I say would need to be double checked before commenting below. To reply to you, I would need to double check your claims, but I would also need to double check if you double checked your claims about mine. And this would need to occur at each and every point of the exchange.

In Audit World, auditing would not end there; in fact it never does. The claims made by me or Alvaro do not stand alone—they are supported by others. Due diligence needs to be paid to their works, also to their sources, and so on and so forth. Notwithstanding all reciprocal checks and balances to ensure that communication flows smoothly!

Thus the Auditing Problem obtains: we can’t go on forever with suspicious minds. To resolve that predicament, {trust is key}.

§3. Toward Auditing Sciences

One way to build trust is to communicate. So I asked Alvaro about what he calls his bottom line, i.e. what is acting like one knows what will replicate. His response: not citing weak papers. I find this response weak. Does it mean that my post perpetuates what makes social science bad? I don’t think so.

There are many reasons to cite a paper: to criticize, by deference, for historical background, as literature survey, etc. Among questions I ask myself before citing a paper, “would I bet if the research replicates” seldom appears. Only if I’d mention its robustness would I double check. Citing should save time, not waste mine.

Alvaro’s non-response shows other weaknesses. First, betting and double checking are opposite activities. Second, on what to bet isn’t quite clear: papers often contain many results. Third, most published papers are {barely cited}. Fourth, reinforcing reactance with incredulity goes against openness.

For me, the last one breaks the deal. In general, main claims (i.e. theses) are meant to be provocative; empty claims can be reproduced trivially. A knowledge system that prioritizes full replication over exploratory research ought to converge toward eternal boredom. I doubt that discovery success rates would improve with a Panopticon in which every scientist plays cop.

Look. I’m all for revising citation practices. My own policies are radical, e.g.:

https://twitter.com/nevaudit/status/1271213592017285121

As you may know, I try to mention people by their first names. From now on, I will also omit journal names and paper authors: year-month-title-url suffices. (See §5 for how it looks.) That said, I don’t impose my personal policies on others, and more importantly I don’t sell them as a recipe to improve scientific results. That’s just magical thinking.

ClimateBall connoisseurs already experience how easily contrarians weaponize the auditing biz. Research always comes with trade-offs. Writing is hard because it forces us to juggle between multiple contradictory constraints. We can’t say everything. We can’t be fully transparent. Editing, like auditing, never ends. At some point we need to hit “send” and accept that we may fail again, in the hope that this time we fail better.

All in all, here would be my bets as to what to expect next. Institutions should soon create {distrust networks} to establish auditing sciences as a scientific field. Scientific communities will continue to be powered by trust. Humans will find ways to exploit new metrics, starting with a {Replication Cop Game}. Distrust will continue to be gamified by reactionary forces.

Maybe one day we’ll learn to embrace crappiness, but I’m not holding my breath.

§4. Notes

{This note} – Known academic note conventions suck. Time to try something else. Renumbering pains me; numbers don’t replace titles; notes usually come first; etc.

{DARPA} ~ “DARPA” stands for Defense Advanced Research Projects Agency. The page Alvaro cites mentions:

DARPA’s Systematizing Confidence in Open Research and Evidence (SCORE) program aims to develop and deploy automated tools to assign “confidence scores” to different SBS research results and claims. Confidence scores are quantitative measures that should enable a DoD consumer of SBS research to understand the degree to which a particular claim or result is likely to be reproducible or replicable.
https://www.darpa.mil/program/systematizing-confidence-in-open-research-and-evidence

My educated guess is that Alvaro participated in a tournament with cash prizes to train a machine-learning textual analyzer. That’s just a guess. Military folks are only into Open Science when it suits them.

{10% Less Democracy} ~ From Gareth’s blurb:

Discerning repeated patterns, Jones draws out practical suggestions for fine-tuning, focusing on the length of political terms, the independence of government agencies, the weight that voting systems give to the more-educated, and the value of listening more closely to a group of farsighted stakeholders with real skin in the game―a nation’s sovereign bondholders.

{Double Check} ~ By this term I’m referring to anything one would need to do to verify or validate what is being said or done in some work at some required level. The audit metaphor bypasses the need to create an explicit list for all tasks. Hopefully you get the point. But no, you can’t really double check this note, at least not without my help.

{Trust is Key} ~ This may sound farfetched, but technical issues resurface as soon as we seriously think about interaction between artificial agents, for which we can’t take nothing for granted. The problem is worse when considering the relationship between humans and AI, like autonomous vehicles or medical devices.

{Barely Cited} ~ Citation numbers sound very noisy to me. They would deserve due diligence. I assume that the following gives us a fair ballpark:

Academics publishing in particular fields of chemistry or neuroscience are virtually guaranteed to be cited after five years, but more than three-quarters of papers in literary theory or the performing arts will still be waiting for a single citation.
2018-04. Uncited Research. https://www.insidehighered.com/news/2018/04/19/study-examines-research-never-receives-citation

It’s hard to tell how revising citation practices will prevent bad and barely cited papers.

{Distrust Networks} ~ I already mentioned that idea, which I still need to flesh out. Another time. For now, think of all the tools and facilities that verify textual products. Control version systems. Online stores that certify applications. Bitcoin machines. Banking transactions. Etc. In principle, replication could be as simple as a checksum.

{Replication Cop Game} ~ Take a Science Cop. Let’s name him S*, in honor of Stuart. Let a Game G be dominated by a strategy by which S* gets rewarded every time he finds faults. Exercise for readers: after how many iterations does S* turn into Anton Ego? The point of Ratatouille is that everyone can cook, not that everyone’s a critic.

§5. Further Readings

2014-03. Implementations are not specifications: Specification, replication and experimentation in computational cognitive modeling. https://doi.org/10.1016/j.cogsys.2013.05.001

2018-11. A Causal Replication Framework for Designing and Assessing Replication Efforts. https://doi.org/10.1027/2151-2604/a000385

2018-11. Openness and Reproducibility: Insights from a Model-Centric Approach.
https://arxiv.org/abs/1811.04525

2019-10. The Value of Failure in Science: The Story of Grandmother Cells in Neuroscience. https://doi.org/10.3389/fnins.2019.01121

2020-04. The case for formal methodology in scientific reform. https://doi.org/10.1101/2020.04.26.048306

2020-07. What is Replication? https://doi.org/10.1371/journal.pbio.3000691

About Willard

neverendingaudit.tumblr.com

View all posts by Willard →

This entry was posted in Philosophy for Bloggers, Research, Science, Scientisits, Sound Science (tm), The philosophy of science, The scientific method and tagged alvaro de menard, auditing science, darpa, distrust networks, less wrong, Replication, science cops, stuart richie. Bookmark the permalink.

28 Responses to The Auditing Problem

Dave_Geologist says:

September 19, 2020 at 10:52 am

At risk of attracting incoming, some (me) would say that neither social science not the dismal science are real sciences, so replication failure is expected and should not be carelessly extrapolated to other fields. A common bait-and-switch is that psychology or sociology experiments can’t be replicated so climate science, even something as fundamental as the greenhouse effect, can’t be trusted.

The difference in my mind is not down to some sort of academic hierarchy of worth. It’s because people are different from one another, and different from themselves week or months or years apart, and can lie to interviewers or to themselves, and can fundamentally and unpredictably change their behaviour as a result of interaction with other people who are also like that. CO2 molecules are not like that (except for the last one, and even then it’s predictable and replicable).

Economics and social science don’t (just) have a lot of unreplicable crap because of poor or biased research. It’s because it has really, really hard subject matter, much harder than physics or chemistry. Of course you can still get crap research there, but at least one leg of the stool is secure, rather than both being insecure. For the same reason consilience is a powerful tool in the physical sciences, but much weaker in the social sciences. Why would devoutly religious people vote for a serial sinner? Good luck finding consilience between Friedmann and Keynes, even if presented with the same data. In contrast, no-one ever has to ask why a C=O bond didn’t rotate like it was supposed to, because it just doesn’t happen (setting aside for the moment quantum mechanics, where the maths still works but for distributions rather than fixed values, and things like string theory which arguably belong more in the realm of philosophy). And there’s even consilience between the quantum mechanical and classical description of that molecule’s behaviour.
Dave_Geologist says:

September 19, 2020 at 11:06 am

To be slightly pedantic, the consilience is between the ensemble behaviour of a gaggle of molecules, and the upscaled behaviour of the quantum-mechanical description. That’s why it’s consilience, not replication.
Willard says:

September 19, 2020 at 3:45 pm

I think replication becomes troublesome as soon as one deals with datasets. The Auditor did not start a quest that turned into a vendetta out of thin air. Old records are far from telling us a straightforward story. (For instance, I recently heard that we only have the vaguest idea of how we could domesticate dogs.) Clinical trials are still muddy. Heck, even chemistry has to clean up its act:

Danheiser is the editor-in-chief of the unconventional journal Organic Syntheses that has verified the experiments of all the papers it has published since it launched in 1921. The journal does this by having the research replicated by independent chemists before publishing them – a practice that is almost unheard of in chemistry or any other research field (the exception being a few brief instances in history). All experiments are checked for reproducibility in the lab of one of the journal’s board of editors, often by graduate students and postdoctoral researchers working under the supervision of the Organic Syntheses editor. Danheiser, who has written about reproducibility, tells Chemistry World that the journal usually aims to check the work in submitted papers within six months.

https://www.chemistryworld.com/news/taking-on-chemistrys-reproducibility-problem/3006991.article

To replicate everything costs money. Our knowledge production systems have limited resources. At some point choices will have to be made, if only to find the right balance between incentives to replicate and incentives to dunk on competing research.

There are three questions I did not address in the post but are crucial for what I called the auditing sciences. The first two go hand in hand: what is it to replicate, and what do we replicate? The third problem is one akin bootstrapping: can we replicate our replication tools, i.e. what quality output should we expect from them? The mere task of building compilers is tough, and we’re only talking about what can be reduced to logic gates.

If a set of criterias leads us to reject everything, then we should find another one. Appealing to INTEGRITY(tm) never suffices. Sooner or later this may turn into legal contracts following industry standards. One day we might even see a market for replicability insurances.

The long and the short of it is that in contractual matters, trust is key.
Victor Venema says:

September 19, 2020 at 5:24 pm

“There are many reasons to cite a paper: to criticize, by deference, for historical background, as literature survey, etc.”

The real science nerd may be interested in an ontology of reasons to cite a paper trying to cover every case. https://sparontologies.github.io/cito/current/cito.html#objectproperties

It would also be great to have authors or (post-publication) peer reviewers indicate which of the cited papers are foundational, it they were to have problems that would require revisiting the current study. We could combine that with retraction databases or post-publication assessments of the cited studies to indicate when a paper may need an additional (post-publication) review.

Retraction Watch Database User Guide

https://grassroots.is

Such a system where the information we gain as we work with the paper and build on it is much more realistic that some imaginary perfect peer review (auditing) system.
Willard says:

September 19, 2020 at 6:11 pm

Great resources, Victor!

The ontology you showed still has a problem of scope. We cite papers the way we do because that’s how reference system works, but we seldom cite them for all the things said in them. In other words, we need to say to what we are referring to when we cite, which shows that the distinction between indirect quotation and citation can become very blurry.

(There are also technical problems, e.g. it’s just an OWL classification scheme, and it tends to age very poorly. Witness our actual bibliographies, which have yet to be properly normalized.)

My own preferred solution would be to drop review sections altogether and to incentivize lichurchur reviews for their own sake. (I.e. “see review R for a review.”) Metastudies can help in that regard. Just like there are research journals, there could be review journals with both synoptic and critical papers. We tend to separate data collection and analysis, we could separate citations collections and their analysis. That way there could be specialization in one or the other.

Ideally, the mapping out of a research field would be so convivial as to become a website that naturally connects with something that looks like en encyclopedia:

https://plato.stanford.edu/

So along the bibliography problem we could solve the textbook problem.
Victor Venema says:

September 20, 2020 at 12:43 am

Thanks. I do not fully get your argument about scope being a problem of ontologies. Isn’t this the other way around, this is the problem ontologies try to solve? Normally it is not communicated why something is cited, the ontology would give the reason. (Some fields are least distinguish between, “this citation provides evidence for my claim” and “this citation is also jamming about this topic”. That is as start.)

Do you know of examples of older citation ontologies? Can you map them onto each other? Do the newer ones provide more detail or is it a mess? Even if the system changes, as long as the system is documented that is still better than doing nothing.

(I am interested as I am thinking of introducing such a system in the grassroots peer review system, it will be hard to make the publishers change their system, but we can do it as science in the post-publication peer review.)

There are systems being developed where one would not write an entire article, but parts of it. The designers seem to have life sciences in mind, I guess it could also work for the natural and Earth sciences, would be interested in a view on such systems from the humanities side. Some of such segments would have less citations, I do not expect them to fully go without.
https://science-octopus.org
https://www.libscie.org

Somewhat comparable to Plato would be the review paper on Sea Surface Temperature John Kennedy put on GitHub:
https://github.com/ET-NCMP/et-ncmp.github.io
If you would like to make a change, just file an issue (start a discussion) or make a pull request (propose a new text). I like the idea of making up to date review papers this way.

Even with such systems, it would be valuable to have citations. If used well, they really add value.

John Cook disagrees with you about the value of citations when it comes to science communication. 🙂
Willard says:

September 20, 2020 at 2:01 am

Victor,

Suppose you cite W19 because it mentions an expression you like, like my “crappiness.” You don’t endorse anything from the paper, not even the interpretation I give for that word. You only acknowledge that this is where you got the inspiration to develop an idea that will be yours. I think in CiTO that could be “credits.” So you write “cite:credits :W19” which means you credit W19 for some undefined reason. That’s a problem of scope.

It’s no big deal if all you need is to (say) exclude W13 from your argumentation base. (Credits have no real argumentative value.) It’s a problem if people extrapolate from your credits responsibilities for which you don’t commit, like a specific interpretation of the wordology you borrow. You would have every right to develop a concept of crappiness that is yours. You are not obliged to defend my own interpretation. You only refer to W19 as a matter of respect and honesty.

***

At a more formal level, OWL belongs to Description Logics. To say something like “citation(W13)” is a concept. If all you got in your concepts are papers, you can’t refer to parts of papers like an argument or a table or a mere buzzword you like. For that you’d need to specify a role that ranges over a more fine-grained ontological domain than papers.

It’s been a long while I checked all this, but I’m quite confident there are paper ontologies around. Scratching my own itch, look what I found:

Several ontologies have been created in the last years for the semantic annotation of scholarly publications and scientific documents. This rich variety of ontologies makes it difficult for those willing to annotate their documents to know which ones they should select for such activity. This paper presents a classification and description of these state-of-the-art ontologies, together with the rationale behind the different approaches. Finally, we provide an example of how some of these ontologies can be used for the annotation of a scientific document

Source: http://ceur-ws.org/Vol-1155/paper-07.pdf

My own stance bypasses these considerations at the personal level. This work belongs to the citation owners themselves, or the institutions that manage bibliographies. And then it becomes their own house rules.

As I see it, that’s the only reasonable way to go. We have decades of bad experience with online citations repositories. There are trillions of hours of utterly useless BibTeX work. Until and unless institutions step in, it’s fine for your own garden, but it may not survive you.

Hope this helps,

PS: I don’t agree with John on many things. That’s fine. We’re not working with the same time frame in mind.
Steven Mosher says:

September 20, 2020 at 3:04 am

W.

Interesting.

next.

“– a practice that is almost unheard of in chemistry or any other research field (the exception being a few brief instances in history). All experiments are checked for reproducibility”

It helps to get a few things clear.

1. Reproducibility is not replication. Reproducibility is easy to define (replication is not)
and “easy” to implement.
2. Reproduceability is an extension of QC.
3. Reproduceability applies to computational science, or sciences that rely heavily on
computation

Schematically.
Scientist X, starts with data Y, he applies method Z. This produces
A) graphical products such as charts
B) numerical products such as tables and p values etc.
Scientist X then write a paper. The paper consists of
A) writing about what he did ( I took Y and applied Z)
B) copies of charts and graphs
C) copies of tables and number
D) Claims about “what is shown”

This product is reproducible if R can take data Y and apply method Z and
reproduce the charts and numbers as published.

Reproducibility is not possible in all cases, especially cases where data is huge , or where computation resources are scarce.

For many disciplines, this QC process can be automated. It is normal business practice.
The point with reproducibility ( markdown documents for example) is that it can also be formalized as a separate activity within an independent department.
results go to formal QC before they leave the building. Journals can also formalize Incoming
QC as a process.

All of this is doable the main point is that it is a simple process for fraud detection and a check
for human error

Example A: researcher claims to have collated medical data from various sources. he publishes
the results from this data collection. No one in his organization doubles checks. the journal does no incoming QC. results are published… cited, become news, a retraction soon follows
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31324-6/fulltext
Result? less trust ( for some) in all science.

Example B; Researcher thinks he used data A, when in fact he mistakenly used data B

Surface Stations

A correction running data A is promised ( and easy). 8 years pass while people wait for
this 5 minute job. Had all the data and code been published this simple fix would have
been done in 5 minutes. Result? People still cite the bad results.

The amazing thing is that people still fight this simple idea of basic QC. They fight it with
exceptions, they fight it by arguing it’s not important. they fight it by playing poor. They fight it
by arguing that it is not replication. well duh. It’s not. It is a check against fraud and human error.
Steven Mosher says:

September 20, 2020 at 3:42 am

“The ontology you showed still has a problem of scope. We cite papers the way we do because that’s how reference system works, but we seldom cite them for all the things said in them. In other words, we need to say to what we are referring to when we cite, which shows that the distinction between indirect quotation and citation can become very blurry.”

I once had this dream that a science paper would have a CLAIMS section
much like a patent does. So you can cite to rely on claims or cite to refute claims.

you could theoretically build an economy of claims and transactions of claims. a blockchain
of truth. Or a merklized truth tree.

it was a dream of course ( foundationalist at its core) and beset with all sorts of problems
first and foremost is ontologies.

Anyway, after reading a shit ton of papers on UHI I note 1 thing. It is very hard to
find one that doesnt start with a throat clearing citation to Luke howard.
I bet 99% of people who cite Luke howard never read the thing they are citing. They are just
clearing their throat, making a hat tip to the origin and then carrying on.
a different oddity. Once I had reviewers ask for more background citations. Like they were
encouraging me to be Pielke. Ok, here is a list of related stuff. This “proves” I have read the
classical literature.. has anyone ( besides me I guess) actually ever looked at all the
citations Pielke Sr. typically throws out at the start of his papers?

as for citing shitty papers. I recently was wading through a UHI metananalysis and found a claim I needed in my truth chain.
metaanalysis A claims that Y showed X.
I need X
So I go read Y? or is citing A enough?
I go read Y. Damn, Y didn’t prove X, Y cited X from W. so i have
to go to W. W got a bunch of things wrong, but I think he got X right.

Do i cite W for X? what if people check W and see all the shit he got wrong, despite
getting X right ( in my opinion). will anyone check W?
Do I cite Y and show that I have “done more” due diligence than simply cull claims from a metaanalysis?
Do I cite A and just forget that I drilled down all the way to W?

quite literally this process of drilling down through citations keeps me from writing.
audits never end. of course there is a halting problem here. where to stop, can you
stop. well things get written so there was some depth level you dont go beyond.
arbitrary I guess. Sometimes practical. A cites B, B cites C, and C cites D as the origin.
D is out of print. Ouch. what do I do now? cite the uncheckable D? cite C to indicate
that I did more reading than normal? cite B? cite A?
David B Benson says:

September 20, 2020 at 3:50 am

Stephen Mosher — A big library has the out-of-print papers.
Willard says:

September 20, 2020 at 3:54 am

> The amazing thing is that people still fight this simple idea of basic QC.

Basic QC should not feel like targeted harassment and bullying. If data thugs behave like asshats, no amount of righteousness will erase it. It’s a suboptimal way to mainstream auditing if you ask me.

You might reply: oh, “I don’t care what you think.” I don’t care what you think either. I care about sound science, and getting personal for basic QC is self-defeating. And in fact that’s the conclusion reached by one of the best known data thug himself:

[I]t is neither wise nor fair to expect self-motivated data vigilantes to police scientific flaws, at least not without clearer reward mechanisms and rules of engagement. Instead, scientific funders should take on a checking role — it is in their own interests.

https://www.nature.com/articles/d41586-018-06903-2

Nobody likes cops. Nobody will like science cops.

***

The government has not the resources to audit everyone, or to audit everyone at the same level of rigor. They randomize, and ponderate according to past behavior. So here would be an argument I would buy:

The CRA chooses a file for an audit based on a risk assessment. The assessment looks at a number of factors, such as the likelihood or frequency of errors in tax returns or whether there are indications of non-compliance with tax obligations. The CRA also looks at the information it has on file for the taxpayer and may compare that information to similar files or consider information from other audits or investigations.

https://www.canada.ca/en/revenue-agency/services/forms-publications/publications/rc4188/what-you-should-know-about-audits.html

Those who get caught ought to pay.

Tax audits have clearer protocols than scientific ones. Auditing sciences should ponder on that and propose ideas to increase QC. They should also find ways to make sure that auditing remains independent from corporate interests.

Once we get clear rules and an independent referee, sure, suit yourself. QC all the way. Auditing poxes in all the (random) houses.
Willard says:

September 20, 2020 at 4:23 am

> It is very hard to find one that doesnt start with a throat clearing citation to Luke howard.

You mean “cite:throat-clearing :Lnn.”

Check out Victor’s findings. This one blows my mind:

Why publish in Octopus?

Establish priority on your ideas and your work instantly. You can publish a paper later.

Publish work that you cannot publish elsewhere: hypotheses, small data sets, methods, peer reviews. Get credit for it, and let the scientific community benefit.

No need to write a whole ‘paper’. You only need to write up what is new: Octopus is fast and efficient.

Everything you do within Octopus – and how it is recevied by your peers – will appear on your public profile page, for funders, institutions and other researchers to see.

https://science-octopus.org/

Something tells me we’re past PDF-based science.

***

> there is a halting problem here.

My first version of the post was wayyy heavier. My editors convinced me to storify all the things. Now I got more notes than when I started.

Yet I try to write in the hope to reduce notes. Next time I’ll fail better.
jacksmith4tx says:

September 20, 2020 at 6:02 am

This might be interesting to watch and could be a useful tool to eliminate disinformation on issues obscured by ideology and bias. I hope the topics include timely subjects like the pandemic or climate change.
https://apnews.com/PR%20Newswire/b93aa4691f8c4a1c5b93ead02e0cef87
“Filmed in front of a live virtual audience, “That’s Debatable” will be conducted in the traditional Oxford-style format with two teams of two subject matter experts debating over four rounds. The live audience will select a winner via mobile, to be announced at the conclusion of the program.

The show will also demonstrate how AI can be used to bring a larger, more diverse range of voices and opinions to the public square and can help uncover new perspectives to enhance the debaters’ arguments. The general public is invited to submit a short argument for or against each episode’s position statement. During the debate, IBM Watson plans to use Key Point Analysis, a new capability in Natural Language Processing (NLP) developed by the same IBM Research team that created Project Debater, which is designed to analyze viewer submitted arguments and provide insight into the global public opinion on each episode’s debate topic.”
Dave_Geologist says:

September 20, 2020 at 11:19 am

On “what does a citation mean”.

There are some which are obviously not citing approvingly. Most Comments and Replies being like that. I don’t know if citation indexes filter that out.

An early publication of mine was a two-page Letter to the Editors in the Correspondence section, showing that a particular interpretation had a simpler (one-stage vs. two-stage) explanation when you looked at it in three dimensions on a broader scale. That journal has each paper in the contents list headed as Research Article, Review Article, or Correspondence so in principle they can be automatically filtered. I don’t think I ever got a Reply. I expanded the concept to full-blown modelling paper, and referenced the Letter as the original source of the idea. As an audit trail rather than self-citation, but I would say that, wouldn’t I? Also to credit my supervisor and a co-author who’d helped me develop the initial idea, and who in the more developed versions of the model were acknowledged for contributing discussion rather than as co-authors. By that stage we’d already gone our separate ways and on to other things.

That (the full-blown modelling paper) and another modelling paper in a different area still get the occasional cite, mostly in textbooks and review papers. As part of a historical literature review, although the basic principles are still correct but with more bells and whistles and lots of computation not available in the early 80s outside of NASA. Partly I suppose as a courtesy (why I would still cite McKenzie (1978) if I wrote a basin modelling paper today), although in the case of textbooks it may be because it’s graphical with closed-form solutions for simple cases so is easier to understand and learn from than stonking great numerical models. Is it relevant in judging my activity today or how in touch I am with today’s field? No, because I’m retired now and I went on after that research to spend decades in industry (although both techniques were applied there, including by me, so I was a go-to-person in certain circles but mostly outside the academic citation treadmill).

Unless it’s obvious I would always explain in the text why I’m citing someone. Sometimes it may be obvious to One Reasonably Skilled In The Art, but not to an Auditor from another field or an interested amateur. Sorry. Back in those days we were writing for a professional audience of peers only, and to read it you had to go to a dead-tree library.

I would never cite a paper I haven’t read (although I might skim past the bit that was not relevant). When I was an editor we had a rule about that. Although it’s obviously difficult to enforce and had to be self-policing. One of the annoying things about the Internet is the profusion of cites which don’t support the claim being made and where the poster probably has not read it but just liked the sound of the title, or is recycling it from somewhere else.
Dave_Geologist says:

September 20, 2020 at 12:05 pm

Interesting. I went on to Google Scholar to see if I got a Reply, and my memory seems to be correct. I did get ten cites in the late 80s (three from me, but by 1991 even I was no longer citing it, just the full-blown model), but the original Letter doesn’t appear as an article just citations. It is available online though, as a scanned pdf with searchable text.

That does seem to be a generic thing, at least with Google Scholar and that journal (an Elsevier one). They now label them Discussion and Reply, but a search for some more recent ones doesn’t link to the online article, only to cites in other papers or by abstracting services.

Obviously that only covers formal Discussions and Replies, not references within a paper that challenge the cited work, but it does suggest some attempt at discrimination.
Willard says:

September 20, 2020 at 3:05 pm

The Science Citation Index made sense in another world:

The Annual SCI Journal Citation Reports were officially launched in 1975. The JCR evolved to provide a statistical summation of the Journal Citation Index, which in turn was the result of resorting the Author Citation Index: instead of alphabetizing the file by author name, you simply sorted the file by the names of the journals in which papers were published. When this exercise was first performed in the early 1960s, the journals already covered in Current Contents included those that either produced the most papers or those that were cited the most.

Source: http://garfield.library.upenn.edu/papers/barcelona2007a.pdf

Garfield admits the limitations of his “measure” (not sure it’s one) by suggesting at the end of his address that “a better evaluation system would involve actually reading each article for quality.” What he’s missing is that reading is also required to know why a paper is cited in the first place.

The SCI remains proprietary. In fact, one might argue that the impact factor and the citation index are controlled by a monopoly.
Steven Mosher says:

September 20, 2020 at 3:24 pm

“Stephen Mosher — A big library has the out-of-print papers.”

thanks, if I fly back to the USA I will have better access to one
Steven Mosher says:

September 20, 2020 at 3:40 pm

“Basic QC should not feel like targeted harassment and bullying. If data thugs behave like asshats, no amount of righteousness will erase it. It’s a suboptimal way to mainstream auditing if you ask me.”

there is nothing personal about journals asking for the data and code before you publish.
Nothing personal about universities and reseach organizations establishing independent
QC departments. heck even NOAA does something similar before publishing.

the point of demanding data and code upon publication is to PREVENT asshatery.
Willard says:

September 20, 2020 at 3:58 pm

> the point of demanding data and code upon publication is to PREVENT asshatery

You mean *one type* of asshatery, and you omit that this follows from everything I said. Making archives accessible renders obsolete the Auditor’s main selling point. But he could always for request more, e.g. correspondence, intermediate values, etc. Judy’s still there, harping about INTEGRITY(tm) in the comfort of her CEO armchair. And Tony’s will always be around.

The only way I know for humans to live well their senescence is to get to an Uncle Murphy state of mind:

Uncle Murphy memes are next level wholesome pic.twitter.com/icFPUZ52Ic

— 1flym (@1flym) September 19, 2020

If we want science kids to be cool, we need to be cool with them.

But sure, let’s penalize scoundrels.
Willard says:

September 20, 2020 at 4:12 pm

Here’s a funny experiment:

Trying a horrible experiment…

Which will the Twitter algorithm pick: Mitch McConnell or Barack Obama? pic.twitter.com/bR1GRyCkia

— Tony "Abolish ICE" Arcieri 🦀🌹 (@bascule) September 19, 2020

When users insert an image in a tweet, Twitter-the-company uses some algorithm to choose which part to present. We have no idea how it chooses, perhaps not Twitter-the-company itself. So users are trying to reverse-engineer the choices. Whiteness looks like a thing even for AI.

Algorithmic opaqueness is a big problem.
Steven Mosher says:

September 20, 2020 at 4:14 pm

“You mean *one type* of asshatery, and you omit that this follows from everything I said. Making archives accessible renders obsolete the Auditor’s main selling point. But he could always for request more, e.g. correspondence, intermediate values, etc. Judy’s still there, harping about INTEGRITY(tm) in the comfort of her CEO armchair. And Tony’s will always be around.”

if I thought it followed logically from what you said I would have remained silent.
And yes, it will not prevent all manner of asshatery, but will prevent the cases that
we know have caused issues.

1. The lost my data forms of denial
2. the you are not a scientist forms.
3. The why should I share it with you when you just want to find problsms.

Heck, brandon once was an asshat to me asking for intermediate data.
the answer was simple.
you have the code, generate it your self.

Now ask yourself how much traction did that asshatery get?
zero.

if preventing all asshatery were the metric then thats clearly impossible. The goal was never
to prevent all asshatery. the goal was to diminish it while providing some positive benefit.

Over time the profession will improve it’s sharing. sometimes under pressure and sometimes
because teaching changes..and sometimes because people get smarter.
Willard says:

September 20, 2020 at 5:04 pm

> if I thought it followed logically from what you said I would have remained silent.

I said “follows,” not “follows logically,” and here’s one hint:

§3. Toward Auditing Sciences

Do you really think that the auditing sciences are possible without any kind of traceability? I don’t think so. Perhaps you’d prefer something more direct:

Tax audits have clearer protocols than scientific ones. Auditing sciences should ponder on that and propose ideas to increase QC. They should also find ways to make sure that auditing remains independent from corporate interests.

Once we get clear rules and an independent referee, sure, suit yourself. QC all the way. Auditing poxes in all the (random) houses.

I’m not sure why you’re trying to force this open door, Mosh. My hypothesis is that your “sometimes under pressure” elides the role of science cops and data thugs in that pressuring. Like the Auditor’s or yours.

That’s so 2010. Let’s see where the historical chips fall regarding all this. I’m in no hurry.
Willard says:

September 20, 2020 at 5:17 pm

To be filed under *Things Take Time* department:

When smoking laws became stricter, there was a lot of discussion in society. One might even say there was a strong polarization, where on the one hand newspaper articles appeared that claimed how outlawing smoking in the train was ‘totalitarian’, while we also had family members who would no longer allow people to smoke inside their house, which led my parents (both smokers) to stop visiting these family members. Changing norms leads to conflict. People feel personally attacked, they become uncertain, and in the discussions that follow we will see all opinions ranging from how people should be free to do what they want, to how people who smoke should pay more for healthcare.

We’ve seen the same in scientific reform, although the discussion is more often along the lines of how smoking can’t be that bad if my 95 year old grandmother has been smoking a packet a day for 70 years and feels perfectly fine, to how alcohol use or lack of exercise are much bigger problems and why isn’t anyone talking about those.

But throughout all this discussion, norms just change. Even my parents stopped smoking inside their own home around a decade ago. The Dutch National Body for Scientific Integrity has classified p-hacking and optional stopping as violations of research integrity. Science is continuously improving, but change is slow. Someone once explained to me that correcting the course of science is like steering an oil tanker – any change in direction takes a while to be noticed. But when change happens, it’s worth standing still to reflect on it, and look at how far we’ve come.

https://daniellakens.blogspot.com/2020/09/p-hacking-and-optional-stopping-have.html

I’m not a huge fan of Daniel. But he’s cool enough. And his story involves smoking and oil tankers.
Willard says:

September 21, 2020 at 12:02 am

That escalated quickly:

https://twitter.com/TheSonOfGodVEVO/status/1307648883078635521
I. J. Khanewala says:

September 21, 2020 at 4:57 am

The mathematicians I know live in an approximation of the audit world.
Willard says:

September 21, 2020 at 2:38 pm

Indeed:
I. J. Khanewala says:

September 21, 2020 at 4:48 pm

I’d forgotten the cloud. Thanks for the reminder
Pingback: Berna’s Boat | …and Then There's Physics

This site uses Akismet to reduce spam. Learn how your comment data is processed.

	Jordan Peterson: A C… on Propagation of nonsense…
	Jordan Peterson: A C… on Propagation of nonsense
	Chubbs on Doubling down?
	Just Dean on Doubling down?
	I colori delle Alpi… on Revisiting causality using…
	Huitfeldts icke-svar… on Roy Spencer and Intelligent…
	Paul Pukite (@whut) on Doubling down?
	Paul Pukite (@whut) on Doubling down?
	russellseitz on Doubling down?
	dikranmarsupial on Doubling down?