Sunday, October 30, 2011

Repetitive labour and Wikipedia

In an effort to understand the different ways in which 'editors' contribute to Wikipedia, I have been using this tool to survey the average number of edits per page of the current 726 active administrators on the English Wikipedia.  I completed this rather tedious task this morning.

The result is that there is a wide range of edits per page from a low value of 1.21 at one extreme, to a high value of 20.51 at the other.  The value distribution, as a percentage of the sample, is shown in the table at the bottom.

What does this mean?  Clearly an editor with a very low edit/page count will be spending very little time on individual articles. The limiting value is 1, meaning that an editor  never returns to a page once they have done something to it.  The upper value is limited by the maximum number of edits to any single article, and will occur if one editor entirely wrote that article, with no help whatsoever.

What else can we say?  My main question is the different value contributed by editors with radically different edits per page.  Are the contributions of those with a high count, of a higher or lower value than of those with a low count?  Before you leap to conclusions, consider the following thought experiment.  Suppose that the article on Caspar David Friedrich, which is not a bad article, and is indeed a Wikipedia 'Featured Article', had been written by about 1,700 different editors.  Thus (since there have been about 1,700 edits to this article) each editor would have contributed no more than one edit.  The article would have grown to its present good quality entirely from the separate and probably disconnected contributions of the different editors.  And then extend the thought-experiment by supposing that all the Featured Articles - which are supposed to be the very best quality that Wikipedia has to offer - were written in this way.

As a limiting case, suppose there are 1,000 Featured Articles, and only 1,000 editors working on them, that each has 1,000 edits, and each editor has edited each article exactly once.  It is theoretically possible that all Featured Articles grew to their currently 'good' state by such a process.  In that case, edits per page would not be a good metric to determine whether the editor was what Wikipedians call a 'content contributor'.  All editors would be 'content contributors', but they would distribute their content thinly and evenly across many different articles.  This would be the 'classic crowdsourcing' that I discussed earlier articles such as this.

But this is clearly not the case.  More research is needed, but there are several bits of evidence suggesting that when 'value' or 'content' means the sort of quality assessed by the Wikipedia 'Featured' or 'Good' article assessment, it is editors with a relatively high edit per page who contribute this.  For example, look at the page here which tells us who contributed to the Caspar David article.  Three editors stand out, namely Ceoil (8.43 edits per page), Modernist (8.07) and Fpenteado (9.29).  Not only did these editors contribute significantly to this article, they contributed significantly to many other articles on Wikipedia.

Another piece of evidence is the type of contribution made by those with low edits per page.  For example, the lowest edit per page of my sample was Andre.  If you look carefully at what he is doing, he is simply adding links to articles on the Estonian Wikipedia, something which he seems to have been doing for a very long time.  That doesn't mean he is not adding something of value to Wikipedia, but you clearly couldn't build an article like the one on Caspar David simply by adding links to the Estonian Wikipedia. Clearly not.  Or consider the contribution history of 'Gaius Cornelius'.  He is using what is called a 'bot' on Wikipedia, i.e. a robot or mechanised editing tool. As you see from its description here, it is a tool 'designed to make tedious and repetitive tasks quicker and easier'.  This is mainly formatting and linking to other articles.  Again, this doesn't mean he and his robot are not adding some sort of value to Wikipedia, but it's clearly not the sort of value that could build an article like the one on Caspar David.

Now we could go further and bite that very difficult bullet: what is the economic value of the different contributions?  That is, what would be the market value of the labour corresponding to the different edits per page?  There are a number considerations here, and please note I am not an economist.  The first is that if quality of articles was a prime consideration, where 'quality' is measured by the Featured Article process, and where quality is the prime objective of the project, you would want to attract more 'content contributors' to increase quality.  Second, given that the table above suggests that content contributors are scarcer than mechanical contributors, you would want to pay more to the content contributors.  Finally, the principle that repetitive labour is easily learned, and thus less well paid than labour whose skill is difficult to acquire, would suggest paying the content contributors more, perhaps much more.  Which is the case in conventional encyclopedias, of course, where the bulk of the work is done by poorly paid penny-a-liners, often using custom-built databases such as Crystal, and the remaining 'flagship articles' are commissioned to skilled subject-matter experts for a premium fee.

This begs the question of why content contributors exist on Wikipedia at all, but that's a subject for another discussion, and I have rambled on enough for today.

By the way, Beyond Necessity is approaching a record number of page views this month.  3,848 views to today, compared to 3,490 last month, and looking to hit the 4,000 barrier by the end of this month. So, please feel freer than usual to click on some of the internal links here.  With best wishes to all.


Edits per pagePercentage of sample
1-222%
2-344%
3-418%
4-58%
5-64%
6-72%
7-81%
greater than 91%

No comments: