De-indexing, unexpectedly solved

force_luke

Contrary to our previous expectations, PIPEDA already guarantees a person the right to de-index material which is inaccurate, outdated or misleadingly incomplete, even if the publisher has a legally credible reason for not taking down the original document.

We failed to see this because we all looked at the very public ongoing suits involving Google. rather than at the publishers’ own control over indexing.

We can act now to have the publisher de-index inaccurate, outdated and incomplete material in Canada, and in rule-of-law countries. When we are dealing with scofflaw countries or publishers, we can fall back to the weaker remedy of de-indexing by intermediaries like Google.

Introduction

The existing authority of PIPEDA allows a great degree of protection to privacy and indirectly to reputation, with less risk to free expression than was expected.

This is because the community and the commission have been addressing the problem with the assumption that search engines are the primary decision maker as to whether a page should be indexed. In fact, that’s not the case. The choice to index a page or not is first made by the page’s publisher, and is honored by Google et all, independent of any decisions Google might itself make to de-index a page.

This was recognized by counsel for Mario Costeja González in the Google Spain case, who attempted to have the la Vanguardia newspapers “use certain tools made available by search engines in order to protect the data”. [Spain]

The fact that tools of this sort are available and routinely used by publishers like CanLII provides context to the guidance the Supreme Court has provided us in the de-indexing case of Google Spain, speaks to how the Privacy Commissioner can act immediately, and how parliament can address the larger problem.

Controlling whether pages are indexed

In discussions to this point, the commentators have spoken as if Google and the other search engines are the only ones deciding what they would not index.

In fact, what not to index was one of the first problems encountered by early search engines, and problems caused by malfunctioning search engine robots were some of the first problems suffered by early publishers of web pages.

The publisher needed to indicate what files they did not wish to have indexed, and the search engine robots needed to know what files will cause failures. For both parties, a “robots” file was the mechanism of choice. Whatever the publisher said to “Disallow”, the search engine’s robot would not try to index. That prevented errors such as

  • indexing of an unannounced site [Koster],

  • trying to index of parts of the site which would require vast resources from the server providing the site, the very problem Stross caused Koster [Stross], or

  • causing the robot becoming lost in an infinite loop, causing the failure of the entire indexing process.

The robots file does not prevent individuals from visiting the site, but search engines honor it out of necessity, for self-preservation. That means that a publisher can exclude particular pages from all of the major search engines with considerable reliability.

That, in turn, means that a publisher can remove particular documents from the set indexed by search engines, but need not remove them from the site. This is particularly applicable to organizations such as the Internet Archive, who wish to preserve old and sometimes obsolete material, but not have it indexed as it were new by a search engine.

This is actively used by privacy-aware sites such as the Canadian Legal Information Institute, CanLII. For example, CanLII’s robots.txt file contains

User-agent: Googlebot
Disallow: /en/search
Disallow: /fr/search
Disallow: /search
...
Disallow: /en/nl/nlla/
Disallow: /fr/nl/nlla/
Disallow: /en/nt/ntro/
...

The first few lines are for Google’s convenience, disallowing wasted effort spent indexing the CanLII search pages, but lines like /en/nl/nlla are different. “nlla” is the Labour Arbitration Awards for Newfoundland and Labrador, and are not intended to be indexed.

Sure enough, Resource Development Trades Council v Muskrat Falls Employers’ Association Inc., 2017 CanLII 91589 (NL LA) are not indexed by Google.

The problem initially raised by Google Spain

In the Google Spain case, the European Court of Justice ruled Google could be ordered to remove links to an article about former Spanish social security debts.

Mario Costeja González had previously tried to get the newspaper to remove or restrict the pages, and was strongly resisted. He then turned to Google, as search engines have a reputation as an “easy target”, being less inclined to defend newspapers than the newspapers themselves would be.

Google, however, strongly resisted this particular initiative. If they were to be treated as a publisher, they would be liable for everything they linked to. They are very aware that they would be out of business without the US’s “safe harbor” and without ensuring they are not seen as a publisher,.

The courts and legislators are well aware of this concern, and sought a way to allow search engines to survive while at the same time protecting the public. The EU, in an elegant bit of pilpul, concluded Google was a sort of publisher, a “data controller”, and therefor open to being ordered to de-index the pages. However, they did not declare Google a publisher in the established sense of the term and therefore did not hold it responsible for the content of the pages it linked to.

What we have learned from existing jurisprudence in Canada

The courts will order take-downs under PIPEDA, including internationally. In the case of globe24h.com, the Federal Court ordered a Romanian site republishing Canadian court reports be taken down. The courts specifically denied the site the “journalistic purpose” exception of paragraph 4(2)(c) of PIPEDA. [Globe24h]

At the same time, the courts strongly protect legitimate journalists. In a recent case, the Supreme Court reversed a contempt conviction against the CBC for not removing archival copies of information about a case for which a publication ban was ordered. [CBC]

Finally, look carefully at the process in Google v. Equustek. Equustek had first obtained an order prohibiting the purveyor of the stolen goods from doing business worldwide, at Google’s request. Google then removed links to the offending information from Google.ca. The legal debate was about whether they should take down links worldwide, notably in the US.

Positive Results

We are in a substantially better state than I had expected a few weeks ago. Under PIPEDA, anyone can require that inaccurate, incomplete or obsolete information about them be taken down. in cases like CanLII, the Internet Archive, newspapers, or legitimately reluctant commentators, they can be de-indexed instead, by being marked “Disallowed from indexing” by the publisher’. When either happens, the information disappears from search.

If the courts, or a commission with order-making powers, orders it down and the site refuses, Google has demonstrated its willingness to remove links in at least Google.ca. Whether Google will remove links globally is still a question before the courts here and in the EU, but they have indicated a positive willingness to de-index sites that have been found to be acting contrary to national law in Canada.

Conclusions

The major thing the commission can do right now is encourage people to require take-downs, and where there are reasons to not do so, require that the pages in question be blocked from indeximg, using the authority PIPEDA already provides.

It is, in my considered opinion, clearly desirable that the commissioner be given authority to issue orders to this effect, rather than applying to the courts.

Nevertheless, PIPEDA and the Supreme Court of Canada have already set an example and a standard for the rest of the world to follow.

Respectfully submitted,

David Collier-Brown


[CBC] R. v. Canadian Broadcasting Corp., 2018 SCC 5 (CanLII), , retrieved on 2018-03-20

[Equustek] Google Inc. v. Equustek Solutions Inc., [2017] 1 SCR 824, 2017 SCC 34 (CanLII), <http://canlii.ca/t/h4jg2>, retrieved on 2018-03-20

[Globe24h] A.T. v. Globe24h.com, 2017 FC 114 (CanLII), , retrieved on 2018-03-20

[Koster] A Method for Robots Control http://www.robotstxt.org/norobots-rfc.txt

[Spain] European Court of Justice in Google Spain SL, Google Inc v Agencia Espanola de Protecciób de Datos, Mario Costeja González, C-131/12 [2014], CURIA. http://curia.europa.eu/juris/document/document.jsf?docid=152065&mode=req&pageIndex=1&dir=&occ=first&part=1&text=&doclang=EN&cid=34297#annotations:q5UroCNFEeiMnk8eapaOTQ

[Stross] How I got here in the end http://www.antipope.org/charlie/blog-static/2009/06/how_i_got_here_in_the_end_part_3.html The story of Charles Stross breaking Martin Koster’s web site, and Koster’s invention of the robots.txt file.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s