Tuesday, March 23, 2010

Letter to Harriet Harman about the Digital Economy Bill

Dear Harriet Harman,

I am writing to express my concern about the Digital Economy Bill.

I am worried by reports that this Bill will be rushed through the Commons without the scrutiny it deserves. I do not believe that the allegations about undue influence from commercial lobbyists can yet be refuted, since clauses drafted by such lobbyists remain in place in the Bill as it stands. Such unquestioning acceptance of lobbyists' material is disappointing and at a time when politicians are under such grave suspicion of corruption it does not help to restore confidence.

The current provisions of the Bill (especially clauses 11-18) also threaten to have a severe negative effect on the freedoms we currently have in the UK to use public internet access and public file transfer services, while also threatening users with disconnection following allegations of copyright infringement: both these objectives represent a massively disproportionate response to the offences purported to occur.

It should not be possible for a conviction for infringement of copyright to result in curtailment of the defendant's access to something as fundamental as the Internet now is. That is like switching off the water supply to someone who has not paid their TV licence. Copyright is extremely important but this is not the way to encourage society to understand and respect it.

It is very difficult to prove accurately who is responsible for a copyright-infringing file transfer and this might lead to the most vulnerable users - those who do not understand the technology - being exploited by unscrupulous individuals who use their networks to perform such infringing file transfers. The passing of the bill will certainly drive determined individuals to make more concerted efforts to cover their tracks and it seems likely that innocent people will suffer at the least the pain of being unfairly charged and having to fight their case, even if they are not eventually convicted.

In addition, disconnection seems to restrict fundamental human rights, such as freedom of expression. Such rights can not be infringed by a democratic government without exceptional reasons. The infringement of copyright has not attracted such draconian penalties since the days of the Court of Star Chamber - a long-past era of oppression in which freedoms were not valued as we value them now.

Please take note of these concerns, which I share with so many others, and resist the call to rush through this inappropriate Bill.

Yours,
Ben Weiner

Labels: ,

Sunday, March 14, 2010

Firefox goes for Meta. Is there nothing better?

Good to see the Firefox 3.6 ‘what’s new’ page is swathed in styles that use Erik Spiekermann’s font Meta. But a shame that Mozilla or its design team couldn’t find an OFL licensed font worthy of their use.

A better OFLB online today might have helped. The design and coding is done; we await implentation.

Labels: , , , ,

Tuesday, March 09, 2010

Screen scraping joy

Been doing a touch of screen scraping, scripting with Ruby, against a target that was ‘unwilling’. A few observations:

  • Using Mechanize (available in various forms for Perl, Python and Ruby [homepage for the latter]) is a must. I started with the Ruby HTTP library, then went to Curb (Ruby’s implementation of Curl), but having the pages you retrieve abstracted into an object that you can manipulate in familiar terms (like, say page.forms_with :name => "choose_colour") helps you concentrate on the peculiarities of your task
  • Replicating the path of a real user is important. There could be session variables at the server end that mean jumping about between items that you cannot navigate between as a regular user will generate error pages, but see below
  • Don’t count on friendly HTTP errors from the server, as it might not know it has done anything wrong
  • If the page output looks OK but you cannot parse it, check for funny characters hidden in the HTML. I found ASCII nulls dotted about; these are initially hard to spot for somewhat obvious reasons. Browsers can deal with this kind of dodginess but XML parsers, as @fidothe reminds me, must ignore the elements in which such characters occur. I was able to do this to get around the problem:

  • @agent = Mechanize.new
    class << @agent
    alias :orig_get :get
    alias :orig_fetch_page :fetch_page
    # remove the chaff characters
    def get(options, parameters = [], referer = nil)
    page = orig_get(options, parameters, referer)
    page.body = page.body.gsub(/"[0x00]"/,"")
    page
    end
    def fetch_page(params)
    page = orig_fetch_page(params)
    page.body = page.body.gsub(/"[0x00]"/, "")
    page
    end
    end

    [0x00] represents ascii null in the sample code; I was able to select and paste the character from an HTML dump with both vim and a GUI text editor but it tends to be less than visible in the wild and YMMV.

  • Assume that what you’re doing is an unwelcome task. If the points above don’t give you that impression, other curiosities probably will.

Labels: , ,

TCO