Most recent Thoughts and Writings

The Sign Up With Google Mistake You Can't Fix

Tue, 1 Mar 2016 18:30:00 GMTUpdated

This blog post has been revised to more accurately reflect some details. Thanks to Fleep CEO Henn Ruukel for reaching out.

Today – by regrettable oversight – I exported my recent e-mail history to Fleep, a collaboration platform used by a new client, and let their software synchronize future e-mails. I thought I was just signing up, giving up my basic info.

The mistake was all mine. I did not notice that the authentication screen was requesting for permission to allow Fleep to "manage my e-mail". But I did realize the consequences moments later when I saw that many of my contacts and my recent e-mail history had been pulled into their system.

It turns out that Fleep actually only pulls down the most recent 200 e-mails. But this wasn't apparent at all to me because I wasn't expected any of my e-mail to be imported. I had unwittingly allowed an app to download my entire personal correspondence for about a decade.

While I do blame Fleep for using what I perceive to be a euphemism – connect with gmail – and not making it crystal clear that you're not just signing up for a messenger app but actually fully integrating your e-mail account, the bigger problem here lies with how Google makes this possible:

  1. At all
  2. Without asking for your password

It's just too easy to give away your personal information on the internet and this needs to be fixed.

We have a similar problem with apps on mobile devices that ask for permission to access all photos in order for you to be able to select just one. I think for the most part you can trust apps to do the right thing – but the way it's currently set up, there is no transparency in what these companies do with the data you have granted them access to.

Legally, I think the EU Data Protection Directive has me covered, but once you have handed over your data to an internet company, it's really out of your control.

Big internet companies, please take privacy seriously and help your users understand the consequences of their actions.

Previous

A Skip Dict for CPython

Fri, 26 Sep 2014 11:00:00 GMT

I have released an implementation of a skip dict datastructure for CPython. It's written in C and works on both Python 2.7+ and Python 3.3+.

It uses a skip list combined with a mapping from keys to values, a design inspired by the sorted sets datastructure found in Redis.

Python's Missing String Type

Thu, 17 Apr 2014 17:35:00 GMT

When Python 3.0 came out in late 2008, it was expected that the eventual wide adoption of the 3.x series would take roughly five years.

And on some Linux systems today, it's even the default interpreter.

$ python
Python 3.5.2 (default, Jun 28 2016, 08:46:01)
>>>

Yet, I don't know anyone who actually uses Python 3 for application development. I think there are two primary reasons for this:

  1. The advantages are few.
  2. The disadvantages are many.

The most controversial change in Python 3 was that the string type was changed from an 8-bit raw byte string to a unicode-based string type which makes sense because the string type is for human-readable text and unicode is able to represent any text.

Unfortunately, it broke almost every existing library. But it also missed the mark.

In Python 2 we have str and unicode. In Python 3 we have str and bytes. But there's a design that allows us to combine the functionality of both in a single type.

Ropes

We can use a rope-like data structure where each leaf is a sequence of bytes with an encoding such as utf-8 (see also the paper from 1994 by Boehm, Atkinson and Plass.)

Rope data-structure

We can add any two str instances together, regardless of encoding, and use all of the common string methods and operators such as len and split. In all cases, the methods would respect the encodings of the various segments.

To "flatten" a rope, we encode it:

>>> string.encode('utf-8')

This is typically necessary only for I/O or use with external libraries.

What about raw bytes? Easy:

>>> data = open('foo.png', 'rb').read()

And if we know that a particular substring is actually encoded:

>>> header = data[1:4].decode('utf-8')

This works because data was read as raw bytes from a file. When we decode this data we get a rope that's composed of a single segment with a unicode-compatible encoding – utf-8.

Discussion on Hacker News.