Sunday, July 08, 2018

How I think file-system tagging should work

The idea of file-system tags makes intuitive sense.  You might have files of different types -- documents, text files and images -- stored in dispersed places in the directory hierarchy, that may all share a common characteristic, such as all being recipes. The directories containing those files might also contain a whole lot of non-recipe-related files (e.g. one of those directories might be for your website and contain a lot of files that aren't recipes).  If you can tag all the recipe files with a 'recipe' tag then you can get a listing of all those files, and only those files. Or, as another example, you can restrict a search to be only within that collection of files.

Directory structure provides the primary basis for organising, and thus finding and viewing, collections of files. Tagging provides another layer on top of this, where files that may be unrelated in terms of directory structure can be related by the tags they share.

Nowadays, mainstream operating systems like Windows and macOS include some sort of file-system tagging.  (I should note that I only have direct experience with tagging on macOS, and have only read about how it works on Windows, so hopefully there won't be any inaccuracies in my depiction of that).  While I think tagging is a good idea in principle, I think that current importations of the idea leave room for improvement.

There are two common properties of tagging systems that I think can be improved:

1. the tag namespace is flat
2. tags are global

I think it would work better if:

1. tag namespaces were hierarchical
2. tags were private to particular directory subtrees


In short, hierarchy is an important tool for organising information, and that's what these two suggestions are about.

When the tag namespace is flat, there's a single tag namespace and all the tags are at the same level within that.  This gets too unwieldy. You might have a set of tags associated with recipes, like for the dish type (entree, main, dessert), and for cuisine type (italian, chinese, mexican, etc). But then if you have other sets of tags for other topics unrelated to recipes, the overall collection of tags can soon get too unwieldy. When trying to find a recipe-related tag from the list of tags, you have to find them from amongst all the other tags.

Being able to partition the tag namespace in a hierarchical fashion would help this.

At this point we can note a trade-off at work here.  In some sense, the whole point of tagging is to escape from hierarchy, to be able to specify categories that cut across the hierarchical structure of the file-system.  A tag like italian could be applied to recipes, but also songs and images of paintings. Applying a hierarchical organisation to tags goes against this. But, it doesn't go totally against it. Even with hierarchical organisation of tags, any tag could still be applied to a file anywhere in the file-system. I believe that a flat global namespace for tags is too unwieldy, so I think the tradeoff is a reasonable one.

One way to do this would be for there to be a global tag namespace, which is then hierarchically structured.  On the top level there might be tags like recipe and work. Under recipe we could have the tags like those listed above: entree, main, desert, italian, chinese, mexican, etc. Under work there'd be various work-related tags, and so on.

Like how I've argued that a flat namespace is too unwieldy, so I think a global namespace is too unwieldy.  My solution for the flat namespace was to make it hierarchical, and similarly my solution for the global namespace is to have a kind of hierarchical partitioning of it.

I think it would be better to associate tag hierarchies with particular directories (or more specifically, the directory subtree rooted at that directory), where the namespace for the tag is only within that directory subtree.  Where, outside of that subtree, those tags do not exist.

Despite tags allowing us to cut across hierarchical structure, we may want to only apply tags to files within a particular directory subtree.  Perhaps some tags are only for music files, which are only under the ~/music directory, or photos, which are only under the ~/photos directory. Or if we need the tags to be a bit more general we could put them in ~, or if we do want some to be even more general we could put them in / or C:\.

In terms of GUI file managers like Finder, their side bar could show only those tags that are applicable within the current directory.

Such a scheme would limit the ability of tags to cut across hierarchy, but I think it is a reasonable tradeoff.

Tags are ultimately a kind of metadata about the information in the directory structure, so it makes some sense to organise the tags within the directory structure, associating them to some extent with the information they're about.  A global namespace does not allow specifying any details of how the tags relate relate to the organisation of the information they're about.

Associating tag hierarchies with directory subtrees would make it more explicit as to what the intended use of the tags is - that is, what sort of information they were intended to classify. That, for example, dog is meant to be for pictures of dogs, not for any file in some way related to dogs. If we start using tags in a more haphazard fashion the entropy in the tag use increases over time.

One way to think of this is that, what we're looking to organise is not just, one the one hand, the directories and files, and on the other, the tags, as if these were separate concerns.  We're looking to organise the overall structure including both the directories-and-files and the tags, and how they relate.