No More Jargon

Friday, January 19, 2007

Why is Acknowleding a New Paradigm Hard?

When you don't grasp a paradigm (a word I hate because of how cheapened it's become, incidentally), it's hard to admit that it has benefits over the way you know or do things as it is.

Learning a new paradigm is frustrating right up until you get the click moment. There are new techniques that you have to learn in order to understand. This takes time and accounts for at least some of the resistance encountered when learning a paradigm. But, this isn't where the real difficulty emerges. After all, if SICP is to be believed "The Trick is Learning the Trick" and the cost isn't really in adding to your knowledge, but something else that's very tricky to grasp.

I think that the true cost is in tossing away old beliefs. There's an element of shame in this that adds insult to injury. Not only are you forced to get rid of something that you valued before, you have to admit to people that may have chided you that you were wrong. If I were Buddhist, I would now mention something about earthly pride and (intellectual) property preventing true enlightenment, but I'm not, so I've just mentioned what I would've.

What prompted these thoughts is this: I haven't found a new paradigm that I've been able to grasp recently. This worries me. Does it mean that I am too concerned with my existing knowledge to gain more? I know of a few paradigms that I would like to learn, but I cannot seem to grok when I try.

In closing, I leave you with a really great, really creepy quote that's both somewhat relevant and irrelevant:

"Where we're going, we won't need eyes to see."

Thursday, January 18, 2007

An Observation on Creativity Using a Fancy Chart

Monday, December 18, 2006

Dear Lord, Let Us Express in Code That Which Belongs in Code

This past semester I took a class called Advanced Systems Analysis and Design and in this class we ostensibly learned all about designing and managing the creation of a technical system from start to finish. I say ostensibly because I feel like I'm worse off now, if what I'm supposed to be able to do is design and manage such a system. Maybe it's my fault though. Maybe I should've started running for the hills when I saw that the course was listed under a MGMT heading in the course catalog.

Now, to be fair, a few of the management bits in the course made sense and I think they might've conveyed valid information. And if most of the course had been filled with content about that, I'd probably be a happy camper, a little bored, mind you, since it's material I find necessary, but not interesting, but not angry, as I am now.

No friends, the bulk of this course was about designing these systems. I'm not going to bore you with all the details of what went so god awful horribly wrong. Suffice to say, we were getting a Manager's perspective on System Design; a person who is so far removed from the technical details of a system that they hold no importance and thus has no business submitting design specifications to an engineer. However, there is one element from the course that I must elaborate on, if only for the sheer idiocy of it all: The Base Structural Grammar.

The first thing you have to know about the Base Structural Grammar is that no one else on the Internet knows jack or shit about it. Google searches for "Base Structure Grammar" and "Base Structural Grammar" turn up one result each. This led myself and the rest of my teammates on the project to conclude that the BSG (as it was called for 6 weeks before the acronym was actually elaborated upon) was either a fancy acronym for Battlestar Galactica or something that the Professor made up on his own, in isolation, and without anyone to tell him just how full of bullshit it was.

At this point, you might be thinking to yourself, "Oh, I'm sure it wasn't that bad." Well friend, let me tell you just why this was so fucking stupid. It was a formal grammar for the description of control flow within a system that lacked BRANCHING CAPACITY. Okay? Get it. An attempt to describe operations that will be implemented in a Turing Complete programing system, in a language that, itself, is NOT TURING COMPLETE.

How were different courses of execution handled you might be asking yourself? By rewriting the entire case, but changed to account for the different operations that would need to be called in this case. As a comparison, if you were trying to do this while actually programming, you would need to rewrite all your functions 2^N times for every if statement present in them, and then dispatch to each of these functions based on the values of the arguments (which you have to magically express the conditions of, because there's no way to describe it in code).

When he explained this, I wanted to curb stomp his fat head. If I had been in the same room, I would've become physically violent. Luckily, I guess, I was watching the lecture remotely. I think I banged my head against my desk for about 10 minutes, but my memory is a little fuzzy from them.

Why, why, why, why couldn't a programming language be used for that? Something like a pared down Scheme would be perfect for those sorts of expressions! And then the code could actually be USED (if it, in fact, worked) after the lower level operations were implemented. Heck, you could actually get a jump start on that if a half-decent testing system existed for this theoretical design language!

I dunno, the mind boggles.

Monday, December 04, 2006

Metadata: The Darker Side of Meta

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

The garden of metadata is not all sunshine and roses though. There are thorns on our flowers of knowledge. Cory Doctorow, prolific Internet auteur, released long ago a general list of problems that will exist on the semantic web whenever it materializes. I will mention but a few of these that I consider the most worrying, specifically poisoned metadata and a lack of investment.

It has been established that attention is the most valuable commodity on the Internet aside from hard currency (with the majority of concern about attention going towards converting it to said currency). In underhanded attempts to gather attention, people will make all sorts of audacious claims and use any technique to grab attention wherever they may. You need only look in your spam folder to see this reality. With the capacity to add new information about information, a less than scrupulous user may attempt to place their content in a place it does not belong. Within the context of tagging, this has been unimaginatively named "tag spam", a prime example of which can be seen here. They keywords that have been placed on this photograph have little, if anything to do with the subject of the picture and seem to serve only promotional purposes.

Spam of this sort seems not to have taken off yet, perhaps in part because the spamers are forced to work with an application that requires user registration and thus limits the number of anonymous entries they can deploy. This low volume has left the signal to noise ratio relatively high, and thus it is not much of a problem. But when metadata moves towards non-federated creation, there will be no such guarantee of a central executor to punish those who seed poisoned data.

Apathy would be what I would consider the next largest problem. Despite an inevitable trend towards a more technologically savvy population, there remains a large segment, even among younger users that couldn't give a damn. Even though adding a tag takes but a few keystrokes, and adding geotags involves a few clicks on a map, it still does require work. Work that people just aren't interested in. For now, this is not a large concern among users since the majority do in fact care about adding metadata to their content.

Perhaps it won't be a problem though. If the visionaries are to be believed, not adding metadata will mean that not only will your content not be findable, it won't even be usable, even by yourself. Self-interested utility may be what drives metadata creation.

In many senses, this is already the case. When a user tags a link on del.icio.us they typically do so because they found the link noteworthy and would like to be able to find it again. When a Flickr user tags a photo and adds geotags, they do so because they would like to find a specific picture again, or see all the pictures they have taken someplace.

Will it fly in Peoria though? I'm not sure.

Sunday, December 03, 2006

Metadata: State of the Art

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

Metadata's most modern incarnations exist in myriad forms. Of those, a few that rank highest on the cross section of hype and usefulness include tagging, geotagging, and microformats. These forms of metadata overlap in their concerns somewhat, but they are distinct in the way that they are employed by users.

Tagging is not a complicated concept and once it is understood some people dismiss it as too simple to be of serious use. Despite these doubts, tagging has proven to be an effective and easy way to add metadata to content.

Tagging is simply attaching an explicit keyword to some data. It is different from categorization in two important ways: as many tags as is desired can be added and there is no vocabulary for tags. This is typically where tagging is dismissed as overly simplistic. However, when the uses of tags begin to be explored, what initially seemed like a simple system begins to gain some exciting emergent properties.

Tagging of course allows for findability, and that is how the vast majority of tag use is considered. Especially for image content, tags have been a boon to persons searching for particular subjects, photography styles, and even colors. With the additional metadata of Creative Commons licenses attached it becomes trivial to find appropriate content from creators who are willing to share their work with you (as been requested of myself 3 times since I began using Flickr). Tag searches can use as many tags to filter results as is desired, both by including and excluding terms. While this is the most popular, and most familiar, use of tags, it's probably the least interesting.

Because tags are explicitly added metadata and the systems of content have the capacity to track entries as they appear. It is possible, using syndication technology, to maintain a subscription to a particular keyword search, being notified of new entries as they are added. This allows a user of Technorati's blog tag search feature to track interest and discussion of particular topics across all weblogs on the internet.

Where this capacity truly becomes interesting is when tags are created for a single specific purpose, either to uniquely identify a concept that a limited subset of users are aware of, or instead of attempting to define the subject of data, to describe how, why, or for what purpose the data is to be used. Specifically, this tends to emerge around events such as conferences, festivals, and expos, (see sxsw2006 for example) but other examples can be seen in meme like activities such as 10placesofmycity or infiniteflickr. The uses for non-subject specification purposes are also fascinating, many people (the author included) have tagged things that they are interested in, but do not have the time to explore as todo or toread. Content that a user feels would be interesting to another user will be tagged as for:username.

Of course, tagging is not without its difficulties. Chief among them being semantic confusion between tags that are homonyms and syntactic disparity between tags that are synonyms of each other. One solution that has begun to gain ground on solving this particular problem however is that of tag clustering. The basic concept behind which is that a given tag will likely have a number of other tags it is commonly seen with. Groups of tags, clusters, tend to emerge that are linked to a particular tag, but syntactically and semantically distinct.

Geotagging is a similar, but distinct concept from tagging, as can be inferred by the name. It still invovles the addition of specific keywords to a piece of content. But those keywords are very strict and have a direct correlation to a physical location on the globe, either as a recognizable location name or in latitude and longitude coordinates. The practice of geotagging probablly emerged from the psuedo-sport Geocaching, but it has a wider appeal in its use.

Geotagging first began to emerge when the Google Maps API was hacked and people began producing mashups against existing databases that had locational information. One of the earliest and most striking of these mashups was Chicago Crime, which culled information from police reports and showed incidents against a map of the city. Another, less serious, example is overplot, a mashup of Overheard in New York (where all entries include a street address).

Flickr also began seeing use of geotagging on photos with an informally specified set of tags "geo:lat=xx.xxxx", "geo:lon=xx.xxxx" and "geotagged" which could be pulled from Flickr's databases using their tag access APIs. These tags were collected on external websites and allowed visitors to see tags within a specific geographic area, as well as determine precisely where a picture was taken. Flickr has since added a built in geotagging tool.

Of particular interest with regards to geotagging is the automatic creation of geolocational information by devices that are involved in the creation process themselves. At least one camera has supported an integrated GPS recording feature and there is a system available to add the capacity to dSLRs.

Of course, neither tagging nor geotagging address concerns of how automated agents will go about actually using this metadata. While the applications where all these datum are being stored allow programmatic access through APIs, designing an agent that would be capable of accessing and understanding all those APIs would be nigh impossible. That is where microformats step up. Microformats are an especially simple conception; they don't even attempt to address new types of metadata. What a microformat is, is simply a specifically structured valid XHTML fragment that conforms to one of the predefined (micro)formats.

As an example, let's look at the hCard microformat, which corresponds to the vCard contact format that has gained popularity amongst communication applications. Here is a simple hCard:


<div class="vcard">
<a class="url fn" href="http://nomorejargone.blogspot.com/">Daniel Nugent</a>
<a class="email" href="mailto:nugend@fakemail.com">nugend@fakemail.com</a>
<div class="adr">
<div class="street-address">999 Madeup Street</div>
<span class="locality">Springfield</span>,
<span class="region">NY</span>,
<span class="postal-code">00001</span>
<span class="country-name">USA</span>
</div>
<div class="tel">518-867-5309</div>
</div>

and how the hCard appears without escaping the HTML tags:

Daniel Nugent
nugend@fakemail.com

999 Madeup Street

Springfield, NY, 00001
USA

518-867-5309

Not exactly the prettiest looking output on the block, but that can be corrected with some style sheets. More importantly, this text is easily machine parseable because it is in a commonly accepted format, and it is also human parseable because it is in clear-text in a common layout.

Microformats exist for addresses, calendar entries, content licenses, and tags among others, with formats for resumes, reviews, and geotagging being developed.

Some people have raised the question as to why they should bother with microformats now if the full-hog semantic web is going to wipe the floor with it tomorrow. The answer is this: Data and metadata stored in microformats will be easily convertible to the official semantic web formats when they are finally decided upon. As an added bonus, robots that are developed to work with microformats will be able to recognize this data immediately and enhance the utility of content.

Next: The Darker Side of Meta