No More Jargon: 2006

Monday, December 18, 2006

Dear Lord, Let Us Express in Code That Which Belongs in Code

This past semester I took a class called Advanced Systems Analysis and Design and in this class we ostensibly learned all about designing and managing the creation of a technical system from start to finish. I say ostensibly because I feel like I'm worse off now, if what I'm supposed to be able to do is design and manage such a system. Maybe it's my fault though. Maybe I should've started running for the hills when I saw that the course was listed under a MGMT heading in the course catalog.

Now, to be fair, a few of the management bits in the course made sense and I think they might've conveyed valid information. And if most of the course had been filled with content about that, I'd probably be a happy camper, a little bored, mind you, since it's material I find necessary, but not interesting, but not angry, as I am now.

No friends, the bulk of this course was about designing these systems. I'm not going to bore you with all the details of what went so god awful horribly wrong. Suffice to say, we were getting a Manager's perspective on System Design; a person who is so far removed from the technical details of a system that they hold no importance and thus has no business submitting design specifications to an engineer. However, there is one element from the course that I must elaborate on, if only for the sheer idiocy of it all: The Base Structural Grammar.

The first thing you have to know about the Base Structural Grammar is that no one else on the Internet knows jack or shit about it. Google searches for "Base Structure Grammar" and "Base Structural Grammar" turn up one result each. This led myself and the rest of my teammates on the project to conclude that the BSG (as it was called for 6 weeks before the acronym was actually elaborated upon) was either a fancy acronym for Battlestar Galactica or something that the Professor made up on his own, in isolation, and without anyone to tell him just how full of bullshit it was.

At this point, you might be thinking to yourself, "Oh, I'm sure it wasn't that bad." Well friend, let me tell you just why this was so fucking stupid. It was a formal grammar for the description of control flow within a system that lacked BRANCHING CAPACITY. Okay? Get it. An attempt to describe operations that will be implemented in a Turing Complete programing system, in a language that, itself, is NOT TURING COMPLETE.

How were different courses of execution handled you might be asking yourself? By rewriting the entire case, but changed to account for the different operations that would need to be called in this case. As a comparison, if you were trying to do this while actually programming, you would need to rewrite all your functions 2^N times for every if statement present in them, and then dispatch to each of these functions based on the values of the arguments (which you have to magically express the conditions of, because there's no way to describe it in code).

When he explained this, I wanted to curb stomp his fat head. If I had been in the same room, I would've become physically violent. Luckily, I guess, I was watching the lecture remotely. I think I banged my head against my desk for about 10 minutes, but my memory is a little fuzzy from them.

Why, why, why, why couldn't a programming language be used for that? Something like a pared down Scheme would be perfect for those sorts of expressions! And then the code could actually be USED (if it, in fact, worked) after the lower level operations were implemented. Heck, you could actually get a jump start on that if a half-decent testing system existed for this theoretical design language!

I dunno, the mind boggles.

Monday, December 04, 2006

Metadata: The Darker Side of Meta

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

The garden of metadata is not all sunshine and roses though. There are thorns on our flowers of knowledge. Cory Doctorow, prolific Internet auteur, released long ago a general list of problems that will exist on the semantic web whenever it materializes. I will mention but a few of these that I consider the most worrying, specifically poisoned metadata and a lack of investment.

It has been established that attention is the most valuable commodity on the Internet aside from hard currency (with the majority of concern about attention going towards converting it to said currency). In underhanded attempts to gather attention, people will make all sorts of audacious claims and use any technique to grab attention wherever they may. You need only look in your spam folder to see this reality. With the capacity to add new information about information, a less than scrupulous user may attempt to place their content in a place it does not belong. Within the context of tagging, this has been unimaginatively named "tag spam", a prime example of which can be seen here. They keywords that have been placed on this photograph have little, if anything to do with the subject of the picture and seem to serve only promotional purposes.

Spam of this sort seems not to have taken off yet, perhaps in part because the spamers are forced to work with an application that requires user registration and thus limits the number of anonymous entries they can deploy. This low volume has left the signal to noise ratio relatively high, and thus it is not much of a problem. But when metadata moves towards non-federated creation, there will be no such guarantee of a central executor to punish those who seed poisoned data.

Apathy would be what I would consider the next largest problem. Despite an inevitable trend towards a more technologically savvy population, there remains a large segment, even among younger users that couldn't give a damn. Even though adding a tag takes but a few keystrokes, and adding geotags involves a few clicks on a map, it still does require work. Work that people just aren't interested in. For now, this is not a large concern among users since the majority do in fact care about adding metadata to their content.

Perhaps it won't be a problem though. If the visionaries are to be believed, not adding metadata will mean that not only will your content not be findable, it won't even be usable, even by yourself. Self-interested utility may be what drives metadata creation.

In many senses, this is already the case. When a user tags a link on del.icio.us they typically do so because they found the link noteworthy and would like to be able to find it again. When a Flickr user tags a photo and adds geotags, they do so because they would like to find a specific picture again, or see all the pictures they have taken someplace.

Will it fly in Peoria though? I'm not sure.

Sunday, December 03, 2006

Metadata: State of the Art

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

Metadata's most modern incarnations exist in myriad forms. Of those, a few that rank highest on the cross section of hype and usefulness include tagging, geotagging, and microformats. These forms of metadata overlap in their concerns somewhat, but they are distinct in the way that they are employed by users.

Tagging is not a complicated concept and once it is understood some people dismiss it as too simple to be of serious use. Despite these doubts, tagging has proven to be an effective and easy way to add metadata to content.

Tagging is simply attaching an explicit keyword to some data. It is different from categorization in two important ways: as many tags as is desired can be added and there is no vocabulary for tags. This is typically where tagging is dismissed as overly simplistic. However, when the uses of tags begin to be explored, what initially seemed like a simple system begins to gain some exciting emergent properties.

Tagging of course allows for findability, and that is how the vast majority of tag use is considered. Especially for image content, tags have been a boon to persons searching for particular subjects, photography styles, and even colors. With the additional metadata of Creative Commons licenses attached it becomes trivial to find appropriate content from creators who are willing to share their work with you (as been requested of myself 3 times since I began using Flickr). Tag searches can use as many tags to filter results as is desired, both by including and excluding terms. While this is the most popular, and most familiar, use of tags, it's probably the least interesting.

Because tags are explicitly added metadata and the systems of content have the capacity to track entries as they appear. It is possible, using syndication technology, to maintain a subscription to a particular keyword search, being notified of new entries as they are added. This allows a user of Technorati's blog tag search feature to track interest and discussion of particular topics across all weblogs on the internet.

Where this capacity truly becomes interesting is when tags are created for a single specific purpose, either to uniquely identify a concept that a limited subset of users are aware of, or instead of attempting to define the subject of data, to describe how, why, or for what purpose the data is to be used. Specifically, this tends to emerge around events such as conferences, festivals, and expos, (see sxsw2006 for example) but other examples can be seen in meme like activities such as 10placesofmycity or infiniteflickr. The uses for non-subject specification purposes are also fascinating, many people (the author included) have tagged things that they are interested in, but do not have the time to explore as todo or toread. Content that a user feels would be interesting to another user will be tagged as for:username.

Of course, tagging is not without its difficulties. Chief among them being semantic confusion between tags that are homonyms and syntactic disparity between tags that are synonyms of each other. One solution that has begun to gain ground on solving this particular problem however is that of tag clustering. The basic concept behind which is that a given tag will likely have a number of other tags it is commonly seen with. Groups of tags, clusters, tend to emerge that are linked to a particular tag, but syntactically and semantically distinct.

Geotagging is a similar, but distinct concept from tagging, as can be inferred by the name. It still invovles the addition of specific keywords to a piece of content. But those keywords are very strict and have a direct correlation to a physical location on the globe, either as a recognizable location name or in latitude and longitude coordinates. The practice of geotagging probablly emerged from the psuedo-sport Geocaching, but it has a wider appeal in its use.

Geotagging first began to emerge when the Google Maps API was hacked and people began producing mashups against existing databases that had locational information. One of the earliest and most striking of these mashups was Chicago Crime, which culled information from police reports and showed incidents against a map of the city. Another, less serious, example is overplot, a mashup of Overheard in New York (where all entries include a street address).

Flickr also began seeing use of geotagging on photos with an informally specified set of tags "geo:lat=xx.xxxx", "geo:lon=xx.xxxx" and "geotagged" which could be pulled from Flickr's databases using their tag access APIs. These tags were collected on external websites and allowed visitors to see tags within a specific geographic area, as well as determine precisely where a picture was taken. Flickr has since added a built in geotagging tool.

Of particular interest with regards to geotagging is the automatic creation of geolocational information by devices that are involved in the creation process themselves. At least one camera has supported an integrated GPS recording feature and there is a system available to add the capacity to dSLRs.

Of course, neither tagging nor geotagging address concerns of how automated agents will go about actually using this metadata. While the applications where all these datum are being stored allow programmatic access through APIs, designing an agent that would be capable of accessing and understanding all those APIs would be nigh impossible. That is where microformats step up. Microformats are an especially simple conception; they don't even attempt to address new types of metadata. What a microformat is, is simply a specifically structured valid XHTML fragment that conforms to one of the predefined (micro)formats.

As an example, let's look at the hCard microformat, which corresponds to the vCard contact format that has gained popularity amongst communication applications. Here is a simple hCard:


<div class="vcard">
<a class="url fn" href="http://nomorejargone.blogspot.com/">Daniel Nugent</a>
<a class="email" href="mailto:nugend@fakemail.com">nugend@fakemail.com</a>
<div class="adr">
<div class="street-address">999 Madeup Street</div>
<span class="locality">Springfield</span>,
<span class="region">NY</span>,
<span class="postal-code">00001</span>
<span class="country-name">USA</span>
</div>
<div class="tel">518-867-5309</div>
</div>

and how the hCard appears without escaping the HTML tags:

Daniel Nugent
nugend@fakemail.com

999 Madeup Street

Springfield, NY, 00001
USA

518-867-5309

Not exactly the prettiest looking output on the block, but that can be corrected with some style sheets. More importantly, this text is easily machine parseable because it is in a commonly accepted format, and it is also human parseable because it is in clear-text in a common layout.

Microformats exist for addresses, calendar entries, content licenses, and tags among others, with formats for resumes, reviews, and geotagging being developed.

Some people have raised the question as to why they should bother with microformats now if the full-hog semantic web is going to wipe the floor with it tomorrow. The answer is this: Data and metadata stored in microformats will be easily convertible to the official semantic web formats when they are finally decided upon. As an added bonus, robots that are developed to work with microformats will be able to recognize this data immediately and enhance the utility of content.

Next: The Darker Side of Meta

Metadata: The Vision

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

What does having all the extra data in the form of amateur publications do for us though that it ought to be organized in the first place? Aside from the intrinsic reward of having data in a coherent order we gain the potential for software to better discover data that we need for various reasons and to assemble that data in a meaningful way.

This paper was actually researched with the use of del.icio.us's metadata search tools. While I could've used Google to do searches for the specific keywords related to metadata, what I could not do was limit those searches' results primarily to conference proceedings, papers, or serious web articles (as opposed to blog postings, forum chatter, and news clippings).

While even the early returns are promising, the real vision of metadata lies in what the Semantic Web has the ability to bring us:

The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules.

We are quite a bit away from the technology for medical information to be correlated automatically with personal scheduling (let alone the legal issues with such information being open and accessible by machine reasoners) however. The reality of cutting edge metadata is much less striking (and unsettling), though still fairly useful.

Next: State of the Art

Metadata: Browsing the Web is Hard Work

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

If Metadata is so important to organizing and finding data, why has it only recently become a topic under significant discussion?

To answer this question properly, a brief history of the World Wide Web must be explored.

In the summer of 1991, Tim Berners-Lee published the first web page, released the HTTP specification, and made available the first web browser and WYSISWYG editor. Sir Berners-Lee's original vision for the web was as a collaborative medium where all visitors where content creators and everyone had access to a space to publish on of their own. Due to a number of technological, social, and other kinds of circumstances however, web publishers were initially limited to an elite set of advanced users and business interests.

Because these publishers were primarily concerned with content of a technical or business nature, they could rely on existing structures of information to categorize or organize the content they wanted to create. In situations where there was no existing structure, the data might not have been important enough to properly categorize or an Information Architect could be employed to create a new taxonomy or hierarchy for the new data. In addition, compared to the content creation rates of today, there was a minuscule influx of new data to properly organize. This allowed the data that was created to be structured by hand.

Also of import is that the data being published was largely textual in nature. This allowed for search engines to perform latent semantic analysis on web pages to obtain a general meaning of the words on a page. Google further refined on this technique by exploiting a previously unconsidered set of metadata inherent in the structure of the web itself: by counting the incoming links to a page, Google could determine the esteem that a page held with regards to its subject and return better results for keyword searches.

Google's PageRank was likely the last stop gap against the torrent of new web content though. In the last five years, the barriers to individual content creation on the web have begun to fall one by one. Technical knowledge, financial barriers, and connection requirements have been eliminated with the advent of free, ad-sponsored publishing platforms, like Blogger, Flickr, YouTube, Odeo, and a galaxy of other sites.

How does this change anything?

One, most of the content that is being published is undifferentiated except by the actual format. People don't restrict themselves to a single topic when they write, or take pictures, or make podcasts. They can less easily rely on using a taxonomy to describe their content, nor do many users feel compelled to create content regarding a single overarching subject. An amateur photographer on Flickr may be taking snapshots of their family one day and creating experimental Photoshop collages from those very same snapshots the next.

Two, much of the new content being created isn't textual. Computers have gotten better at recognizing objects in pictures and spoken words, but they're still lagging far behind their capacity to read digital text. We can't yet rely on Google to search through terabytes of images, video, and audio without supplementing that data with text.

Three, by giving every John Q Public and his brother the capacity to publish, the amount of content created daily has increased at an exponential rate. No one could do the job the old way even if they wanted to.

Metadata, data about data, is suddenly very important.

Next: The Vision

Metadata: Machine Accessibility

This series of posts is part of a short paper I am writing for Communication Design for the WWW.

A recent article in the New York times heralded the arrival of what they called "Web 3.0" or the Semantic Web. This caused quite a bit of tittering among commentators on the Internet, mostly because the paint was still fresh on Web 2.0 (whatever it actually means), but also because the Semantic Web was nothing new.

The Semantic Web is a format and specification project that has been underway for almost a decade. Its stated goal is the creation of a knowledge format that will allow machine intelligences to comprehend and reason about a wide and constantly evolving range of data. The format, and formats that will be derived from it, are what is known as metadata.

The McGraw-Hill Dictionary of Scientific and Technical Terms defines metadata as:

A description of the data in a source, distinct from the actual data; for example, the currency by which prices are measured in a data source for purchasing goods

Of course, this definition has no strict association with computer data. By all rights, metadata has existed for centuries, the Dewey Decimal System being the most widely known and rigorous. But even a convention as simple as alphabetical ordering by author, then title in the non-fiction section is a use of metadata. The data being the work itself, the text on the pages, and the metadata being the author and title.

This brings me to what I feel is an important point about metadata: although it ostensibly exists to allow mechanical interaction with data, the chief beneficiaries are ultimately humans.

To show this, consider a simple thought experiment:

Tear off the cover of every book in a library.
Try to find a book written by your favorite author.

Next: Web Browsing is Hard Work

Wednesday, November 15, 2006

Adversarial System Design

So, very, very often, I see people characterizing the need for member and function privacy as a way of keeping people out and preventing them from mucking with your stuff.

How messed up is that? You're a fool if you let untrusted users run code in your system without putting massive, massive restrictions on them. See _why's Sandbox efforts.

When you're designing a system or library, the purpose of restricting access to elements of an object is not like putting up a "KEEP OUT" sign, but more along the lines of, "Here are the controls to drive the car. Really, please, don't unscrew the spark plugs." But if a competent mechanic needs to do something to the car, he doesn't feel compelled to not go touchy touchy. Someone else that's happy to just drive the car so he can carry around his precision telescopes won't touch the spark plugs.

Why design with an adversarial mindset? You're not helping the mechanics, and the astronomers probably weren't going to touch the spark plugs in the first place.

Thursday, November 09, 2006

An Idea About How to Keep Your Blog Updated

Don't publish an entry until you have an idea for the next entry.

The fact that you have an entry just sitting there, BEGGING to be tossed on the intarnet will necessarily force you to come up with something. Doesn't have to be great, right? Just a little doodad you can expound on for a few paragraphs.

Got the kernel for the next one sitting as a Draft, awwww yeah.

Tuesday, November 07, 2006

The Bondage of the Dumb Language Part 34

Holy crap, a new post. It may not interest you, but one of the things that has been crushing my soul for the last half year was the fact that I haven't updated here. In fact, I know you don't care, because you don't exist. Onwards...

First, let's establish something: C++ sucks at everything. Anyone who's used C++ after having used something else will readily acknowledge this, with the possible exception of people who have a chance of getting the C++ standards committee to actually listen to them; they've got too much invested to admit it. It exists in some quasi temporal role as a systems language and an application language. As we all know, C++ inherits its systemness from C and its applicationess from a misunderstanding of Smalltalk.

What does this mean? I'll tell you what it means, it means that C++ is stuck on the hardware model that it was developed on: High performance, single threaded servers. It's even worse though, because C++ lacks a certain... realness to its objects and there is a strong tradition in the C++ community to do things in a very API oriented way (see the Standard Template Library, which, having read Alexander Stepanov's notes, was originally intended to be a collection of templated structs with concept requirements), it's fundamentally hard to work efficiently on new hardware organizations. God forbid someone develops stackless hardware or the C++ programmers are severely doomed.

More importantly, C++ has an impedance mismatch in that it allows both complete programmer control of the system state, which results in concern about byte structure of variables, stack state, and dereference operations for efficient computation as well as higher level object capacities which allow for ease in design, maintenance, and understandability. It's like trying to walk on two surfaces that have a significantly different height at once.

Originally, I was going to suggest that perhaps some sort of complexity manager could be added to the runtime code generated by the compiler, that would allow for a programmer to stick more closely to the abstract model presented by most other object languages. But after reading about the truly horrific evolution of the Java MVC stack, I'm leaning against bandages from now on. C++ needs to be fixed. It's some sort of matter/anti-matter Frankenstein right now that's trying to annihilate itself. I'd feel bad for it if I didn't encounter such enormous pain whenever I tried programming in it.

Monday, May 08, 2006

Life Tends to Stop Very Important Blogging

For the past month and two weeks, I have been in a crunch because of school. Typical of the last month of a semester, everyone suddenly needs that thing that they told you not to worry about back when the semester started, and promptly forgot to tell you that they needed when you had any reasonable hope of actually getting it done on time.

So, I hope that's a suitable explanation for the period of absence here. The good news is, god willing, I should be done with school in a week and able to resume the sorts of learning that I think are actually useful. One of those is writing entries for this blog! Hah hah, who knew that writing to the random people who happen to stumble across here could be construed as a learning exercise!

But, I really think it is useufl and helpful to write about what you have a passion for, especially on a blog where our own egos drive us to have a bigger digital dick than everyone else. My reasoning goes something like this: I need more readers. To get more readers, I need better content. To get better content, I need to know how to write better content. To know how to write better content, I need to learn more about what the subject of this blog is. The subject of this blog is about Computer Science and Information Technology (no hardware though, *bleaugh*). I need to learn more about Computer Science and Information Technology. QED.

Well, I'm not a very good informal logician, so who knows if that holds water (and I 'm sure as hell not going to put it in first order logic terms), but it's a good a reason as most to write a blog. Too bad it's not as good a reason as "I'm getting paid to write this blog." One can dream though.

Saturday, April 01, 2006

The Language of 2016? Part 2: Easy Datastructures and Rapid System Access

One of the the things that made Perl gain popularity in the late 80's and 90's was how easy it was to take a string and fiddle with the middle of it without overwriting any memory. Now, C++ has real strings as opposed to C's arrays of characters, but when using many of the system calls that are critical to doing common tasks, they are incompatible. There is, of course, a c_str method, but thsere's not really a quick way to go the other direction.

What Perl did was make treating a string like it was an array as easy as indexing into it. You could insert, remove, push, pop, whatever. And many of the things you could do on an array, you could do on a hash, making playing with your data quick and flexible. Combine this with the ease and simplicity of system access and the fact that those system calls tended to return things inside of data structures, and you had a lot of utility for low cost to the Programmer. This is all possible in C and C++ of course, but there's a cost barrier. It hurts a little, not much, but it hurts. And the type incompatibility between C++ STL objects and C calls is just annoying.

Java fixed a lot of this, and good on them for it, but there is still a thick layer of crap associated with data structures and file operations. Interfaces were a clever way of handling the incompatibility of types on which common operations could be performed, but the question always remained: Why can't I just create an instance of this high level type and then YOU choose the underlying implementation for me? Plus, there's always the problem of casting to the required type to pass to an API, very annoying.

Quite often, performance reasons are cited as to why these things don't exist in the current popular languages. I'll grant that performance is an important topic... but only when I don't have the performance I need. As long as there's an algorithmic upgrade path, it makes a lot of sense to choose the simplest suitable impelementation at construction time.

The Languages of 2016 will build on the idea of easy, readily available datastructures and unfettered access to the system. The former because data manipulations are so critical to most of the work that we do as Programmers and the latter because dealing with hardware is a noxious, unloving task. I predict that in 2016, it will be common to see a language with at least basic Graph and Tree types in the standard namespace with common searches over those structures as well as manipulation and meta-data operations available. I also predict that the system access will be further simplified and generalized, probablly to treat many different distrubted storage mediums as they would the local disk (but I'm getting a little bit ahead of myself there). It would also be nice to have better ways to interface with system buses like USB (and Wireless USB), but no one's ever seemed to care about that and devices are mostly proprietary so I think that's a little unlikely.

This is an article in my series of entries entitled "The Language of 2016?"

Friday, March 31, 2006

The Language of 2016? Part 1: Simple Semantics

"I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies." -- C. A. R. Hoare

It's often said that since all languages are Turing complete, you can do the same thing in any of them.

This is a dirty, dirty lie (no offense to Turing).

What you can do is get the same output from two programs written in different languages given the same input. Internally, depending on what the individual semantic models of the language are and what the desired conversion from input to output is, you will be doing very different things. If you attempted to do them both with the same methodology, you would have to implement the semantics missing from the other language in the language you were attempting to fit the methodology to.

Because at some point, you will find a problem that is exponentially easier in one set of semantics than the one supported by the language you're using, making it easy to modify the semantics of that language is critical. What's the best way to do this? Having a simple underlying semantics to the language. This will be absolutely critical to the Language of 2016.

Smalltalk and the Lisp's both practice this. Since I haven't used Smalltalk, I will refer to it via the proxy of Ruby (which shares similar semantics).

In Ruby, since everything is an object and all actions happen because of messages, I can model and manipulate new behavior easily. Changing the behavior of an object is as simple as creating a new function and slapping it into the object at the needed point. Likewise, Classes being objects allows me to talk to them and tell them how they should start acting. Since even the environment of execution is an object, I can capture that and use it to my nefarious purposes.

Lisp goes in a different, but possibly even more powerful direction. The readtime Macros allow programmers to turn semantics into syntactic abstractions with nigh unlimited power. This power is enabled because every statement in Lisp is an S-Expression: that is, the execution of some code. Because all Lisp Code is in S-Expressions and all Data likewise (after all the Data has to be stated in some form), Macros enable the manipulation of code exactly like data. Since I have little real experience with Lisp, I hesitate to venture further into a discussion about the Macro system.

Though I haven't read them yet, the two books in this area that I have heard as most relevant are The Art of the Metaobject Protocol and On Lisp.

This is an article in my series of entries entitled "The Language of 2016?"

The Language of 2016?

It's been estimated by a few people that popular industry languages have a lifetime of about 10 years.

It's been cited that in the 70's FORTRAN was what you needed, in the 80's C++, in the 90's Java, and that right now we're on the precipice of a new language taking over the "Enterprise" or whatever the hell you want to call it.

To be honest? I don't care about that. I feel like I'm on a course to knowing the sorts of things that are going to be necessary to be a succesful programmer in the next 10 years. By no means am I there yet, but I feel comfortable with where I'm headed right now.

What I want to start getting answers to is where I and everyone else will be going in a decade, give or take.

This entry is inspired by the "World's Most Maintainable Programming Language" article series written by chromatic.

Because of its Length, I'm breaking this down into a number of entries for easy consumption:

Part 1: Simple Semantics
Part 2: Common Datastructures and Manipulations Thereon
Incomplete Entries:
Part 3: Trivializing Common Tasks
Part 4: Distributed Computational Models
Part 5: Optimization Over Usage
Part 6: Declarational Co-Language
Part 7: Community

Tuesday, March 28, 2006

When the Walls are Closing In

For my Distributed Computing class, I'm writing an Actor Computation framework in Ruby. I've got a little code written and the rudimentary architecture laid out.

So far, I've had three, "Oh Shit... is this gonna be a huge problem?" moments.

The first came when I was trying to figure out how a computation on one machine was going to be delivered back to the place where it was requested. I knew that this was going to be something that I'd need to solve in the course of writing the framework, but I hadn't yet worked out a theoretical solution. I eventually figured out what I needed to do when I was in the shower (a classic "Eureka" moment to be sure).

The second came when I was trying to determine how an Actor would migrate from one machine to another without losing any messages sent to it. I had a little help with this problem because my Professor has written a slightly similar system that implements a solution to this and he pointed me in the right direction.

The third came over the course of the past few days when I realized that I had failed to figure out how short lived actors would be garbage collected (that is, actors who, once no one holds references to them and who have exhausted their message list, are deallocated). I began poking around the Remote Method Invocation library I'm using and looking for discussions on Distributed Reference Counting. The walls started closing in. It looked like I was going to have to write something that would be brutish and nasty and abusive of Ruby's meta-object model.

I didn't like it, but I became resolved to my fate. I'd just have to find the tests for the RMI library to make sure I wasn't breaking anything. Oh well, life sucks and then you die.

Today, I was thinking about the probelm again. Then I remembered something: The objects those distributed references point to are mobile! The RMI library's distributed references are static! I couldn't proceed with my nasty, brutish solution anyhow.

Oh noes.

Not 20 seconds later I thought of a solution that was immensely more satisfying and would be much more consistent with the rest of my system if not necessarily easier to program.

I tell you this, not because I want you to be impressed with how fast I think my way out of problems and into solutions, but because of how I came to my solution: Another Level of Indirection. My problem was because I was thinking to close to the tool I was using, I was trying to come up with the answer to a problem that was bound by the constraints of the wrong domain. I should've known better whilst I was fretting over what my library does to just go up a level and not worry so much.

If Lazines, Impatience, and Hubris are the three greatest traits of a Programmer, Indirection is surely his greatest tool.

Saturday, March 25, 2006

Reason 20X6 Why Java Sucks: Discourages Laziness

(A pronunciation guide for 20X6, for the uninformed).

Right, so I was writing a few test cases for my Graphical User Interfaces assignment and I banged my head against Java AGAIN (quite nautrally, I might add, I wasn't trying to be dynamic, it just happened).

The first moment occurred when I was looking through the JUnit Assertions for something that would let me test whether an exception was thrown or not. I didn't see anything right off the bat, so I chalked it up to laziness and went to write one myself. I typed in the method name when I realized "Ahhhh yes, no closures... oh and no easy way to invoke a method dynamically. Poop." Oh well.

The next time was when I saw that I'd be writing a couple of assertions repeatedly with minor variations depending upon the attribute I was vetting. No reason to repeat myself, right? Wrong. Once again, the lack of easy dynamic method invocation bites my ass (it also bites ass).

Heck, for that matter, if Java had decent introspection facilities, I could've probablly just written one special assertion and done all the attribute tests in one fell helper method.

Now, if you're reading this, please note that I'm not saying these things are impossible, just that the path of least resistance is to not use them, but rather write more, repetitive code.

When a language enables laziness to create better code, better code WILL be created. Every Programmer wants to do the right thing, they just don't want to work at it, and there's no good reason that they should have to.

Wednesday, March 22, 2006

UML Doesn't Know When to Stop

I was looking at a UML sequence diagram today (one that had an internal state diagram inside one of the objects), when it dawned on me precisely what bothers me about UML: There's no easy way of knowing when to stop.

I was looking at this diagram (beautifully rendered by the way) and I thought to myself: "What is the inherent advantage of putting this state notation in a document over into the source code?" On the surface, for the particular example, none, of course. It was a trivial sample meant to show how you would diagram such a thing in UML. Here's the question that goes unanswered though: When then, if not now?

When would it be appropriate for a designer to put that state diagram inside of his sequence diagram, or as a seperate diagram altogether or any of the other hojillion diagram types (which is another barrel of monkeys, but not central here) that UML provides? Well, none of the tutorials I've seen talk about it, none of the documents on UML talk about that.

The reason they don't, I think, is because it's called a "Language". And technical guys think about it like a language in that all of the paradigms should still apply. So they go and write the tutorials like its a language and show you all the doodads inside of it without explaining the use, assuming that you can parlay your expertise of how to program into writing UML.

This is a dirty fallacy that's hurting everyone. UML is a tool (they should probablly call it UMT). It has highly specific tasks that it is very good at. Namely fleshing out ideas in areas of a system where there is ambiguity, either between the customer and the designer, or the designer and the coding team. If you treat it like a language, something that gets used almost uniformly thoughout a project, then all of the different screws and bolts start looking like nails. If we treat it like a tool, like a unit testing framework or a version control system, then we can use it when it's appropriate and leave it alone when it's not. After all, you don't try to implement your network stack with Subversion.

Maybe this is a culture problem. I remember taking Software Design and Development and we spent a whoooole lot of time talking about requirements, understanding them, and expressing them in UML diagrams. But, again, there wasn't a lot of discussion about when enough is enough, issues of scale, issues of responsibility for different parts of design. Those things, the things that I would think are more important, less specific technical knowledge, just got glossed over.

That's all I'm going to write about that for now. I've got some more ideas brewing on this subject, but there's nothing worse than a long-winded blog entry.

No More Jargon

Tags