Saturday, October 29, 2005

Alphabetization Challenge Results

Original challenge here. Well, time's run out, and nobody has guessed the right order, which makes me feel better about my alphabetization skills. Here is the actual order in which the entries appeared in my phone book:
A Auto Emergency Lock Out
A B C Haircuts
A & F Auto Supply
A 4 V Audio
A & O Liquor Store
A Two Girl Experience
A 2 Z Construction Inc.
AA Auto Body
A & A Auto Performance Inc.
A & A Automotive
But is there any rhyme or reason to this sequence? I've found a set of rules that rationalizes it, but I don't know whether it's the actual rule set employed:
1. Space comes before any letter.
2. Spell out numbers.
3. Treat the sequence space-ampersand-space as if it were a single space…
4. …Unless the letters before and after the ampersand are the same, in which case treat space-ampersand-space as a null.
I have, not, however, combed through the phone book looking for exceptions to these rules. Even if this rule set is correct, it's not complete. Looking back at the longer list of entries that I compiled and then whittled down to ten for the challenge, I found the following sequence
A-1 Acoustics
A1 Shredding
A 1 Tax Stop
All appeared between A & O Liquor Store and A Two Girl Experience. Can anyone rationalize this sequence? Apparently there must be an exception to Rule 1 above, since otherwise A 1 Tax Stop would come before the other two. Maybe all forms of A followed by 1, with space or hyphen or nothing in between, are treated as A 1.

I have two alternative hypotheses about phone book alphabetization. 1. They have a list that was originally organized according to some set of rules, but they insert new entries by looking for similar entries and putting the new ones nearby. Each error therefore creates the seed for a pattern, which creates the illusion of a system of exceptions. 2. Money can buy exceptions to the rules.


kissmesoftly said...

I wouldn't be surprised if your second hypothesis was correct... and made less detectable because of your first one.

Jeanne Marie said...

I would personally guess inconsistancies in the method in which they were organized. Just imagine how many people had to have been working on the A section of it---the amount of businesses that appear, disappear and reappear over the years. One person being inconsistant throws things off.

I worked for a company that assembled the books for one of those restuarant review places. They'd put out a full colored book describing all the finest places to eat, every single year. But it was hard for a place to lose the status once they received it, so many entries remained the same every year, and pages had to be inserted inbetween, done in alphabetical order.

While revising a section, I noticed that about 2/3rds of the Restuarants that started with "the", ie. "The Blue Ginger", would be listed under the first letter of the second word. So that, logically, would go under B. 1/3 of them were listed under T, where "The" would go, and then arranged under that in regular alphabetical order.

I noticed the inconsistancies laid in the ones placed under "the", then also noticed that was how the newest woman on the project organized her filing cabinets. The inconsistancies were kept the same because the company did not want the restuarants to be confused by their new placement in the book, and perhaps, miff those who would be pushed further back into the book by all the "The's" being spread out.

That was a long way of saying, I think the same thing may have happened here. Someone may have made an error, but it is too much hassle to disrupt the order of things.

Randy said...

Wow, your phone book company either has custom sorting implemented (Paying companies get bumped up in the sort order) or they just have a crappy database that is manually indexed.

A third possibility is that they are running a binary collation. This sorts solely on the unicode value for a given character. This might lead to visual inconsistencies when differing languages/locales have the same glyph (visual representation of a character). Because the same glyph will look the same on the page, yet have a different underlying value (Like a turkish 'i' as opposed to an english 'i'), they will be sorted differently by the binary collation algorithm than you might first expect.

However, based on the "the" example I'd trend towards data entry monkies manually indexing the database.