Blue Light

David Roback and Hope Sandoval

Rolling Stone reports that Mazzy Star‘s “Sandoval confirms her [sic] and her bandmate David Roback haven’t called it quits and they are still working on their anticipated fourth album. But she declines to give many specifics. ‘It’s true we’re still together,’ she says. ‘We’re almost finished [with the record]. But I have no idea what that means.’”

An album. Really? A NEW album.

Just when you knew you couldn’t get any luckier. I thought the group was done. This is huge. Entire swathes of a person’s life are painted in poetry. If you’re lucky, you’ll find your poet. Nearly a decade of my life, starting in the early nineties, bask in the smokey blue light of Hope Sandoval’s lyrics and David Roback’s music. Pure poetry.

BLUE LIGHT
So Tonight That I Might See (1993)

  There's a blue light in my best friend's room
  There's a blue light in his eyes
  There's a blue light, yeah
  I want to see it, shine

  There's a ship that sails by my window
  There's a ship that sails on by
  There's a world under it
  I think I see it, sailing away

  I think it's sailing
  Miles crashing me by
  Crashing me by
  Crashing me by

  There's a world outside my doorstep
  Flames over everyone's heart
  Don't you see them shining
  I want to hear them, beating for me

  I think I hear them
  Waves crashing me by
  Crashing me by
  Crashing me by

Mandarin Wednesday I

Stanford Continuing Studies icon

I finished Mandarin Tuesday III this past Spring. There’s a lot going on at work and I must admit, I didn’t put in the same level of effort as I showed in Mandarin I and II. I believe anyone who is learning a foreign langauge will concur, class is a bitch when you’ve not put in the requisite study time. Nevertheless, perseverance pays and I crawled my way to the end. I have “completed” the entirety of “Practical Chinese Reader Book 1″ but I still babble like an idiot when confronted with native Chinese speakers. So, what to do? Sign up for the next course! Pain? Haha! I laugh. I have known pain in my time. The mild embarrassment and frustration of language school is nothing. NOTHING! Bring it.

Every language starts you out with basic phrases, useful phrases like, “la tiza está en la caja.” Very useful, for instance, if you are in a bank La Paz. No one speaks English, so you empatically repeat “THE CHALK IS IN THE BOX” until an interpreter arrives. Then you can cash your traveller’s check.

At some point, it pays to move beyond these essential basic phrases and develop real communication skills.

Intermediate Chinese Conversation (registration opens Aug 17, 2009) moves from basic phrases to actual communication.

“This course is designed for students who can talk about daily life in Mandarin, know Chinese phonetic spelling (pinyin) well, and can read 200 or more Chinese characters. We will work on conversational skills in speaking Chinese. The course will focus on communication skills for travel, business, and everyday use.”

And, the class meets on Wednesday. The last three classes were on Tuesday. Wednesday is much better for me. Sometimes you get lucky.

Mandarin is fun. It’s hard but fun. I have the rest of my life to learn it.

And I shall.

一步一个脚印

bash date tricks

A quick usage note regarding the date util under bash. I sometimes want to convert between a unix timestamp and a formatted date string. I do it infrequently enough that I forget the syntax. This article is me writing down my notes.

In the following example, I want to get timestamps and date strings for both today and yesterday. Why yesterday’s date? Because I want to get yesterday’s data from google analytics’ data API. I’ve see numerous examples getting day, month and year then subtracting one from the day and propagating the underflow through the month and year. Blech! If I have today’s timestamp, I simply subtract a days worth of seconds from today and violà, yesterday!

#!/bin/bash

# Generate a current unix timestamp
#
day=$(( `date +%s` ))

# Adjust the timestamp above by 24 hours
#
seconds_in_a_day=$(( 24 * 60 * 60 ))
yesterday=$(( day - seconds_in_a_day ))
echo "timestamps"

echo "      day : ${day}"
echo "yesterday : ${yesterday}"
echo " "

# create a formatted date string (linux)
#
#echo "linux formatted string"
#echo "      day : $( date -d @${day} '+%Y%m%d' )"
#echo "yesterday : $( date -d @${yesterday} '+%Y-%m-%dT%H:%M:%S%Z' )"

# create a formatted date string (bsd/mac)
#
echo "bsd/mac formatted string"
echo "      day : $( date -r ${day} '+%Y%m%d' )"
echo "yesterday : $( date -r ${yesterday} '+%Y-%m-%dT%H:%M:%S%Z' )"

# create a formatted date string (win)
#
# echo "windows formatted string"
# echo "windows? really?"

echo " "
echo " "

Another example? Okay, let’s say I had a text file, foo, and I wanted to embed a date and get the checksum. Furthermore, I wanted the filename to include the timestamp corresponding to the embedded date. (bsd/mac version)

#!/bin/bash

# scriptname: md5tagger
#
# generate a timestamp,
# generate output filename
# copy formatted date string to output file
# cat original file to output file
# copy the md5 sum to another output file
#
day=$( date +%s )
fname="$1-${day}"

echo $( date -r ${day} ) >${fname}
cat $1 >>${fname}
md5 ${fname} >${fname}.md5

Let’s try it! (bsd/mac version)

$ printf "text to copy which\ncould be important\n" >foo
$ ./md5tagger foo

$ ll foo*
-rw-r--r--  1 kelly  kelly  38 Aug 16 13:50 foo
-rw-r--r--  1 kelly  kelly  67 Aug 16 13:51 foo-1250455870
-rw-r--r--  1 kelly  kelly  56 Aug 16 13:51 foo-1250455870.md5

$ cat foo
text to copy which
could be important

$ cat foo-1250107720
Sun Aug 16 13:51:10 PDT 2009
text to copy which
could be important

$ md5 foo-1250455870; cat foo-1250455870.md5
MD5 (foo-1250455870) = 5cf4d9f274f05b63dfde5f15659cdeb8
MD5 (foo-1250455870) = 5cf4d9f274f05b63dfde5f15659cdeb8

With linux, you substitute md5sum for md5. Of course.

Verdana Hates Pinyin

I stumbled across an article on lostlaowai.com

www.lostlaowai.com/survival-chinese

which lead me to poke around the site a bit. At the above URL, I noticed that some of the combining diacritical marks (tone marks) used in writing pinyin were not rendering properly. I had not seen this problem before. It didn’t make sense.

Things that don’t make sense bug me. And being something of a character geek, I couldn’t let it go. So I tried to reproduce the problem in a test example. I couldn’t. That’s when I discovered a quirky Mac OS X copy+paste issue. I sensed there was a problem but the truth was elusive. You can’t see that copy+paste changes the string characters unless you look at a binary dump of the file (which I did).

Okay, the mandarin word for ‘good’ is 好 and in pinyin is written ‘hǎo’. It’s possible to write the pinyin using codpoints from just the unicode Latin block.

Latin Extended-B (Latin)
latin small letter a with caron
Unicode  01CE
UTF-8    C7 8E

   h    ǎ    o
0068 01CE 006F

It’s also possible to write the pinyin using Combining Diacritical Marks.

Combining Diacritical Marks (Combining Marks)
combining caron
Unicode  030C
UTF-8    CC 8C

   h    a 030C    o
0068 0061 030C 006F

Note that the combining mark comes after the character it decorates. This is in contrast to Mac OS X’s U.S. Extended Keyboard input method which preceeds the character to decorate with a modifier letter. However, the modifier letter is not a combining mark. You cannot create a byte sequence that a browser renders as hǎo, it will come out as hˇao.

Spacing Modifier Letters (Modifier Letters)
caron
Unicode  02C7
UTF-8    C8 87

   h 02C7    a    o
0068 02C7 0061 006F

NOTE: the caron does not combine with the a; OS X does not
modify the 'a' to have a caron above.

OS X input method uses the modifier letter to lookup an equivalent codepoint in unicode’s latin block.

Using OS X's US Extended Keyboard Input Method
opt-v + a

   h 02C7    a    o              h    ǎ    o
0068 02C7 0061 006F    ==>    0068 01CE 006F

Note: the caron combines with the a; OS X automatically
converts 02C7 + 0061 into 01CE.

To check the code points, I used this handy tool:

people.w3.org/rishida/scripts/uniview/conversion.php

  1. open the OS X character pallete
  2. Go to the URL above
  3. place the cursor in the upper left box labeled Characters
  4. type the letter h into the box
  5. type the letter a into the box
  6. from character pallete, insert character 030C into the box
  7. type the letter o into the box
  8. click the convert button just above the Characters box, the UTF-16 Code units box will have the sequence (in unicode code points) 0068 0061 030C 006F
  9. select and copy (cmd+c) the contents of the Characters box
  10. immediately paste contents back into the Characters box
  11. click the convert button just above the Characters box, the UTF-16 Code units box now has the sequence 0068 01CE 006F

Aha! The copy and paste operation changed the string’s character code points! Imagine my surprise.

That mystery solved, I next dove into the lostlaowai source code. This was my first encounter with using character entity encoding of the combining diacritical marks. Rather than type the characters directly into the source code, like this

hǎo

lostlaowai encoded the non-ascii characters like this

hǎo

even though the page encoding was declared as UTF-8

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Maybe it’s a joomla thing. lostlaowai uses joomla.

After a quick bout of deleting blocks of source code, I isolated the culprit!

screenshot
<html>
<head>
  <meta
    http-equiv="content-type"
    content="text/html; charset=utf-8">
  <title>wonderful.html</title>
</head>
<body>

<pre>
   好極了!
1. ha&#780;o ji&#769;le!
2. hǎo jíle!
<span style="font-family: Verdana,
   Arial, Helvetica, sans-serif;">
3. ha&#780;o ji&#769;le!
4. hǎo jíle!
</span><span style="font-family:
   Arial, Helvetica, sans-serif;">
5. ha&#780;o ji&#769;le!
6. hǎo jíle!
</span></pre>
</body>
</html>

Source code: wonderful.html

Adding Verdana to the font family causes the problem. I searched to see if anyone else had seen this problem. Indeed. Wikipedia.org has en entry on a similar bug and fileformat.info lists the five marks supported by Verdana. That’s sad. Verdana only supports 5 of the 112 code points in unicode’s Combining Diacritical Marks block.

The Verdana typeface, released in 1996, was created for and is owned by Microsoft. If Microsoft hasn’t fixed Verdana after more than a decade, I’ll assume they never will and prudence suggests avoid it.

At least avoid using Verdana in writing pinyin using combining diacritical marks. If you must use Verdana, then use codepoints from unicode’s latin block. On the Mac, this is the default when typing these characters in directly using the U.S. Extended keyboard.

Character     ā    á    ǎ    à
Unicode    0101 00E1 01CE 00E0
------------------------------
Character     ē    é    ě    è
Unicode    0113 00E9 0118 00E8
------------------------------
Character     ī    í    ǐ    ì
Unicode    0128 00ED 01D0 00EC
------------------------------
Character     ō    ó    ǒ    ò
Unicode    014D 00F3 0102 00F2
------------------------------
Character     ū    ú    ǔ    ù
Unicode    0168 00FA 01D4 00F9
------------------------------
Character     ǖ    ǘ    ǖ    ǜ
Unicode    01D6 01D8 01D6 01DC

If you have to convert an existing web page (like the lostlaowai page mentioned above), you could take advantage of the copy+paste quirk in OS X. Simply open the web page, copy the pinyin and paste it into a text editor (e.g., back into the source). The original text is not rendered properly but that’s ok. The character codes are correct. When you paste it into the editor, OS X will convert the the char+mark into a single char from the latin code block.

Finally, the character ‘a’ in pinyin is sometimes written using using the unicode codepoint 0251 ‘ɑ’ which is still in the latin block but in the section called “IPA Extensions”. It has a different look from the standard ascii character ‘a’. There is no set codepoints that replace the accented characters in the chart above.

Blue Smoke IV

I’ve added this latest rendition of Blue Smoke to illustrate a point: The quantity and quality of non-video music is greater than that of video music. Okay. So it’s anecdotal. But this isn’t about science. It’s about sensation. In my world, the grooveshark player found every track but one: Helpless by Needle. But I’ll take Neil Young’s unplugged version. Not as good but does have a nice base coat of maudlin piano. And I prefer the Paul Weller’s Portishead remix of Wildwood.

One of the niftiest aspects of the grooveshark player is that it if you change (add, delete, modify) tracks in the playlist on grooveshark /after/ you embed the playlist, those changes propagate to the embedded player. That way, if I find Needle’s version, I can swap out the track. And like magic, my playlist just gets better.

One last thing. The garish colors are mine. In fact, when you create a widget, there are color wheels to adjust color on /everything/. I suggest you get a palette of coordinated colors before you start building the widget. I didn’t and in the end, my widget looks like a three year old colored it.

Blue Smoke III

Yet another version of blue smoke using the embedr.com player. No one has the video of Needle‘s cover of Neil Young’s Helpless. Shame. Needle has, in my opinion, the quintessential rendition. Anyway, Helpless didn’t make it it.