<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Red Leopard &#187; bash</title>
	<atom:link href="http://www.redleopard.com/tag/bash/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.redleopard.com</link>
	<description>A Stranger in a Strange Land</description>
	<lastBuildDate>Mon, 07 Jun 2010 22:59:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>bash uuid generator</title>
		<link>http://www.redleopard.com/2010/03/bash-uuid-generator/</link>
		<comments>http://www.redleopard.com/2010/03/bash-uuid-generator/#comments</comments>
		<pubDate>Thu, 25 Mar 2010 04:20:50 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=913</guid>
		<description><![CDATA[Onliner bash scripts are handy but bash and common utilities don&#8217;t always work the same on the two systems I most use: Centos vs. OS X.


centos $ cat /etc/redhat-release 
CentOS release 5.4 (Final)




osx $ sw_vers &#124; head -n2
ProductName:	Mac OS X
ProductVersion:	10.6.2


For example, I recently wrote a simple script to generate a set of UUID using the [...]]]></description>
			<content:encoded><![CDATA[<p>Onliner bash scripts are handy but bash and common utilities don&#8217;t always work the same on the two systems I most use: Centos vs. OS X.</p>
<div class="terminal">
<pre>
centos $ <span style="color: green;">cat /etc/redhat-release </span>
CentOS release 5.4 (Final)
</pre>
</div>
<div class="terminal">
<pre>
osx $ <span style="color: green;">sw_vers | head -n2</span>
ProductName:	Mac OS X
ProductVersion:	10.6.2
</pre>
</div>
<p>For example, I recently wrote a simple script to generate a set of <a href="http://en.wikipedia.org/wiki/Uuid">UUID</a> using the <code>uuidgen</code> utility. OS X and Centos versions of <code>uuidgen</code> take very different parameters.</p>
<p>Of course they do.</p>
<p>Centos <code>uuid</code> manpage</p>
<div class="terminal">
<pre>
UUIDGEN(1)                                                 UUIDGEN(1)

NAME
       uuidgen - command-line utility to create a new UUID value

SYNOPSIS
       uuidgen [ -r | -t ]
  ...
</pre>
</div>
<p>I like to use the <code>uuidgen -r</code> option to explicitly generate a random-based UUID. It&#8217;s not strictly necessary as this is the default behavior. Still, I like to put it in. That&#8217;s just me. OS X doesn&#8217;t have this option. Oh, well.</p>
<p>OS X <code>uuidgen</code> manpage</p>
<div class="terminal">
<pre>
UUIDGEN(1)           BSD General Commands Manual           UUIDGEN(1)

NAME
     uuidgen -- generates new UUID strings

SYNOPSIS
     uuidgen [-hdr]
  ...
</pre>
</div>
<p>Next up, OS X generates UUID in upper case whereas Centos generates UUID in lower case.</p>
<div class="terminal">
<pre>
centos $ <span style="color: green;">uuidgen</span>
18722f8e-14cd-41fb-a63e-af9ff1c287ce
</pre>
</div>
<div class="terminal">
<pre>
osx $ <span style="color: green;">uuidgen</span>
81AE9EAC-0B8B-4DB9-B262-76AA8C285DD6
</pre>
</div>
<p>Again, not really a big deal but I like consistency. Easy to fix with a pipe and <code>tr</code>.</p>
<div class="terminal">
<pre>
osx $ <span style="color: green;">uuidgen | tr [:upper:] [:lower:]</span>
62a4d6b9-e0a9-4996-9e71-e7291158b700
</pre>
</div>
<p>But I needed a set of UUID. A simple loop would suffice.</p>
<div class="terminal">
<pre>
centos $ <span style="color: green;">for i in `seq 1 4`; do uuidgen | tr [:upper:] [:lower:]; done</span>
408bf1d7-80a6-41ee-8a75-f7bbb5b65dd7
ae5e0aa4-f0b2-48ff-9cfe-ab99fb37b5c7
7e0a7e69-364d-4259-9b3f-83d448e9b591
e1d1b257-974e-4754-a6d3-fe4566b55c93
</pre>
</div>
<div class="terminal">
<pre>
osx $ <span style="color: green;">for i in `seq 1 4`; do uuidgen | tr [:upper:] [:lower:]; done</span>
-bash: seq: command not found
</pre>
</div>
<p>Drat! No <code>`seq 1 4`</code> in OS X.</p>
<p>Okay. Use the alternate form to declare a sequence.</p>
<div class="terminal">
<pre>
osx $ <span style="color: green;">for i in {1..4}; do uuidgen | tr [:upper:] [:lower:]; done</span>
c861326b-bde8-4198-b45a-6bfb7016addb
ef813568-5d3d-4587-a170-8aab798fd83b
21fe8562-1511-4fd4-bd37-71b43c32e013
acb10051-9af8-42b8-9ac9-54010ad71d07
</pre>
</div>
<p>and verifiy that it also works on Centos.</p>
<div class="terminal">
<pre>
centos $ <span style="color: green;">for i in {1..4}; do uuidgen | tr [:upper:] [:lower:]; done</span>
93c68aba-cbe5-4b79-a1cc-e00eaae0527a
c564a4f4-9d39-4d2d-8762-4ba506c97de8
f694000b-d2cc-4b31-aabd-c3facd13b081
86466e00-3948-45f7-9090-09ab816b8fb6
</pre>
</div>
<p>Would ruby be easier? Probably not for this simple hack.</p>
<p>If I knew ruby better, dropping into irb would be just as easy as bash oneliners. But there would be other problems. For example, &#8220;Which ruby?&#8221;</p>
<div class="terminal">
<pre>
centos $ <span style="color: green;">ruby -v</span>
ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux]
</pre>
</div>
<div class="terminal">
<pre>
osx $ <span style="color: green;">ruby -v</span>
ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]
</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2010/03/bash-uuid-generator/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>bash date tricks</title>
		<link>http://www.redleopard.com/2009/08/bash-date-tricks/</link>
		<comments>http://www.redleopard.com/2009/08/bash-date-tricks/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 21:14:39 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=724</guid>
		<description><![CDATA[A quick usage note regarding the date util under bash. I sometimes want to convert between a unix timestamp and a formatted date string. I do it infrequently enough that I forget the syntax. This article is me writing down my notes.
In the following example, I want to get timestamps and date strings for both [...]]]></description>
			<content:encoded><![CDATA[<p>A quick usage note regarding the date util under bash. I sometimes want to convert between a <a href="http://en.wikipedia.org/wiki/Unix_timestamp">unix timestamp</a> and a formatted date string. I do it infrequently enough that I forget the syntax. This article is me writing down my notes.</p>
<p>In the following example, I want to get timestamps and date strings for both today and yesterday. Why yesterday&#8217;s date? Because I want to get yesterday&#8217;s data from google analytics&#8217; data API. I&#8217;ve see numerous examples getting day, month and year then subtracting one from the day and propagating the underflow through the month and year. Blech! If I have today&#8217;s timestamp, I simply subtract a days worth of seconds from today and violà, yesterday!</p>
<div class="terminal">
<pre>
#!/bin/bash

# Generate a current unix timestamp
#
day=$(( `date +%s` ))

# Adjust the timestamp above by 24 hours
#
seconds_in_a_day=$(( 24 * 60 * 60 ))
yesterday=$(( day - seconds_in_a_day ))
echo "timestamps"

echo "      day : ${day}"
echo "yesterday : ${yesterday}"
echo " "

# create a formatted date string (linux)
#
#echo "linux formatted string"
#echo "      day : $( date -d @${day} '+%Y%m%d' )"
#echo "yesterday : $( date -d @${yesterday} '+%Y-%m-%dT%H:%M:%S%Z' )"

# create a formatted date string (bsd/mac)
#
echo "bsd/mac formatted string"
echo "      day : $( date -r ${day} '+%Y%m%d' )"
echo "yesterday : $( date -r ${yesterday} '+%Y-%m-%dT%H:%M:%S%Z' )"

# create a formatted date string (win)
#
# echo "windows formatted string"
# echo "windows? really?"

echo " "
echo " "
</pre>
</div>
<p>Another example? Okay, let&#8217;s say I had a text file, foo, and I wanted to embed a date and get the checksum. Furthermore, I wanted the filename to include the timestamp corresponding to the embedded date. (bsd/mac version)</p>
<div class="terminal">
<pre>
#!/bin/bash

# scriptname: md5tagger
#
# generate a timestamp,
# generate output filename
# copy formatted date string to output file
# cat original file to output file
# copy the md5 sum to another output file
#
day=$( date +%s )
fname="$1-${day}"

echo $( date -r ${day} ) >${fname}
cat $1 >>${fname}
md5 ${fname} >${fname}.md5
</pre>
</div>
<p>Let&#8217;s try it! (bsd/mac version)</p>
<div class="terminal">
<pre>
$ printf "text to copy which\ncould be important\n" >foo
$ ./md5tagger foo

$ ll foo*
-rw-r--r--  1 kelly  kelly  38 Aug 16 13:50 foo
-rw-r--r--  1 kelly  kelly  67 Aug 16 13:51 foo-1250455870
-rw-r--r--  1 kelly  kelly  56 Aug 16 13:51 foo-1250455870.md5

$ cat foo
text to copy which
could be important

$ cat foo-1250107720
Sun Aug 16 13:51:10 PDT 2009
text to copy which
could be important

$ md5 foo-1250455870; cat foo-1250455870.md5
MD5 (foo-1250455870) = 5cf4d9f274f05b63dfde5f15659cdeb8
MD5 (foo-1250455870) = 5cf4d9f274f05b63dfde5f15659cdeb8
</pre>
</div>
<p>With linux, you substitute md5sum for md5. Of course.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2009/08/bash-date-tricks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>use curl for api documentation</title>
		<link>http://www.redleopard.com/2009/04/use-curl-for-api-documentation/</link>
		<comments>http://www.redleopard.com/2009/04/use-curl-for-api-documentation/#comments</comments>
		<pubDate>Thu, 02 Apr 2009 18:08:03 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=574</guid>
		<description><![CDATA[I&#8217;ve been working quite a bit with the rest plugin for Struts2. The really nice thing about this plugin is the way it cleans up Struts URLs. Makes them more rails-like. I chuckled when depressed programmer suggested that struts2 is &#8220;WebWork on drugs.&#8221; I hate struts2. I really do.
Anyway, I have stripped down an AccountController [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working quite a bit with the <a href="http://cwiki.apache.org/S2PLUGINS/rest-plugin.html">rest plugin</a> for <a href="http://struts.apache.org/">Struts2</a>. The really nice thing about this plugin is the way it cleans up Struts URLs. Makes them more rails-like. I chuckled when <a href="http://depressedprogrammer.wordpress.com/2007/04/04/struts-2-and-zero-configuration-for-actions-and-results/">depressed programmer</a> suggested that struts2 is &#8220;WebWork on drugs.&#8221; I hate struts2. I really do.</p>
<p>Anyway, I have stripped down an AccountController to show just the POST service. In reality, the create() method is wired to a middle tier service that authenticates username, password pairs then updates session attributes with  member id and other bits of persistent session data I need.</p>
<div class="terminal">
<pre>
// imports omitted

@Results({
  @Result(
    name = "success",
    type = ServletActionRedirectResult.class,
    value = "account")
})
public class AccountController extends ActionSupport
{
  private String username;
  private String password;
  // getters/setters omitted

  public AccountController() { }

  public HttpHeaders index()   { return notImplemented(); }
  public HttpHeaders show()    { return notImplemented(); }
  public HttpHeaders edit()    { return notImplemented(); }
  public HttpHeaders editNew() { return notImplemented(); }
  public HttpHeaders update()  { return notImplemented(); }
  public HttpHeaders destroy() { return notImplemented(); }
  public HttpHeaders create()
  {
    int status = (username.equals("alice")
           &#038;&#038; password.equals("restaurant"))
      ? HttpServletResponse.SC_ACCEPTED
      : HttpServletResponse.SC_UNAUTHORIZED;

    return new DefaultHttpHeaders().withStatus(status);
  }

  private DefaultHttpHeaders notImplemented()
  {
    return new DefaultHttpHeaders()
      .withStatus(HttpServletResponse.SC_NOT_IMPLEMENTED);
  }

}
</pre>
</div>
<p>Note that I only return HTTP headers; the body content will always be empty. </p>
<p>I have found curl invaluable for documenting the API. This is a simple case but consider a much more complicated system with dozens of URLs and each URL implements many of the HTTP methods (including PUT and DELETE).</p>
<p>Third party developers are the bane of the support engineer. First, few people read documentation. They skim the material and furiously code. When their software fails, they file a bug that the API is broken. Usually, the API isn&#8217;t broken; the developer simply did not understand the API.</p>
<p>I subscribe to the <a href="http://www.agilemanifesto.org/">agile manifesto</a> value of &#8220;working software over comprehensive documentation.&#8221; In my work, I have found that a few <a href="http://curl.haxx.se/">curl</a> examples clears up most of these issues. For example, to exercise the create() method in the AccountController, simply post a form.</p>
<div class="terminal">
<pre>
curl                                    \
  --request POST                        \
  --include                             \
  --url "http://ws.example.com/account" \
  --form "username=alice"               \
  --form "password=restaurant"          \
  --cookie-jar "cookies"                \
  --cookie "cookies"
</pre>
</div>
<p>I like to add the &#8220;&#8211;include&#8221; flag as it displays some extra header information. When I get a support call, I have the developer trot out the &#8220;documentation&#8221; curl examples and open a bash shell. This, of course, drives the Windows guys nuts&#8211;to which I reply, &#8220;buck up.&#8221; We work through the exercise of getting the http request working with the curl example. Then a miracle occurs. The developer now has a working example on their machine from which to re-examine their code.</p>
<p>A final note. The &#8220;&#8211;cookie-jar&#8221; and &#8220;&#8211;cookie&#8221; parameters will handle cookies between the web server and your curl commands. In otherwords, you can login to a website and these parameters will store your authenticated session id in a file. The file in this example is named &#8220;cookies&#8221; but it can be legal filename. You can then make subsequent calls to URLs, passing the cookies (and, therefore, the session id) back up to the server.</p>
<p>For example, to upload your avatar picture to your new social network, first login using the curl command above. This establishes an authenticated session. Then post your picture using the curl command below, making sure you pass the cookies back up.</p>
<div class="terminal">
<pre>
curl                                   \
  --request POST                       \
  --include                            \
  --url "http://ws.example.com/avatar" \
  --form "avatar=@somepix.jpg"         \
  --cookie-jar "cookies"               \
  --cookie "cookies"
</pre>
</div>
<p>Finally, if you need to add a description, publish the curl command as part of a bash script. For example,</p>
<div class="terminal">
<pre>
#!/bin/bash

# 1. you must login before you can upload the avatar
# 2. the web server will reject any avatar exceeding 2MB
# 3. do not forget the '@' symbol, a common mistake
# 4. do not forget to include --cookie and --cookie-jar

curl                                   \
  --request POST                       \
  --include                            \
  --url "http://ws.example.com/avatar" \
  --form "avatar=@somepix.jpg"         \
  --cookie-jar "cookies"               \
  --cookie "cookies"
</pre>
</div>
<p>Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2009/04/use-curl-for-api-documentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>bash progress monitor</title>
		<link>http://www.redleopard.com/2009/01/bash-progress-monitor/</link>
		<comments>http://www.redleopard.com/2009/01/bash-progress-monitor/#comments</comments>
		<pubDate>Sat, 10 Jan 2009 05:55:28 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=489</guid>
		<description><![CDATA[I have a remote machine that is used to store and process XML files. Recently, I had need to duplicate a directory of XML files (e.g., cp -r a b). It&#8217;s not really germane to the subject here, but this particular server has a whack configuration and I gotta rant before I continue.
The office server [...]]]></description>
			<content:encoded><![CDATA[<p>I have a remote machine that is used to store and process XML files. Recently, I had need to duplicate a directory of XML files (e.g., cp -r a b). It&#8217;s not really germane to the subject here, but this particular server has a whack configuration and I gotta rant before I continue.</p>
<p>The office server (<a href="http://en.wikipedia.org/wiki/Scrappy-Doo">scrappy</a>) has pretty good specs.</p>
<div class="terminal">
<pre>
[scrappy ~]$ cat /proc/meminfo

MemTotal:      3980800 kB

[scrappy ~]$ cat /proc/cpuinfo

processor   : 0
model name  : Intel(R) Core(TM)2 CPU   6600  @ 2.40GHz
cpu MHz     : 2394.000
cache size  : 4096 KB

processor   : 1
model name  : Intel(R) Core(TM)2 CPU   6600  @ 2.40GHz
cpu MHz     : 2394.000
cache size  : 4096 KB

[scrappy ~]$ cat /proc/scsi/scsi

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SONY   Model: DVD RW AW-Q170A    Rev: 1.72
  Type:   CD-ROM                           ANSI SCSI revision: 05

[scrappy ~]$ cat /proc/ide/hd?/model

ST3320620AS
</pre>
</div>
<p>Whoa! What&#8217;s my SATA drive doing attached to the IDE driver? When I compare to my home CentOS box (<a href="http://en.wikipedia.org/wiki/Marmaduke">marmaduke</a>), I see that its drives are connected differently. Yes, <code>marmaduke</code> has one HDD connected via the IDE driver (ST3320620A) but <em>that drive is a PATA drive</em>. The four SATA drives are connected via SATA drivers. (The SATA drives will be configured as a software RAID 10, stay tuned. There&#8217;s a <a href="http://www.xen.org/">xen</a> project in the making.)</p>
<div class="terminal">
<pre>
[marmaduke ~]$ cat /proc/scsi/scsi

Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3500630AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3300620AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3300620AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3300620AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05

[marmaduke ~]$ cat /proc/ide/hd?/model

PIONEER DVD-RW DVR-111D
ST3320620A
</pre>
</div>
<p><code>scrappy</code> was configured before arriving at the office by a friend of a friend who runs a PC shop. &#8220;But it was such a deal!&#8221; Yeah, right. Bunch of monkeys. How hard is it to configure the BIOS to use the SATA interface rather than the IDE interface?</p>
<p>Anyway, I don&#8217;t have time to rebuild <code>scrappy</code> right now so I live with the dismal disk performance. Here&#8217;s the problem at hand. I have numerous XML files—some largish and some smallish. I have several sets and each set has about 4000 files.</p>
<div class="terminal">
<pre>
[scrappy ~]$ ls src/*xml | wc -w

4323

[scrappy ~]$ ls -l src/*xml | sort -n -r -k5

-rw-r--r-- 1 kelly kelly 315804120 Dec 19 15:46 0001.xml
-rw-r--r-- 1 kelly kelly 275651475 Dec 19 17:34 0002.xml
-rw-r--r-- 1 kelly kelly 260250994 Dec 19 16:15 0003.xml
-rw-r--r-- 1 kelly kelly 222402294 Dec 19 16:25 0004.xml
-rw-r--r-- 1 kelly kelly 204642813 Dec 19 15:52 0005.xml
     .
     .
     .
-rw-r--r-- 1 kelly kelly      1467 Dec 19 19:15 4321.xml
-rw-r--r-- 1 kelly kelly      1467 Dec 19 16:01 4322.xml
-rw-r--r-- 1 kelly kelly      1098 Dec 19 19:19 4323.xml
</pre>
</div>
<p>I wanted to duplicate the set of files as I needed to run some prototype code that I didn&#8217;t trust to be non-destructive. Simple.</p>
<div class="terminal">
<pre>
[scrappy ~]$ cp -r src tgt
</pre>
</div>
<p>However, the disk performance is agonizing. So bad that I leave it while I work on another machine. But I want to know the progress and see it as it changes. With six to ten shells open, I want something that can be resized to use minimal screen real estate. I want a quick command line progress monitor.</p>
<p><code>bash</code> to the rescue. I didn&#8217;t want to create a script file so I just jack it right into the terminal&#8217;s command line. When you open the <code>while</code> loop, bash will continue on the next line until you close it with the <code>done</code> keyword.</p>
<div class="terminal">
<pre>
[scrappy ~]$ while 'true'; do
>   ts=`date`
>   src=`ls src/*xml 2>/dev/null | wc -w`
>   tgt=`ls tgt/*xml 2>/dev/null | wc -w`
>   echo -ne "  ${ts}  ${src}  ${tgt}        \r"
>   sleep 1
> done

  Fri Jan  9 15:20:17 PST 2009  4323  2304
</pre>
</div>
<p>Recall we&#8217;ve <a href="http://www.redleopard.com/2008/11/fuser-detects-ftp-completion/">previously covered</a> that <code>2&gt;/dev/null</code> hides the error message generated by <code>ls</code> if no file is found.</p>
<p>The components are stored in local variables as a matter of convenience and displayed using <a href="http://unixhelp.ed.ac.uk/CGI/man-cgi?echo">echo</a>.</p>
<p><code>echo</code> is passed two switches. The <code>-n</code> switch supresses the trailing newline so that the cursor remains on the same line as the displayed text. The <code>-e</code> switch causes backslashes in the text to be interpreted as the escape character. This is useful since I want to add a trailing carriage return character. This will <em>push</em> the cursor to the beginning of the line while remaining on the same line as the text.</p>
<p>After sleeping for one second, the script generates a new <code>echo</code> output which overwrites the old text. I suppose I could add a test to the script to break when <code>${src}</code> equals <code>${tgt}</code>.</p>
<p>I don&#8217;t know why disk I/O is so slow on <code>scrappy</code>. Perhaps the mode is set to use programmed I/O rather than DMA. Who knows? Who cares? Both <code>scrappy</code> and <code>marmaduke</code> have Intel ICH8 SATA controllers. <code>scrappy</code> has a faster processor with more cache. Yet, <code>marmaduke</code> smokes on disk throughput on either the SATA or IDE drives. Something is wonky.</p>
<p>I&#8217;d like to say that I can ignore the issue. I have way too much going on right now. But it bugs me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2009/01/bash-progress-monitor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>grep and UTF-8</title>
		<link>http://www.redleopard.com/2008/12/grep-and-utf-8/</link>
		<comments>http://www.redleopard.com/2008/12/grep-and-utf-8/#comments</comments>
		<pubDate>Wed, 24 Dec 2008 01:44:24 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=417</guid>
		<description><![CDATA[I needed to look up the various strings Apple uses to name the iTunes Library. First I tried to get name from the iTunes resource bundle


echo "this won't work..."
echo "so don't even try it"

cd /Applications/iTunes.app/Contents/Resources/English.lproj
cat Localizable.strings &#124; grep 'PrimaryPlaylistName'


But I quickly learned that grep doesn&#8217;t work on the strings file. Why? Because Apple string files [...]]]></description>
			<content:encoded><![CDATA[<p>I needed to look up the various strings Apple uses to name the iTunes Library. First I tried to get name from the iTunes resource bundle</p>
<div class="terminal">
<pre>
echo "this won't work..."
echo "so don't even try it"

cd /Applications/iTunes.app/Contents/Resources/English.lproj
cat Localizable.strings | grep 'PrimaryPlaylistName'
</pre>
</div>
<p>But I quickly learned that grep doesn&#8217;t work on the strings file. Why? Because <a href="http://developer.apple.com/DOCUMENTATION/MacOSX/Conceptual/BPInternational/Articles/StringsFiles.html#//apple_ref/doc/uid/20000005-SW15">Apple string files</a> are not UTF-8. They are UTF-16. Usually. But in this case they are. I wanted to iterate over the set of resource strings and extract just the string I wanted.</p>
<p>First I had to convert the file from UTF-16 to another format. Really, the only format that makes sense is UTF-8. After a bit of trial and error, I finally had my script just so.</p>
<div class="terminal">
<pre>
#!/bin/bash

cd /Applications/iTunes.app/Contents/Resources/

# file to look inside of
f='Localizable.strings'

# string to search for
s='PrimaryPlaylistName'

# look for directories of the form *.lproj
#
for d in `ls -1 | grep 'lproj'` ; do
  echo -n "${d}: "
  iconv -f UTF-16 -t UTF-8 ${d}/${f} | grep "${s}"
done
</pre>
</div>
<p>That&#8217;s it. That&#8217;s the script. Slap that puppy in a file (e.g., foo) and fire it off.</p>
<div class="terminal">
<pre>
$ ./foo
Dutch.lproj: "kMusicLibraryPrimaryPlaylistName" = "Bibliotheek";
English.lproj: "kMusicLibraryPrimaryPlaylistName" = "Library";
French.lproj: "kMusicLibraryPrimaryPlaylistName" = "Bibliothèque";
German.lproj: "kMusicLibraryPrimaryPlaylistName" = "Mediathek";
Italian.lproj: "kMusicLibraryPrimaryPlaylistName" = "Libreria";
Japanese.lproj: "kMusicLibraryPrimaryPlaylistName" = "ライブラリ";
Spanish.lproj: "kMusicLibraryPrimaryPlaylistName" = "Biblioteca";
da.lproj: "kMusicLibraryPrimaryPlaylistName" = "Bibliotek";
fi.lproj: "kMusicLibraryPrimaryPlaylistName" = "Kirjasto";
ko.lproj: "kMusicLibraryPrimaryPlaylistName" = "보관함";
no.lproj: "kMusicLibraryPrimaryPlaylistName" = "Bibliotek";
pl.lproj: "kMusicLibraryPrimaryPlaylistName" = "Biblioteka";
pt.lproj: "kMusicLibraryPrimaryPlaylistName" = "Biblioteca";
pt_PT.lproj: "kMusicLibraryPrimaryPlaylistName" = "Biblioteca";
ru.lproj: "kMusicLibraryPrimaryPlaylistName" = "Медиатека";
sv.lproj: "kMusicLibraryPrimaryPlaylistName" = "Bibliotek";
zh_CN.lproj: "kMusicLibraryPrimaryPlaylistName" = "资料库";
zh_TW.lproj: "kMusicLibraryPrimaryPlaylistName" = "資料庫";
</pre>
</div>
<p>If I were better at command line perl, I could make a nice formatted table. That is, if I were better at perl.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/grep-and-utf-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>centos l10n problem</title>
		<link>http://www.redleopard.com/2008/12/centos-l10n-problem/</link>
		<comments>http://www.redleopard.com/2008/12/centos-l10n-problem/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 02:46:03 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[utf8]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=353</guid>
		<description><![CDATA[Just about the time I believe the UTF-8 beast is in the cage, it escapes and runs amok. 
This AM, I started to deploy an update to the webapp on EC2. Seems that some of the static strings in the app contained UTF-8 encoded non-ascii characters. The java compiler barfed. &#8220;The heck?&#8221;, I thought. I [...]]]></description>
			<content:encoded><![CDATA[<p>Just about the time I believe the UTF-8 beast is in the cage, it escapes and runs amok. </p>
<p>This AM, I started to deploy an update to the webapp on <a href="http://aws.amazon.com/ec2/">EC2</a>. Seems that some of the static strings in the app contained UTF-8 encoded non-ascii characters. The java compiler barfed. &#8220;The heck?&#8221;, I thought. I just compiled the app on my MacBook. I checked the usual suspects (tomcat&#8217;s server.xml, JAVA_OPTS) but everything looked fine. However, when I looked at the code, it was indeed mangled.</p>
<p>Crap! Was this a bug in CVS? (Yes, we still use CVS). Wait. What if I cut and paste the correct code from my Mac to the Centos server version. No luck. Couldn&#8217;t be vi. Trusty old vi. Could it be that <a href="http://www.centos.org">Centos</a> is confused? Let&#8217;s look:</p>
<div class="terminal">
<pre>
$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
</pre>
</div>
<p>What the&#8230;?</p>
<p>I don&#8217;t know what I did but when I created my ec2 image, I must have omitted a step. None of the googled web-geniuses had solved this exact problem but it seems everyone flails about with LANG environment variable.</p>
<div class="terminal">
<pre>
export LANG=en_US.UTF-8
</pre>
</div>
<p>That did the trick!</p>
<div class="terminal">
<pre>
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
</pre>
</div>
<p>A fresh cvs checkout and I was back in business. I don&#8217;t feel I completely understand Centos localization configuration. At least I&#8217;m aware of it, now.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/centos-l10n-problem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>bash array crawler</title>
		<link>http://www.redleopard.com/2008/12/bash-array-crawler/</link>
		<comments>http://www.redleopard.com/2008/12/bash-array-crawler/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 02:18:06 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=343</guid>
		<description><![CDATA[I wanted to complement my bash directory crawler post with a bash array crawler example.
Sometimes, it&#8217;s easier to jack a list of identifying tokens into an array and process them rather than to build an end-to-end script with database access. For this contrived example, I grab a list of UUID from MySQL with a simple [...]]]></description>
			<content:encoded><![CDATA[<p>I wanted to complement my <a href="http://www.redleopard.com/2008/12/bash-directory-crawler/">bash directory crawler</a> post with a bash array crawler example.</p>
<p>Sometimes, it&#8217;s easier to jack a list of identifying tokens into an array and process them rather than to build an end-to-end script with database access. For this contrived example, I grab a list of <a href="http://www.famkruithof.net/uuid/uuidgen">UUID</a> from MySQL with a simple SQL statement.</p>
<div class="terminal">
<pre>
mysql> SELECT id, uuid FROM icons;
+-----+--------------------------------------+
| id  | uuid                                 |
+-----+--------------------------------------+
|   1 | fe0b16ed-3369-4dda-8e60-faffb966375d |
|   3 | 82bfcbc2-84a2-4ca7-914b-13172b94feb6 |
|   6 | ab5e7265-3698-4205-b081-e6aec528fee2 |
|  11 | 4b6ca26b-c6ed-494f-aeb4-9bf369e2d465 |
|  19 | e7cc807b-7f15-46fa-b1c5-85d1f1050155 |
+-----+--------------------------------------+
5 rows in set (0.00 sec)
</pre>
</div>
<p>Next, jack the tokens into an array and simply crawl over the tokens.</p>
<div class="terminal">
<pre>
#!/bin/bash

uuids=(
 fe0b16ed-3369-4dda-8e60-faffb966375d
 82bfcbc2-84a2-4ca7-914b-13172b94feb6
 ab5e7265-3698-4205-b081-e6aec528fee2
 4b6ca26b-c6ed-494f-aeb4-9bf369e2d465
 e7cc807b-7f15-46fa-b1c5-85d1f1050155
)

for uuid in ${uuids[@]} ; do

  # do something interesting here
  echo "http://icons.example.com/${uuid}.jpg"

  # curl
  #   --request GET
  #   --remote-name
  #   --url "http://icons.example.com/${uuid}.jpg"

done
</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/bash-array-crawler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EC2 and S3 Success Story</title>
		<link>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/</link>
		<comments>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 02:09:36 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=150</guid>
		<description><![CDATA[I&#8217;ve been building systems lately on Amazon&#8217;s Elastic Compute Cloud (EC2). At first, I was only interested in Amazon&#8217;s Simple Storage Solution (S3) after seeing the SmugMug slide show.
I hadn&#8217;t really considered using EC2 since we had more servers in colocation than I really needed. But I had a file storage problem. When you have [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been building systems lately on Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">Elastic Compute Cloud</a> (EC2). At first, I was only interested in Amazon&#8217;s <a href="http://aws.amazon.com/s3/">Simple Storage Solution</a> (S3) after seeing the SmugMug <a href="http://www.slideshare.net/techdude/scalability-set-amazons-servers-on-fire-not-yours/">slide show</a>.</p>
<p>I hadn&#8217;t really considered using EC2 since we had more servers in colocation than I really needed. But I had a file storage problem. When you have a thousand files, you stick them in a directory. When you have a million files, you cannot simply stick them in a single directory. You distribute them across multiple directories. What a PITA.</p>
<p>My first thought was to use <a href="http://www.danga.com/mogilefs/">MogileFS</a>. It handles the directory hashing for you and distributes redundant copies of files across multiple servers. I had extra servers. Sweet. But before I rushed off and started building my shiny new filesystem, I wanted to check out the competitors. That led me to SmugMug. And that led me to S3.</p>
<p>I work at a tiny <a href="http://www.sonicswap.com/index.do">startup</a>. I had a problem and very few developers to ask for help. Every hour I needed from was a significant impact on another project. And dammit, all the open projects were on fire. I needed to solve my file system problem and fast.</p>
<p>So up on S3 the files went. XML files. Beaucoup XML files.</p>
<p>It was painless. It was simple. It was cheap. The monthly S3 cost is a fraction of a server&#8217;s cost in colocation. Sweet!</p>
<p>Wait! If that&#8217;s so yummy, why not move XML processing up to EC2? Our XML processing load was increasing&#8230;increasingly increasing. I rewrote our XML processing app, built a custom amazon machine image (centos + apache + tomcat) and fired it up. Nice!</p>
<p>Building the machine instance was a pain but worth the effort. I learned a lot about centos that I didn&#8217;t previously know or really understand. However, I wish I had a real system administrator on staff. It would have hurt less.</p>
<p>One of the goals for the EC2-based XML processing was to shift from offline XML processing to a RESTful web service. That is, rather than queue the XML processing in a single process, I needed to finish the XML processing during the HTTP request. On demand processing. Done in seconds (not tens of minutes). And handle multiple concurrent processing requests.</p>
<p>Here is the EC2 <--> S3 connection. For each file received for processing, I write dozens to hundreds of files to S3 plus open scads of HTTP connections to other web servers. Running these in a single thread burned precious time. Even though we &#8220;write&#8221; to S3, the underlying mechanism is another HTTP request.</p>
<p>Simple. Build a thread pool for the HTTP requests and run multiple threads concurrently. That worked swimmingly but for one issue. It didn&#8217;t take long until I started seeing the &#8220;Too many open files&#8221; in the exception logs. </p>
<p>Normally, the limit on open files is quite adequate. But you bolt Apache&#8217;s <a href="http://hc.apache.org/">HttpClient</a> to the backend of your webapp and supercharge it with a healthy <a href="http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html">thread pool</a> and you <em>will</em> overwhelm the default settings. Centos will not &#8220;garbage collect&#8221; the spent files from completed HTTP requests fast enough.</p>
<p>The solution: Up the limits on open files. The default is 1024. Simply edit <code>/etc/security/limits.conf</code> and change the soft and hard values for <code>nofile</code>. I&#8217;m sure there is a maximum size but these values have been working for me. What&#8217;s appropriate for your system is dependent on your system. You will need to pick size values for yourself.</p>
<div class="terminal">
<pre>
#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4
*                soft    nofile          8192
*                hard    nofile          65536
</pre>
</div>
<p>What was the net result of moving XML processing and storage up to the Amazon Cloud? Retired 60% of the servers in colocation. Built a scalable infractructure. Reduced overall monthly hosting costs. Fewer moving parts.</p>
<p>Now, if only I had a system administrator&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>bash directory crawler</title>
		<link>http://www.redleopard.com/2008/12/bash-directory-crawler/</link>
		<comments>http://www.redleopard.com/2008/12/bash-directory-crawler/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 19:44:21 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.site/?p=114</guid>
		<description><![CDATA[Currently, popular filesystems (ext3, hfs+) have a practical limit on the number of files and directories you can store in a single directory. Certainly, most of the unix command line tools will not work once you exceed some magic threshold. In my experience, 10,000 files and or directories is the practical limit.
So what do you [...]]]></description>
			<content:encoded><![CDATA[<p>Currently, popular filesystems (ext3, hfs+) have a practical limit on the number of files and directories you can store in a single directory. Certainly, most of the unix command line tools will not work once you exceed some magic threshold. In my experience, 10,000 files and or directories is the practical limit.</p>
<p>So what do you do when you have 1,000,000 XML files to process? I had this very problem recently. Fortunately, the problem was simplified as each file belong to one of 27,000 categories.</p>
<p>I organized my hierarchy into three directory levels with all the xml files in the lowest level. I then use bash to traverse the directories.</p>
<div class="terminal">
<pre style="line-height: 100%;">
master/
  |
  +-- 0/
  |   |
  |   +-- 0/
  |   |   |
  |   |   +-- f494a6f9-fc57-4408-a637-d3b768d0cd99.xml
  |   |   |
  |   |   +-- 5be1a5ed-f159-41d1-bc2e-737b5d2bed8b.xml
  |   |   |
  |   |   +-- a4276d0f-a014-42c2-a5ec-dbf59dfee95a.xml
  |   ⋮
  |   +-- 9999/
  |
  +-- 1/
  |   |
  |   +-- 10000/
  |   ⋮
  |   +-- 19999/
  |
  +-- 2/
      |
      +-- 20000/
      ⋮
      +-- 26999/
</pre>
</div>
<p>In my problem space, I am guaranteed that each leaf directory has at least one and at most a few hundred xml files. The following script is in production use with the one exception that I&#8217;m doing more than simply counting words.</p>
<div class="terminal">
<pre>
#!/bin/bash

cd /home/alice/work/master
master_directory=`pwd`

for hashed_directory in $master_directory/* ; do
  for leaf_directory in $hashed_directory/* ; do
    for xml_metadata in $leaf_directory/*.xml ; do

      # do something interesting
      cat $xml_metadata | wc

    done
  done
done
</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/bash-directory-crawler/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fuser Detects FTP Completion</title>
		<link>http://www.redleopard.com/2008/11/fuser-detects-ftp-completion/</link>
		<comments>http://www.redleopard.com/2008/11/fuser-detects-ftp-completion/#comments</comments>
		<pubDate>Tue, 25 Nov 2008 19:28:57 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[bash]]></category>

		<guid isPermaLink="false">http://www.redleopard.site/?p=113</guid>
		<description><![CDATA[At work, we have legacy systems with problems which no one had taken the time to fix. One such legacy problem involved an FTP server. Client applications would FTP files up to the server for processing. That part worked fine. What didn&#8217;t work was knowing when the FTP was complete so we could start processing [...]]]></description>
			<content:encoded><![CDATA[<p>At work, we have legacy systems with problems which no one had taken the time to fix. One such legacy problem involved an FTP server. Client applications would FTP files up to the server for processing. That part worked fine. What didn&#8217;t work was knowing when the FTP was complete so we could start processing the data.</p>
<p>Recently, I decided to fix this problem.</p>
<p>Many people have written on the subject. One of the approaches advised, &#8220;watching the file and when the file size stops changing, you can use it.&#8221; I didn&#8217;t like that one. For <em>so</em> many reasons. <a href="http://www.usenet-forums.com/linux-general/83329-how-detect-locked-files.html">Another</a> recommended using <code>lsof</code>. Hummmmmmm. I didn&#8217;t get a warm fuzzy feeling with that one either.</p>
<p>A coworker suggested I try <code>fuser</code>. That did the trick as <code>fuser</code> allowed me to monitor an incoming file and determine when it was no longer being used by the ftp (or any other) process.</p>
<div class="terminal">
<pre>
#!/bin/bash
DELAY="10"

while true; do
  for FILENAME in `ls -A1 /var/ftp/incoming/*.gz 2&gt;/dev/null`; do
    if [[ `fuser $FILENAME | wc -c` -eq 0 ]]
      then

        # do something interesting with the data here
        mv  $FILENAME ./backup/.

    fi
  done
  sleep $DELAY
done
</pre>
</div>
<p>All our incoming FTP files arrive in a single directory. The above script loops through the list of files once every 10 seconds. Normally, <code>ls</code> complains when you ask for a file listing and there are no files to list. The <code>2&gt;/dev/null</code> code fragment will send the complaint quietly to the bit bucket.</p>
<p>For each file found, <code>fuser</code> lists all processes that are using the file. I simply count the number of characters, <code>wc -c</code>, in the response from <code>fuser</code>. If the file is not being used by any processes, <code>fuser</code> returns nothing and the character count is zero. At that point, I can safely process the file.</p>
<p>update 2008-12-25: I should have added that this server runs Centos 5.1. I did the development on OS X (10.5.5) and that fuser behaves a little differently on the two platforms.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/11/fuser-detects-ftp-completion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
