bash progress monitor

I have a remote machine that is used to store and process XML files. Recently, I had need to duplicate a directory of XML files (e.g., cp -r a b). It’s not really germane to the subject here, but this particular server has a whack configuration and I gotta rant before I continue.

The office server (scrappy) has pretty good specs.

[scrappy ~]$ cat /proc/meminfo

MemTotal:      3980800 kB

[scrappy ~]$ cat /proc/cpuinfo

processor   : 0
model name  : Intel(R) Core(TM)2 CPU   6600  @ 2.40GHz
cpu MHz     : 2394.000
cache size  : 4096 KB

processor   : 1
model name  : Intel(R) Core(TM)2 CPU   6600  @ 2.40GHz
cpu MHz     : 2394.000
cache size  : 4096 KB

[scrappy ~]$ cat /proc/scsi/scsi

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SONY   Model: DVD RW AW-Q170A    Rev: 1.72
  Type:   CD-ROM                           ANSI SCSI revision: 05

[scrappy ~]$ cat /proc/ide/hd?/model

ST3320620AS

Whoa! What’s my SATA drive doing attached to the IDE driver? When I compare to my home CentOS box (marmaduke), I see that its drives are connected differently. Yes, marmaduke has one HDD connected via the IDE driver (ST3320620A) but that drive is a PATA drive. The four SATA drives are connected via SATA drivers. (The SATA drives will be configured as a software RAID 10, stay tuned. There’s a xen project in the making.)

[marmaduke ~]$ cat /proc/scsi/scsi

Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3500630AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3300620AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3300620AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3300620AS      Rev: 3.AA
  Type:   Direct-Access                    ANSI SCSI revision: 05

[marmaduke ~]$ cat /proc/ide/hd?/model

PIONEER DVD-RW DVR-111D
ST3320620A

scrappy was configured before arriving at the office by a friend of a friend who runs a PC shop. “But it was such a deal!” Yeah, right. Bunch of monkeys. How hard is it to configure the BIOS to use the SATA interface rather than the IDE interface?

Anyway, I don’t have time to rebuild scrappy right now so I live with the dismal disk performance. Here’s the problem at hand. I have numerous XML files—some largish and some smallish. I have several sets and each set has about 4000 files.

[scrappy ~]$ ls src/*xml | wc -w

4323

[scrappy ~]$ ls -l src/*xml | sort -n -r -k5

-rw-r--r-- 1 kelly kelly 315804120 Dec 19 15:46 0001.xml
-rw-r--r-- 1 kelly kelly 275651475 Dec 19 17:34 0002.xml
-rw-r--r-- 1 kelly kelly 260250994 Dec 19 16:15 0003.xml
-rw-r--r-- 1 kelly kelly 222402294 Dec 19 16:25 0004.xml
-rw-r--r-- 1 kelly kelly 204642813 Dec 19 15:52 0005.xml
     .
     .
     .
-rw-r--r-- 1 kelly kelly      1467 Dec 19 19:15 4321.xml
-rw-r--r-- 1 kelly kelly      1467 Dec 19 16:01 4322.xml
-rw-r--r-- 1 kelly kelly      1098 Dec 19 19:19 4323.xml

I wanted to duplicate the set of files as I needed to run some prototype code that I didn’t trust to be non-destructive. Simple.

[scrappy ~]$ cp -r src tgt

However, the disk performance is agonizing. So bad that I leave it while I work on another machine. But I want to know the progress and see it as it changes. With six to ten shells open, I want something that can be resized to use minimal screen real estate. I want a quick command line progress monitor.

bash to the rescue. I didn’t want to create a script file so I just jack it right into the terminal’s command line. When you open the while loop, bash will continue on the next line until you close it with the done keyword.

[scrappy ~]$ while 'true'; do
>   ts=`date`
>   src=`ls src/*xml 2>/dev/null | wc -w`
>   tgt=`ls tgt/*xml 2>/dev/null | wc -w`
>   echo -ne "  ${ts}  ${src}  ${tgt}        \r"
>   sleep 1
> done

  Fri Jan  9 15:20:17 PST 2009  4323  2304

Recall we’ve previously covered that 2>/dev/null hides the error message generated by ls if no file is found.

The components are stored in local variables as a matter of convenience and displayed using echo.

echo is passed two switches. The -n switch supresses the trailing newline so that the cursor remains on the same line as the displayed text. The -e switch causes backslashes in the text to be interpreted as the escape character. This is useful since I want to add a trailing carriage return character. This will push the cursor to the beginning of the line while remaining on the same line as the text.

After sleeping for one second, the script generates a new echo output which overwrites the old text. I suppose I could add a test to the script to break when ${src} equals ${tgt}.

I don’t know why disk I/O is so slow on scrappy. Perhaps the mode is set to use programmed I/O rather than DMA. Who knows? Who cares? Both scrappy and marmaduke have Intel ICH8 SATA controllers. scrappy has a faster processor with more cache. Yet, marmaduke smokes on disk throughput on either the SATA or IDE drives. Something is wonky.

I’d like to say that I can ignore the issue. I have way too much going on right now. But it bugs me.

Your email will never published nor shared. Required fields are marked *...

*

*

Type your comment out: