<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Red Leopard &#187; amazon</title>
	<atom:link href="http://www.redleopard.com/tag/amazon/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.redleopard.com</link>
	<description>A Stranger in a Strange Land</description>
	<lastBuildDate>Mon, 07 Jun 2010 22:59:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>EC2 and S3 Success Story</title>
		<link>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/</link>
		<comments>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 02:09:36 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=150</guid>
		<description><![CDATA[I&#8217;ve been building systems lately on Amazon&#8217;s Elastic Compute Cloud (EC2). At first, I was only interested in Amazon&#8217;s Simple Storage Solution (S3) after seeing the SmugMug slide show.
I hadn&#8217;t really considered using EC2 since we had more servers in colocation than I really needed. But I had a file storage problem. When you have [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been building systems lately on Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">Elastic Compute Cloud</a> (EC2). At first, I was only interested in Amazon&#8217;s <a href="http://aws.amazon.com/s3/">Simple Storage Solution</a> (S3) after seeing the SmugMug <a href="http://www.slideshare.net/techdude/scalability-set-amazons-servers-on-fire-not-yours/">slide show</a>.</p>
<p>I hadn&#8217;t really considered using EC2 since we had more servers in colocation than I really needed. But I had a file storage problem. When you have a thousand files, you stick them in a directory. When you have a million files, you cannot simply stick them in a single directory. You distribute them across multiple directories. What a PITA.</p>
<p>My first thought was to use <a href="http://www.danga.com/mogilefs/">MogileFS</a>. It handles the directory hashing for you and distributes redundant copies of files across multiple servers. I had extra servers. Sweet. But before I rushed off and started building my shiny new filesystem, I wanted to check out the competitors. That led me to SmugMug. And that led me to S3.</p>
<p>I work at a tiny <a href="http://www.sonicswap.com/index.do">startup</a>. I had a problem and very few developers to ask for help. Every hour I needed from was a significant impact on another project. And dammit, all the open projects were on fire. I needed to solve my file system problem and fast.</p>
<p>So up on S3 the files went. XML files. Beaucoup XML files.</p>
<p>It was painless. It was simple. It was cheap. The monthly S3 cost is a fraction of a server&#8217;s cost in colocation. Sweet!</p>
<p>Wait! If that&#8217;s so yummy, why not move XML processing up to EC2? Our XML processing load was increasing&#8230;increasingly increasing. I rewrote our XML processing app, built a custom amazon machine image (centos + apache + tomcat) and fired it up. Nice!</p>
<p>Building the machine instance was a pain but worth the effort. I learned a lot about centos that I didn&#8217;t previously know or really understand. However, I wish I had a real system administrator on staff. It would have hurt less.</p>
<p>One of the goals for the EC2-based XML processing was to shift from offline XML processing to a RESTful web service. That is, rather than queue the XML processing in a single process, I needed to finish the XML processing during the HTTP request. On demand processing. Done in seconds (not tens of minutes). And handle multiple concurrent processing requests.</p>
<p>Here is the EC2 <--> S3 connection. For each file received for processing, I write dozens to hundreds of files to S3 plus open scads of HTTP connections to other web servers. Running these in a single thread burned precious time. Even though we &#8220;write&#8221; to S3, the underlying mechanism is another HTTP request.</p>
<p>Simple. Build a thread pool for the HTTP requests and run multiple threads concurrently. That worked swimmingly but for one issue. It didn&#8217;t take long until I started seeing the &#8220;Too many open files&#8221; in the exception logs. </p>
<p>Normally, the limit on open files is quite adequate. But you bolt Apache&#8217;s <a href="http://hc.apache.org/">HttpClient</a> to the backend of your webapp and supercharge it with a healthy <a href="http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html">thread pool</a> and you <em>will</em> overwhelm the default settings. Centos will not &#8220;garbage collect&#8221; the spent files from completed HTTP requests fast enough.</p>
<p>The solution: Up the limits on open files. The default is 1024. Simply edit <code>/etc/security/limits.conf</code> and change the soft and hard values for <code>nofile</code>. I&#8217;m sure there is a maximum size but these values have been working for me. What&#8217;s appropriate for your system is dependent on your system. You will need to pick size values for yourself.</p>
<div class="terminal">
<pre>
#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4
*                soft    nofile          8192
*                hard    nofile          65536
</pre>
</div>
<p>What was the net result of moving XML processing and storage up to the Amazon Cloud? Retired 60% of the servers in colocation. Built a scalable infractructure. Reduced overall monthly hosting costs. Fewer moving parts.</p>
<p>Now, if only I had a system administrator&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
