<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Red Leopard &#187; s3</title>
	<atom:link href="http://www.redleopard.com/tag/s3/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.redleopard.com</link>
	<description>A Stranger in a Strange Land</description>
	<lastBuildDate>Mon, 07 Jun 2010 22:59:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>EC2 and S3 Success Story</title>
		<link>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/</link>
		<comments>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 02:09:36 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://www.redleopard.com/?p=150</guid>
		<description><![CDATA[I&#8217;ve been building systems lately on Amazon&#8217;s Elastic Compute Cloud (EC2). At first, I was only interested in Amazon&#8217;s Simple Storage Solution (S3) after seeing the SmugMug slide show.
I hadn&#8217;t really considered using EC2 since we had more servers in colocation than I really needed. But I had a file storage problem. When you have [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been building systems lately on Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">Elastic Compute Cloud</a> (EC2). At first, I was only interested in Amazon&#8217;s <a href="http://aws.amazon.com/s3/">Simple Storage Solution</a> (S3) after seeing the SmugMug <a href="http://www.slideshare.net/techdude/scalability-set-amazons-servers-on-fire-not-yours/">slide show</a>.</p>
<p>I hadn&#8217;t really considered using EC2 since we had more servers in colocation than I really needed. But I had a file storage problem. When you have a thousand files, you stick them in a directory. When you have a million files, you cannot simply stick them in a single directory. You distribute them across multiple directories. What a PITA.</p>
<p>My first thought was to use <a href="http://www.danga.com/mogilefs/">MogileFS</a>. It handles the directory hashing for you and distributes redundant copies of files across multiple servers. I had extra servers. Sweet. But before I rushed off and started building my shiny new filesystem, I wanted to check out the competitors. That led me to SmugMug. And that led me to S3.</p>
<p>I work at a tiny <a href="http://www.sonicswap.com/index.do">startup</a>. I had a problem and very few developers to ask for help. Every hour I needed from was a significant impact on another project. And dammit, all the open projects were on fire. I needed to solve my file system problem and fast.</p>
<p>So up on S3 the files went. XML files. Beaucoup XML files.</p>
<p>It was painless. It was simple. It was cheap. The monthly S3 cost is a fraction of a server&#8217;s cost in colocation. Sweet!</p>
<p>Wait! If that&#8217;s so yummy, why not move XML processing up to EC2? Our XML processing load was increasing&#8230;increasingly increasing. I rewrote our XML processing app, built a custom amazon machine image (centos + apache + tomcat) and fired it up. Nice!</p>
<p>Building the machine instance was a pain but worth the effort. I learned a lot about centos that I didn&#8217;t previously know or really understand. However, I wish I had a real system administrator on staff. It would have hurt less.</p>
<p>One of the goals for the EC2-based XML processing was to shift from offline XML processing to a RESTful web service. That is, rather than queue the XML processing in a single process, I needed to finish the XML processing during the HTTP request. On demand processing. Done in seconds (not tens of minutes). And handle multiple concurrent processing requests.</p>
<p>Here is the EC2 <--> S3 connection. For each file received for processing, I write dozens to hundreds of files to S3 plus open scads of HTTP connections to other web servers. Running these in a single thread burned precious time. Even though we &#8220;write&#8221; to S3, the underlying mechanism is another HTTP request.</p>
<p>Simple. Build a thread pool for the HTTP requests and run multiple threads concurrently. That worked swimmingly but for one issue. It didn&#8217;t take long until I started seeing the &#8220;Too many open files&#8221; in the exception logs. </p>
<p>Normally, the limit on open files is quite adequate. But you bolt Apache&#8217;s <a href="http://hc.apache.org/">HttpClient</a> to the backend of your webapp and supercharge it with a healthy <a href="http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html">thread pool</a> and you <em>will</em> overwhelm the default settings. Centos will not &#8220;garbage collect&#8221; the spent files from completed HTTP requests fast enough.</p>
<p>The solution: Up the limits on open files. The default is 1024. Simply edit <code>/etc/security/limits.conf</code> and change the soft and hard values for <code>nofile</code>. I&#8217;m sure there is a maximum size but these values have been working for me. What&#8217;s appropriate for your system is dependent on your system. You will need to pick size values for yourself.</p>
<div class="terminal">
<pre>
#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4
*                soft    nofile          8192
*                hard    nofile          65536
</pre>
</div>
<p>What was the net result of moving XML processing and storage up to the Amazon Cloud? Retired 60% of the servers in colocation. Built a scalable infractructure. Reduced overall monthly hosting costs. Fewer moving parts.</p>
<p>Now, if only I had a system administrator&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/12/ec2-and-s3-success-story/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flexible Web Services</title>
		<link>http://www.redleopard.com/2008/09/flexible-web-services/</link>
		<comments>http://www.redleopard.com/2008/09/flexible-web-services/#comments</comments>
		<pubDate>Mon, 01 Sep 2008 22:00:15 +0000</pubDate>
		<dc:creator>kelly</dc:creator>
				<category><![CDATA[KellyBlog]]></category>
		<category><![CDATA[flex]]></category>
		<category><![CDATA[s3]]></category>

		<guid isPermaLink="false">http://www.redleopard.site/?p=107</guid>
		<description><![CDATA[Things don&#8217;t change. You change your way of looking, that&#8217;s all.
&#8211; Carlos Castañeda
Early this spring, I made some big architectural changes in the company&#8217;s website. The two most far-reaching changes involved Amazon&#8217;s Web Services and Adobe&#8217;s Flex product. Sometimes you regret big changes. I only regret having not made the changes earlier.
I admit that I [...]]]></description>
			<content:encoded><![CDATA[<p>Things don&#8217;t change. You change your way of looking, that&#8217;s all.<br />
&#8211; Carlos Castañeda</p>
<p>Early this spring, I made some big architectural changes in the company&#8217;s <a href="http://www.sonicswap.com">website</a>. The two most far-reaching changes involved Amazon&#8217;s Web Services and Adobe&#8217;s Flex product. Sometimes you regret big changes. I only regret having not made the changes earlier.</p>
<p>I admit that I wasn&#8217;t always a flex fan. Indeed, I dismissed the flex out of hand in the early days mainly as a response to Macromedia&#8217;s steep pricing model. Ouch. Since then, Adobe had bought Macromedia and the pricing model changed several times. I never noticed. Such is the cost of writing something off.</p>
<p><img width="150" height="188" style="float: left; margin: 0 0.5em 0.5ex 0; border: 1px solid black;" alt="Flexible Rails book cover" src="http://www.redleopard.com/images/flexiblerails-cover.jpg" /></p>
<p>Then I saw Peter Armstrong&#8217;s <a href="http://www.sdforum.org/_data/global/images/SDF_Images/events/ruby/Armstrong%20SDForum_Presentation">presentation</a> at the <a href="http://www.sdforum.org/index.cfm?fuseaction=Page.ViewPage&#038;PageID=853">Third Annual Silicon Valley Ruby Conference</a> (April 18-19, 2008). Turns out, flex is now free. (uhhhmmm, ok. you will want to buy the developer ide but it&#8217;s possible to use the free command line compiler.)</p>
<p>I bought Armstrong&#8217;s <a href="http://manning.com/armstrong/">book</a> and started working through the tutorials. By the time I finished Chapter 6, I knew that I was going to change our webapp from a javascript-driven front-end to an flex-based, actionscript-driven front-end. We deal with a <em>lot</em> of tabulated data. Over a million sets and growing. Some of the datasets are small and some are quite large. Furthermore, each row had numerous event listeners attached. We had plans to add more, add mashups, add slicky bells and wistles. The larger tables were dozens of megabytes large. Which isn&#8217;t a problem in and of itself except that&#8230;</p>
<p>Javascript sucks.</p>
<p>Well, not really. It&#8217;s the browser&#8217;s that suck. All of them. We had to back off functionality just to get IE to render. Firefox&#8217;s javascript+dom engine (including ff3) would render the tables (if it didn&#8217;t crash) but not all the event listeners worked. Safari (web-kit) would render and all events worked but then Apple released the 5-second script warning. We schemed endlessly on how to break the problem up. Mostly we hoped one of the miracle javascript libraries would solve these problems. We knew it was unlikely but we hoped.</p>
<p>I made some side-by-side comparisons on my MacBookPro and actionscript beat javascript hands down. Orders of magnitude faster to load and render. Less code. Consistency between browsers. And finally, actionscript/flex worked. And worked well.</p>
<p>What didn&#8217;t work well was our MySQL database. Our million datasets occupied hundreds of millions of database rows in scads of tables and hundreds of gigabytes of disk space. A 3ware RAID10 disk subsystem with master/slave configuration was in complete IO overload <em>all the time</em>. I had Matt from <a href="http://www.mysql.com/consulting/">mysql</a> come in to help us unravel our mess. This guy was good. He identified, explained, documented and proposed a prioritized list of solutions.</p>
<p>But I started thinking, why the hell do we even store the datasets in MySQL in the first place? The only time we ever change a dataset is when we replace it. The relational database solution had always been a problem and was getting worse by the day. Thinking like a mathematician, I decided to solve a different problem. Why not just save the datasets in flatfiles and overwrite them when they change? Just store them in flatfiles. Like pictures.</p>
<p>Of course, every solution brings its own problems. And storing a million flatfiles has its problem: you cannot store a million files in a single directory (not on Centos+ext3 you cannot). I recalled reading about mogileFS some time back. It presented what looked like a flat directory in which you could store millions of files in a single directory, distributed the save over multiple machines with redundancy, and so on. Looking at that solution lead me to smugFS. And to <a href="http://blogs.smugmug.com/don/2007/03/30/etech-2007-smugmug-amazon-slides-are-up/">Don MacAskill&#8217;s</a> presentation on <a href="http://blogs.smugmug.com/don/files/ETech-SmugMug-Amazon-2007.pdf">Scalability</a>. Crap! I said. I don&#8217;t need to host the flatfile&#8217;s at our colocation center. I&#8217;ll put them on <a href="http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2/103-0195969-2128613?ie=UTF8&#038;node=16427261&#038;no=3435361&#038;me=A36L942TSJ2AJA">Amazon S3</a>. And While I&#8217;m at it, I&#8217;ll build a RESTful API on <a href="http://www.amazon.com/EC2-AWS-Service-Pricing/b/ref=sc_fe_l_2/103-0195969-2128613?ie=UTF8&#038;node=201590011&#038;no=3435361&#038;me=A36L942TSJ2AJA">Amazon EC2</a> to import the datasets.</p>
<p>And my team has done it. We have completed the move to EC2/S3 and Flex. I couldn&#8217;t be happier. Armstrong&#8217;s book, <em>Flexible Rails</em>, lit the spark for our migration to <em>Flexible Web Services</em>.</p>
<p>I will write about some of the hurdles we&#8217;ve overcome in future posts. Some of the things we&#8217;ve solved are (i) building an EC2 image from scratch, (ii) compressing flat files in a format which actionscripts&#8217; ByteArray can uncompress, (iii) using the struts2 REST plugin, (iv) XML parsing using StAX.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.redleopard.com/2008/09/flexible-web-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
