Thoughts and tutorials on programming

Tuesday, June 23, 2009

rdocs for all gems from rubyforge

I am pleased to announce the release of my site that has the rdocs for [almost] all existing rubygems.

Why?

Because it's convenient to have all rdocs installed in a single known place to be able to browse/search them. Because it's a central repository, too, it eliminates the need to install local rdocs for gems, which (when you turn it off) means you install local gems *much* more quickly. It makes me happy every time I do a "gem install" :)
It also eliminates the need for running a local gem server.

These gems' rdocs are all in the hanna theme, which provides for method search and an easy on the eyes layout. Though darkfish is also quite pretty, it isn't as easy to read because of font contrast.

Check it out!

http://allgems.faithpromotingstories.org/gems

core docs: http://coredocs.faithpromotingstories.org/

Feedback welcome.

Note also that they're using a temporarily subdomain url. If anybody is interested and could help me with a subdomain of a more ruby related url that would be cool. I hate to fork over that $10 a year for another domain you know me :)

Enjoy.
=r

Thursday, June 18, 2009

How to save MUCH RAM when running rails (linode/slicehost) and mod_rails passenger

If you're using mod_rails on a VPS with little RAM [ex: linode, slicehost], then there are a few things you'll want to do to save RAM when running rails. Here are a few things I did to allow me to run multiple rails processes on one linode slice.

1: Install a 32-bit OS

Ruby uses twice as much RAM if you're in 64-bit than in 32-bit [and most other things do, too]. Use 32-bit! [linode has options to do this easily].

Savings: 50%

2: Don't use as many processes per rails app.
If you have low volume sites (and most of us do), then don't create too many processes per app.

PassengerMaxInstancesPerApp 1

3: If you are only using one process per rails app, then turn off the spawner process--it does you no help.

By default mod_rails spins up one "spawner process" and then "x" "actual working processes" The spawner thread just preloads rails so that it can be shared amongst instances of that app.

In my case, from top (~1 app process):
total mem, RSS, ... name

117m 49m 3156 S 0 13.8 0:00.93 ruby # spawner 49M
171m 89m 2240 S 0 24.7 0:00.61 ruby # instance 89m

fix:
use
RailsSpawnMethod conservative
in your apache2.conf

this results in [slower startup times and]
RAM, RSS...process name
143m 72m 3480 S 0 20.1 0:01.69 ruby
Savings: (assuming you only want one process per rails app): 49m (on 64-bit).

If you're worried about slow startup speed, you can set your rails processes to never expire[4].

4: If you use multiple processes per rails app, then set your spawner process to die quickly.

Though I haven't used it, theoretically the spawner process will be killed eventually--so set it to die quickly, to free up that expensive RAM[5].

You could do something like
ab -n10 -c10 http://yourhost
to ensure it fires up all the processes from the spawner, then the spawner is free to die quickly.

If you don't use REE, then don't use the spawner at all--little savings there RAM wise.

I have a suspicion that even with REE, you don't see much RAM savings, but having not used it, I can't say for certain.

4: use the MBARI patches to MRI.

Originally, one of my rails apps started at 89MB RSS then grew in total RAM usage by ~ 8MB per request, linearly. Odd? Yes. Fix: I recompiled ruby using the MBARI patches to 1.8.6/1.8.7. It now starts at 60MB RSS and stays there solid [1]. That's right--it stays solid at LESS than an unpatched ruby starts at. And with higher speeds [6].

Also avoid the ruby that comes bundled with ubuntu--though it uses the same amount of RAM as normal, it is compiled with pthreads enabled so it is slower.

Using 1.9 might also yield RAM savings like this. Haven't tried it though.

Savings: unknown since it appeared to growing forever (a lot though).

Overall Result:

With these suggestions in place, I can now run 3 or 4 rails apps on my "cheap grade" 360MB linode. Much better than the 1 I was able to originally.

Enjoy!

Other potential tricks:

use nginx instead of apache--faster, much better RAM usage. Potential savings:

Tweak mysql to use less memory [3]

Possibly tweak GC settings (37signals', evan weaver [2], etc.) though this appears to not increase speed too much over the MBARI patches [6].

refs:

http://groups.google.com/group/phusion-passenger/browse_thread/thread/df1fc1073dbef38
[1] http://www.ruby-forum.com/topic/170608#new
[2] http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/
[3] http://articles.slicehost.com/2007/9/11/ubuntu-feisty-mysql-and-ror
[4] http://www.modrails.com/documentation/Users%20guide%20Apache.html#PassengerPoolIdleTime
[5] http://www.modrails.com/documentation/Users%20guide%20Apache.html#_railsframeworkspawneridletime_lt_integer_gt
[6] http://www.nabble.com/-ruby-core:19846---Bug--744--memory-leak-in-callcc--to20447794.html#a21140287

Thursday, June 04, 2009

state of the art in ruby compilation/JIT

There are several levels that one can take compilation of Ruby code to.
Ex: yarv compiles ruby code to yarv internal byte code. But there are other levels, which we hope to exploit in order to make a faster ruby. We'll discuss some different style of compilation (JIT and otherwise).


There are a few existent libraries that do translation.

ruby2c: translates code like

def fact(n)
n*(fact(n-1))
end
fact(1000) # it needs this as a hint to know what type the fact method takes

to standard C (at least in all examples I've seen)

int fact(int n) {
return(n*fact(n-1));
}


So their ansiC aspect is the most hard-core "I don't want this to lany of this in Ruby at all after it's done."

pros: fast as can be. cons: see below.

rubyinline, interestingly, wraps something like

"int fact(int n) {
return(n*fact(n-1));
}"

with converters to call and return to and from Ruby, so you can call fact(1000) in your ruby code and it will work.

One interesting idea would be to pipe the output of ruby2c to rubyinline. Alas, ruby2c seems somewhat broken currently.

cons: you have to write in C. That is not what most of us want to ever have to do again.


I think where this perspectives fall apart is that they're not as flexible/dynamic as normal Ruby. It might have trouble with:

a = [/abc/, 'abc', 33]
b = 33
a.each{|thing| puts thing.inspect; b+= 1} # dynamic arrays probably not well handled, as well as ruby internal methods like 'inspect', blocks, etc.

or

a = [/abc/, /def/]
a.each{|reg| if "abc" =~ reg then puts 'yes'; end} # regex probably isn't supported, a lot of the rest of the stdlib


Then again, I haven't tried it since ruby2c doesn't even work for me.

pros: as fast as fast can be. also comes with a dependency walker so that it can "surmise" which classes of methods you'll be passing in, then it programs for just those.
cons: doesn't handle all ruby (AFAIK).
Maybe in its current state they're useful for doing mathematically intense operations? Dunno. Rubyinline is a little more useful, but requires writing in C.


Ruby2Cext takes a "ruby to ruby C" approach. Its original aim was to produce ruby compatible code equivalents in C.

i.e. the ruby:

def go
3
end

translates to

VALUE go_c(VALUE self) {
return INT2NUM(3); // uses all ruby types
}
void init_File {
rb_method_register(someClass, "go", go_c, 0);
}

It also translates blocks and everything to its ruby c equivalents (using ruby c syntax) . For 1.8 that was said to yield a 2X speed increase.

They also added some plugins, one gives you the ability to "freeze" certain C method calls, i.e.
if it encounters String#strip it always calls straight to the C function for String#strip [thus avoids doing a ruby method call]

ex:

def go
"".strip
end

Generates code that looks up String#strip and "saves off" the function's exact location [who would want to override String#strip, right?].

Thus the above ruby is converted roughtly to something like:

void *string_strip_function = Qundef;

VALUE go_c(VALUE self) {
VALUE str = rb_str_new("");
return strip(str);
}

VALUE strip(VALUE fromThis) {
switch(GET_TYPE(fromThis)) {
case STRING :
return *(string_strip_function)(fromThis);// this avoids rb_funcall to lookup then call strip
else:
return rb_funcall3(rb_intern("strip"), fromThis);
}
}


void init_File {
rb_method_register(someClass, "go", go_c, 0);
strip_function = lookup_internal_ruby_c_method(rbString, "strip");
}

So you can see this avoids a few rb_funcalls to built-in methods, and is still almost entirely ruby compatible. This results in "up to 5x speedup" or so it says.

drawbacks to rb2cext: 1.9 compiles ruby to bytecode--perhaps rb2xext won't have as much a gain if used with a 1.9 VM, since it already does some of this work, though profiling would help evaluate this.

All the above examples were static compilers. Run once before runtime [or at eval time].

Another class would be JIT dynamic compilers. Jruby is really the only one that has anything like that currently, and yet somehow it doesn't yet seem to be quite as fast as 1.9 without a JIT [1][4].
There exists a "demo" JIT using ruby2c, as well [2]. It optimizes a single method, but at least shows you how you could do something more dynamic.

So where can/should the future of ruby interpreters lie? There's lots of things you could try.
A few are making it tighter in C, or making it more JIT'y, or making it moree "dependency walking" so it can pre-optimize the paths that are guaranteed to only have a certain class passed in to them.

re: making it tighter in C

One drawback currently to ruby2cext is that if you define

class A
def this_method
calls_this_other_method
end
def calls_this_other_method
end
end

it will translate this (loosely) as

VALUE calls_this_other_method = rb_intern("calls_this_other_method"); // cache the symbol away
VALUE this_method_c(VALUE this) {
check_right_parameters();
return rb_funcall(calls_this_other_method);
end
VALUE calls_this_c(VALUE this) {
check_right_parameters();
return Qnil;
}


Note that it did not optimize the call between this_method and calls_this_other_method, but required ruby do rb_funcall. This is the perfectly valid ruby way--you can override calls_this_other_method later and it will work with the new method, but in most cases after a warm up phase methods aren't overridden, so the rb_funcall could be avoided by calling the method directly. So we could hard code the calls to the other known ruby (now C) methods.

If we were to assume that class method definitions were "frozen" after a specific setup phase, then a few more optimizations would be available, the above being one of them. [3]
In other words, rb2cext could become more C-y, with direct calls from method to method.

Interestingly, ruby2c also has a RubytoRubyC component (I never saw any examples of it posted, and couldn't get it to run) which might attempt something similar.

At the same time, ruby2c could become more ruby-y, i.e. integrating with the stdlib [i.e. if you know a string is being passed in, and the command is

def go a
a << '3'
end

then you could call rb_str_concat directly.

So there are two ways of action: one to make ruby to c translators more c-y, one to make them more ruby-y. Static compile time type analysis might make it so you can guarantee an object's class and optimize for it. That's an option.

Another option would be to create something more JIT'y. Profile to discover the "hot paths", then write C that optimizes for the common path (then you could make direct C calls outside your current class, too).

So what to do? Thoughts?
-=r

[1] http://blog.pluron.com/2009/05/ruby-19-performance.html
[2] http://github.com/seattlerb/zenhacks/tree/master
[3] For instance you could c-ify every ruby method in existence, with direct calls to any C methods within the same class (since we assume they all exist now). Ludicrous allows for this: http://betterlogic.com/roger/?p=1534 though I haven't experimented speed-wise.
[4] http://groups.google.com/group/ruby-benchmark-suite/browse_thread/thread/f56b4335cfd3ec57/c7babfb676d71450?lnk=gst&q=patch+gc#c7babfb676d71450 shows how 1.9 with a GC patch can be competitive to jruby.

Contributors

Followers