Tuesday, May 13, 2008

Forcing a process to garbage collect in Erlang

We upgraded our dynamic pricing service tonight with a new version of thrift, so I was checking top to make sure everything was cool a few hours later. I noticed that one of the pricers was using 1.1G of RAM - significantly more than I'd ever seen it using before. Figuring it was a memory leak, I started a console node and connected it to the erlang cluster:

amiest@app2:~$ erl -name console
Erlang (BEAM) emulator version 5.5.2 [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.5.2 (abort with ^G)
(console@app2.prod.amiestreet.com)1> P = pricer@app2.prod.amiestreet.com.
'pricer@app2.prod.amiestreet.com'
(console@app2.prod.amiestreet.com)2> net_adm:ping(P).
pong

First step of diagnostics was to fire up etop with etop:start([{node, P}]). This showed that the process count was remaining stable at an appropriate number -- we'd had a bug with a process leak once before but it didn't seem to be the cause this time. The amount of RAM used by processes seemed pretty high though. Next step:

(console@app2.prod.amiestreet.com)4> rpc:call(P, shell_default, i, []).

This is equivalent to running the i() command on the remote node, and shows all the running processes along with some info.

This printed out something useful - the rex process was using almost 900MB of RAM for no apparent reason. I'd never heard of rex, but evidently it handles remote execution from other languages, and possibly RPC as well. Checking on some other erlang nodes I saw that rex usually only used a few hundred KB.

After much googling, I came across this article which is my only plausible explanation for how rex got so big -- the Erlang GC doesn't run on a process if the process isn't doing any work.

The solution? rpc:call(P, erlang, garbage_collect, [pid(5038,10,0)]). (5038.10.0 was the pid shown by i()). This kicked the memory usage back down where it should be.

1 comment:

Pichi said...

You can easily diagnose or manage remote node using Ctrl+G which opens "User switch command" console and than with '?' you can read briefly help. To start shell on remote node you can simply write r 'pricer@app2.prod.amiestreet.com' and then connect to it using c 2 command. After this you can work on remote node like you have shell in it.