Hoping that someone is googling for the right terms here:
Anyone out there know how to calculate a confidence interval around an estimate of the Jaccard similarity coefficient?
For Pearson correlation you can use Fisher's Z-prime Transformation, but I can't quite figure a principled way of doing the same for Jaccard similarity.
Thursday, May 29, 2008
Friday, May 16, 2008
FB Engineering blog post on Facebook Chat
Eugene at Facebook posted an interesting article about the technology behind the new Facebook Chat. This new service has large parts written in Erlang and communicates with the rest of the system using the Thrift bindings Amie Street and Facebook have been collaborating on for the last couple of months.
The good news for us: our thrift bindings are pretty much guaranteed to be stable and leak/bug free now that they're used for millions of messages/second over at FB.
If you're interested, check out over at the thrift git repository
The good news for us: our thrift bindings are pretty much guaranteed to be stable and leak/bug free now that they're used for millions of messages/second over at FB.
If you're interested, check out over at the thrift git repository
Tuesday, May 13, 2008
Forcing a process to garbage collect in Erlang
We upgraded our dynamic pricing service tonight with a new version of thrift, so I was checking
First step of diagnostics was to fire up etop with
This is equivalent to running the
This printed out something useful - the
After much googling, I came across this article which is my only plausible explanation for how
The solution?
top
to make sure everything was cool a few hours later. I noticed that one of the pricers was using 1.1G of RAM - significantly more than I'd ever seen it using before. Figuring it was a memory leak, I started a console node and connected it to the erlang cluster:
amiest@app2:~$ erl -name console
Erlang (BEAM) emulator version 5.5.2 [source] [64-bit] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.5.2 (abort with ^G)
(console@app2.prod.amiestreet.com)1> P = pricer@app2.prod.amiestreet.com.
'pricer@app2.prod.amiestreet.com'
(console@app2.prod.amiestreet.com)2> net_adm:ping(P).
pong
First step of diagnostics was to fire up etop with
etop:start([{node, P}]).
This showed that the process count was remaining stable at an appropriate number -- we'd had a bug with a process leak once before but it didn't seem to be the cause this time. The amount of RAM used by processes seemed pretty high though. Next step:
(console@app2.prod.amiestreet.com)4> rpc:call(P, shell_default, i, []).
This is equivalent to running the
i()
command on the remote node, and shows all the running processes along with some info.This printed out something useful - the
rex
process was using almost 900MB of RAM for no apparent reason. I'd never heard of rex
, but evidently it handles remote execution from other languages, and possibly RPC as well. Checking on some other erlang nodes I saw that rex
usually only used a few hundred KB.After much googling, I came across this article which is my only plausible explanation for how
rex
got so big -- the Erlang GC doesn't run on a process if the process isn't doing any work.The solution?
rpc:call(P, erlang, garbage_collect, [pid(5038,10,0)]).
(5038.10.0 was the pid shown by i()). This kicked the memory usage back down where it should be.
Friday, May 02, 2008
io_lib_pretty - a nice secret module
There are some modules in the erlang stdlib that aren't exactly advertised, but are quite useful. My newest discovery is
Take for example a logging program that deals with records that look like this:
If you just try to print it out, you get:
Pretty useless output.
Using
Just like the shell. I listed the record information manually in the function above, but you can easily use the
Next time: how to load record definitions dynamically at runtime.
io_lib_pretty
. It hasn't got a manpage, but there are some docs if you less `locate io_lib_pretty.erl`
.io_lib_pretty
is the module used by the shell to print records in a nicely formatted way. This isn't possible using plain io:format
but can make program output a lot nicer.Take for example a logging program that deals with records that look like this:
5> L = #logMessage{actor=23507, server_ip = <<123,234,123,234>>}.
#logMessage{actor = 23507,
server_ip = <<"{\352{\352">>,
timestamp = undefined,
level = undefined,
log_filename = undefined,
message = undefined}
If you just try to print it out, you get:
7> io:format("Logged: ~p", [L]).
Logged: {logMessage,23507,<<"{\352{\352">>,undefined,undefined,undefined,undefined}ok
Pretty useless output.
Using
io_lib_pretty
you can get:
9> io:format(io_lib_pretty:print(L, fun(logMessage, 6) -> [actor, server_ip, timestamp, level, log_filename, message] end)).
#logMessage{actor = 23507,
server_ip = <<"{\352{\352">>,
timestamp = undefined,
level = undefined,
log_filename = undefined,
message = undefined}ok
Just like the shell. I listed the record information manually in the function above, but you can easily use the
record_info
macro to accomplish the same without code duplication. Or even easier, use the exprecs parse transform (pretty printing example available there).Next time: how to load record definitions dynamically at runtime.
Subscribe to:
Posts (Atom)