Implementing a distributed semaphore.

08:49AM May 14, 2016 in category Java by Zoltan Farkas

I recently needed to limit the number of parallel accesses to a resource across a cluster. The classic use case for a Semaphore. 

So I went ahead and looked around for existing distributed implementations. The most solid one I could find is a zookeeper based one part of apache curator. This is probably a good implementation. (I say probably since the quality of popular OS libs is questionable more often than I like it)

The one drawback in my case is that this requires zookeeper, a extra piece of infra I don’t have(yet). What I do have is a MVCC Sql database with ACID… 

So a few hours of coding later the JdbcSemaphore  was born!

What is special (I thinnk) about this implementation is: 
1) You increase/decrease the number of total reservations, see the actual total/available reservation with JMX, which is pretty useful!
2) See the state in the DB with plain SQL!
3) Relatively low overhead, it is part of spf4j!

Currently it is in beta, and is being peer reviewed… concurrency is hard to get right!

Enjoy, for information, comments, suggestions go to and don't be afraid to use the issue tracker! :-)


The awesomeness of ZEL part 2.

10:00AM Apr 23, 2016 in category General by Zoltan Farkas

:Since I started ZEL to calculate fibonacci for the PART 1 article, I have added some extra info about the result of a operation

ZEL Shell
zel>func det fib(x) { fib(x-1) + fib(x-2) }
type>class org.spf4j.zel.vm.Program
executed in>6889844 ns
zel>fib(0) = 0
type>class java.lang.Integer
executed in>4319632 ns
zel>fib(1) = 1
type>class java.lang.Integer
executed in>317251 ns
type>class java.lang.Integer
executed in>50519054 ns
type>class java.lang.Long
executed in>5201067 ns
type>class java.math.BigInteger
executed in>7518551 ns
type>class java.math.BigInteger
executed in>572010 ns

As you might notice there are 2 "strange" things to observe: 

  1. The result types are different. Thanks to ZEL's type auto upgrade feature. (no overflows in ZEL)
  2. Execution time for fib(101) < execution time for fib(100), and this is thanks to the "det" (deterministic) keyword used to declare the fibonacci function.

For more detail please see 


The awesomeness of ZEL

09:33AM Apr 23, 2016 in category General by Zoltan Farkas

I recently showed someone how bug fib (10000) is, and the best tool for this was ZEL:

ZEL Shell
zel>func det fib(x) { fib(x-1) + fib(x-2) }
zel>fib(0) = 0
zel>fib(1) = 1
zel>fib (10000)

More information about ZEL you will find at.



Anticompetitive behavior? or just stupid behavior?

10:45AM Apr 17, 2016 in category General by Zoltan Farkas

I was reading an article about the UC Davis attempted to manipulate search results related to a pepper spraying incident that took place there (which I already forgot about :-), I am sure they wish they would be a European university and be protected by the "right to be forgotten" laws :-).

The proposed solution was pretty interesting:

"flood of content with positive sentiment and off-topic subject matter", and proposed hosting content on Google's own services, which would appear higher in the firm's search results."

This raises eye brows, since Google Search has a monopoly position in quite a few markets, and it is investigated in Europe for exact same behavior... Now there might be practical/technical explanations for this "behavior"... Ultimately crawling sites hosted in your own data centers is cheaper (less overhead)... On the other side when I use Google Search my expectation as customer is to get the best relevant search results and I don't believe the fact that content is hosted by Google has anything to do with relevance!

My suggestion to Google is: if this "behavior", which Nevins & Associates seem to think they can take advantage of exists, to correct it! not for the antitrust issues! but for improving the quality of their product!


SPF4j 7.2.25 is out!

10:43AM Apr 12, 2016 in category General by Zoltan Farkas

SPF4j 7.2.25 is out! UPDATE 7.2.26 is out to fix a bug with the slf4j formatting utility.

This is probably the last release that will be usable with JDK 1.7. Moving on forward the library will be compiled with JDK 1.8, and will not be usable with JDKs lower than 1.8.

7.2.25 contains a few notable additions:

1) Java nio TCP proxy. This is a useful utility for testing HA for tcp based services.

Here is how simple it is to use:

        ForkJoinPool pool = new ForkJoinPool(8);
        try (TcpServer server = new TcpServer(pool,
                new ProxyClientHandler(HostAndPort.fromParts(“”, 80), null, null, 10000, 5000),
                1976, 10)) {
            byte[] originalContent = readfromSite(“”);
            byte[] proxiedContent = readfromSite("http://localhost:1976");
            Assert.assertArrayEquals(originalContent, proxiedContent);
2) Slf4jMessageFormatter - a useful utility that is more flexible and after than the one shipped with slf4j. (see Slf4jMessageFormatterTest for more detail)

3) AppendableLimiterWithOverflow and a fast implementation of AppendableWriter

4) FastStackCollector allows usage with a more flexible filter: FastStackCollector(Predicate<Thread>) allowing you to reduce the profiling overhead and improve the relevance of the profiled data.

For more detail see:


To Spin or not to Spin!

11:02AM Sep 12, 2015 in category General by Zoltan Farkas

Both spf4j LifoThreadPool and java ForkJoin pool use Spinning to try to improve performance. It seems like FJP decided to stop spinning...

Latest JDK update disabled spinning for the fork-join pool:

the latest update also fixes serious bugs with FJP like: 

The spf4j thread pool spinning is configurable (on by default) and the decision to employ it is in the hands of the user... By default spf4j will have max nr_processors/2 spinning threads at any  (configurable via system property: lifoTp.maxSpinning). 

There are significant differences between spf4j lifo thread pool and FJP, and these differences make them useful for different use cases...  

FJP lacks the configurability of the spf4j thread pool. It is not possible to configure a FJP to reject tasks before and unrecoverable error like a OOM happens. FJP timeout to retire a thread is hardcoded to 2S, why isn't this at least configurable via a System property is a bit of a mystery to me... I am a big fan of minimalistic configurability, but in case of FJP it is a bit extreme... The cost of creating a thread can be significant if one factors all the thread local caches that are typically created by the average Server software and which are lost when the thread is brought down... so it would be useful if one could specify a minimal number of threads to be kept alive and a custom timeout... I understand that you can fork the FJP code, but if you need only a thread pool you can use spf4j thread pool instead... and you can even choose to spin. 

SPF4J thread pool will schedule the tasks in FIFO order to worker threads who are picked in a LIFO order... 

On the performance side, in some of the tests FJP is faster that spf4j in others it is significantly slower... which means that my tests need improvement...  



JDK ThreadPoolExecutor NOT.

08:16AM Jun 20, 2015 in category General by Zoltan Farkas

I have finally reached my patience with the JDK thread pools....

1) JDK thread pools are biased towards queuing. So instead of spawning a new thread, they will queue the task. Only if the queue reaches its limit will the thread pool spawn a new thread.

2) Thread retirement does not happen when load lightens. For example if we have a burst of jobs hitting the pool that causes the pool to go to max, followed by light load of max 2 tasks at a time, the pool will use all threads to service the light load preventing thread retirement. (only 2 threads would be needed…)

Unhappy with the behavior above, I went ahead and implemented a pool to overcome the deficiencies above.

To resolve 1) there are several “hacks” using the existing JDK pools:

To resolve 2) Using Lifo scheduling resolves the issue. This idea was presented by Ben Maurer at ACM applicative 2015 conference:

So a new implementation was born:

So far this implementation improves async execution perfomance for ZEL (

The implementation is spin capable to reduce context switch overhead, yielding superior performance for certain use cases.



Spf4j 7.1.10 is Out

07:48PM Apr 19, 2015 in category General by Zoltan Farkas

Bugs fixes, performance improvements and some new funtionalitty to record CPU and Thread usage for your process.

Open files monitor has been improved to automatically detect OF limit, will call the GC if warn threshold exceeded and even shut down the process when limit is reached if configured so.

JMX exporter utility has been improved to fully support "mixin"  MX beans. Spf4j monitors can now be controlled with JMX.

release is available in the central repo as always at 




Spf4j 7.1.8 is out

07:29PM Apr 01, 2015 in category General by Zoltan Farkas

Nee MemorizingBufferedInputStream is now available for troubleshooting purposes. (see where you fail)

Plus lots of bug fixes...

code available at


Spf4j 7.1.4 is out

08:26AM Mar 08, 2015 in category General by Zoltan Farkas

This release contains a lot of enhancements:

Object recycler had a few bugs fixed, and should be  production ready.

Added SizedObjectRecyclers which are very useful for recycling buffers. (ByteArrayBuilder can now use a recycled array.)

PipedOutputStream and PipedInputStream, a significantly better implementation that the stock jdk one, not only it is slightly faster, but the producer controls the byte handover with flush, having buffering semantics. This implementation supports timeouts as well by integrating with spf4j Runtime.get|setDeadline()

New UpdateablePriorityQueue implementation.

New Strings utilities, for fast to/from utf8 coding/decoding.

release is available in the central maven repo.



spf4j + flight recorder

06:49PM Nov 02, 2014 in category General by Zoltan Farkas

Spf4j has not a Flight recorder profiler integration for JMH, all you need to do to us it is:

        Options opt = new OptionsBuilder()
                .jvmArgs("-XX:+UnlockCommercialFeatures", "-Djmh.stack.profiles="
                        + System.getProperty("jmh.stack.profiles",
         new Runner(opt).run();

As you can see in the example above you can actually use spf4j profiler and flight recorder at the same time.

(not that it makes sense to do that :-) )

 enjoy, cheers!


Jmh + Spf4j

11:27AM Nov 02, 2014 in category General by Zoltan Farkas

I had some time this weekend to code due to bad weather :-), and I have integrated spf4j and jmh so that spf4j can be used to profile benchmarks. This way as you see a performance degradation you can immediately take a look at what potentially is the cause. All you need to do is to look at the ssdump files generated. (ex spf4j benchmark profiles).

Spf4j profiler is a better and lower overhead implementation compared with the JMH StackProfiler, however both suffer from safe point bias, which makes their results less accurate. (a lot of commercial profilers suffer from the same issue, I believe java flight recorder does not)

Spf4j will contain JMH profiler integration with java Flight recorder in the near future. 


Sleep sort in Zel

11:45AM Nov 01, 2014 in category General by Zoltan Farkas

One of the candidates we have been recently interviewing, as a anecdote implemented a sleep sort during the interview.

So I thought to myself, this can easily be implemented in ZEL as well, so here it is:

func sleepSort(x) {
  l = x.length;
  if l <= 0 {
    return x
  resChan = channel();
  max = x[0];
  sl = func (x, ch) {sleep x * 10; ch.write(x)};
  sl(max, resChan)&;
  for i = 1; i < l; i++ {
    val = x[i]; 
    sl(val, resChan)&;
    if (val > max) {
      max = val
  sleep (max + 1) * 10;
  for c =, i = 0; c != EOF; c =, i++ {
     x[i] = c
  return x

 and it works like a charm, enjoy!


Generating a Unique ID

12:13PM Oct 26, 2014 in category General by Zoltan Farkas

Most applications I encounter use UUID.randomUUID().toString() to generate unique IDs for various things like requests, transactions.... which is quite a slow implementaion.

Since I implemented a UID generator in SPF4J, I decided to do a little bit of benchmarking with JMH: 

and here are the results on my 4 core macbook pro: 

Benchmark                              Mode  Samples         Score        Error  Units

o.s.c.UIDGeneratorBenchmark.jdkUid    thrpt       60    261797.856 ±  11388.450  ops/s 

o.s.c.UIDGeneratorBenchmark.atoUid    thrpt       60   8102280.696 ± 159030.080  ops/s

o.s.c.UIDGeneratorBenchmark.scaUid    thrpt       60  25371629.029 ± 354517.591  ops/s

As you ca see the spf4j UID generator is 100x faster.

And as you can see it is significantly faster than the implementation using atomic instructions. In a lot of the code I stumble upon I see a lot of unjustified use, and the scalability impact is significant. 


SPF4J release 6.5.17 is OUT

07:00AM Sep 21, 2014 in category General by Zoltan Farkas

Release 6.5.17 is out, code and binaries at. Some of the notable changes:

 1) Added 3 measurement stores: tsdbtxt a simple text based format to store measurements. Graphite UDP store, and Graphite TCP store.

 2) ObjectPool is now called RecyclingSupplier, an extension to Guava Supplier. with 2 methods: get() and recycle(object)...

 3) Performance enhancements to further reduce the library overhead (and Heisenberg uncertainty principle)

 4) Retry methods in the Callable class have been further refined. A randomized Fibonacci back-off with immediate retries has been introduced as default.

 5) Added Either utility class.

6) Easy to export JMX operations and attributes. Simply annotate with @JmxExport the method or getters and setters and Register the object with the new Registry class and your done.


Apple Watch makes Yo obsolete(or not)

08:28PM Sep 09, 2014 in category General by Zoltan Farkas

One of the interesting features of apple watch is its instant messaging capability.

It allows you to send a YO in the most efficient way.

Based on popularity of YO, I see this as apple watch's killer feature :-).


spf4j jmx utilities enhaced

01:57PM Aug 22, 2014 in category General by Zoltan Farkas

Added some enhancements to the spf4j library to export attributes and operations.

All you need to do is annotate with @JmxExport your getters and setter and operations and call Registry.export() with your objects.

code @



Why does history have to repeat itself?

10:52PM Aug 02, 2014 in category Java by Zoltan Farkas

I wonder if the large loss of life in the Iraq and Afghanistan wars was worth it… and I am pretty sure it was not…

13 years after 9/11, and 10 years after the initial 9/11 commission report

"Al Qaeda–affiliated groups are now active in more countries than before 9/11."

“The struggle against terrorism is far from over—rather, it has entered a new and dangerous phase.”

“A senior national security official told us that the forces of Islamist extremism in the Middle East are stronger than in the last decade.”

“ISIS now controls vast swaths of territory in Iraq and Syria, creating a massive terrorist sanctuary. One knowledgeable former Intelligence Community leader expressed concern that Afghanistan could revert to that condition once most American troops depart at the end of 2014.”

On PBS Frontline on Jul 29 somebody said about the new terrorist threat:

“This is Al Qaeda 6.0, they make Bin Laden’s Al Qaeda look like boy scouts”

I see the same failed strategy being employed by Israel in Gaza…  

The Israeli army is creating the next generation of Extremist that will make the previous one look like boy scouts…

Why does history have to repeat itself?


spf4j alternative java flight recorder

08:56PM Jul 11, 2014 in category General by Zoltan Farkas

With JDK update 40 Oracle released  Java Mission Control + Java flight recorder:

(for more detail see:

As with spf4j you can implement continuous profiling, and there are some pros and cons of using Java Flight Recorder:

Java flight recorder has some implementation advantages that in theory will provide better data quality. Oracle calls the impact:"Zero performance overhead" which is sales BS, every engineer with the IQ greater than the room temperature knows that there is no such thing. However the overhead can be minimal and potentially lower that the spf4j, although not significantly lower.

But don't get ready to throw spf4j out of the window, java flight recorder is available only on the Oracle JVM, and is free to use in your test environments only, for production environments you will need to buy a license. Meanwhile spf4j you runs on any JVM, and is free to use in any environment.

Also some of the visualization is spf4j are in my view better...

In any case Java flight recorder is a great tool for implementing continuous profiling.



Easilly expose attributes and operations via JMX

10:14AM Jul 05, 2014 in category General by Zoltan Farkas

I implemented a small utility to export attributes and operations via jmx.

All you need to do is:

1) Annotate your attribute getter/setter or operation with @JmxExport

2) invoke: Registry.export("test", "Test", testObj1, testObj2...);

 and your attributes and methods will be available via JMX.

This is available in the latest version of spf4j


Parallel qsort in zel

08:43PM Apr 04, 2014 in category Java by Zoltan Farkas

I had a bit of time to implement some extra features in zel, just enough so that I can write quick sort in zel:

func qSortP(x, start, end) {
  l = end - start;
  if l < 2 {
  pidx = start + l / 2;
  pivot = x[pidx];
  lm1  = end - 1;
  x[pidx] <-> x[lm1];
  npv = start;
  for i = start; i < lm1; i++ {
    if x[i] < pivot {
      x[npv] <-> x[i];
      npv ++
  x[npv] <-> x[lm1];
  qSortP(x, start, npv)&;
  qSortP(x, npv + 1, end)&

qSortP(x, 0, x.length)

As you can see it is pretty much the standard implementation, and since it is ZEL it is parallel.

Parallel exec time = 510 ms
Parallel exec time = 470 ms
Parallel exec time = 473 ms
Single Threaded exec time = 1640 ms
Single Threaded exec time = 1528 ms
Single Threaded exec time = 1527ms

Tests executed on a quad core MacBook pro and show good scalability of the execution engine.

pretty cool, tests are at



ZEL has now channels.

09:22PM Mar 24, 2014 in category General by Zoltan Farkas

Here is a simple program where we have 1 producer and 10 consumers:

        ch = channel();
        func prod(ch) { for i = 0; i < 100 ; i++ { ch.write(i) }; ch.close()};
        func cons(ch, nr) {
            sum = 0;
            for v =; v != EOF; v = {
                out(v, ","); sum++ 
            out("fin(", nr, ",", sum, ")") 
        prod(ch); // start producer
        for i = 0; i < 10; i++ { cons(ch, i) } //start consumers

as with zel futures, channel operations do not block a thread.

ZEL coroutines are multiplexed over a pool of threads where and future.get(transparent)

are points where execution can be suspended.

Current channel implementation is a unbounded channel.


spf4j release 6.5.2

11:06PM Feb 23, 2014 in category General by Zoltan Farkas

I finally managed to  clean up and improve ZEL to make it worthy of being part of spf4j.

you can checkout source download binaries(from the maven repo) at



zel and replicas

08:44PM Feb 19, 2014 in category General by Zoltan Farkas

Implemented zel system function "first", which will  return the first value returned by a set of async invocations.

This is in general practical for implementing replica invocations, where we care about the first and fastest result.

Here is a dummy example:

replica = func async (x) {
    sleep random() * 1000;
    out(x, " finished\n");
    return x
out(first(replica(1), replica(2), replica(3)), " finished first\n");
sleep 1000

returns something like:

3 finished
3 finished first
2 finished
1 finished

As you can see in this case 3 finishes first. 2 and 1 finish afterwards, but the result are discarded.

Next on my list are exceptions and canceling async tasks where the result are not needed anymore...


Zel performance part II

08:39PM Feb 13, 2014 in category Java by Zoltan Farkas

Zel recursive Fibonacci implementation beats java, c++, erlang recursive implementations because of its o(n) characteristics. 

You can't compensate for a bad algorithm with the language choice.

fib = func det (x) {fib(x-1) + fib(x-2)};
fib(0) = 0;
fib(1) = 1;

However java, c, c++ outperform significantly zel in most cases.

I decided to compare zel against 2 similar languages: MVEL and SPEL.

Based on my micro-benchmarks ZEL looks  similar in performance with MVEL and significantly faster than SPEL.

latest tests are at , enjoy!



Zel concurrent programming and performance

10:28PM Feb 12, 2014 in category Java by Zoltan Farkas

Let's take my previous chapter example of calculating pi and see how it performs in sync mode (single threaded):

pi = func (x) {
  term = func (k) {4 * (-1 ** k) / (2d * k + 1)};
  for i = 0; i < x; i = i + 1 { parts[i] = term(i) };
  for result = 0, i = 0; i < x; i = i + 1 { result = result + parts[i] };
  return result

executes in about 450 ms

in parallel mode it executes in: 375 ms, we get a bit of a gain, but we pound the processors a bit more.

I have optimized the parallel implementation to:

piPart = func (s, x) {
  term = func sync (k) {4 * (-1 ** k) / (2d * k + 1)};
  for i = s; i < x; i = i + 1 {
    parts[i] = term(i)
  for result = 0, i = s; i < x; i = i + 1 {
    result = result + parts[i]
  return result

pi = func (x, breakup) {
  range = x / breakup;
  l = breakup - 1;
  for i = 0, result = 0, k = 0; i < l; i = i + 1 {
    part[i] = piPart(k, k + range);
    k = k + range
  part[i] = piPart(k, x);
  for i = 0, result = 0; i < breakup; i = i + 1 {
     result = result + part[i]
  return result
pi(100000, 5)

and it executes in about 230 ms, about twice faster than the single threaded implementation.

Tests have been executed on a 4 core laptop and they significantly impaired by power management which modifies frequency, disables cores ...

Performance it still far away from the single threaded java implementation which executes in 10 ms...

Will probably be able to get the times closer if I implement ++ and += in zel, but will probably still be far away from java.


Concurrent programming comparison with GO

10:49PM Feb 11, 2014 in category Java by Zoltan Farkas

Here is the parallel implementation  of calculating PI in ZEL:

pi = func (x) {
  term = func (k) {4 * (-1 ** k) / (2d * k + 1)};
  for i = 0; i < x; i = i + 1 { parts[i] = term(i) };
  for result = 0, i = 0; i < x; i = i + 1 { result = result + parts[i] }
  return result

The GO implementation looks like:

func main() {

func pi(n int) float64 {
    ch := make(chan float64)
    for k := 0; k <= n; k++ {
        go term(ch, float64(k))
    f := 0.0
    for k := 0; k <= n; k++ {
        f += <-ch
    return f

func term(ch chan float64, k float64) {
    ch <- 4 * math.Pow(-1, k) / (2*k + 1)

 ZEL async function calls do make the code more readable by having the concurrency completely out of the way.


Concurency/async programming in zel

08:45PM Feb 10, 2014 in category Java by Zoltan Farkas

One of the cool things about zel is concurrency.

Here is a simple example:

 f1 = func {sleep 5000; 1};
 f2 = func {sleep 5000; 2};
 f1() + f2()

this program will return 3 after about 5 seconds.

as you can see currently all functions are executed asynchronously,

no cumbersome futures syntax needed. the language will deal with the futures transparently.


ZEL lives again.

10:51AM Feb 08, 2014 in category General by Zoltan Farkas

I have cleaned up my good old ZEL expression evaluator.
The code now is not only cleaner, but I have added new functionality to the language.

New additions are async programming and memorization, which allow for pretty cool implementations.

With this we can implement the fibonacci function like:

fib = func det (x) {fib(x-1) + fib(x-2)};
fib(0) = 0;
fib(1) = 1;

with O(n) time and S(n) space characteristics.

which makes it possible to actually calculate large fibonacci numbers, unlike the closest implementation in java:

    public long fib(final long i) {
        if (i <= 1) {
            return i;
        } else {
            return fib(i - 1) + fib(i - 2);

where fib(40) takes to execute in about 5 ms in zel and 500 ms in java.

implementing fibonacci in java so that it actually works for large numbers looks like:

    public BigInteger fibBNr(final int i) {
        if (i <= 1) {
            return BigInteger.valueOf(i);
        } else {
            BigInteger v1 = BigInteger.ZERO;
            BigInteger v2 = BigInteger.ONE;
            BigInteger tmp;
            for (int j = 2; j <= i; j++) {
                tmp = v2;
                v2 = v1.add(v2);
                v1 = tmp;
            return v2;

and outperforms zel significantly: calculating fib(10000) in 5 ms while zel takes about 1900 ms 1600 ms 1200 ms 1000 ms.

This shows that there is significant overhead in the ZEL execution, but we are  not really comparing apples with apples since zel does memorization, and after computing fib(10000) calling fib(x) where x<=10000 will return in o(1) time.

If you need a fibonacci implementation with memorization the ZEL implementation is probably not a bad choice.

you can download the code from:



New call graph visualization...

09:57AM Aug 31, 2013 in category General by Zoltan Farkas

Hot methods are not really well visible in the flame charts, so I has a idea to improve them...

I really like the result:


The UI is quite usable, I was able to detect and fix several performance issues in real production code already.

There are a few things to be improved on the UI, but over all it is a big step forward!

Other approaches are to use graphviz to visualize call graphs as suggested in:

which seems to be the way they are visualized at Google as well: