09 September 2014

Part 1a. "Какая гадость... Какая гадость эта ваша заливная рыба"

These days, there are more tools to play with benchmarks. No doubts, its really hard to do especially if they are micro. Let's cheers up an JMH. Really pretty good and easy to use framework for a writing the micro benchmarks for a JVM. To play with, I decided to come back to my old investigation about the Spring proxying, I did some time ago. But now, the list of targets are extended and tests are more representative.

So, the purpose was to compare the cost of proxying the POJOs by different instruments widely used nova days. The list from the original was slightly extended so its looks like:

  1. Direct call - just an explicit call, for comparison purposes, o.k.e.CatBenchmark.t1_direct,
  2. Generated bytecode (JavAssist) - "manually" generated proxy with aids of JavAssist, o.k.e.CatBenchmark.t2_javassist,
  3. Generated bytecode (CGLib) - the same, except CGlib, o.k.e.CatBenchmark.t3_cglib,
  4. JDK's Dynamic proxy - proxy provided by default JDK's dynamic proxy, o.k.e.CatBenchmark.t4_dynamic,
  5. Hibernate proxy - proxy provided by Hibernate, o.k.e.CatBenchmark.t5_hibernate,
  6. Spring proxy (CGLib) - proxy, provided by Spring, o.k.e.CatBenchmark.t6_springOpt,
  7. Spring proxy (Dynamic) - "optimized" proxy, provided by Spring, o.k.e.CatBenchmark.t7_spring,

Here we need to make a couple remarks: a) Used framework allows achieve pretty stable results that, I think, could be trusted. Measurements were made for different "load", from simplest one, execution of which was comparable to the cost of proxying, to more complex case, where cost was amortized by weight of calculations (i.e. calculations took significantly more time then time spent on pure proxying). b) The version of competitors where up to dated to be more actual.

The code is available for inspecting here. Any notes / comments are welcome. :)

So, the 1st experiment was about proxying the hashing of the "I said meow!" string.

The result on my old laptop under JDK 1.8 u20 looks like:

 Benchmark                          Mode  Samples   Score  Score error  Units
 o.k.e.CatBenchmark.t1_direct       avgt       30   9.762        0.047  ns/op
 o.k.e.CatBenchmark.t2_javassist    avgt       30  10.268        0.035  ns/op
 o.k.e.CatBenchmark.t3_cglib        avgt       30  10.643        0.050  ns/op
 o.k.e.CatBenchmark.t4_dynamic      avgt       30  16.760        0.136  ns/op
 o.k.e.CatBenchmark.t5_hibernate    avgt       30  25.991        0.167  ns/op
 o.k.e.CatBenchmark.t6_springOpt    avgt       30  27.275        0.150  ns/op
 o.k.e.CatBenchmark.t7_spring       avgt       30  53.929        0.423  ns/op

For a self check, lets look at the border case, the Spring's dynamic proxy. Its cost, as we may see is 53.929 - 9.762 = 44.176±0.47 ns. Unexpectedly good looking the CGLib, its stands not so far from explicit invocation.

The hash calculation is pretty straight forward. Just iterating over character with some magic of prime numbers. So, the next step was to increase the number of iterations in 10 times:

 Benchmark                          Mode  Samples    Score  Score error  Units
 o.k.e.CatBenchmark.t1_direct       avgt       30  137.933        0.417  ns/op
 o.k.e.CatBenchmark.t2_javassist    avgt       30  138.760        0.508  ns/op
 o.k.e.CatBenchmark.t3_cglib        avgt       30  139.164        0.389  ns/op
 o.k.e.CatBenchmark.t4_dynamic      avgt       30  147.382        0.655  ns/op
 o.k.e.CatBenchmark.t5_hibernate    avgt       30  157.001        0.828  ns/op
 o.k.e.CatBenchmark.t6_springOpt    avgt       30  158.625        0.759  ns/op
 o.k.e.CatBenchmark.t7_spring       avgt       30  184.689        0.988  ns/op

And we may see, the time per operations is really about 10 times more. And observed cost is still reasonable: 46.756±1.405 ns.

And last case is about relatively heavyweight calculation of CRC32 of string from previous experiment:

 Benchmark                          Mode  Samples    Score  Score error  Units
 o.k.e.CatBenchmark.t1_direct       avgt       30  207.325        1.435  ns/op
 o.k.e.CatBenchmark.t2_javassist    avgt       30  206.799        1.581  ns/op
 o.k.e.CatBenchmark.t3_cglib        avgt       30  208.403        1.069  ns/op
 o.k.e.CatBenchmark.t4_dynamic      avgt       30  217.890        1.144  ns/op
 o.k.e.CatBenchmark.t5_hibernate    avgt       30  233.030        2.577  ns/op
 o.k.e.CatBenchmark.t6_springOpt    avgt       30  231.929        1.956  ns/op
 o.k.e.CatBenchmark.t7_spring       avgt       30  259.991        2.291  ns/op

With cost: 52.666±3.726 ns.

So, the worst case, Spring's proxy based on dynamic proxy costs ~50 ns. No so bad, right? Last time it was in twice as bigger, there is definite progress between Spring v. 2.5.6 and v. 4.0.6. But that is just a proxying a call... We didn't consider losses on bunch of JIT optimisations that just impossible to do on semi-reflective call wrapping implied with CGLib usage. And this is only 1 call. Modern applications uses tons of Spring config files, where interconnections may build the whole tree of proxy instances.

P.S. Was pointed to the nice explanation of "black hole". Definitely more "realistic" tool to measure such kind of experiment.

No comments: