the corner office

a blog, by Colin Pretorius

# Java view traversal performance - some sloppy tests

Note: this article is rather old now - these tests were performed using Notes R5. I'm not sure whether my findings are applicable to more recent versions of Notes. Please feel free to leave a comment if your experiences differ from mine.

The experiment

A single database, containing documents associated with one of two form fields. The first type of document, call it "FormA", had 50 fields of 10 characters each. Document size: 545 bytes. FormB had 50 fields of 50 characters each. Document size: 2,545 bytes. I created 5,000 documents for each 'form' type, for a total of 10,000 documents in the database.

The Java app

Based on some work I'm doing, I have an iterator class which provides an interface to a ViewNavigator. The iterator uses a view which contains two columns:

  1. the universal ID of the document, ie. @text( @documentuniqueid )
  2. a "last modified" string, which is a formatted string based on the @modified function.

Note that the view doesn't display any of the fields in the document - purely metadata.

I have two of these views, identical in every respect except that one shows FormA documents, and the other shows FormB documents.

You can download a less-than-polished version of the code here (6k), and a copy of the test database here (22k). The app is meant to access the database locally: DIIOP isn't an option if it's raw speed you're after. Have a ball, but please don't grouse if your machine self-destructs or something. If you have any questions about the code, feel free to ask.

The Objective

To get to the bottom of a few Notes view traversal performance and memory management questions.

The Disclaimer

This is hardly scientific. Numbers are undoubtedly slow because I'm using a really slow hard drive. Feel free to download my test DB and code (coming soon) and see what numbers you get on your end.

Now, let's play.

Test 1: Does size matter?

A standalone java app, accessing the database locally, runs through all ViewEntries in view A before recycling the Database and starting again on view B. The first run through each view is ignored, to mitigate any cache issues. Thereafter the app alternates between the two views, running 20 times through each. For this test, the ViewEntry.isConflict() method is called.

FormDoc Size (bytes)Avg rate/secBest rate/sec
FormA5451,3741,398
FormB2,5451,0931,163

These results were quite consistent across a number of tests. A show database command showed that each freshly-created view index was exactly the same size (977,880 bytes), so any differences ought to be attributed to the the underlying documents, and not the view index itself.

Clearly, size did matter. I decided there are two potential culprits:

  1. the isConflict() check
  2. entry.getDocument().recycle()

Test 2: Does IsConflict() matter?

I had pondered whether the ViewEntry.isConflict() method hits the underlying document or relies purely on the view index. To test this, I re-ran the tests, alternating between the two views AND alternating between calling the isConflict() method and leaving it out.

FormDoc Size (bytes)Check - Avg rate/secNo Check - Avg rate/secCheck - Best rate/secNo Check - Best rate/sec
FormA5451,2361,2371,3901,394
FormB2,5459269551,0621,090

I infer that the conflict check slows things down, but marginally. (On some runs, the difference was slightly more marked). Does this mean that isConflict() hits the view index only? We can't be sure, because of the pesky entry.getDocument().recycle() requirement which appears to be loading the document into memory anyway.

Test 3: So what about that entry.getDocument().recycle() thing?

See this post for details: one has to recycle a ViewEntry's underlying Document to avoid memory leaks. Jonvon had speculated that if you don't load the Document itself, then perhaps you don't need to recycle it. So, I disabled the conflict check (just in case it caused documents to load), commented out the recycle() call, and re-ran the tests. Hah! I got:

FormDoc Size (bytes)Check - Avg rate/secNo Check - Avg rate/secCheck - Best rate/secNo Check - Best rate/sec
FormA545556556571570
FormB2,545482486503507

Not recycling the underlying document halves the speed! I was unable to crash the app, but memory usage climbed steadily. I tried again with 20,000 documents in each view, and got:

Start testing Size. (Check conflicts = false)
pBUnidMdf :	10000	avg=233 | 233
MemAlloc: OUT OF PRIVATE HANDLES! -- pid 00000164 Handles used so far 16409, Maximum handles = 10495

(I've posted the exact error for posterity and Google). So Jon, that answers your question - you have to recycle the Document anyway.

The size-based discrepancy, and the above crash both tell me that when you get a handle on a ViewEntry, the underlying Document is also loaded, even if you never touch it.

My conclusion:

You can't avoid loading a back-end Document whenever you access a ViewEntry using the Java Notes API. That's a bit of a bummer, because it seems like something of an inefficiency, especially for large documents - it also casts some doubt on whether the ColumnValues approach to reading document values is as efficient in Java as it is purported to be in LotusScript (although I've admittedly not tested with LotusScript).

There are a few more things I'd like to play around with, while I'm at it:

  1. just how fast is using ColumnValues, as opposed to reading fields directly? I can't test that with the current views I'm using.
  2. what effect does the number of fields in a document have on load speeds?
  3. what effect does view size have on view traversal speed?
  4. what sorts of speed improvements do we see from ViewEntry traversal vs Document traversal in Java?

Finally, a gripe: we all love Notes, but again, I can't help but be frustrated by banging my head against the opacity of a proprietary, closed-source product. Sure, this doesn't really matter for the average app - but if you're worried about performance, they're valid questions: questions the right programmers at Lotus/IBM could answer in a flash. Ultimately it results in a frustrating inefficiency: I could have learned a lot more in the time it took me to do this, if I'd spent it reading through (or trying to decipher) source code rather than creating and running crude benchmarks.

No, I don't expect IBM to open up their software. However, I have to think that people working with open-source software do enjoy certain advantages we don't. Food for thought...

Last modified: {2006.04.21 22.05}

Comments:

1. jonvon (2004.03.11 - 23:42) #

this is excellent work colin. really well done, thanks a lot.

:-)

2. Jerry Carter (2004.03.15 - 14:27) #

Colin,

I'm with Jonvon, nice piece of work. Some more questions to throw on the heap:

If you're reusing the same Document object, I wonder how it would fair if you only called recycle once every 5000 or 10000 (since that's in the neighborhood of the private handle limit) itterations. Does this free up all private handles or just the one being used?

You can take a look at the Notes java wrappers with DJ Decompiler. I don't know or really care if that violates the ULA. I don't have any intent of fostering illegal competition against IBM through that act, so do what you will. I found it interesting but not particularily enlightening when I was working on a different bug in the notes java classes.

And on that note, so long as we're willing to be lazy (ie. rely on IBM to write most of the foundation code for us) we're kind of stuck with the closed source. Open source is a great idea, and exists because people came to this very same fork in the road and said, "Screw it.", and went down the path of writing their own server (aka PUAKMA). :-)

Thanks! Great article!

Jerry

3. Colin (2004.03.16 - 00:53) #

Jerry, I believe you'd still need to recycle each document. Even if you were re-using the same in-memory Java object, I'm pretty sure the C/C++ API objects would be instantiated separately for each distinct on-disk document. That's why it's essential that you recycle that back-end object before you lose the handle to it, because if you (say) point the Java object to another document before recycling it, you've just leaked that memory (at least until you recycle the parent object, eg. the database)

Decompiling is explicitly forbidden by the EULA and if you do, you forfeit your right to use the software, which is a bit of a pain. I doubt anyone at IBM is going to get antsy about it but you never know when admitting to it might prove to be career-limiting down the line. Taking a peak is very tempting though...

You make another good point: there's nothing really stopping anyone from writing their own Java API, if they're willing to muck about with the C/C++ API directly. A lot of work, though, but it might be quite illuminating :)

4. Marius Waldal (2004.12.08 - 10:00) #

Great article, Colin! Very informative.

Have you had a chance to test the speed of accessing ColumnValues versus document fields?

/Marius

5. Colin (2004.12.22 - 00:30) #

Thanks, Marius. I actually did set up a test, but as I recall my preliminary look-see gave me erratic results and I wanted to re-run the tests on a clean machine with fewer processes running, etc. I think I've long since lost the test code, but I'll see what I can dig up, I'd completely forgotten about this :)

Add a comment

Your name (mandatory):

Your email:
Your email address is not displayed
Your home page:

Comment (sorry, no HTML):

Remember details?
Yes No

meta

-home-
about
contact
disclaimer
articles
code
link blog

style: [?]
[plain.dark.blue]

Categories

java
linux
music
notes/domino
personal
politiek
studies
techie
thee_blog
world

RSS Feeds

rssfeed all posts
rssfeed all cmts
rssfeed tech posts
rssfeed tech cmts

Archives

2008.10
2008.09
2008.08
2008.07
2008.06
2008.05
2008.04
2008.03
2008.02
2008.01
2007.12
2007.11
2007.10
2007.09
2007.08
2007.07
2007.06
2007.05
2007.04
2007.03
2007.02
2007.01
2006.12
2006.11
2006.10
2006.09
2006.08
2006.07
2006.06
2006.05
2006.04
2006.03
2006.02
2006.01
2005.12
2005.11
2005.10
2005.09
2005.08
2005.07
2005.06
2005.05
2005.04
2005.03
2005.02
2005.01
2004.12
2004.11
2004.10
2004.09
2004.08
2004.07
2004.06
2004.05
2004.04
2004.03
2004.02
2004.01
2003.12
2003.11
2003.10
2003.09
2003.08
2003.07
2003.06

© Colin Pretorius