# Java view traversal performance - some sloppy tests
Note: this article is rather old now - these tests were performed using Notes R5. I'm not sure whether my findings are applicable to more recent versions of Notes. Please feel free to leave a comment if your experiences differ from mine.
The experiment
A single database, containing documents associated with one of two form fields. The first type of document, call it "FormA", had 50 fields of 10 characters each. Document size: 545 bytes. FormB had 50 fields of 50 characters each. Document size: 2,545 bytes. I created 5,000 documents for each 'form' type, for a total of 10,000 documents in the database.
The Java app
Based on some work I'm doing, I have an iterator class which provides an interface to a ViewNavigator. The iterator uses a view which contains two columns:
- the universal ID of the document, ie.
@text( @documentuniqueid ) - a "last modified" string, which is a formatted string based on the
@modifiedfunction.
Note that the view doesn't display any of the fields in the document - purely metadata.
I have two of these views, identical in every respect except that one shows FormA documents, and the other shows FormB documents.
You can download a less-than-polished version of the code here (6k), and a copy of the test database here (22k). The app is meant to access the database locally: DIIOP isn't an option if it's raw speed you're after. Have a ball, but please don't grouse if your machine self-destructs or something. If you have any questions about the code, feel free to ask.
The Objective
To get to the bottom of a few Notes view traversal performance and memory management questions.
The Disclaimer
This is hardly scientific. Numbers are undoubtedly slow because I'm using a really slow hard drive. Feel free to download my test DB and code (coming soon) and see what numbers you get on your end.
Now, let's play.
Test 1: Does size matter?
A standalone java app, accessing the database locally, runs through all ViewEntries in view A before recycling the Database and starting again on view B. The first run through each view is ignored, to mitigate any cache issues. Thereafter the app alternates between the two views, running 20 times through each. For this test, the ViewEntry.isConflict() method is called.
| Form | Doc Size (bytes) | Avg rate/sec | Best rate/sec |
|---|---|---|---|
| FormA | 545 | 1,374 | 1,398 |
| FormB | 2,545 | 1,093 | 1,163 |
These results were quite consistent across a number of tests. A show database command showed that each freshly-created view index was exactly the same size (977,880 bytes), so any differences ought to be attributed to the the underlying documents, and not the view index itself.
Clearly, size did matter. I decided there are two potential culprits:
- the isConflict() check
- entry.getDocument().recycle()
Test 2: Does IsConflict() matter?
I had pondered whether the ViewEntry.isConflict() method hits the underlying document or relies purely on the view index. To test this, I re-ran the tests, alternating between the two views AND alternating between calling the isConflict() method and leaving it out.
| Form | Doc Size (bytes) | Check - Avg rate/sec | No Check - Avg rate/sec | Check - Best rate/sec | No Check - Best rate/sec |
|---|---|---|---|---|---|
| FormA | 545 | 1,236 | 1,237 | 1,390 | 1,394 |
| FormB | 2,545 | 926 | 955 | 1,062 | 1,090 |
I infer that the conflict check slows things down, but marginally. (On some runs, the difference was slightly more marked). Does this mean that isConflict() hits the view index only? We can't be sure, because of the pesky entry.getDocument().recycle() requirement which appears to be loading the document into memory anyway.
Test 3: So what about that entry.getDocument().recycle() thing?
See this post for details: one has to recycle a ViewEntry's underlying Document to avoid memory leaks. Jonvon had speculated that if you don't load the Document itself, then perhaps you don't need to recycle it. So, I disabled the conflict check (just in case it caused documents to load), commented out the recycle() call, and re-ran the tests. Hah! I got:
| Form | Doc Size (bytes) | Check - Avg rate/sec | No Check - Avg rate/sec | Check - Best rate/sec | No Check - Best rate/sec |
|---|---|---|---|---|---|
| FormA | 545 | 556 | 556 | 571 | 570 |
| FormB | 2,545 | 482 | 486 | 503 | 507 |
Not recycling the underlying document halves the speed! I was unable to crash the app, but memory usage climbed steadily. I tried again with 20,000 documents in each view, and got:
Start testing Size. (Check conflicts = false) pBUnidMdf : 10000 avg=233 | 233 MemAlloc: OUT OF PRIVATE HANDLES! -- pid 00000164 Handles used so far 16409, Maximum handles = 10495
(I've posted the exact error for posterity and Google). So Jon, that answers your question - you have to recycle the Document anyway.
The size-based discrepancy, and the above crash both tell me that when you get a handle on a ViewEntry, the underlying Document is also loaded, even if you never touch it.
My conclusion:
You can't avoid loading a back-end Document whenever you access a ViewEntry using the Java Notes API. That's a bit of a bummer, because it seems like something of an inefficiency, especially for large documents - it also casts some doubt on whether the ColumnValues approach to reading document values is as efficient in Java as it is purported to be in LotusScript (although I've admittedly not tested with LotusScript).
There are a few more things I'd like to play around with, while I'm at it:
- just how fast is using ColumnValues, as opposed to reading fields directly? I can't test that with the current views I'm using.
- what effect does the number of fields in a document have on load speeds?
- what effect does view size have on view traversal speed?
- what sorts of speed improvements do we see from ViewEntry traversal vs Document traversal in Java?
Finally, a gripe: we all love Notes, but again, I can't help but be frustrated by banging my head against the opacity of a proprietary, closed-source product. Sure, this doesn't really matter for the average app - but if you're worried about performance, they're valid questions: questions the right programmers at Lotus/IBM could answer in a flash. Ultimately it results in a frustrating inefficiency: I could have learned a lot more in the time it took me to do this, if I'd spent it reading through (or trying to decipher) source code rather than creating and running crude benchmarks.
No, I don't expect IBM to open up their software. However, I have to think that people working with open-source software do enjoy certain advantages we don't. Food for thought...
Last modified: {2006.04.21 22.05}