Underperforming LabVIEW compiler when it comes to buffer allocations

Discussion:

(too old to reply)

AnthonV

2007-06-21 16:10:10 UTC

Have you ever noticed that even though the LabVIEW dataflow paradigm enforces code to be executed in a specific order, you are able to get huge (depending on you cluster or array sizes) improvement in processing speed if you enforce order by using a sequence diagram.  In many cases you can elliminate the buffer allocations that are (unneccesarily) made by the compiler if you don't rely on dataflow and rather use sequence diagrams.
 
See that attached images.  'Without.png' shows the code as one would normally wire it - notice the buffer allocation indicated by the red circle.  This cluster of mine is very big with large arrays in it so this allocation is very costly.  'With.png' is exactly the same code just with order enforced using a sequence.  Notice there is no buffer allocation made by the compiler here.   My cycle time on this diagram reduced from 53ms to 5ms (!), just by adding the sequence.
 
I was hoping that 8.2.1 would be better at this than 7.1 but it doesn't seem so.
 
Any ideas or am I missing something?
 

with.PNG:
Loading Image...

without.PNG:
Loading Image...

AnthonV

2007-06-21 17:40:15 UTC

Permalink

There is obviously some other code that isn't shown here that is executed when the flag is set so what you suggest won't work for me unfortunately.  I'm selecting specific elements of the array and checking the flag, if it is set I reset it and perform some operation on the rest of the element, place it back into the array and then bundle it back.
 
In any case, I've attached a simplified VI that shows the effect on performance.  Select which method with the 'method button'.  The only difference between the two methods is the use of dataflow (incorrectly resulting in a buffer allocation) or an explicit sequence.  In this case on my PC (3.8GHz Pentium running XP with 4GB RAM) the execution time is 1540ms for the dataflow method and 1ms for the sequence!  Maybe there is something to be said for the compiler not to 'err on the safe side'.
 
Let me know what you think - maybe I'm missing something.  Going home now so I'll pick up your comments tomorrow morning.

TD - bbb.ctl:
http://forums.ni.com/attachments/ni/170/254539/1/TD - bbb.ctl

BufferExample.vi:
http://forums.ni.com/attachments/ni/170/254539/2/BufferExample.vi

TD - aaa.ctl:
http://forums.ni.com/attachments/ni/170/254539/3/TD - aaa.ctl

tst

2007-06-21 19:10:11 UTC

Permalink

AnthonV wrote:
Maybe there is something to be said for the compiler not to 'err on the safe side'.

Not really. It's much better for such an error to slow down your application then for it to produce wrong results and I can find <a href="http://forums.ni.com/ni/board/message?board.id=170&message.id=162943" target="_blank">all kinds</a> of <a href="http://forums.ni.com/ni/board/message?board.id=170&message.id=187351" target="_blank">examples</a>. Inplaceness is complicated and it's possible that what you're seeing here is a missed corner in the algorithm.
For a couple of threads demonstrating the complexity (and explaining some more) you can have a look <a href="http://forums.ni.com/ni/board/message?board.id=170&message.id=191622&view=by_date_ascending&page=1" target="_blank">here</a> and <a href="http://forums.ni.com/ni/board/message?board.id=170&message.id=231529&view=by_date_ascending&page=1" target="_blank">here</a>.

AnthonV

2007-06-22 06:10:09 UTC

Permalink

Thanks tst for the links to the other theads - this is obviously a known concern and one has to make sure you use the 'show buffer allocations' often to ensure efficient code.  Especially the second thread where there seemed to be a difference between 7.1 and 8.2 was concerning.  I am eagerly waiting for 8.5 (NI Week '07 I hear) to see what it does with this.
I have never tried VI Analyzer - it could be that this also gives tips regarding buffer allocations.

AnthonV

2007-06-22 09:10:15 UTC

Permalink

Here is an example of a different case - a buffer allocation is performed on the entry shift register of a loop.  One can move this outside the loop by simply placing a case statement around the loop that is always true.  See the attached images.  Now I am not sure whether this buffer allocation happens on every cycle or only on the first entry into the loop - if it happens once only this is obviously not an issue.

loop1.PNG:
Loading Image...

loop2.PNG:
Loading Image...

Ben

2007-06-22 13:10:08 UTC

Permalink

Discusions of performance and buffer allocations and in-placeness require more time than I have at the moment to analyze fully.
 
So lacking time let me share an idea that has worked well for me.
 
I have borrowed the idea of using a "key" to do quick searches.
 
If I am designing my data structures for an application and I see that one of the elements of a cluster will be used to control wether or not I operate on the other elements of the cluster, I'll keep the "key" value in a sepearte array (usually in a shift register).
 
This lets me do seraches on the array of "key" values and then use the results of that search to only opearte on the selected "records" (I am showing my age by using that term!).
 
Ben

altenbach

2007-06-21 20:40:11 UTC

Permalink

I would think this is a real bug.
 
I simplified the code a bit. Using the code in the image (other case is wired straight through), the absence of the 1-frame case sequence causes a 1000+ fold slowdown. (no other change in the code!).
 
(I also tested with all TRUE for "Changed" in the generated data)
 
<img src="Loading Image...

">
 
Can you be a bit more specific what else you need to change in the cluster?
 
 Message Edited by altenbach on 06-21-2007 01:18 PM

BufferAllocation.png:
http://forums.ni.com/attachments/ni/170/254581/1/BufferAllocation.png

altenbach

2007-06-22 01:40:07 UTC

Permalink

AnthonV wrote:
There is obviously some other code that isn't shown here that is executed when the flag is set so what you suggest won't work for me unfortunately.  I'm selecting specific elements of the array and checking the flag, if it is set I reset it and perform some operation on the rest of the element, place it back into the array and then bundle it back.

See, you do all operations within a single array element! There is absolutely no need to constantly unbundle and rebundle the array from the parent cluster. Just unbundle the "controls" array, do whatever you need to do inside the loop, and at the very end bundle it back to the parent cluster.
Here s a quick very simple example of what I had in mind. It checks if "changed" is TRUE, and if so, resets "changed" to FALSE and also sets state=3. The possibilities are unlimited, do whatever you need to do to each array element. Once you have edited the entire array, and the FOR loop has finished, bundle it back into the parent. No outer loop! No shift registers!
If you only have selected elements, autoindex on the list of indices and basically do the same with "index array" and "replace array element".
<img src="Loading Image...

">
 Message Edited by altenbach on 06-21-2007 06:28 PM

BufferAlternative.png:
http://forums.ni.com/attachments/ni/170/254616/1/BufferAlternative.png

altenbach

2007-06-22 02:10:06 UTC

Permalink

altenbach wrote:If you only have selected elements, autoindex on the list of indices and basically do the same with "index array" and "replace array element".

Here's what I had in mind.
<img src="Loading Image...

">  Message Edited by altenbach on 06-21-2007 06:42 PM

BufferAlternative2.png:
http://forums.ni.com/attachments/ni/170/254619/1/BufferAlternative2.png

AnthonV

2007-06-22 05:10:13 UTC

Permalink

Thanks for the detailed feedback.  Unfortunately it seems I will have to keep on bundeling inside the loop as unbundeling the array and autoindexing also creates copies at every iteration causing the following results:  bundeling inside the loop: 1ms; bundeling outside the loop: 1200ms (see attached picture).  I agree that what you suggest is the most elegant and esthetically pleasing solution, but in an embedded application (or any other for that matter) it is unacceptable. 
Any programmer that has any need for time-critical code will have to regularly do a 'show buffer allocations' and elaborate their code with sequences and less-than-esthetically-pleasing sections to eliminate unrequired buffer allocations.
I have often had LabVIEW programmers tell me that LabVIEW should not be used for performance-critical tasks but rather for ease-of-use.  I disagree, with a little bit of effort and some elaboration one can get incredible speed-ups of your code - as long as you eliminate non-obvious buffer allocations.  And it would be wonderful if the compiler could do this more intelligently - in fact I expect this from the compiler and I hope it will do better in future versions if LabVIEW wants to compete mainstream.
 

buffex2_1.PNG:
Loading Image...

Darren

2008-08-12 00:40:07 UTC

Permalink

I tested the slowdown described in this thread and it appears to have disappeared in LabVIEW 8.6.  -D