Discussion:
Build a $100,000 labview supercomputer?
(too old to reply)
Root Canal
2008-08-07 23:10:06 UTC
Permalink
I've been testing experimental computer vision algorithms using labview. We plan to eventually convert the code into assembly language or VHDL or embed it in a processor or whatever to make it really fast.

However, my insane boss is willing to spend alot of money to speed up development. He wants to buy or build a multiprocessor system that will speed up the execution of parallel labview code. Alot of our code can be made super parallel, so whoopie hooray.

Thing is, we already run on a dual core xeon 5140, so it's not like we can just buy a quad core chip and get a couple orders of magnitude performance increase.

I see hints that labview can be made to take advantage of a large number of parallel processors, so my question is: how do I run labview on forty processors in tandum? What hardware should I buy? What OS do I use? And will it be faster than running on an 8 core processor or will it be just a glorified space heater?

Let's assume I have a $100,000 dollar budget.

please help me spend my crazy boss's money.

Thanks.
altenbach
2008-08-08 00:40:16 UTC
Permalink
Sorry, I don't have any specific suggestions, but have a look at the following&nbsp;<a href="http://www.ni.com/niweek/2008/keynote/mike_santori_mike_cerna.htm" target="_blank">NI-Week 2008 Keynote clip</a>. It discusses some impressive computing hardware. :)
&nbsp;
(OTOH, My old LabVIEW 4.0 code still controls an instrument running on a 120MHz Pentium I) :o
muks
2008-08-08 11:40:06 UTC
Permalink
Hi Root canal, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Can you share us for what application u need to go this far.What is the processing speed requirement?
Ben
2008-08-08 13:40:07 UTC
Permalink
I agree that you should check with NI now that NI WEek is behind us to see what they want to sell us today.
Another idea:
DIstributed processing - Curtis-Wright offers a product called SCRAMNet that is shared memory linked using fiber. The throughput&nbsp;absolutely blows away anything else I have used, but I digress. If you split the work of your app across multiple high-end processors, and pass the analysis works to seperate machines, the processings cpabilities are limited only by your budget. In your case you may want to skip using the fiber switch they sell (unless you need the redundency) since the switch alone will set you back about $20K. :smileysurprised:
Ben
Root Canal
2008-08-08 15:10:11 UTC
Permalink
The application is a flexible computer vision testbed. We have a generic problem of figuring out how to get a computer to recognize and properly identify objects in a video. We are presently looking for a small set of objects in a small set of scenes, but we will want to adapt to other objects and other scenes in the future so we want to be able to test different algorithms using a variety of software configurations. We can do that right now, but more complex algorithms take much longer to process than simple algorithms.
&nbsp;
Alot of the image processing functions that we use can be optimized to run quickly in a parallel computing architecture because either the image can be broken into overlapping segments which are then processed independently or because we are solving large systems of linear equations. However, we can not simply divide the entire computer vision process up into a bunch of completely seperate parallel calculations. There has to be a controller that goes along and dynamically assigns computing tasks to different processors and then pulls the results back together, thinks about it for a fraction of a second, and then distributes new tasks.
&nbsp;
Because this will be a testbed for computer vision research and development, and because many problems in computer vision are as yet unsolved, the&nbsp; processing speed requirement is simply: " as fast as we can make it while maintaining programmatic flexibility." We are not trying to achieve a certain speed benchmark with a known algorithm, we are trying to keep from falling asleep while waiting for it to compute. You can only sit next to a grinding computer playing guitar hero for so many hours before that too begins to feel like work.
&nbsp;
Ravens Fan
2008-08-08 16:10:06 UTC
Permalink
You might be able to do this with multiple desktop computers.&nbsp; Create an application as an executable that you can deploy on numerous computers that does the algorithm processing.&nbsp; Use VI server working across the network to start up or command those individual applications as necessary.&nbsp; Have your master computer break up the process and determine which PC is doing what.&nbsp; Distribute data to each of the slave PC's by way of TCP/IP and command it to do whatever task by way of VI server (or perhaps another command channel opened on another TCP/IP port.)&nbsp; When they complete their task, they send the processed data back by TCP/IP and wait for their next command.
&nbsp;
It sounds like an interesting application, but would be&nbsp;complicated.&nbsp; But I think most of the complication would be in handling the processing algorithms, how to break up data and merge it back together.&nbsp; But that seems like it would be your area of expertise.
&nbsp;
Good Luck!
Root Canal
2008-08-08 17:10:06 UTC
Permalink
I think I understand what you are saying, but the thing is that I've read all this hype about labview supporting multicore and multithread processing. Labview is an inherently parallel language, ideal for parallel programming. It can already automatically break up computation between cores on a quad core processor (or an eight core processor if you're on that crazy eight core mac). It does so without the programmer having to specifically delegatae which task goes to what and worry about timing issues between processor cores!
That's really neat! From the literature I've read from NI, all you have to do is make sure you wire up your block diagram in such a way as to not force linear execution (for instance by running everything in series through a flat sequence) and then labview dynamically handles the load balancing between cores. So this begs the question: If labview can do automatically do this with&nbsp;one eight core processor, then can it do this with eight single core processors?&nbsp;Can it utilize&nbsp;sixteen single core processors?&nbsp;Four quad core processors?&nbsp;A hundred quad core processors!?!
It seems like it should be able to allow a user to build a scalable supercomputer without having to go through the pain of&nbsp;creating some complicated control hub program to handle&nbsp;task distribution and timing because labview already does that for you.
Does anyone know if the multithreading capability in labview can handle multiple processors (not just multiple cores) and if so, how many? How do I build a computer with thirty or forty processors running in tandem that&nbsp;would be well suited to parallel processing in labview?
What OS would I use? what kind of architecture should the computing platform have? Should I use special labview software or applications? What kind of data transfer constraints are there? how much memory do I need and how should it be shared by the processors? Can I use multiple multicore processors? Would it be better to use all single core or all multicore?
I don't know what I am doing here; all I know is that NI says that labview 8.6 can run super efficiently on a multicore system and sure makes it sound like labview can run super efficiently on a multi processor system. Cool! Please give me a solid &nbsp;example of how to buy or build such a system!
Anybody know?
Please Help!
Ben
2008-08-08 18:10:06 UTC
Permalink
Hi Root,
I have managed to load up 8 CPU with LV apps but it takes a little extra effort to harnes the power of all of them. In a nut-shell, you want to have a lot of parallel code in your number crunching routines. In my case I broke up arrays into groups of eight and called all of my sub-VIs as reentrant.
LV will break up the code into threads and then use the OS to get the work done.
BenMessage Edited by Ben on 08-08-2008 01:01 PM
altenbach
2008-08-08 18:40:05 UTC
Permalink
Root Canal wrote:

Does anyone know if the multithreading capability in labview can handle multiple processors (not just multiple cores) and if so, how many? How do I build a computer with thirty or forty processors running in tandem that&nbsp;would be well suited to parallel processing in labview?


<a href="http://digital.ni.com/public.nsf/3efedde4322fef19862567740067f3cc/84eca015aa496b23862565bc006c0f19?OpenDocument" target="_blank">Here</a> is some related discussion, but it is a bit old (LV 7.1). Have a look at vi.lib\Utility\sysinfo.llb\threadconfig.vi in your labview installation.
&nbsp;
It seems that the just released LabVIEW 8.6 has a few new enhancements that provide even better support for multiCPU systems (There was apparently some hype at NI week). So that's what you probably should use.
Ben
2008-08-08 20:40:05 UTC
Permalink
Sorry! 3Meg image is bad bad bad!
&nbsp;
&nbsp;
BenMessage Edited by Ben on 08-08-2008 03:11 PMMessage Edited by Ben on 08-08-2008 03:11 PM
Brian_A
2008-08-11 19:40:06 UTC
Permalink
The basic idea is what Ben said: " LV will break up the code into threads and then use the OS to get the work done."As has been noted, LabVIEW automatically multi-threads your application (making it very simple for a programmer to take advantage of multi-core computing).&nbsp; There are of course methods to programming that will maximize your return, but the nature of graphical programming tends toward incidental parallel operations which will multi-thread and take advantage of multiple cores with really no effort required from the programmer.As for the maximum number of cores/processors, you would have to look at the operating system.&nbsp; I believe Windows XP supports up to 32 cores (you may want to check me on that).&nbsp; We actually had a demo at <a href="http://www.ni.com/niweek/" target="_blank">NI Week</a> where we had a single computer with 4 quad-core processors (16 total cores) running the The LabVIEW Real-Time operating system.&nbsp; Real-Time is probably not necessary for you (it doesn't sound like you need a deterministic OS), but it was a good demonstration of a high core-count system.There are a lot of good resources that explain a lot of this at <a href="http://www.ni.com/multicore/" target="_blank">http://www.ni.com/multicore/</a>.&nbsp; I think <a href="http://digital.ni.com/express.nsf/bycode/exyjqg" target="_blank">this download</a> is also very good. I hope this helps!
Ben
2008-08-11 19:40:09 UTC
Permalink
Thanks Brian for those numbers. I should expand one one thing that can be done that is "not taught in LV Basics".
&nbsp;
If you have collection of things (think multiple microphones) that you want to perform the same anaysis on, the "natural" thing to do is is to combine all of the microphone signals into an array and use a "lean and mean" sub-VI in a For Loop&nbsp;to do the number crunching. But if you want to use all of your processors to do the number crunching that construct would force each microphone to be processed one after the other. I picked up a lot of speed by using eight arrays (each with one microphone worth of info) and then use a "re-entrant" sub-VI. THe code looks a little weird and if you did not know what I was doing you may say "Ben! You can simplify your code .... For Loop!"
&nbsp;
So the idea still is the same. If you want to use multiple processors in parallel, make sure your code "looks parallel".
&nbsp;
Ben
Root Canal
2008-08-11 21:10:05 UTC
Permalink
Brian_A, mind sharing a little info on the specific hardware that you were running on? Was it a shared memory system, distributed memory, or hybrid?
&nbsp;
Thanks,
root
Brian_A
2008-08-12 16:40:19 UTC
Permalink
Well, the demo was not my own so I do not know the specifics, and can't find any at the moment (but I can let you know if I do).&nbsp; I would imagine you could search around the internet to find high-performance computing solutions for sale (I know it's not too hard to find multiple socket mainboards), along with considerations regarding memory management.&nbsp; There is also some duscussion on memory schemes in the download I mentioned in my last post. To expand on Ben's comments, check out the "Parallel Programming Strategies for Multicore Processing in LabVIEW" section of the <a href="http://zone.ni.com/devzone/cda/tut/p/id/6422" target="_blank">Multicore Programming Fundamentas</a> tutorial in our Developer Zone.&nbsp; It covers the three main ways to program in order to take maximum advantage from a multi-core system: Task Parallelism, Data Parallelism, and Pipelining.
Root Canal
2008-08-12 19:40:04 UTC
Permalink
Thanks for the help. I think I'll just take a stab at ordering an off the shelf&nbsp;high performance computing solution that will support several quadcore processors and we'll try to benchmark various tasks to see where the bottleneck is before buying something more ambitious. Thanks again for all the advice.
&nbsp;
root
&nbsp;
&nbsp;
&nbsp;
Ben
2008-08-12 20:10:05 UTC
Permalink
Hi root,
If you get a chance try to post back to this thread about what you found.
Thank you,
Ben
Brian_A
2008-08-13 15:40:05 UTC
Permalink
Hey Root, &nbsp;I don't know why I didn't think of this before, but check out the <a href="http://www.ni.com/niweek/keynote_videos.htm" target="_blank">NI Week Keynote video</a> from Wednesday, August 6 titled &quot;Introduction and Very Large Telescope&quot; (it's the first one listed under Wednesday).&nbsp; About 6 min. in they start talking about the telescope application, which shows off some awesome LabVIEW computing power.&nbsp; Definately take 10 min. to check it out.&nbsp; There is a bit of info in the video on specific Dell hardware used, and for this app. they are definitely trying to get the most computing power possible!&nbsp; SPOILER ALERT: at the end of the video they are running LabVIEW on a 128 core machine!!
unclebump
2008-08-09 15:40:05 UTC
Permalink
Sounds like an application for facial recognition in a large crowd.
Root Canal
2008-08-11 14:40:07 UTC
Permalink
nothing that innocent.
cguzman
2008-08-13 16:10:07 UTC
Permalink
Root,&nbsp;As has been mentioned in this post, we were able to build a 16 core system running LabVIEW RT. We actually ran a few machine vision algorithms with our NI Vision Development Module (VDM) library and were able to see how the threads were balanced evenly among the 16 different cores (in 4 different cpu's). There is no theoretical limit to the number of cores LabVIEW and VDM can take advantage of. As it has also been mentioned, all LabVIEW and VDM are doing is chunking up the processing into as many threads as you have parallel code sequences and the OS is balancing the load between the cores and sending an incoming thread to the appropriate core (actually, with LabVIEW Real-Time you can choose which processor to execute on using RT's API or Timed Structures). The constraint you are going to come up against is a physical constraint. I have no idea about how many cores you can load up in a single motherboard at the time. The 16 core machine was pretty impressive to me and I didn't build it, so I can't give you advice on that. If you believe you can have a large number of tasks executing in parallel and even 16 cores might not do it for you, you might want to try parallel computing through &quot;distributed computing&quot; rather than &quot;multicore computing&quot;. This is much more scalable at the time than trying to add more cores to a single computer system but it does require you to write code to coordinate thread dispatching and results gathering. You also have to regroup the results after the processing but I would say the major downside to this is network latency. Of course, with the budget you are mentioning, you can easily create a local Gigabit ethernet subnet and minimize this impact. The way you could dispatch processing to other computers/targets in LabVIEW is through VI Server.Just some ideas to consider. By the way, the vision algorithms which were&nbsp;multi-cored internally for version 8.6 of the NI Vision Development Module are: &bull;Convolution
&bull;Cross Correlation
&bull;Concentric Rake
&bull;Gray Morphology
&bull;Image Absolute Difference
&bull;Image Addition
&bull;Image Complex Division
&bull;Image Complex Multiplication
&bull;Image Division
&bull;Image Expansion &bull;Image Logical AND
&bull;Image Logical OR
&bull;Image Logical XOR
&bull;Image Multiplication
&bull;Image Re-sampling &bull;Image Subtraction
&bull;Image Symmetry
&bull;Morphology
&bull;Nth Order
&bull;Rake
&bull;Spoke &nbsp;Sincerely,&nbsp;CarlosSoftware Group ManagerNational Instruments
Sankara
2008-08-13 17:40:06 UTC
Permalink
Root, &nbsp;Apart from the hardware support for the number of cores, if you are using Pharlap for running LabVIEW Real-Time, then you are currently limited to using 32 cores (It is the OS limitation). If you are running LabVIEW on any Windows OS, then the limit is the number of cores the version of the Windows supports.&nbsp;Sincerley,Sankara (National Instruments)
Loading...