Discussion:
cRIO disconnects from Ethernet network and stops responding to pings
(too old to reply)
JohnZ
2008-08-01 00:10:05 UTC
Permalink
We are running a cRIO application that periodically disconnects from the Ethernet network even though the VI continues to run on the cRIO.The application has two parallel loops.  One loop reads data from the FPGA and writes it to a RT FIFO.  The second loop opens a TCP connection to a remote host, reads data from the RT FIFO, and transmits the data to the remote host.  We have instrumented each loop to send messages to the serial port that indicate their status.  When the cRIO disconnects, both loops continue to send their status messages to the serial port. The first loop happily continues reading data from the FPGA and writing it into the RT FIFO.  The RT FIFO overwrites the oldest data when it fills up to keep it from overflowing.  The second loop reports a TCP write timeout (tcp write error 56) around the time that the cRIO disconnects, after which it closes the existing TCP connection and then repeatedly tries to open a new TCP connection (each attempt times out after a 5 second delay).All that appears normal and usually the connection is automatically restored.  Occasionally, however, the cRIO is not able to re-open the TCP connection.  Furthermore, the cRIO stops responding to pings from the remote host.  On two occasions, the cRIO was finally able to open a new connection after 30-40 minutes.  Other than that, the only way we've found out of this dilemma is to reboot the cRIO.The frequency of the disconnects and the time before they occur doesn't seem connected to whether or not the cRIO is set to run the application at startup or if the application is run via an interactive VI window on the remote computer.It is possible that the disconnects have something to do with which computer the cRIO is connected to, as the problems seem to happen much more frequently (often within minutes) when the cRIO is sending data to a laptop running Windows XP, but it can run for hours when sending data to a rackmount server running Windows Server 2003.  Everything else (the speed and size of data being sent, the ethernet hub and cables, the VI running on the remote computer) is the same.
Jeremy_B
2008-08-01 18:40:18 UTC
Permalink
Hi JohnZ, Which real time controller and chassis do you have?  What versions of LabVIEW Real Time and NI-RIO do you have?Do you have any statistics on CPU and memory usage?  If you have anything in your loops which could cause the loop to run continually, it can starve lower priority threads which handle network communications.  Can Measurement & Automation Explorer see the cRIO after you have lost your TCP connection?
JohnZ
2008-08-01 22:10:06 UTC
Permalink
We're done most of the testing with a 9014 real time controller and a 9104 chassis.  However, we've also seen similar behavior with 9014/9103 and 9102/9101 combinations.The 9014 cRIO normally runs at around 50% CPU usage and less than 40% memory usage when the data is flowing steadily.  The RT System Monitor did not indicate a spike in either before the cRIO disconnected. Measurement and Automation Explorer is not able to connect to the cRIO after it disconnects.  We tried pinging the cRIO from a different computer, but that also failed.After disconnecting, the cRIO sends error messages to the serial port while trying to open a new TCP connection. It reports an error 56 (time-out error) 3 or 4 times (for a total duration of 15 or 20 seconds) and then reports an error 42 (generic error) at least 10 times per second for 18 seconds, and then switches back to 15 seconds of timeouts, and so on. When getting error 42, it ALWAYS gets it for 18 seconds straight.  It usually gets error 56 for 15 seconds (3 timeouts) but sometimes for 20 seconds (4 timeouts).When the cRIO disconnects, it can't be contacted by any computer through the ethernet port, yet the link lights still show that it is connected. If its connected directly, both the green and orange link lights on the cRIO light up, but when it is connected through the 10 Mb hub, only the orange link light lights up, but the hub shows that the cRIO is connected to it. In spite of this, it is still impossible to ping the cRIO in either scenario.It is possible that, in addition to what computer the cRIO is connected to, the errors could be related to whether the cRIO has a direct connection to a computer or if it is connected through a hub. Today, we have only managed to get errors when the cRIO is directly connected to a laptop (one error after 1.5 minutes, one after 10 minutes, one after 7 minutes, etc.)  There were no disconnects when the cRIO was connected to a 10 Mbps hub for more than half an hour.Do any of those symptoms give you clues about what is happening?Thanks!
Jeremy_B
2008-08-04 15:40:09 UTC
Permalink
Hi JohnZ, What version of NI-RIO are you using?&nbsp; If it is not the latest (2.4.1), you could try updating it in case there were any bug fixes.&nbsp; <a href="http://joule.ni.com/nidu/cds/view/p/id/1057/lang/en" target="_blank">http://joule.ni.com/nidu/cds/view/p/id/1057/lang/en</a> Would it be possible for you to post your real time VI for us to take a look at?&nbsp; Normally, disconnections are related to either a hardware failure (which sounds unlikely, given that you've reproduced it with different hardware), or from thread starvation.&nbsp; If you post your VI, we can try to reproduce it here.&nbsp; If there is a bug in the driver software itself, we would definitely want to investigate.&nbsp; If you aren't comfortable posting your files here, I can arrange for you to be able to e-mail them to us.
JohnZ
2008-08-04 18:40:18 UTC
Permalink
It turns out the cRIO's do have older versions of NI-RIO (2.3.0 and 2.3.1).&nbsp; We'll install the updates and see if that resolves the issue.Thanks!
Jeremy_B
2008-08-04 19:10:05 UTC
Permalink
Hi JohnZ,
&nbsp;
Let us know if that helps, if it doesn't, I'll definitely want to investigate further.
JohnZ
2008-08-05 20:40:10 UTC
Permalink
We just upgraded the cRIO software to NI-RIO 2.4.1, and the error has changed.Now it will crash once, recover, and then crash again. After the second crash the status LED blinks 4 times (indicating that the cRIO crashed twice without rebooting correctly in between). The time between crashes varies; it has crashed after only 7 minutes, but has also lasted almost a full hour on another run.Neither the CPU nor memory usage spiked before either crash.Unlike before, the cRIO completely stops the program it's running when it crashes (even the serial port stops generating output), but it now responds to pings and can be rebooted through MAX. The application on the cRIO cannot be opened in debug mode after the cRIO's second crash.With the old NI-RIO software, the cRIO would keep running our application but no data could pass through the ethernet port in either direction. With the new NI-RIO software, the cRIO stops running our application, but the ethernet connection remains open.Jeremy_B, if you think it would be beneficial to see our code, we could email it to you.Thanks!
Jeremy_B
2008-08-06 18:40:09 UTC
Permalink
Hi JohnZ, I would like to see it.&nbsp; I don't want you to have to post your e-mail address on the forum (unless you feel comfortable doing so), so if you would like to e-mail me you can send us a support e-mail <a href="www.ni.com/support" target="_blank">(http://www.ni.com/support</a>) and in the text of your e-mail reference service request number 1213457, and they'll route the e-mail to me. You are also welcome to post it here or to our ftp site at ftp://ftp.ni.com/incoming, if you are comfortable with that.
Loading...