ConferenceRoom

T2000



Sun sent us a T2000 server as part of their Try N Buy program.

We had an amusing time trying to get the serial port on the machine to work. Apparently our RJ45-to-DB9 adapter was miswired or broken or something, and it seemed to be the only one we had. I advise Sun to either ship the T2000 with the network management port enabled and set to use DHCP or to ship with a cable that connects the serial management port to a PC.

Once we finally got that straightened out, we noticed that the management scheme is extremely cool. There's both a serial port and a fast ethernet port just for the monitoring system. From either port, you can power the main guts on and off, change some configuration settings, or access the system console. This is very nice stuff, and it works well.

One nit, the networking software for the management system does not seem to understand CIDR correctly. When I tried to configure it inside our main LAN /22, it gave nonsensical error messages. I had to put the management interface onto a /24 to get it to work. Sun, if you're listening, please fix this in a future firmware update.

Once you get the system started, it runs through the typical Solaris installation stuff. It starts sshd for you, a nice touch. Also, bash was pre-installed. God bless you, Sun.

Installing tools like gdb, gcc, and others was as simple as downloading from Sun's web site and running the package installer. Other tools were easy enough to compile from source once things like gcc were installed. (I can't live without joe.)

Getting ConferenceRoom to compile on Solaris for the Sparc processor was not that painful. Our code used to compile on Solaris 6, and most of the work was removing workarounds for warts in Solaris 6 that were fixed in Solaris 10.

The first problem was that the network I/O code would not work correctly. For some reason, the DP_POLL ioctl was returning EINVAL. For some strange reason, if the limit on file descriptors is 256 and you ask for up to 256 notifications, the ioctl returns an error. I can't see why this shouldn't be allowed (Sun? Is this a defect?) but the fix was trivial -- don't allow a poll set to include that close to the maximum number of file descriptors we can support. (We need one for /dev/poll and we don't use 0-2 anyway.)

Then there were a few issues in the spinlock code. Before Sun supported pthread_spinlock_* functions, we used our own hand-coded ones. All of our other platforms either used our x86 spinlocks or didn't have POSIX spinlocks, so this was a first for that code. It had suffered from some bitrot when the lock debug instrumentation code changed.

The next issue was the atomic operation code. Sun includes a lot of nice atomic operations (increment, decrement, OR, AND, and CAS for 8, 16, 32-bit values) that are presumably much faster than our portable code. So we added support for those operations. (It has functions for 64-bit values too, but they seem to be only usable by the kernel.)

That got most of our code working perfectly. One remaining issue was a certain programmer who shall remain nameless who assumed that all processors store 32-bit unsigned integers the same way if you access them as characters. In two places. That were very hard to find. He has been counseled.

The last time I did a make -j 32 on our code was when we were making a Cray build for laughs. Doing so on the T2000 unveiled a defect in three of our make files. I had to fix that, watching a page of gcc invocations scroll by every few seconds is just too much fun to miss out on.

There were only two other build issues. One is that I couldn't figure out the right way to get the GMT offset for a struct tm in local time. The timezone/altzone symbols were not getting defined and there is no tm_gmtoff. We have ugly fallback code, but I'd love to know the right way. The other was the absence of statfs, so I had to change the code to include the statvfs code instead (like we do for Linux).

That was it, the code built and ran. And ran. And ran.

I tested ConferenceRoom with 10,000 simulated flooding clients and had no problem. The machine and the chat server stayed responsive, with some infrequent burstiness from the chat server. I think the burstiness is a tuning issue, likely in our memory code which wasn't really designed to handle the level of concurrency this machine is capable of. Either that or we're just not starting enough threads.

Not a bad first day with the T2000.



Maintained by David Schwartz, questions may be sent to WebMaster Support

Contents Copyright (C) 2006, WebMaster