Frequent inactive GPU client problem

APOLLO

[H]ard|DCer of the Month - March 2009
Joined
Sep 17, 2000
Messages
9,089
I have one system that has developed a rather annoying problem. Every couple of hours I notice one or more of its GPU clients stalls and the GPU core becomes idle. The client appears to be working but it isn't, and it won't resume unless I restart the client or 'coax' it with the affinity tool. It's as if the priority level drops to the lowest and the SMP client takes over the reigns if that makes any sense.

This is the same system that has another problem with the GPU3 client functioning very slowly with two G92 cards that I posted about here. I don't use GPU3 on this system due to the slowdown. In any case, I don't know if the two problems are related and reluctantly decided to post a new thread. The inactive client problem I'm experiencing now is with the GPU2 client. This stalling affects all three clients in this system at random intervals. Anyone seen this before?

It's vexing issues like these that has really induced a feeling of wasted effort lately, and has woken me up to a lot of 'realities' about F@H. :rolleyes: :mad:
 
i'm wondering if theres something wrong with that system. maybe the OS is screwed up or a drive is dying or hell the motherboard. only other thing i could think of is try stressing the hell out of that system and see if anything fails.
 
This system has experienced unexplainable anomalies since day one that I have never seen before, and I've been working on systems for over a decade and a half. Despite this, I am not yet convinced these issues stem from the system itself, because it tends to work more often than not and remains my top producer. Otherwise, I would have ditched it two years ago when I first built the forsaken POC.
 
have you been running the same OS install since the problems started?
 
have you been running the same OS install since the problems started?
I have reinstalled the OS several times in the last two years. This problem is very recent and the current OS was installed only a few months ago. I seriously doubt it's the OS (Win 7-64) but the answer to your question is yes.
 
just wondering if maybe something got screwed up in the install. have had it happen before.

hell maybe the systems just cursed lol who knows. :D
 
just wondering if maybe something got screwed up in the install. have had it happen before.
You've seen this before and reinstalling the OS solved it?

hell maybe the systems just cursed lol who knows. :D
That's what I always say, LOL. There are at least two other issue I've seen that are more serious and even more perplexing. Fortunately, I eventually overcame them for the most part. These are hardware related motherboard anomalies but prevented me from seeing the full potential of the system for at least a year and half, and my production in that time suffered as a result. I don't want to calculate what I could have produced if this POS system had only worked the way it was intended to when I built it back in Jan '09... :rolleyes:

Anyway, I'll be glad to replace the board whenever that happens, but likely won't any time soon now that my future in F@H is highly uncertain.
 
let me guess thats the skulltrail system your having the issues with right?
 
let me guess thats the skulltrail system your having the issues with right?
Yep, and now I'm noticing one client is crashing. I'm wondering if any WUs are having stability problems.
 
Yep, and now I'm noticing one client is crashing. I'm wondering if any WUs are having stability problems.

I've just checked my systems and had a quick check over at FF, nothing major is happening with regard to GPU failures - most of the current queries seem to be around the classic client
 
Well, the first thing I'd be doing is swapping that card over to another box and see if the problem follows the card. If not, then a certain reinstall of the OS - maybe move to a different variety like WinXP to see if that doesn't alter the situation any. Unless you're running the exact same bad WU I can't see how it's client related...

I know this is OT, but I've got a MB I'm about to RMA just because I can't pin down the exact problem that causes it to reboot without warning sometimes. I've replaced the RAM, CPU and GPU, brought it back down to standard clock speeds, even reinstalled with a different OS and I still get the same results. Sometimes it'll go for a week and not reboot, and other times it'll reboot every other day, no rhyme or reason. The only common denominator left is the MB. :(
 
Hmm, the only thing that has been changed is the FSB increased by 1MHz up to 424MHz yesterday. I set the clock back and restarted the system over 10 minutes ago although don't have an idea how a slight FSB OC would affect the GPU client if the SMP client is OK. I'll closely monitor the situation and if I notice problems again, I'll need to do something else more drastic...

Well, the first thing I'd be doing is swapping that card over to another box and see if the problem follows the card. If not, then a certain reinstall of the OS - maybe move to a different variety like WinXP to see if that doesn't alter the situation any. Unless you're running the exact same bad WU I can't see how it's client related...
I wish I could move the card but that would entail a massive effort because I don't have any other system with free slots. I would need to remove parts from another system. Also, the card is huge and won't fit in most of my other cases without issues. I'm kind of stuck and will need to sort the problem out in this machine even if it means reinstalling everything. Hope it won't come to that. /sigh

I know this is OT, but I've got a MB I'm about to RMA just because I can't pin down the exact problem that causes it to reboot without warning sometimes. I've replaced the RAM, CPU and GPU, brought it back down to standard clock speeds, even reinstalled with a different OS and I still get the same results. Sometimes it'll go for a week and not reboot, and other times it'll reboot every other day, no rhyme or reason. The only common denominator left is the MB. :(
This motherboard has definitely had weird issues. There's no way I'm going to RMA it though. That should have been done two years ago when I purchased it from a seller (or sent it back to him). It will likely cost me and don't want to spend a dime on what is now a nearly worthless product.

It has 'worked' for the most part, just not the way it was meant to work or the way I expected it. Big purchase mistake there. The only reason I never RMAed the board was due to its functioning OK most of the time, albeit with poor CPU performance until this past summer, and the fact I had bad experiences with the RMA procedure in the past. I know there are people who RMA components at the slightest cause. I'm the opposite and regret it now. /double sigh
 
Have you tried changing the power supply? I have found over the years that systems that do flakey things can sometimes be fixed with a different PSU,
 
Yah I was just thinking the same thing. I had a system that I fought for a year or two and finally the PSU died. I replaced it with a PC Power and Cooling and never had another problem with the board.

I know its a stretch but...
 
really dumb questions but i know on my mohter board i had a molex connector that gave more power to the video cards when running more then one in the system. for month i never noticed it.

does your board have and a molex connector and its not being used?
 
FWIW Apollo, I wouldn't be surprised if there are several people out there who would be interested at buying your Skulltrail system for a decent price. There is still something cool about them :cool:

Regarding your issue, I have never seen it before. Whenever my GPU client hangs, it is obvious.

One quick idea though: did windows update install a new nvidia driver?
 
I think I need to drive up to Apollo's and lend him a hand.
 
Have you tried changing the power supply?
No, not recently, mainly because I've changed PSUs a few times since I got the board. Whenever I restart the system it works fine for a little bit then I get the clients stalling on me. BTW, this always happens when the client downloads a new WU in case that means anything to you GPU gurus.

Yah I was just thinking the same thing. I had a system that I fought for a year or two and finally the PSU died. I replaced it with a PC Power and Cooling and never had another problem with the board.

I know its a stretch but...
It could be but I don't really have a spare PSU of equivalent W rating (the board is a power hog) to try without cannibalizing from another system. If it comes to requiring a new PSU, I think I would just shut it down because there's no way I'm putting a dime on it unless I can somehow recover it in the future. A PSU is reusable because it can be installed elsewhere, but I'm still reluctant to do it for this system.

really dumb questions but i know on my mohter board i had a molex connector that gave more power to the video cards when running more then one in the system. for month i never noticed it.

does your board have and a molex connector and its not being used?
It has one molex connector in addition to all the other standard power connectors. I had always connected a molex cable to it just in case it needed the extra juice, so the extra power should be utilized by the board. If I ever get an SR-2, I will use every single power connector that's on the board because I'm thorough that way and don't want the 'what if' going through my head.

FWIW Apollo, I wouldn't be surprised if there are several people out there who would be interested at buying your Skulltrail system for a decent price. There is still something cool about them :cool:
It was cool when I got it but the novelty quickly wore off when I found what a major hassle it was getting it all set up and working properly; nothing like the relative ease of the SR-2 from all the posts I read. Very envious of you guys and the little issues you had to get your systems running. It took me at least 18 months...and that's after trying 4 widely different pairs of processors from dual-cores to quad-cores, from 65nm to 45nm, etc. Not to mention different memory modules galore until I finally came upon a combination that worked as expected, meaning no weird performance issues like one-half to one-third the normal TPFs in SMP. Either it's the most finicky board ever made or I'm a total dunce. :confused:

Regarding selling it: there's two problems with that. I will never see a decent fraction of what I invested even though I paid a 'discount' for it. Second, because there are known anomalies (whether it's normal for the product or I have a lemon) I cannot ever feel comfortable selling this board. It goes against my code of honor.

One quick idea though: did windows update install a new nvidia driver?
I always disable Windows Update. It annoys me.

I think I need to drive up to Apollo's and lend him a hand.
Maybe some time when the season changes or next time you're in the area.
 
Maybe some time when the season changes or next time you're in the area.
It's been spring here the past week. However, I do need to check with the border agents for other reasons.. just kidding. :p
 
thats why you buy a nice ass 80+ gold psu so that you can use it through a bunch of system upgrades. :D that or if it doesnt fix the problem, throw it back in the box and resell it for what you paid or just return it. thats what bestbuy/fry's/microcenter is good for. ;)
 
thats why you buy a nice ass 80+ gold psu so that you can use it through a bunch of system upgrades. :D that or if it doesnt fix the problem, throw it back in the box and resell it for what you paid or just return it. thats what bestbuy/fry's/microcenter is good for. ;)
You know what, I might just get a PSU for the kick of it if all else fails. I'll have Zero recommend one for me.
 
Tiger's posted that he has a skulltrail system as well - might be worth dropping him a PM to see if he has any insights
 
Is your GPU heavily over-clocked (silly question around here)? I've had the same problem with GPU3 wu's sitting there idle (every couple of days or so, but never any corruption/ rejections). I've eased back the clocks a little and haven't seen it happen all week. Worth a try..
 
Is your GPU heavily over-clocked (silly question around here)? I've had the same problem with GPU3 wu's sitting there idle (every couple of days or so, but never any corruption/ rejections). I've eased back the clocks a little and haven't seen it happen all week. Worth a try..
The card is a GTX 295 with 640/512 clocks. I'll lower the frequency if I encounter the same problem again but I believe this isn't the problem, nor is it a PSU issue either...

I think I may have found the cause but it's still too early to be absolutely certain. I changed over to the GPU3 client overnight and the card folded GPU3 P1117x WUs fine for 10 hours straight. No stalls, no crashes. I don't want to stay with the GPU3 client on this card because the production is down nearly 2000 PPD for each GPU, as many of you already know due to GPU3 not being optimized for the GT200 architecture.

So, the problem seems to be centered around the recent spate of 787-point GPU2 WUs. These were once rare WUs, or at least relatively uncommon. Usually, I see 353s, 450s, 783s, etc., but until the stalls and crashes started occurring, I didn't fold that many 787s very often. Yesterday, all the WUs I received in my few remaining GPU2 clients were almost exclusively 787s. The crashes were afflicting GPU1 when both GPUs were working on 787s. Is it possible this card does not like working with 787s on both GPUs?

I don't know if I should pursue further inquiry into this WU series and my card crashes with PG if no one else was seeing the same problem. :confused:
 
Back
Top