Slow internet connection +bigadv

DeFex · Jan 21, 2011

I am using the GPU Tracker and I accidentally stopped the client and the router right when it was trying to send work to the server and was sitting at 100%, anyways after i restarted it a new WU started right away, and it sent the old WU at the same time as it was running.

(here i turned off bigadv because im using my computer for stuff)

Code:

[01:42:18] Assembly optimizations on if available.
[01:42:18] Entering M.D.
[01:42:24] Completed 0 out of 500000 steps  (0%)
[01:44:27] Completed 5000 out of 500000 steps  (1%)
[01:44:40] - Couldn't send HTTP request to server
[01:44:40] + Could not connect to Work Server (results)
[01:44:40]     (171.67.108.22:8080)
[01:44:40] + Retrying using alternative port
[01:46:22] Completed 10000 out of 500000 steps  (2%)
[01:48:07] Completed 15000 out of 500000 steps  (3%)
---------------etc---------------------------
[02:16:07] Completed 85000 out of 500000 steps  (17%)
[02:16:36] + Results successfully sent
[02:16:36] Thank you for your contribution to Folding@Home.
[02:18:20] Completed 90000 out of 500000 steps  (18%)
[02:20:37] Completed 95000 out of 500000 steps  (19%)

If there was a way to trick it in to that kind of behavior automatically it could be helpful, My upload speed is only 500k or so and it takes ages (maybe 20 min to 1/2 hour it seems, but I never timed it) to send a bigadv. If you had a SR-2 and 128k upload, you could probably do most of a a regular SMP WU while its uploading instead of sitting idle.

Do you think I ought to make a request for this as an option in GPU tracker?

dmolter · Jan 21, 2011

What I ended up doing with my network is setting up a pfsense box and enabling QoS. When going through the QoS wizard, you can set a penalty IP. You can then choose how much upload and download you want this ip to use. This definitely helped my internet connection out bunches and went from not being able to browse the internet when uploading/downloading bigadv units to browsing on the web and not even noticing that I was uploading or downloading a new unit. If you need any more help, I can take screen shots of my setup if you go with this method.

Btw, I did this back when I had the 3/600k package from my local isp

musky · Jan 21, 2011

It is not part of GPU tracker, but there is a decoupler for uploading and download WUs called Langouste. It works very well. I have it on about half my machines. The biggest problem is forgetting to start it, which i do constantly.

http://foldingforum.org/viewtopic.php?f=14&t=11615

DeFex · Jan 21, 2011

@dmolter Problem is that the client stops for the first upload, yet it can upload while running if it "fails" folding time is being wasted while it waits for the upload.

@musky, thats interesting, maybe it could be added in!

musky · Jan 21, 2011

if you can adjust the proxy settings on the config file through tracker (or if you do it manually), it would work as-is.

may i be worthy · Jan 21, 2011

musky said:
It is not part of GPU tracker, but there is a decoupler for uploading and download WUs called Langouste. It works very well. I have it on about half my machines. The biggest problem is forgetting to start it, which i do constantly.

http://foldingforum.org/viewtopic.php?f=14&t=11615

Bloody hell, that is a well kept secret. Thanks for that - it could net me ~6000 PPD.

Want. To. Be. Folding. Again.

musky · Jan 22, 2011

It has probably cost me 100K because I keep forgetting to turn it on after a reboot. Other than operator error, it works well.

may i be worthy · Jan 22, 2011

Now who was I racing recently where I really could have used an extra 6000ppd?

Your discretion and timing are noted.

EDIT: So I take it it splits out the work unit into a temp dir and sets the original working on a new unit. I suppose I could add to HFM the temp directories as well to check on them. But does it respect a oneunit flag? (I know I will forget to stop Langouste just like you will forget to start it)

tjmagneto · Jan 22, 2011

I remember hearing about the decoupler when I running Linux VMs but did not know there was a Windows version. Thanks, Musky.

musky · Jan 22, 2011

may i be worthy said:
EDIT: So I take it it splits out the work unit into a temp dir and sets the original working on a new unit. I suppose I could add to HFM the temp directories as well to check on them. But does it respect a oneunit flag? (I know I will forget to stop Langouste just like you will forget to start it)

To over-simplify, it just prevents the fah client from uploading results itself and instead spawns off a separate "helper" process that does the uploading. It allows download requests to pass through, so the fah client actually does download. When you finish a WU, you see this:

Code:

[18:40:35] Completed 250000 out of 250000 steps  (100%)
[18:40:46] DynamicWrapper: Finished Work Unit: sleep=10000
[18:40:56] 
[18:40:56] Finished Work Unit:
[18:40:56] - Reading up to 52713120 from "work/wudata_09.trr": Read 52713120
[18:40:56] trr file hash check passed.
[18:40:56] - Reading up to 46989088 from "work/wudata_09.xtc": Read 46989088
[18:40:57] xtc file hash check passed.
[18:40:57] edr file hash check passed.
[18:40:57] logfile size: 201096
[18:40:57] Leaving Run
[18:40:59] - Writing 100071244 bytes of core data to disk...
[18:41:01]   ... Done.
[18:41:25] - Shutting down core
[18:41:25] 
[18:41:25] Folding@home Core Shutdown: FINISHED_UNIT
[18:41:30] CoreStatus = 64 (100)
[18:41:30] Sending work to server
[18:41:30] Project: 6900 (Run 28, Clone 14, Gen 22)


[18:41:30] + Attempting to send results [January 21 18:41:30 UTC]
[18:41:30] - Couldn't send HTTP request to server
[18:41:30] + Could not connect to Work Server (results)
[18:41:30]     (130.237.232.141:8080)
[18:41:30] + Retrying using alternative port
[18:41:30] - Couldn't send HTTP request to server
[18:41:30] + Could not connect to Work Server (results)
[18:41:30]     (130.237.232.141:80)
[18:41:30] - Error: Could not transmit unit 09 (completed January 21) to work server.
[18:41:30]   Keeping unit 09 in queue.
[18:41:30] Project: 6900 (Run 28, Clone 14, Gen 22)


[18:41:30] + Attempting to send results [January 21 18:41:30 UTC]
[18:41:30] - Couldn't send HTTP request to server
[18:41:30] + Could not connect to Work Server (results)
[18:41:30]     (130.237.232.141:8080)
[18:41:30] + Retrying using alternative port
[18:41:30] - Couldn't send HTTP request to server
[18:41:30] + Could not connect to Work Server (results)
[18:41:30]     (130.237.232.141:80)
[18:41:30] - Error: Could not transmit unit 09 (completed January 21) to work server.


[18:41:30] + Attempting to send results [January 21 18:41:30 UTC]
[18:41:30] - Couldn't send HTTP request to server
[18:41:30] + Could not connect to Work Server (results)
[18:41:30]     (130.237.165.141:8080)
[18:41:30] + Retrying using alternative port
[18:41:30] - Couldn't send HTTP request to server
[18:41:30] + Could not connect to Work Server (results)
[18:41:30]     (130.237.165.141:80)
[18:41:30]   Could not transmit unit 09 to Collection server; keeping in queue.
[18:41:32] - Preparing to get new work unit...
[18:41:32] Cleaning up work directory
[18:41:37] + Attempting to get work packet
[18:41:37] Passkey found
[18:41:37] - Connecting to assignment server
[18:41:38] - Successful: assigned to (171.67.108.22).
[18:41:38] + News From Folding@Home: Welcome to Folding@Home
[18:41:38] Loaded queue successfully.
[18:42:10] Project: 6900 (Run 28, Clone 14, Gen 22)


[18:42:10] + Attempting to send results [January 21 18:42:10 UTC]
[18:42:11] - Couldn't send HTTP request to server
[18:42:11] + Could not connect to Work Server (results)
[18:42:11]     (130.237.232.141:8080)
[18:42:11] + Retrying using alternative port
[18:42:11] - Couldn't send HTTP request to server
[18:42:11] + Could not connect to Work Server (results)
[18:42:11]     (130.237.232.141:80)
[18:42:11] - Error: Could not transmit unit 09 (completed January 21) to work server.


[18:42:11] + Attempting to send results [January 21 18:42:11 UTC]
[18:42:11] - Couldn't send HTTP request to server
[18:42:11] + Could not connect to Work Server (results)
[18:42:11]     (130.237.165.141:8080)
[18:42:11] + Retrying using alternative port
[18:42:11] - Couldn't send HTTP request to server
[18:42:11] + Could not connect to Work Server (results)
[18:42:11]     (130.237.165.141:80)
[18:42:11]   Could not transmit unit 09 to Collection server; keeping in queue.
[18:42:11] + Closed connections
[18:42:11] 
[18:42:11] + Processing work unit
[18:42:11] Core required: FahCore_a3.exe
[18:42:11] Core found.
[18:42:11] Working on queue slot 00 [January 21 18:42:11 UTC]
[18:42:11] + Working ...
[18:42:11] 
[18:42:11] *------------------------------*
[18:42:11] Folding@Home Gromacs SMP Core
[18:42:11] Version 2.22 (Mar 12, 2010)
[18:42:11] 
[18:42:11] Preparing to commence simulation
[18:42:11] - Assembly optimizations manually forced on.
[18:42:11] - Not checking prior termination.
[18:42:16] - Expanded 25469259 -> 31941441 (decompressed 125.4 percent)
[18:42:16] Called DecompressByteArray: compressed_data_size=25469259 data_size=31941441, decompressed_data_size=31941441 diff=0
[18:42:16] - Digital signature verified
[18:42:16] 
[18:42:16] Project: 2686 (Run 8, Clone 13, Gen 58)
[18:42:16] 
[18:42:16] Assembly optimizations on if available.
[18:42:16] Entering M.D.
[18:42:25] Completed 0 out of 250000 steps  (0%)
[18:49:22] Project: 6900 (Run 28, Clone 14, Gen 22)


[18:49:22] + Attempting to send results [January 21 18:49:22 UTC]
[18:49:22] - Couldn't send HTTP request to server
[18:49:22] + Could not connect to Work Server (results)
[18:49:22]     (130.237.232.141:8080)
[18:49:22] + Retrying using alternative port
[18:49:22] - Couldn't send HTTP request to server
[18:49:22] + Could not connect to Work Server (results)
[18:49:22]     (130.237.232.141:80)
[18:49:22] - Error: Could not transmit unit 09 (completed January 21) to work server.


[18:49:22] + Attempting to send results [January 21 18:49:22 UTC]
[18:49:22] - Couldn't send HTTP request to server
[18:49:22] + Could not connect to Work Server (results)
[18:49:22]     (130.237.165.141:8080)
[18:49:22] + Retrying using alternative port
[18:49:22] - Couldn't send HTTP request to server
[18:49:22] + Could not connect to Work Server (results)
[18:49:22]     (130.237.165.141:80)
[18:49:22]   Could not transmit unit 09 to Collection server; keeping in queue.
[18:55:17] Completed 2500 out of 250000 steps  (1%)

So the time between when it first starts trying to send results and when it is processing a new unit is around 1 minute:

[18:41:30] + Attempting to send results [January 21 18:41:30 UTC]
[18:42:25] Completed 0 out of 250000 steps (0%)

Eventually, you see this in the log:

[00:49:22] Project: 6900 (Run 28, Clone 14, Gen 22)
[00:49:22] - Error: Could not get length of results file work/wuresults_09.dat
[00:49:22] - Error: Could not read unit 09 file. Removing from queue.

This looks concerning, but it isn't. That is just telling you that the upload happened since the last time the fah client tried to upload. You will also notice that this message was over 2 hours from when it first tried to upload. This used to worry my, until I looked at the log a little closer. The fah client tried to upload immediately, then it tries to upload 5 minutes later. If it fails both times, it waits 2 hours before it tries again. So, the actual upload finished sometime between 5 minutes and 2 hours 5 minutes from finishing. Based on the actual WU value in the stats, the upload happened in the normal amount of time from finishing (15 - 20 minutes for me.) So, you really don't lose anything, and gain 15-20 minutes processing time on the new unit.

Like I said, the only issue is "operator error" forgetting to start the thing. As long as you remember to start it with your fah client, it works perfectly.

may i be worthy said:
Now who was I racing recently where I really could have used an extra 6000ppd?

Your discretion and timing are noted.

You still won...quit your griping...

tjmagneto said:
I remember hearing about the decoupler when I running Linux VMs but did not know there was a Windows version. Thanks, Musky.

Yeah, it was only Linux for a long time. I used it back when i had 4 I7 920s running Linux and a painfully slow upload speed. When i went to 1 Mb up, it wasn't nearly as useful. I switched back (and found out it worked with Windows now) when my cable company decided I needed to go back to 128K up.

[H]ugh_Freak · Jan 22, 2011

Musky.. 18:00 to 00:00 is 6 hours .. not 2 BTW

musky · Jan 22, 2011

[H]ugh_Freak;1036739158 said:
Musky.. 18:00 to 00:00 is 6 hours .. not 2 BTW

True...way too early on a Saturday for math...

So it uploads somewhere between 5 minutes and 6 hours 5 minutes from finish.

Slow internet connection +bigadv

DeFex

[H]ard|DCer of the Month - June 2011

dmolter

Limp Gawd

musky

[H]ard|DCer of the Year 2012

DeFex

[H]ard|DCer of the Month - June 2011

musky

[H]ard|DCer of the Year 2012

may i be worthy

[H]ard|DCer of the Month - December 2010

musky

[H]ard|DCer of the Year 2012

may i be worthy

[H]ard|DCer of the Month - December 2010

tjmagneto

[H]ard DCOTM x2

musky

[H]ard|DCer of the Year 2012

[H]ugh_Freak

[H]ard|Gawd

musky

[H]ard|DCer of the Year 2012