I have 2 tables roughly defined as follows (using mysql btw):
to help put it in context, a "bundle" is basically a list of software packages to be installed on the machine "machine_name" in a specific order. state is "new", "processing", or "complete" depending on whether the software packages havent begun installing, have begun installing but arent yet complete, or have finished installing, respectively.
the actual packages/install jobs are stored in a seperate table:
bundle_id is the id corresponding to the bundle/machine the install job is for. state is "pending" or "complete", depending on whether or not the installation of the package on machine_ipaddr is complete. seq_num is the number corresponding to the order in which that package is to be installed *for a particular bundle/machine*
so i might have something like:
i have a routine get_jobs() that needs to do the following:
for each machine_name that is not in a state "complete"
- find the package in install_jobs with the lowest seq_num, that is not in a state "complete", and belongs to that machine/bundle
- add (machine_name, package_name) to the @jobs array
so if i ran get_jobs() on the above tables i'd get the following list:
nothing for "machine1" is listed because it was already marked complete. "crysis" was chosen over "counterstrike" because it had a lower sequence number...ditto for "doom" over "mariokart".
make sense? then i'd do the install, mark those installs as "complete", and repeat.
so the lazy version is something like:
which is okay i guess, but this is gonna be fairly heavilly trafficked, and its a web query so if i do a bajillion SQL queries in the for() loop it might time out or something (could be over a thousand machines)... and possibly for reasons of concurrency i'd rather have all this done in 1, nice little SQL query. i.e.
@jobs = <insert awesome SQL statement here>
hopefully this doesnt come off too messy. its actually a nice, clean little problem to think about when you cut to the root of it.
i blew half a day trying to figure this out, reading up on joins and subselect statements and whatnot and i think i'm a bit out of my element here, so any help would be MUCH appreciated. and if you dont think its possible let me know so i can stop wasting my time on this, and maybe reconsider the db structure a bit
im hitting the sack but ill check back in the morning. TIA
Code:
[b]table: bundles[/b]
bundle_id INT
machine_name VARCHAR
state VARCHAR
to help put it in context, a "bundle" is basically a list of software packages to be installed on the machine "machine_name" in a specific order. state is "new", "processing", or "complete" depending on whether the software packages havent begun installing, have begun installing but arent yet complete, or have finished installing, respectively.
the actual packages/install jobs are stored in a seperate table:
Code:
[b]table: install_jobs[/b]
package_name VARCHAR
bundle_id INT
state VARCHAR
seq_num INT
bundle_id is the id corresponding to the bundle/machine the install job is for. state is "pending" or "complete", depending on whether or not the installation of the package on machine_ipaddr is complete. seq_num is the number corresponding to the order in which that package is to be installed *for a particular bundle/machine*
so i might have something like:
Code:
[b]bundle_id, machine_name, state[/b]
1, "machine1", "complete"
2, "machine2", "processing"
3, "machine3", "new"
[b]package_name, bundle_id, state, seq_num[/b]
"quake", 1, "complete", 1
"counterstrike", 2, "pending", 2
"crysis", 2, "pending", 1
"doom", 3, "pending", 1
"mariokart", 3, "pending", 2
i have a routine get_jobs() that needs to do the following:
for each machine_name that is not in a state "complete"
- find the package in install_jobs with the lowest seq_num, that is not in a state "complete", and belongs to that machine/bundle
- add (machine_name, package_name) to the @jobs array
so if i ran get_jobs() on the above tables i'd get the following list:
Code:
("machine2", "crysis")
("machine3", "doom")
nothing for "machine1" is listed because it was already marked complete. "crysis" was chosen over "counterstrike" because it had a lower sequence number...ditto for "doom" over "mariokart".
make sense? then i'd do the install, mark those installs as "complete", and repeat.
so the lazy version is something like:
Code:
@jobs = ()
@machines = SELECT machine_name FROM bundles WHERE state != 'complete'
foreach $machine_name in @machines:
$package_name = SELECT package_name FROM bundles, install_jobs WHERE bundles.bundle_id = install_jobs.bundle_id AND install_jobs.state = 'pending' ORDER BY seq_num ASC LIMIT 1
add ($machine_name, $package_name) to @jobs list
return @jobs list
which is okay i guess, but this is gonna be fairly heavilly trafficked, and its a web query so if i do a bajillion SQL queries in the for() loop it might time out or something (could be over a thousand machines)... and possibly for reasons of concurrency i'd rather have all this done in 1, nice little SQL query. i.e.
@jobs = <insert awesome SQL statement here>
hopefully this doesnt come off too messy. its actually a nice, clean little problem to think about when you cut to the root of it.
i blew half a day trying to figure this out, reading up on joins and subselect statements and whatnot and i think i'm a bit out of my element here, so any help would be MUCH appreciated. and if you dont think its possible let me know so i can stop wasting my time on this, and maybe reconsider the db structure a bit
im hitting the sack but ill check back in the morning. TIA