Keyword Search

BugginOuT

Limp Gawd
Joined
May 14, 2008
Messages
449
Can someone assist me on how to create a custom script that will search the local disk for keywords like "password" or "pw"? Or point me to the nearest site as to what I need to get started?

Much appreciated..
 
GOOD NEWS! Such a tool already exists, it's called 'grep'. Now you don't have to do anything!
 
GOOD NEWS! Such a tool already exists, it's called 'grep'. Now you don't have to do anything!

Wish it was that simple or perhaps show me how simple it is. Downloaded Grep for Windows (CLI) and trying to execute a number of commands but I don't think I have it right.

C:\Program Files (x86)\GnuWin32\bin>grep -r password *.txt "C:\Bin"
grep: *.txt: No such file or directory
C:\Bin/Test.txt:password

Though it came up with an error, it was able to locate the text file containing the word "password".

Is there a way to run the search against the volume vs. a specific directory?
 
I think your command is most likely malformed

With grep on nix systems you generally use it like

grep [options] pattern file/fileglob

example
Code:
-bash-3.2$ echo "password" >> one.txt
-bash-3.2$ echo "password" >> two.txt

-bash-3.2$ grep 'password' /home/josh/temp/*.txt
/home/josh/temp/one.txt:password
/home/josh/temp/two.txt:password
-bash-3.2$

I usually use cygwin for stuff like this on windows, but just because I'm more comfortable with nix type commands etc.
 
findstr comes with Windows and does the same thing as grep (mostly). For this example,

Code:
grep -r password *.txt "C:\Bin"

you'd just do:

Code:
findstr -s "password"  C:\bin\*.txt

Is there a way to run the search against the volume vs. a specific directory?
What do you mean? byte-wise over the whole disk? Something else?
 
Certainly wouldn't be very speedy hitting a whole disk, depending on how much it is filled.

What exactly are you trying to accomplish beyond the obvious of finding files that have "password" in them?

If you're on a windows system and have indexing turned on, that's not a bad route either.
 
Certainly wouldn't be very speedy hitting a whole disk, depending on how much it is filled.

What exactly are you trying to accomplish beyond the obvious of finding files that have "password" in them?

If you're on a windows system and have indexing turned on, that's not a bad route either.

Due to compliance, we are required to search for password-related items on our Windows environment. Basically, I am trying to create a script that can be executed via Scheduled Task to run at night time, generating an output to a text file so that we can view and remediate accordingly. So Yes, it would have to scan all local drives of a server.

Since "findstr" comes with the MS OS, it's more likely to be approved by Management and found it to work better than Grep, without having to install it.

Is there another way to approach this?
 
Is there another way to approach this?
What's wrong with using findstr? It's not clear what your requirements are -- we still don't know if you mean to look at all files or the whole disk surface (or only the occupied disk surface).

Your requirements might simply be too much. If you have a 3 TB disk drive on a server, and read that drive at 100 megabytes a second, you'll need 30,000 seconds to read the whole drive. That's 8.3 hours.
 
What's wrong with using findstr? It's not clear what your requirements are -- we still don't know if you mean to look at all files or the whole disk surface (or only the occupied disk surface).

Your requirements might simply be too much. If you have a 3 TB disk drive on a server, and read that drive at 100 megabytes a second, you'll need 30,000 seconds to read the whole drive. That's 8.3 hours.

I was just stating if there's another way to tackle this. To answer you question, any file that would contain the keywords (password and/or pw). So one would assume that it needs to scan an entire volume for those keyword(s).

If I'm not answering your question or not making sense, please let me know and I'll try my best to elaborate as much as possible.
 
Are you checking client machines or network shares/file servers? What time frame do you have to complete this within and at what intervals?

mikeblas is asking if you're only checking just the files or the entire disk itself.
 
I was just stating if there's another way to tackle this.
Yes, there are dozens of ways to search a disk drive for information.

If I'm not answering your question or not making sense, please let me know and I'll try my best to elaborate as much as possible.
You've been offered a solution but you have rejected it without saying why you don't like it. You might write your own program; you might take backups of the machine and scan through the backups offline, so the machine's use isn't interrupted. You might index files so you can remember what you've scanned and what you haven't scanned from day-to-day, minimizing your work. If you have lots of servers, you might work out a repository so that you can get a fingerprint of a file and see if it's expected -- if not, scan it. (For example, all computers have FireFox.EXE installed. If it's one of the versions you've previously seen, don't scan it. If it's new, scan it, test it, and record the new fingerprint so no other machine has to scan it again.)

It seems like, if you're concerned with installing grep.exe, you won't be too interested in writing any more elaborate software. But since you're not being clear about what your requirements or limitations are, we are forced to just guess at something that might make you happy.
 
Are you checking client machines or network shares/file servers? What time frame do you have to complete this within and at what intervals?

mikeblas is asking if you're only checking just the files or the entire disk itself.

No client workstations, just file servers. Please see below for the requirements.

Yes, there are dozens of ways to search a disk drive for information.

You've been offered a solution but you have rejected it without saying why you don't like it. You might write your own program; you might take backups of the machine and scan through the backups offline, so the machine's use isn't interrupted. You might index files so you can remember what you've scanned and what you haven't scanned from day-to-day, minimizing your work. If you have lots of servers, you might work out a repository so that you can get a fingerprint of a file and see if it's expected -- if not, scan it. (For example, all computers have FireFox.EXE installed. If it's one of the versions you've previously seen, don't scan it. If it's new, scan it, test it, and record the new fingerprint so no other machine has to scan it again.)

It seems like, if you're concerned with installing grep.exe, you won't be too interested in writing any more elaborate software. But since you're not being clear about what your requirements or limitations are, we are forced to just guess at something that might make you happy.

The requirements:

Create a script to run at off business-hours between 7pm -7am
Scan for all files on a local volume(s) for keywords like "password", "pw", "PWD", etc.
Exclude certain filetypes like .mdb, .ldf, .bak, etc.
Output to a text tile
View text file and remediate accordingly.

I'm open to suggestions, but due to certain limitations, we can't install 3rd party and/or open source software unless it's on the approved list.
 
Last edited:
I'm open to suggestions, but due to certain limitations, we can't install 3rd party and/or open source software unless it's on the approved list.
Since the "approved list" is your pool of options, give us that list. Otherwise, suggestions that rely on helper utilities would continue to get blacklisted before truly being evaluated.

Alternatively, this topic could be a good starting point to for you to push for a policy change to meet new demands and requirements.
 
Since the "approved list" is your pool of options, give us that list. Otherwise, suggestions that rely on helper utilities would continue to get blacklisted before truly being evaluated.

Alternatively, this topic could be a good starting point to for you to push for a policy change to meet new demands and requirements.

As of now, based on the discussion I had with my manager, anything that's Microsoft supported/approved solution. Utilize what's available on the OS.

Powershell is another utility that we can use, but it's not part of Windows 2003 server edition.

This is what we have so far:

@TITLE PSWD Check
@ECHO OFF
@ECHO %DATE%-%TIME% >C:\Result.Out
findstr /s /n /p /i "password psd" c:\*.* >> c:\Result.Out
@ECHO %DATE%-%TIME% >>C:\Result.Out
exit

Is it possible to exclude certain file extensions when searching w. FINDSTR?
 
Powershell is another utility that we can use, but it's not part of Windows 2003 server edition.
True, but PowerShell is supported on Server 2003. MS has a download for it.

Is it possible to exclude certain file extensions when searching w. FINDSTR?
*EDIT* Some options seem possible as well. (Source)

But that should also be weighed against PowerShell and whether its capabilities would be more beneficial.
 
Last edited:
I would setup a Linux VM and within Linux map the entire file server to a directory, read only if you want to be on the extra safe side. Linux has better tools for this, such as grep.

You can use grep to recursively search folders for filenames as well as the content of the files. Check the grep --help page and it will show the parameters. You have to ensure the recursive option is on, think it's -R but I'm just going from memory so you'd have to check. Lookup regex, pretty sure I recall it having a "OR" parameter so you should be able to search for all keywords in one command.

I'm sure there's tools in windows too though, but it will probably cost extra and be more complex and require to install stuff on the server which you probably don't want to do in a production environment.
 
Does findstr cost extra? Why is it more complicated than installing a VM, then Linux inside the VM, and administering both for the rest of the life of the server?
 
While I personally like the general gnu linux utilities better, a VM seems like a lot of overkill. If the two solutions are equal in result and only one is marginally slower, who cares in the end?

You do have to consider maintenance for who may come after you. Will it be a linux guy? If not be sure to document your solution.

If findstr is capable of doing the job (and it is) and it's what's on your approved list - then go for it. I don't know your background/history in IT/Dev work, but sometimes there is reason to stand your ground and go for the "better" (that's subjective) approach - I don't know what your office is like so I can't tell you if it's one of those times or not.

You may want to ask if you can create a process to get software approved, or the ability to install something like perl, python, ruby (though powershell may offer what you need - I'm a linux guy though so I generally try to go with multiplatform tools)

At any rate, findstr seems advanced enough for this (it supports its own regex implementation). It may not support the total range of control and you might need to use powershell or regular batch scripts to write control mechanisms around what files it does and does not check.
 
True, but PowerShell is supported on Server 2003. MS has a download for it.

Yes but installing the packages to X number of 2003 (32bit + x64) servers in the environment makes it harder to get pushed thru. The number of Change Management forms that needs to be approved, getting the application owner/developer for testing and validation, etc. It's an option but a less likely one, but I will certainly bring it to his attention.


*EDIT* Some options seem possible as well. (Source)

But that should also be weighed against PowerShell and whether its capabilities would be more beneficial.

I've been searching, but have not seen any exclusion. I think I may just have to specify what file types needs to be search instead.

I'll keep everyone updated as I progress along.
 
Yes but installing the packages to X number of 2003 (32bit + x64) servers in the environment makes it harder to get pushed thru. The number of Change Management forms that needs to be approved, getting the application owner/developer for testing and validation, etc. It's an option but a less likely one, but I will certainly bring it to his attention.
You only need PowerShell on the machine(s) running the audit, not on every machine being audited. This would also save you from filling out X-1 number of forms :)

Edit: But this also needs to be weighed against the FINDSTR command line and other options suggested by others from a rollout and maintenance perspective.
 
didn't you know that setting up a complete VM for each and every application you'll ever run is the cool thing to do these days? :p

Does findstr cost extra? Why is it more complicated than installing a VM, then Linux inside the VM, and administering both for the rest of the life of the server?
 
Ouch on the change forms. I can create a change and note in it that it will apply to X,Y,Z servers and boom, it's done. I had to dig through change exceptions to find that clause, maybe you have one as well.

As I recently learned, scanning large volumes file by file, line by line for strings of data is nowhere near the same performance as running it right on disk.

Also, if these are file servers - do you have the ability to access the archives/backups. If you can access the previous day's backups all from one machine it may help (depending on the storage medium again).

If it's going to be something you have to do over and over, I would seriously consider the previous suggestion to build a more complete solution. If you do the initial scan and store the full path and something along the lines of an md5sum - the first scan would tell you if any match - anything after that would be a fast check (you would only be gathering the md5 rather than searching the whole file) - if the md5 changed since the first scan or if it's a new file then you know you need to check it again. An exclusion list would also be a nice feature.

didn't you know that setting up a complete VM for each and every application you'll ever run is the cool thing to do these days? :p

People actually do follow this, it's amazing. Our operations center was shocked when I said I had a lightweight ETL package and a webserver running on the same box the other day when it was unreachable. Their first answer was "So you run more than one service on this machine" - Yeah, it comes out of my budgets, of course I'm going to use bunkbeds when space is at a premium.
 
Back
Top