Friday, October 30, 2009

download search result from cgi/cfm form

get second column and sort and remove duplicate

cat FLAT_RCL.txt | cut -f 2 | grep '^0[2-9][A-X]' | sort|uniq > dest.txt


You can use command line browser tools to download file
lynx -dump $url > file.txt


But Most importantly you can use curl tools.
Suppose there is a search toolbar like the following,
you want to download all the results from the
query.













You can use
curl -d "SearchType=QuickSearch&rcl_id=$x&summary=true&Search=Search" http://www-odi.nhtsa.dot.gov/recalls/results.cfm > $x.html

The most important is the rcl_id you provided (as a parameter here).

For more information about curl, refer to
http://curl.haxx.se/docs/manual.html




How to readline from file:
N=0
cat ./dest.txt |  while read LINE;
do
N=$((N+1))
x=$LINE
cd ~/fay
curl -d "SearchType=QuickSearch&rcl_id=$x&summary=true&Search=Search" http://www-odi.nhtsa.dot.gov/recalls/results.cfm > 
$x.html
done

No comments:

Post a Comment