June 22, 2009

Google word popularity (swrnum)

When Google returns a page with search results, the page shows an approximate number of hits.
Something like

Results 1 - 10 of about 61,600,000 for banana [definition]. (0.13 seconds)

The page source reveals: the swrnum variable holds that
approximate number:


This makes it so easy to find the (approximate) word popularity.

count=`egrep -io "swrnum=([0-9]+)\"" ${temp_file} | sed 's/swrnum=//; s/"//'`

Here is a simple bash script:


# if the file with search terms is empty -
# print the usage and exit

if [ -z "$src_file" ]; then
echo "usage: ./ "
echo ""
echo " e.g.: ./ search_terms.txt"
echo " where search_terms.txt has one term per line"
echo " something like: "
echo " aardvark"
echo " tubulidentata"
echo " durer"
echo " random bs"
echo " loose multiword search"
echo " \"strict multiword search\""

# print the name of the search terms file
echo "file with search terms: '"$src_file"'"

# the main loop - read the search terms line by line
cat $src_file | while read line
line_ext=`echo ${line} | sed 's/ /+/g'`
# run google search, save results into temp_file
wget -N -t2 -e robots=off -awget.log -T 200 -O ${temp_file} -U "Mozilla/5.0(Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040613 Firefox/0.8.0+"${line_ext}

# confirm the search term
search_term=`egrep -io "(.+) - Google Search" ${temp_file} | sed 's///; s/ - Google Search//'`

# find the total count in the swrnum variable
count=`egrep -io "swrnum=([0-9]+)\"" ${temp_file} | sed 's/swrnum=//; s/"//'`

# print results
echo "${search_term}, ${count}"

Sample output:

cat search_terms.txt

run_search search_terms.txt

file with search terms: 'search_terms.txt'
oracleaid, 167
wordpress, 309000000


