oracle aide

June 22, 2009

Google word popularity (swrnum)

Filed under: meme mining — Tags: — oracleaide @ 1:10 pm

When Google returns a page with search results, the page shows an approximate number of hits.
Something like

Results 1 - 10 of about 61,600,000 for banana [definition]. (0.13 seconds)

The page source reveals: the swrnum variable holds that
approximate number:

swrnum=61600000


This makes it so easy to find the (approximate) word popularity.

count=`egrep -io "swrnum=([0-9]+)\"" ${temp_file} | sed 's/swrnum=//; s/"//'`

Here is a simple bash script:

#!/bin/bash
src_file=$1
temp_file="temp.txt"

# if the file with search terms is empty -
# print the usage and exit

if [ -z "$src_file" ]; then
echo "usage: ./run_search.sh "
echo ""
echo " e.g.: ./run_search.sh search_terms.txt"
echo " where search_terms.txt has one term per line"
echo " something like: "
echo " aardvark"
echo " tubulidentata"
echo " durer"
echo " random bs"
echo " loose multiword search"
echo " \"strict multiword search\""
exit
fi

# print the name of the search terms file
echo "file with search terms: '"$src_file"'"

# the main loop - read the search terms line by line
cat $src_file | while read line
do
line_ext=`echo ${line} | sed 's/ /+/g'`
# run google search, save results into temp_file
wget -N -t2 -e robots=off -awget.log -T 200 -O ${temp_file} -U "Mozilla/5.0(Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040613 Firefox/0.8.0+" http://www.google.com/search?q=${line_ext}

# confirm the search term
search_term=`egrep -io "(.+) - Google Search" ${temp_file} | sed 's///; s/ - Google Search//'`

# find the total count in the swrnum variable
count=`egrep -io "swrnum=([0-9]+)\"" ${temp_file} | sed 's/swrnum=//; s/"//'`

# print results
echo "${search_term}, ${count}"
done

Sample output:

cat search_terms.txt
oracleaid
wordpress

run_search search_terms.txt

file with search terms: 'search_terms.txt'
oracleaid, 167
wordpress, 309000000

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: