Getting SEM Keyword Data From Apache Log Files

The use of adwords with the current form of The Dog Way is difficult because products are supplied through drop ship wholesalers that provide limited inventory and very small margins, but, on the plus side, no need to worry about fulfillment or inventory costs! Because the average gross product might be around $10, and I’m hoping the average margin per order will be around $20, there isn’t much room to bid. The CPCs I’m seeing right are $1.50. meaning a hopeful, break-even conversion rate of 7.5%, which is not the case – yet.

Lack of budget, lack of expected success, and a very limited product suite present some challenges. Couple that with a new domain, no relevant click-through-rate history, and the options of keywords I can profitably bid on became pretty limited.

I had initially planned on targeting very long tail phrases: add in the list of products, clean up the titles, and those titles became the keywords. That hit a wall when it became clear that inventory from Doba could change randomly and that Google would not enter keywords with insufficient search volume into the auction (which means the product names I had planned on targeting). Although, I imagine it’d still be possible to pick up those search queries through broad match somehow.

I did see a value on SEM though to generate a list of keywords to target for SEO,  and though there are other, cheaper ways to get these lists, which I’ll go over later, they don’t give an indication of conversion rates and usability stats.

I ended up making two very simple campaigns targeting ‘Dog Clothes’ and ‘Dog Coats’ and then the adgroups targeted the sub-categories. I put together about 5 keywords per ad group using broad match for each. I set a small budget and then ran it to see what would happen.

Of note:

-Most of the keywords did not end up showing, Google said the keywords were either too similar to other keywords, for example: ‘Affordable Dog Coats’ and ‘Affordable Dog Jackets’ and then that some were too low volume. Out of the roughly 50 keywords I entered, only 3 generated traffic. I wanted to see the actual search queries, but Google claimed there was insufficient volume to show those. This where grep comes in handy again.

I went to the access-logs file mentioned earlier and did a few other commands to specifically get the search queries. The command looked roughly like this, and then I’ll break it down.

grep -E “glcid|aclk” log_file.txt | awk ‘{print $11} | awk -Fq\= ‘{print $2}’ | awk -Fsource= ‘{print $1}’ | sed -e ‘s/%20/ /g’ | sed -e ‘s/”//g’ > ~/search_terms.txt

When reading anything that looks like code from me, please see my general disclaimer that basically says, “I’m a business guy, not a coder.”

The parts

grep -E “glcid|aclk” log_file.txt – Instructs the utility grep to look through the log file for instances of ‘glcid’ or aclk’, the parameters I’ve seen for adwords, and pulls out those lines.

The ‘|’, or pipe, is a way of outputting one commands output into another’s input through.

awk ‘{print $11}’ – awk is another unix utility for working with files, it’s very handy. Awk is very useful for looking at columns and uses whitespace as the column delimiter by default. The ‘{print $11}’ is a command to print just column 11, which, is the referring string, or what URL just sent the user to the landing page.

awk -Fq\= ‘{print $2} – The default column delimiter for awk is whitespace by default, but it’s possible to specify another delimiter using -Fpattern. In this case I used ‘q=’ because thats where Google puts the search query, but notice the ‘\’. The ‘\’ is a way to escape out a character and prevent it from being used as a special character. Once I specified the delimiter as ‘q=’, the referring string gets broken into two. The part I want, with the search query, is in column 2.

awk -Fsource= ‘{print $1}’ – Another use of the delimiter because there is still some part of the URL on the search query I don’t want. I basically used the same trick as a move to get down to just the search query.

sed -e ‘s/%20/ /g’ – Sed is a unix stream editor, another handy utility. The -e tells sed to edit the stream. The next part is essentially a find and replace. I am replacing the ‘%20’ with a ‘space’ to clean up the formatting. The ‘g’ at the end is a specification to make it global, or on all instances. The basic sed replace structure is this:

-sed ‘s/string_to_replace/new_string_to_enter/numberofinstancestoreplace’

sed -e ‘s/”//g’ – That gets rid of any quotation marks on the string, and we’re left with “ “.

> ~/search_terms.txt – Instead of piping the previous output to another command, we’re redirecting the output to a file. The ~/ specifies my home directory and then the file name is search_terms.txt.

And now we have a list of keywords looking like this:

benfica dog clothing

cheap pet clothes for small dog

cheap designer dog clothes

cheap designer dog clothes

dog clothes

clothyes for dogs



I’ll write another blog post about a couple of simple ways to clean up the file, you can you put it in excel to dedup, sort, count, et cetera.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: