Developing Goals

Before leaving my job to work full time on a startup, I wrote down goals of other things I’d like to learn and do with the time I had not working a 9 to 5. My days before going full time on the startup had been: wake up, work on idea, go to work, come home, and work some more. Plus some time thrown in there for going to the gym and eating food. I didn’t leave much time to develop other skills, volunteer, or pick up a hobby.

My days now aren’t significantly different: wakeup, work, work some more, go to sleep. But, there is more room to spend 30 to 60 minutes learning a skill, hanging out, or doing a hobby. With the new found time (time saved from not having to commute, not being in meetings, and not preparing reports) here are some of the larger goals and then specific projects I want to pursue:

Larger Goals:

  • Learn more technical skills – Learn to setup a website, make emails, pull in data from various sources, and perform calculations on them. I have first hand experience with how technical skills can significantly boost productivity of marketing personnel and I want to continue this
  • Network & Blog – I used to get out and network a decent amount several years ago, and blog, just want to get back in the habit
  • Pickup some old and new habits – Bouldering, stand up paddle boarding, and maybe some martial art
  • Languages – Continue to develop French and learn Vietnamese
  • Stay Healthy/Get Healthier – Speaks for itself

That’s it, trying to keep things simple on the high level goals.


Getting Restarted

I left my job May 10th to work on starting a company with two other co-founders: Claire Vo and Dave McLain. Before I went full time I came up with a list of things I wanted to accomplish with the available time I had – one of those items was start blogging. It’s been 24 days since May 10th and no blog posts. Originally, I had wanted write posts that were intelligent, creative, and totally awesome; 24 days in I just want to get things moving along.

If I don’t have anything insightful to write, I’ll just share some of the things I learned today.

I’m working on a prototype for a product idea I’ve had since 2009. My goal is to get a working website up within the next two weeks. It involves Python and Web Scraping.

Things Learned Today:

I’m out


Preventing Wasted Crawls Part 1 of Many

Googlebot loves to crawl: it’ll crawl any thing that looks like a URL, anything it can find in javascript, html, or on the page. If it looks like a URL, Googlebot will try to crawl it. Great for Google, probably great for web users because Google learns more about the web, but it can lead to wasted crawls for web owners.

As I mentioned in a postion about initial SEO decisions for The Dog Way, I blocked all category pages. I’ve loosely monitored Google’s crawls and found them just crawling any available URL: pagination, sorting by size, price, color, et al.


(For those that look at log files, you’ll notice the IPs aren’t Googlebot IPs, we’re using Cloudflare to try to speed up the site and all requests come through their IPs.)

Now I need to find all of the URLs I should have blocked, but didn’t. Very, very simple unix command: wget -O- url | grep urlpath.*\” | sort | uniq

(What I actually ran: wget -O- | grep -o dog-boots-and-shoes.*\” | sort | uniq)

Here’s what the output looks like:


To break that down.

wget -O- url: wget is a program to download files, by default it’ll save the file in the current directory. The -O- tells it to redirect to the stream output. The url is the url to download.

grep -0 urlpath.*” : urlpath in this case is the the part of the URL after the domain. In this case it was dog-boots-and-shoes.*\” (the \” is a way to escape out the ” to treat it as only one “). The ‘-o’ outputs just the text that matches and nothing else.

sort | uniq : sorts the lines and then just outputs the unique ones.

What were the key takeaways from that? I need to improve the handling of pagination, prevent Google from crawling: limit, dir=, size=, and color=. The /p/ are products and are already blocked in robots.txt. The really simple changes are to just block them all in robots. Other options are configuring URL parameters in Google Webmaster, trying to block the links out through rel=”nofollow”, or, for pagination, using rel=”next” and rel=”prev”. For right now, I’m just going with robots.txt because it’s the fastest way to fix the crawls.

Initial SEO Decisions for The Dog Way

Quick Discussion of SEO For The Dog Way

If you spend a few minutes looking around The Dog Way, you’ll notice there is almost no attempt to optimize the site for search engines. Further, almost all of the content is blocked in robots.txt, and in fact, until three week ago, the entire site was blocked. Two reasons.

The first is that since the products come from drop shippers found through there is no original content on the site.

The second is that the main focus of customer acquisition will be through social media channels, hopefully.

Having said that, there will still be some attempts at SEO, and the plan looks like this.

1. Allow bots to crawl the home page, about us, and blog.

2. Do some keyword research to decide which keywords to target through category and sub-category pages.

3. Write entertaining, good content on the category pages, and then the sub-category pages. As each page gets content, I will unblock it in robots.txt and then submit it to Google to crawl.

4. Focus on image and video optimization after that. Dogs are cute, pictures of dogs are cute, people like clicking on them, so I’m hopeful about the last tactic.

One question that comes up with blocking an entire site in robots.txt is how long does it take to get re-included and re-crawled. Turns out that Google still checks the robots.txt everyday, even when it’s blocked. You can see this by going into the logs and looking at the crawls.

Hosting for The Dog Way is done through SimpleHelix and the keep at least one day of log files on the shared, apache server under the symbolic link access-logs. Here’s how I monitored the crawl activity.

Step 1: cd access-logs

Step 2: nice grep “Googlebot” | more

This allows me to pull out all page requests from Googlebot and then page through them a few a time. SimpleHelix updates the past 24 hours and always stores it in ‘’, so there was no need to specify a date or anything else, this isn’t always the case. Grep is a unix utility for looking through files.

I unblocked Google from the homepage about three weeks ago and they started crawling other links right away. It took about 3 days for the meta content to show up for TheDogWay and it now ranks number 1 for ‘thedogway’ but just page one for ‘The Dog Way’.

Getting SEM Keyword Data From Apache Log Files

The use of adwords with the current form of The Dog Way is difficult because products are supplied through drop ship wholesalers that provide limited inventory and very small margins, but, on the plus side, no need to worry about fulfillment or inventory costs! Because the average gross product might be around $10, and I’m hoping the average margin per order will be around $20, there isn’t much room to bid. The CPCs I’m seeing right are $1.50. meaning a hopeful, break-even conversion rate of 7.5%, which is not the case – yet.

Lack of budget, lack of expected success, and a very limited product suite present some challenges. Couple that with a new domain, no relevant click-through-rate history, and the options of keywords I can profitably bid on became pretty limited.

I had initially planned on targeting very long tail phrases: add in the list of products, clean up the titles, and those titles became the keywords. That hit a wall when it became clear that inventory from Doba could change randomly and that Google would not enter keywords with insufficient search volume into the auction (which means the product names I had planned on targeting). Although, I imagine it’d still be possible to pick up those search queries through broad match somehow.

I did see a value on SEM though to generate a list of keywords to target for SEO,  and though there are other, cheaper ways to get these lists, which I’ll go over later, they don’t give an indication of conversion rates and usability stats.

I ended up making two very simple campaigns targeting ‘Dog Clothes’ and ‘Dog Coats’ and then the adgroups targeted the sub-categories. I put together about 5 keywords per ad group using broad match for each. I set a small budget and then ran it to see what would happen.

Of note:

-Most of the keywords did not end up showing, Google said the keywords were either too similar to other keywords, for example: ‘Affordable Dog Coats’ and ‘Affordable Dog Jackets’ and then that some were too low volume. Out of the roughly 50 keywords I entered, only 3 generated traffic. I wanted to see the actual search queries, but Google claimed there was insufficient volume to show those. This where grep comes in handy again.

I went to the access-logs file mentioned earlier and did a few other commands to specifically get the search queries. The command looked roughly like this, and then I’ll break it down.

grep -E “glcid|aclk” log_file.txt | awk ‘{print $11} | awk -Fq\= ‘{print $2}’ | awk -Fsource= ‘{print $1}’ | sed -e ‘s/%20/ /g’ | sed -e ‘s/”//g’ > ~/search_terms.txt

When reading anything that looks like code from me, please see my general disclaimer that basically says, “I’m a business guy, not a coder.”

The parts

grep -E “glcid|aclk” log_file.txt – Instructs the utility grep to look through the log file for instances of ‘glcid’ or aclk’, the parameters I’ve seen for adwords, and pulls out those lines.

The ‘|’, or pipe, is a way of outputting one commands output into another’s input through.

awk ‘{print $11}’ – awk is another unix utility for working with files, it’s very handy. Awk is very useful for looking at columns and uses whitespace as the column delimiter by default. The ‘{print $11}’ is a command to print just column 11, which, is the referring string, or what URL just sent the user to the landing page.

awk -Fq\= ‘{print $2} – The default column delimiter for awk is whitespace by default, but it’s possible to specify another delimiter using -Fpattern. In this case I used ‘q=’ because thats where Google puts the search query, but notice the ‘\’. The ‘\’ is a way to escape out a character and prevent it from being used as a special character. Once I specified the delimiter as ‘q=’, the referring string gets broken into two. The part I want, with the search query, is in column 2.

awk -Fsource= ‘{print $1}’ – Another use of the delimiter because there is still some part of the URL on the search query I don’t want. I basically used the same trick as a move to get down to just the search query.

sed -e ‘s/%20/ /g’ – Sed is a unix stream editor, another handy utility. The -e tells sed to edit the stream. The next part is essentially a find and replace. I am replacing the ‘%20’ with a ‘space’ to clean up the formatting. The ‘g’ at the end is a specification to make it global, or on all instances. The basic sed replace structure is this:

-sed ‘s/string_to_replace/new_string_to_enter/numberofinstancestoreplace’

sed -e ‘s/”//g’ – That gets rid of any quotation marks on the string, and we’re left with “ “.

> ~/search_terms.txt – Instead of piping the previous output to another command, we’re redirecting the output to a file. The ~/ specifies my home directory and then the file name is search_terms.txt.

And now we have a list of keywords looking like this:

benfica dog clothing

cheap pet clothes for small dog

cheap designer dog clothes

cheap designer dog clothes

dog clothes

clothyes for dogs



I’ll write another blog post about a couple of simple ways to clean up the file, you can you put it in excel to dedup, sort, count, et cetera.


Tagged ,

Blog Intro

Welcome to the first post of Ad Free Marketing! This blog is for online marketers, or for people that like short, stilted sentences mixed in with a few typos, grammatical errors, and the occasional run on sentence. Take your pick.

I, with my incredibly talented partner, recently launched a side-passion to experiment with social commerce,  keep exploring new areas of online marketing, and more importantly, make a more enjoyable shopping experience. This blog will document the steps I take to promote customer acquisition and retention. If we’re lucky, Claire will start talking about the design and UX decisions she makes for; she is also the one doing all of the front and back-end coding.

Why You Might Want To Read This Blog

The steps I take to promote The Dog Way will be chronicled here, including successes and failures. My goal with the blog is to share what works, what doesn’t, and to provide data on how the decision was made. The hope is that each blog post will provide some value to the average online marketer.

Who I Am

You can check out my LinkedIn Profile or my Indeed Resume here to get a sense of where I’ve worked, but at the end of the day, I’m an Austinite into web commerce, data stuff, and startups. I’ve worked in business development, product, analytics, and online marketing roles.

Feel free to reach out to me with question through my Indeed Resume or leave a comment.


EJ Lawless