Keyword Categorisation for PPC Search Engine Marketing

sometimes it’s strange that you never get to think about creating a solution for a manual task that you have to do, even though it would be totally easy to do.
such for me was building a tool that categorises keywords… i have probably categorised keywords manually for way too long, probably 10s of thousands of keywords. often i cut corners, often i would just wrap keywords into an ad group by the way that i generated them, running a quick sense check if the keywords are in the right place.

it turns out that the way that i was doing this was actually quite straight forward: i look at the keyword and decide, based on a few key phrases, what category i would map that keyword too. let’s take an example. the company i (hypothetically) work for is an airline with the name paddyjet:

  • i have brand terms, ie: paddyjet, paddy jet and so on and so forth.
    everything that has these words or ‘paddy’ in them will be brand terms, with the possible exception of keywords that have the word ‘review’ or ‘compare’ in them. maybe i also want to exclude the words ‘contact’ and ‘customer service’, as they too should be in another category called ‘paddyjet cs terms’
  • i have competitor terms that i may or may not want to bid on: british airways, ba, delta, – i would probably want to create a rule with these other brand terms and put them in another category, again with the exception of ‘review’ and ‘compare’ terms
  • there are the ‘review’ and ‘compare’ type terms
  • budget type terms will probably have the terms ‘cheap’, ‘discount’, ‘bargain’, ‘low cost’ and so on
  • cs terms should be our brand name and the words like ‘customer service’, ‘contact’, ‘complaints’…
in short, there are rules in your mind that make you decide quite instantly where to put terms. these rules are:
  • must include this word
  • must exclude these words
  • must include these words and these words
so i wrote a little python script to categorise keywords based on your rules. click on the link to go to my website and get the file and also samples
it needs the list of keywords in a text file (default keywords.txt), a list of noise words to remove (default: noiseWords.txt – it is optional to provider content in this file, the file itself has to be there) and the categorisation rules file (default keywordCategories.txt). 
the rules file needs to be marked up and structured in an ordered way:
  • more important rules need to go at the top
    ie: paddyjet tickets – if you rather want this in the ‘brand terms’, then put the brand rules first. if you rather want this to be categorised as ‘ticket terms’, then put this one first
  • one line per rule – no exceptions
  • start the line with the category name, followed by a colon (:)
  • words that mean the keyword does not fit in this category need to be in curly brackets ({…})
  • words that need to in combination with the term need to be in hard brackets ([…])
  • you can create combinations of words that need to be included and the search term.
    ie: you have vanity urls that you wish to treat different from brand terms (‘flypaddyjet’, ‘premierpaddyjet’ or paddyjetbusiness). simply put a tilde at the required position. ie:
    vanity urls: [premier~, ~premier] paddyjey, tickets, bookings, paddy
    will create the word combinations:
    premierpaddyjet, premiertickets, premierbookings, premierpaddy, paddyjetpremier, ticketspremier, …
finally, anything that can’t be categorised as ‘unknown’.
at the end a file will be generated with three columns: the original keyword, the noise words removed and the category based on the rules. the default name of the file would be date-time stamped and look like: YYYYMMDD_HHMMSS_categorisedKeywords.txt
i hope this tool could be useful for some people in digital marketing who currently struggle to categorise thousands of keywords manually. contact me if you have any questions, i’m happy to help.
it’s a very quick and dirty solution, so if you have problems, find bugs or would like more features, feel free to contact me.