If you use Google Analytics on a regular basis, you have probably ran into the “matches RegExp” or “matches regex” options within advanced report filters or advanced segments. I know when I started learning Google Analytics, I looked up regex and was extremely overwhelmed with its complexity.
To be honest, I’m still a bit overwhelmed with what is possible using regex. But the beauty of using regex with GA is that you don’t need to be an IT geek or a developer to use it in your everyday work. I still consider myself a regex newbie, but I’ve mastered the very basics on my own through reading, creating and testing my own. And I use regex every single day when building new filters, advanced segments, goals, in-line report filters, and so on.
Ohhh, Regular Expression..
So what exactly is regex and what can it be used for within Google Analytics? Glad you asked, why don’t you have a seat and I’ll explain (aka link to a better explanation). Regular expression has been around since the 1950’s, but I was first introduced to the concept a few years ago when I began my journey towards Google Analytics mastery. I’ve found regex to be extremely helpful within Google Analytics as many aspects of GA allow for the use of regex. My hope for this post is provide people with non-technical backgrounds (the analyst, marketer types) helpful regular expressions for Google Analytics.
Scenarios & Regular Expression Solutions
A very simple regex character that I tend to use most when building advanced segments or using inline report filters is the | (or “pipe”). This basically means “or” in regex. A great use-case is when creating a branded keyword advanced segment. My name is simple, Andy Gibson, so it’s fairly difficult to misspell it. So, for an advanced filter for my branded keyword, aka my name, I might build something simple like this:
(andy gibson|andygibson|andygibson google analytics|andy gibson google analytics)
Note: The parenthesis ( ) act exactly the same as they do in algebra.
You can use other regex characters to consolidate some of these branded keywords once you get more advanced, but that’s a bit more complicated.
Another example would be when trying to use an inline filter to compare a few pages together.
Here are my top 10 pages for the last couple days:
If I want to look at only the Homepage, About Me page, and Contact Me page, I can build a quick regex using the pipe character (along with a few other ones), like so:
What this regex says is, match any page that starts (^) and ends with ($) “/” or “/contact-me/” OR “/about-me/”:
Have you ever been trying to build a profile filter to include multiple different subdomains or directories of a site? Let me give you an example:
- This happens to me all the time on publishing clients’ sites. I want to build a profile for only a specific section of the site, let’s say the “Classifieds” section. But, the problem is, there are several different subdomains. Let me explain:
- The Classifieds homepage URL is www.example.com/classifieds. This page basically features a search bar to type in what you are looking for.
- Once you put in a search term and click Search, you are taken to classifieds.example.com/searchresults.aspx?some_additional_parameters
Hopefully you’ll notice how difficult this would be to create a profile to include these parts of the site. There are two different subdomains in play here, www.example.com and classifieds.example.com. The Predefined Filters option in GA allows you to filter by hostname, but this won’t work for a few reasons:
- You could build a Predefined Filter to include only traffic to the hostname that contains classifieds.example.com, but that doesn’t account for www.example.com/classifieds (the hostname does not contain directories).
- You could build a Predefined Filter to include only traffic to the hostname that contains example.com, but this will pull in pretty much any URL on the site.
The way to do this in GA is to build a filter that includes URIs that match a regex. Here’s the regex I would use to include Classified pages with those two subdomains/directories:
It looks fairly complicated but it really isn’t:
^ = “starts with”
$ = “ends with”
.* = match any possible character (“get anything”)
\ = make the following character (the character after the \ ) into plain text (“\.” means make the dot an actual dot)
| = “or”
So, this regex translated, is: “include all URIs that start with “classifieds.” and can have anything after the .com OR include all URIs that start with “www.” and end with the “/classifieds” part.
And this is how you would create the filter:
Just be cognizant of your URL structure, because these can potentially pick up other URLs you aren’t intending to. That’s why you need to test these extensively with your site if you are using regex to build profile filters.
Remember: Filters work in order! So if your first filter Includes only traffic to the hostname www.example.com, you are already excluding any other subdomains other than www.example.com. So if I second filter that is to include only traffic to the hostname classifieds.example.com, this won’t pick up that subdomain because it is already being excluded in the first filter. If this does not make sense, let me know in the comments and I can explain better.
Other Helpful RegEx Resources
- gskinner regex testing tool (I use it almost every day to test regex’s)
- LunaMetrics Guide to Regular Expression for Google Analytics (this is a GREAT resource)
What are your favorite regex’s to use with Google Analytics? Would you like me to provide more examples of regex’s I use? Leave a comment, I’d love to hear from you!