If you use Google Analytics on a regular basis, you have probably ran into the “matches RegExp” or “matches regex” options within advanced report filters or advanced segments. I know when I started learning Google Analytics, I looked up regex and was extremely overwhelmed with its complexity.

To be honest, I’m still a bit overwhelmed with what is possible using regex. But the beauty of using regex with GA is that you don’t need to be an IT geek or a developer to use it in your everyday work. I still consider myself a regex newbie, but I’ve mastered the very basics on my own through reading, creating and testing my own. And I use regex every single day when building new filters, advanced segments, goals, in-line report filters, and so on.

Ohhh, Regular Expression..

So what exactly is regex and what can it be used for within Google Analytics? Glad you asked, why don’t you have a seat and I’ll explain (aka link to a better explanation). Regular expression has been around since the 1950’s, but I was first introduced to the concept a few years ago when I began my journey towards Google Analytics mastery.  I’ve found regex to be extremely helpful within Google Analytics as many aspects of GA allow for the use of regex. My hope for this post is provide people with non-technical backgrounds (the analyst, marketer types) helpful regular expressions for Google Analytics.

Scenarios & Regular Expression Solutions

Beginner

A very simple regex character that I tend to use most when building advanced segments or using inline report filters is the | (or “pipe”). This basically means “or” in regex. A great use-case is when creating a branded keyword advanced segment. My name is simple, Andy Gibson, so it’s fairly difficult to misspell it. So, for an advanced filter for my branded keyword, aka my name, I might build something simple like this:

(andy gibson|andygibson|andygibson google analytics|andy gibson google analytics)

Note: The parenthesis ( ) act exactly the same as they do in algebra.

You can use other regex characters to consolidate some of these branded keywords once you get more advanced, but that’s a bit more complicated.

Intermediate

Another example would be when trying to use an inline filter to compare a few pages together.

Here are my top 10 pages for the last couple days:

My Top 10 Pages in GA

If I want to look at only the Homepage, About Me page, and Contact Me page, I can build a quick regex using the pipe character (along with a few other ones), like so:

^(/|/contact-me/|/about-me/)$

What this regex says is, match any page that starts (^) and ends with ($) “/” or “/contact-me/” OR “/about-me/”:

Inline report filter regex

 

 Advanced

Have you ever been trying to build a profile filter to include multiple different subdomains or directories of a site? Let me give you an example:

  1. This happens to me all the time on publishing clients’ sites. I want to build a profile for only a specific section of the site, let’s say the “Classifieds” section. But, the problem is, there are several different subdomains. Let me explain:
    • The Classifieds homepage URL is www.example.com/classifieds. This page basically features a search bar to type in what you are looking for.
    • Once you put in a search term and click Search, you are taken to classifieds.example.com/searchresults.aspx?some_additional_parameters

Hopefully you’ll notice how difficult this would be to create a profile to include these parts of the site. There are two different subdomains in play here, www.example.com and classifieds.example.com. The Predefined Filters option in GA allows you to filter by hostname, but this won’t work for a few reasons:

  1. You could build a Predefined Filter to include only traffic to the hostname that contains classifieds.example.com, but that doesn’t account for www.example.com/classifieds (the hostname does not contain directories).
  2. You could build a Predefined Filter to include only traffic to the hostname that contains example.com, but this will pull in pretty much any URL on the site.

The way to do this in GA is to build a filter that includes URIs that match a regex. Here’s the regex I would use to include Classified pages with those two subdomains/directories:

^(classifieds\.example\.com.*|www\.example\.com\/classifieds$)

It looks fairly complicated but it really isn’t:

^ = “starts with”

$ = “ends with”

.*  = match any possible character (“get anything”)

\  = make the following character (the character after the \ ) into plain text (“\.” means make the dot an actual dot)

|  = “or”

So, this regex translated, is: “include all URIs that start with “classifieds.” and can have anything after the .com OR include all URIs that start with “www.” and end with the “/classifieds” part.

And this is how you would create the filter:

Regex filter to capture only Classifieds section traffic

Just be cognizant of your URL structure, because these can potentially pick up other URLs you aren’t intending to. That’s why you need to test these extensively with your site if you are using regex to build profile filters.

Remember: Filters work in order! So if your first filter Includes only traffic to the hostname www.example.com, you are already excluding any other subdomains other than www.example.com. So if I second filter that is to include only traffic to the hostname classifieds.example.com, this won’t pick up that subdomain because it is already being excluded in the first filter. If this does not make sense, let me know in the comments and I can explain better.

 Other Helpful RegEx Resources

What are your favorite regex’s to use with Google Analytics? Would you like me to provide more examples of regex’s I use? Leave a comment, I’d love to hear from you!

 

Comments
Harvey Specter
Posted at 3:37 am November 26, 2014
Nick Soper
Reply
Author

Oh my gosh. That link to the gskinner regex testing tool is the most useful thing I’ve found in months! Thank you thank you, thank you!

    Harvey Specter
    Posted at 8:45 am November 26, 2014
    Andy Gibson
    Reply
    Author

    Glad you found it useful, Nick!

Harvey Specter
Posted at 6:01 pm February 10, 2015
Austin D. Trombley
Reply
Author

How do you filter on multiple contains, neither of these work (return any values), but they do return values if I run separately

ga:pagePath=@GLDJE, ga:pagePath=@MSHFK2
ga:pagePath=@ (GLDJE|MSHFK2)

    Harvey Specter
    Posted at 7:08 pm February 17, 2015
    Andy Gibson
    Reply
    Author

    Hey Austin,

    To be honest, I haven’t used GA’s Reporting API much. I could show you in regular expression, but I’m not entirely sure with the API.

Harvey Specter
Posted at 3:18 am April 16, 2015
johnwise74
Reply
Author

Hey Andy,

This is great post. But the filter you have mentioned is not working for me. Could you please help me on this?

    Harvey Specter
    Posted at 7:44 pm April 21, 2015
    Andy Gibson
    Reply
    Author

    Hey John,

    I responded to your email!

Harvey Specter
Posted at 11:01 am September 10, 2015
Kevin
Reply
Author

Great article!

I am a little lost though. I am creating a custom report so that a user can look at data from 3 years ago. The problem with that data is that there are a ton of self referrals from various sub-domains and multiple .ca, .com, .org, domains, which are all the same.

What is the regex to filter out all subdomains and all domain extensions?

Thanks so much, in advanced. 🙂

    Harvey Specter
    Posted at 12:09 pm September 10, 2015
    Andy Gibson
    Reply
    Author

    Hey Kevin, thanks for the kind words!

    Are you looking to exclude any referral traffic from your domain? You can use an exclude filter in the Custom Report to match any subdomain or top-level domain on your site. Ex: .*\.domain\..* would match any subdomain or top-level domain for “domain”, like new.domain.com, new.domain.ca, etc.

    If you’re looking to only exclude a few subdomains or top-level domains, use a pipe | which means “or”:

    Exclude new.domain.com, new.domain.ca, login.domain.com, login.domain.org
    ((new|login)\.domain\.(ca|org|com))

Harvey Specter
Posted at 5:30 pm October 2, 2015
Larry Carillo
Reply
Author

Hi Andy,
Thank you so much for writing this post. I know it’s an older one but it was very helpful in being able to compare a grouping of pages together i.e. ^(/|/contact-me/|/about-me/)$

Thank you again!

Larry

Follow Larry Carillo on Twitter!

    Harvey Specter
    Posted at 9:12 am October 12, 2015
    Andy Gibson
    Reply
    Author

    Of course, you’re welcome!

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>