Our pricing is partially based on the number of monthly visits to your site, so we’d better have an accurate definition of “visits.” This is an interesting question anyway, because it’s one of the primary web analytics metrics. But, this is harder to define than it seems.
There are two fundamental questions:
- How should a “visit” be defined?
- How do you measure “visits” in practice?
Defining a “visit”
Let’s just write down some events that we think should and shouldn’t be a visit:
- When a human being first arrives on the site and loads the page, staying there for 31 seconds, that’s a visit.
- If that same human then clicks a link and sees another page, that’s not a new visit; that’s part of the same visit.
- If that same human doesn’t have cookies or Javascript enabled, still all that should count as one visit.
- If that same human loads the site with a different browsers, that’s still not a new visit; that’s part of the same visit.
- If that same human bookmarks the site, then 11 days later comes back to the site, that is a new visit.
- When a robot loads your site, our servers render it just like they would for a human visitor. However, this visit may not represent a value to you, which is why we will not count bot traffic toward your total visits. Additional information can be found here.
- There are additional cases too where the “right thing to do” is less clear. For example, take the case of a “quick bounce.” Suppose a human clicks a link to the site, then before the site has a chance to load the human clicks “back.” Does that count as a visit? Our servers still had to render and attempt to return the page, so in that sense “yes.” But a human didn’t see the site and Google Analytics isn’t going to see that hit, so in that sense “no.” Because we need the notion of a “visit” to correspond to “the amount of computing resources required to serve traffic,” we round off in favor of saying “yes.”
So rather than attempting to write down an exact definition of a “visit,” we’ll just say that whatever it is, it has to be consistent with all the notions written above.
Exceptions
Exception: We do NOT count “image visits” towards traffic charges.
There’s a special kind of “visit” as defined above which we do NOT count towards your account. This is a visit which hits only static content (usually an image), but doesn’t hit a normal page on your site.
This is common when not using a CDN, getting hot-linked, Twitter campaigns, and embedding images into email campaigns. While this does represent real traffic to your site, and real cost on our side to serve it, we also appreciate that sometimes this is out of your control, and that it’s less expensive for us to serve static content than it is to serve dynamic content.
If you get a lot of this sort of traffic, we’ll reach out to you to understand what’s happening, and see if we can work together to create a solution that doesn’t involve so much traffic, such as enabling our CDN, getting you signed up for a service like CloudFlare, moving content to a content service like S3, and so forth. But we won’t charge you extra.
Exception: We do NOT count visits from well-known “bot” User Agents.
When we see a known bot User Agent in the Nginx access logs, we do not include those as a billable visit. While we do not count bot visits toward billable visits, those bot visits would need to be coming from a User Agent that is a known bot User Agent.
Measuring a “visit”
This is where things can get tricky.
It’s tempting to say “Whatever Google Analytics says is the ‘number of visitors’ in a month, that’s the number of visits in a month.” But it’s clear that this metric does not satisfy the definition above. GA tries very hard to show only real, human traffic coming to your site – so it doesn’t measure bot traffic or “quick bounces.” And Google Analytics would double-count the case of a human using two browsers or (sometimes) someone who has cookies disabled in their browser.
We also need something clear and simple so it’s trivial to compute and easy to analyze if it’s not behavior like we expect.
So we’ve settled on this metric:
We take the number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period (factoring out well-known “bot” user agents, and static requests). The number of “visits” in a given month is the sum of those daily visits during that month.
Does this satisfy the conditions above?
- Yes, because that’s an IP address.
- Yes, because that’s the same IP address, so it won’t be counted again.
- Yes, because we’re not using cookies or Javascript or any other feature dependent on the browser being used.
- Yes, because it’s tied to the network, not the browser.
- Yes, because we reset our notion of “unique IP address” every day.
- Yes, because we’ll see the hit in our logs.
This does mean there’s some cases where you could theoretically argue we’re counting visitors too often. For example, a person visits a site from work, then drives home and visits the site again later that day. That will count as two visits because the IP addresses will be different. But, we’d argue; (a) that doesn’t happen much, (b) it’s not terribly unreasonable for that to count as two visits, (c) those events are counter-balanced by times where we count only one visit where really it’s two.
As an example of that last point, what if two people in the same office visit a site from two computers? That should be two visits; even Google Analytics would count it as two. But we count it only as one because their IP addresses (from our perspective) are the same. So the cases where we count too few are counterbalanced — to the first approximation anyway — by those where we count too many, and therefore we think this is still a fair metric.
Google Analytics vs WP Engine visitor counts
It can be difficult to truly judge what visits are coming into your site if you’re comparing visitor counts from different resources. It’s important to keep in mind that no matter where you look, each visitor count is filtered and sorted based on a certain set of goals and principles. In the case of Google Analytics, they specifically try to show only actual “human” traffic to your site, to better gauge how your site is performing among real people and to provide marketing analytics. So they filter out various kinds of traffic including bot traffic, to show you this.
And, sometimes bots can spoof a legitimate User Agent, which looks like a legitimate visit/hit in the Nginx access logs and then gets counted as a billable visit. This can also contribute to the discrepancy you see between our count and Google Analytics’ count. The reason is that in order for Google Analytics to count a visit or a page view, a javascript file needs to be executed on the user’s end. If the “user” is actually a bot spoofing a legitimate User Agent, then the way that many bots hit the site, that javascript file necessary for GA to work will not be executed.
This means that while WP Engine sees that hit that looks legitimate in our Nginx access logs, Google Analytics is not able to track it.
If you’re seeing large discrepancies between Google Analytics and WP Engine’s visitor counts, it may be beneficial to use a service like CloudFlare. The benefit of CloudFlare in these cases is that CloudFlare can filter out any kind of “spammy” requests based on origin IP, User Agent, etc (CloudFlare maintains an extensive database of what should and should not be trusted) at the DNS level so that those requests never make it to the server. After customers have done this in the past, the majority of them have reported that the visits we track falls in line much better with what Google Analytics reports.